基于Flask提供Ollama开放的模型调用

yanghaitao

6
文章

1
评论

2024年12月25日 18:56:53人工智能1 1,462

背景

Ollama启动后即可提供模型调用的服务；类似于Docker的操作命令。

观察发现，用不了多久Ollama的模型就会消失。

其实只是Ollama加载模型极快；不活跃的服务就会被卸载掉。需要调用模型，又能很快载入。

代码

from flask import Flask, request, Response
from openai import OpenAI

app = Flask(__name__)
client = OpenAI(
    base_url="http://172.**.**.35:11434/v1/",
    api_key="ollama"
)

def event_stream(question):
    response = client.chat.completions.create(
        messages=[
            {"role": "system", "content": "你是一个人工智能助手"},
            {"role": "user", "content": question}
        ],
        model="qwen2.5:latest",
        stream=True
    )
    # 输出数据到前端
    for chunk in response:
        if chunk.choices[0].finish_reason is not None:
            data = '[Done]'
        else:
            data = chunk.choices[0].delta.content
        yield 'data: %s\n\n' % data.replace("\n", "<br/>")

@app.route("/")
def index():
    return "Welcome to WhatsRAG!"

@app.route("/whatsrag/chat/stream", methods=["GET"])
def chat_stream():
    question = request.args.get('question')
    print(question)
    return Response(event_stream(question), content_type='text/event-stream')

app.run(debug=True, host="0.0.0.0", port="12345")

背景

代码

发表评论