cisco qos--How Are Conversation Numbers Allocated?

本文介绍如何在路由器上分配会话编号以实现流量分类和服务质量控制。通过应用加权公平队列(WFQ)等策略,确保不同类型的网络流量能够得到合理处理。文中详细解释了各类流量对应的队列设置及参数配置。

http://www.cisco.com/c/en/us/support/docs/quality-of-service-qos/qos-congestion-avoidance/10107-showpolicy.html

This section explains how your router allocates conversation numbers for the queues created when the service policy is applied.

Router# show policy-map interface s1/0.1 dlci 100 
 Serial1/0.1: DLCI 100 - 
 output : mypolicy 
  Class voice 
   Weighted Fair Queueing 
       Strict Priority 
       Output Queue: Conversation 72 
         Bandwidth 16 (kbps) Packets Matched 0 
        (pkts discards/bytes discards) 0/0 
  Class immediate-data 
   Weighted Fair Queueing 
       Output Queue: Conversation 73 
         Bandwidth 60 (%) Packets Matched 0 
         (pkts discards/bytes discards/tail drops) 0/0/0 
         mean queue depth: 0 
         drops: class  random   tail     min-th   max-th   mark-prob 
                0      0        0        64       128      1/10 
                1      0        0        71       128      1/10 
                2      0        0        78       128      1/10 
                3      0        0        85       128      1/10 
                4      0        0        92       128      1/10 
                5      0        0        99       128      1/10 
                6      0        0        106      128      1/10 
                7      0        0        113      128      1/10 
                rsvp   0        0        120      128      1/10 
  Class priority-data 
   Weighted Fair Queueing 
       Output Queue: Conversation 74 
         Bandwidth 40 (%) Packets Matched 0 Max Threshold 64 (packets) 
         (pkts discards/bytes discards/tail drops) 0/0/0 
  Class class-default 
   Weighted Fair Queueing 
       Flow Based Fair Queueing 
       Maximum Number of Hashed Queues 64  Max Threshold 20 (packets)

The class-default class is the default class to which traffic is directed, if that traffic does not satisfy the match criteria of other classes whose policy is defined in the policy map. The fair-queue command allows you to specify the number of dynamic queues into which your IP flows are sorted and classified. Alternately, your router allocates a default number of queues derived from the bandwidth on the interface or VC. Supported values in either case are a power of two, in a range from 16 to 4096.

This table lists the default values for interfaces and for ATM permanent virtual circuits (PVCs):

Default Number of Dynamic Queues as a Function of Interface Bandwidth

Bandwidth RangeNumber of Dynamic Queues
Less than or equal to 64 kbps16
More than 64 kbps and less than or equal to 128 kbps32
More than 128 kbps and less than or equal to 256 kbps64
More than 256 kbps and less than or equal to 512 kbps128
More than 512 kbps256

Default Number of Dynamic Queues as a Function of ATM PVC Bandwidth

Bandwidth RangeNumber of Dynamic Queues
Less than or equal to 128 kbps16
More than 128 kbps and less than or equal to 512 kbps32
More than 512 kbps and less than or equal to 2000 kbps64
More than 2000 kbps and less than or equal to 8000 kbps128
More than 8000 kbps256

Based on the number of reserved queues for weighted fair queuing, Cisco IOS assigns a conversation or queue number as shown in this table:

Conversation / Queue NumberType of Traffic
1 - 256General flow-based traffic queues. Traffic that does not match to a user-created class will match to class-default and one of the flow-based queues.
257 - 263Reserved for Cisco Discovery Protocol (CDP) and for packets marked with an internal high-priority flag.
264Reserved queue for the priority class (classes configured with the priority command). Look for the "Strict Priority" value for the class in the show policy-map interface output. The priority queue uses a conversation ID equal to the number of dynamic queues plus eight.
265 and higherQueues for user-created classes.

Confirming Your Service Policy


import time import os import requests import json from basereal import BaseReal from logger import logger from typing import Dict, List, Optional, Callable import jwt import hashlib import hmac import base64 from datetime import datetime, timedelta import threading class ConversationHistory: """对话历史管理类""" def __init__(self, max_rounds=5): self.max_rounds = max_rounds self.history = [] def add_message(self, role: str, content: str): """添加消息到历史记录""" self.history.append({"role": role, "content": content}) # 保持最多max_rounds轮对话(每轮包含用户和AI两条消息) self.history = self.history[-(self.max_rounds*2):] def get_messages(self) -> List[Dict]: """获取完整的历史消息列表""" return self.history.copy() def clear(self): """清空历史记录""" self.history = [] class CozeClient: def __init__(self, app_id: str, private_key: str, kid: str, bot_id: str): """ 初始化Coze客户端 :param app_id: 应用ID :param private_key: 私钥 :param kid: Key ID :param bot_id: 机器人ID """ self.last_activity = {} self.app_id = app_id self.private_key = private_key self.kid = kid self.bot_id = bot_id self.access_token = None self.token_expire_time = None self.lock = threading.Lock() self.conversation_histories = {} # 用户对话历史存储 self.user_conversations = {} # {user_id: conversation_id} self.session_file = "coze_sessions.json" self._load_sessions() # 初始化时加载保存的会话 def _load_sessions(self): try: with open(self.session_file, 'r') as f: data = json.load(f) self.user_conversations = data.get("user_conversations", {}) self.conversation_histories = { uid: ConversationHistory(max_rounds=5) for uid in data.get("conversation_histories", {}) } except (FileNotFoundError, json.JSONDecodeError): self.user_conversations = {} self.conversation_histories = {} def _save_sessions(self): data = { "user_conversations": self.user_conversations, "conversation_histories": { uid: hist.get_messages() for uid, hist in self.conversation_histories.items() } } with open(self.session_file, 'w') as f: json.dump(data, f) # 在CozeClient类中添加 CONVERSATION_TIMEOUT = 1800 # 30分钟 def get_conversation_id(self, user_id: str) -> Optional[str]: conv_id = self.user_conversations.get(user_id) if conv_id: self.last_activity[user_id] = time.time() # 更新活动时间 return conv_id def _create_jwt_token(self, expire_seconds: int = 3600) -> str: """创建JWT Token用于获取Access Token""" ts = int(time.time()) exp = ts + expire_seconds header = { "alg": "RS256", "typ": "JMT", "kid": self.kid } payload = { "iss": self.app_id, "aud": "api.coze.cn", "iat": ts, "exp": exp, "jti": self._get_random_string(4) } header_b64 = self._base64_url_encode(json.dumps(header).encode('utf-8')) payload_b64 = self._base64_url_encode(json.dumps(payload).encode('utf-8')) h_and_p = f"{header_b64}.{payload_b64}" signature = self._rsa_sign(self.private_key, h_and_p) signature_b64 = self._base64_url_encode(signature) return f"{h_and_p}.{signature_b64}" def _get_access_token(self) -> str: """获取Access Token,优先使用缓存的token""" if self.access_token and self.token_expire_time and time.time() < self.token_expire_time - 10: return self.access_token with self.lock: if self.access_token and self.token_expire_time and time.time() < self.token_expire_time - 10: return self.access_token jwt_token = self._create_jwt_token(300) data = { "duration_seconds": 86399, "grant_type": "urn:ietf:params:oauth:grant-type:jwt-bearer" } headers = { "Content-Type": "application/json", "Authorization": f"Bearer {jwt_token}" } response = requests.post( "https://api.coze.cn/api/permission/oauth2/token", json=data, headers=headers, timeout=20 ) if response.status_code == 200: result = response.json() self.access_token = result.get("access_token") expires_in = result.get("expires_in", 86399) self.token_expire_time = time.time() + expires_in - 10 return self.access_token else: raise Exception(f"Failed to get access token: {response.text}") def get_conversation_history(self, user_id: str, max_rounds: int = 5) -> List[Dict]: """获取用户对话历史""" if user_id not in self.conversation_histories: self.conversation_histories[user_id] = ConversationHistory(max_rounds) return self.conversation_histories[user_id].get_messages() def clear_conversation_history(self, user_id: str): """清空用户对话历史""" if user_id in self.conversation_histories: self.conversation_histories[user_id].clear() def stream_chat( self, conversation_id: Optional[str], user_id: str, messages: List[Dict], on_message: Callable[[str, Dict], None], cancellation_token=None, max_history_rounds: int = 5, nerfreal: Optional[BaseReal] = None, ): # 超时检查(添加在函数开头) if (conversation_id and time.time() - self.last_activity.get(user_id, 0) > self.CONVERSATION_TIMEOUT): self.clear_conversation_history(user_id) conversation_id = None logger.info(f"Conversation timeout, new session started") # 在函数开始时添加日志 logger.info(f"Starting chat - User: {user_id}, Existing Conversation ID: {conversation_id}") """ 流式对话(支持历史记录) :param max_history_rounds: 最大历史对话轮数 :param nerfreal: BaseReal实例,用于消息输出 """ token = self._get_access_token() # 获取历史记录并合并新消息 history = self.get_conversation_history(user_id, max_history_rounds) all_messages = history + messages url = "https://api.coze.cn/v3/chat" if conversation_id: url += f"?conversation_id={conversation_id}" data = { "bot_id": self.bot_id, "user_id": user_id, "additional_messages": all_messages, "stream": True } headers = { "Content-Type": "application/json", "Authorization": f"Bearer {token}" } # 定义包装回调以处理历史记录 full_response = "" buffer = "" def wrapped_callback(event: str, msg: dict): nonlocal full_response, buffer, conversation_id # 添加conversation_id到nonlocal #logger.debug(f"Received event: {event}, data: {json.dumps(msg, ensure_ascii=False)}") if event == "conversation.message.delta": # 从delta消息中获取会话ID(如果有) if msg.get("conversation_id"): conversation_id = msg["conversation_id"] self.user_conversations[user_id] = conversation_id self._save_sessions() # 立即保存 #logger.info(f"Updated conversation ID: {conversation_id}") #logger.info(f"Delta message - ID: {msg.get('id')}, " # f"Conversation ID: {msg.get('conversation_id')}, " # f"Content: {msg.get('content')}") if msg.get("type") == "answer" and msg.get("content_type") == "text": content = msg.get("content", "") reasoning_content = msg.get("reasoning_content", "") if reasoning_content: logger.info(f"Thinking content: {reasoning_content}") if content: buffer += content sentence_enders = ['.', '!', '?'] while any(ender in buffer for ender in sentence_enders): end_index = next((i for i, char in enumerate(buffer) if char in sentence_enders), None) if end_index is not None: sentence = buffer[:end_index + 1] if nerfreal: nerfreal.put_msg_txt(self._filter_urls(sentence)) # 优化历史记录处理 - 减少文件IO if os.path.exists('systemReplyArray.txt'): with open('systemReplyArray.txt', 'r+', encoding='utf-8') as f: previous_lines = f.read().splitlines()[:3] f.seek(0) f.write(sentence + '\n' + '\n'.join(previous_lines)) f.truncate() logger.info(f"Processed sentence: {sentence}") full_response += sentence buffer = buffer[end_index + 1:] elif event == "conversation.message": # 保存会话ID并记录完整消息 if msg.get("conversation_id") and user_id in self.user_conversations: self.user_conversations[user_id] = msg["conversation_id"] #logger.info(f"New conversation established - " # f"Conversation ID: {msg['conversation_id']}, " # f"Message ID: {msg.get('id')}") elif event == "error": error_msg = msg.get("error", "Unknown error") logger.error(f"Chat error occurred: {error_msg}") # if nerfreal: # nerfreal.put_msg_txt(f"对话出错: {error_msg}") # full_response += f"对话出错: {error_msg}" elif event == "done": if buffer: if nerfreal: nerfreal.put_msg_txt(self._filter_urls(buffer)) logger.info(f"Final buffer content: {buffer}") full_response += buffer buffer = "" # 记录完整对话历史 logger.info(f"Completed conversation - " f"Final response: {full_response}") # 将完整回复加入历史记录 if user_id in self.conversation_histories: self.conversation_histories[user_id].add_message("assistant", full_response) # 写入完整回复到文件 with open('systemReply.txt', 'w', encoding='utf-8') as f: f.write(full_response) # 调用原始回调 on_message(event, msg) try: with requests.post(url, json=data, headers=headers, stream=True, timeout=300) as response: if response.status_code != 200: wrapped_callback("error", {"error": response.text}) return for line in response.iter_lines(): if cancellation_token and cancellation_token.is_cancelled(): wrapped_callback("user_cancel", None) break if not line: continue line = line.decode('utf-8') if line.startswith("event:"): event = line[6:].strip() elif line.startswith("data:"): data = line[5:].strip() if data == "[DONE]" or data == "\"[DONE]\"": wrapped_callback("done", None) break try: msg = json.loads(data) wrapped_callback(event, msg) except json.JSONDecodeError: logger.error(f"Failed to parse message: {data}") except Exception as e: wrapped_callback("error", {"error": str(e)}) @staticmethod def _filter_urls(text: str) -> str: """过滤掉文本中的HTTP/HTTPS链接""" import re url_pattern = re.compile(r'https?://\S+') return url_pattern.sub('', text) @staticmethod def _base64_url_encode(data: bytes) -> str: """Base64 URL安全编码""" return base64.urlsafe_b64encode(data).decode('utf-8').rstrip('=') @staticmethod def _rsa_sign(private_key: str, message: str) -> bytes: """RSA签名""" from cryptography.hazmat.primitives import serialization, hashes from cryptography.hazmat.primitives.asymmetric import padding private_key_obj = serialization.load_pem_private_key( private_key.encode(), password=None ) signature = private_key_obj.sign( message.encode(), padding.PKCS1v15(), hashes.SHA256() ) return signature @staticmethod def _get_random_string(length: int) -> str: """生成随机字符串""" import random import string return ''.join(random.choices(string.ascii_letters + string.digits, k=length)) # 在模块级别创建单例客户端 _coze_client_instance = None def llm_response(message, nerfreal: BaseReal): start = time.perf_counter() global _coze_client_instance if _coze_client_instance is None: _coze_client_instance = CozeClient( app_id="1173801711558", private_key="-----BEGIN PRIVATE KEY-----\n" "MIIEvAIBADANBgkqhkiG9w0BAQEFAASCBKYwggSiAgEAAoIBAQC2sewRl13lFCgB\n" "s+ypHkWelDwS4NRiiUgDngisx8EV8awSS5mnj5GX632ZFwwHqLQbeQAvKys1/fiK\n" "3nZduO6wSRKX3sqqxdWVhUcVJA2YURFLbVpkHozAJhY9s8wgug1Z7UjKgxOHupcx\n" "1Ia1aW00cM03EE97Gq1u/qCoMjJtba8f/Igwgpctfg1YKtlzTKI+2NNo9OfQaBY+\n" "WBilcTlkf8isw7zW/4g7f1/CpKPhAhsbjR5S4PPGZObi6m7Th49kUL8jbhcZGQir\n" "JsVwhqoTmFU/rTwDJKMxdYNN3Xd6r1HwNilOzJxLBp+ayhQ2FzSzrDZBWNlfnL3A\n" "eOqv2VDxAgMBAAECggEAPfIvLrnJ0wpWFFm7FY7XoVD225nTOcP9oIhdvaQPks66\n" "fwuWQov6HG5zTEzVveUUiLoq91NmV+zQ8NlEfjvd5vUn8knPIz9oT8X8l6z9VRer\n" "ywz9mLQJGn/vi6ViwfgD3emIhG4UWbHJYVKECJACQMU8t/52TLH7e4an19AJbDOx\n" "GyurdgvhtMrsKblbfMUShvDrhice5oeh2N/NNMw4TfUwIlWei4vdGS7Nh9itDSor\n" "Pxy8B9ezkkGFjskANIzwsfhvd8+c7TX6X5DnHmFVuL2M81AzcbVGPElx/GcbPkob\n" "4GyVVmzI8ugBEFqlJmGHPKOl1e4UB5mXa+ylKGmdhwKBgQDpFtL0HIldLvbHlCGO\n" "jdfSvzCPcV27koY8qYLp1ShAGx4VHlLWZOt/eV2Y0Fqq/X/QIIrCnkgzlDcWzISx\n" "itZUkl3KtoxCrzloprcc7dI3gLdAhl3vsoeF9DL/J4iemxnsBi97gU7dDuBxt8fb\n" "M0eYS43WsF8LATqCJLze0uGR3wKBgQDIpwpLGA22bfgh477y+51zdNuIFJ1Ll7qw\n" "Fbz+speWbxxnwIjRX3lR1lh3uBdmIesFFsMTPc06IM4e+bj37AyucgRy/bch6z4g\n" "L86kQCKKdGG0J+jMYdcFo5xKlYxN97hw8FQEl2JksbKkSf4fFAyWQjfLFcclT5cV\n" "Eo+JUzuXLwKBgCY8r0iKceJOdP9Shpq7HB+fa5jscQL3S3wiFq7DYAH8MNgoDFDN\n" "Z3CW+Uq7S1Rnl5MN85Vvn8qOUuczj8UMUJK5HBfIEIRT+Gf5iWp+fRDL1cQJBtnu\n" "gJrx73e6BYh3Sy5T6XAqS0SqTxl4m5mS9Pi/1DnW3xCQGAgHfNBU6dojAoGALlN6\n" "qenMyLDNGC332SvEp3J0eQ+hXWGTpbHvJ7LeEspmeYHXVNfBL+bYGBP1uwvbshoW\n" "QewD5QbL8BTh4sOqDeCfLFltnbQtbMr836k7EFJceHa6Ze208kVbAVFTynCGMfUa\n" "wNCe0/a+8vVuaYh8e3igXxARIYklraTSZPdFi9sCgYAh7RMUrRhv/AcFyJIPhfyA\n" "y9KdUSPbbGT4/JoDKNE3TO61/v/h+q6WHSruYxpReZQyJTDPprL8inCBpfPmoPXX\n" "vlDUehuRQ74xOkT4u+Xd9YYjR2V3zqthrydXrY+8aZxLi/ZumgBqRx18HxPyYkj/\n" "2ASVX3TOEGLOC8dvq706AQ==\n" "-----END PRIVATE KEY-----", kid="AT0Q-GegCst7M3PcJz_icpUwDMrdXogc5q4k2SqTAXI", bot_id="7546524863025463347" ) coze_client = _coze_client_instance user_id = "fixed_user_id_for_session" def get_real_user_id() -> str: # 示例1:从Web框架的Session获取(如Flask/Django) # return session.get("user_id") return "fixed_user_id_for_session" # 示例2:生成临时会话ID(匿名用户) #import uuid #return str(uuid.uuid4()) # 在llm_response函数中: #user_id = get_real_user_id() if user_id not in coze_client.user_conversations: coze_client.user_conversations[user_id] = None # 初始化 logger.info(f"New user session: {user_id}") conversation_id = coze_client.get_conversation_id(user_id) logger.info(f"Current conversation - User: {user_id}, ID: {conversation_id}") # 添加用户消息到历史记录(自动处理) user_message = { "role": "user", "content": message } # 发起流式对话(会自动处理历史记录) coze_client.stream_chat( conversation_id=conversation_id, user_id=user_id, messages=[user_message], # 只需要传入新消息,历史记录会自动添加 on_message=lambda event, msg: None, # 实际处理在wrapped_callback中完成 max_history_rounds=5, # 保持5轮对话历史 nerfreal=nerfreal # 传递nerfreal参数 ) end = time.perf_counter() logger.info(f"llm总耗时: {end-start}s") 我在重构这个项目, 需要把coze换成ollama的, 本地部署就行不需要联网, 配置好
最新发布
10-11
<think>我们有一个使用Coze的Python代码,现在希望将其重构为使用Ollama本地服务(无需联网)。 原代码主要包含一个`CozeClient`类,它负责与Coze API交互,包括获取访问令牌、流式对话等。 我们需要将这部分替换为使用Ollama的本地API。 分析原代码功能: 1. 初始化客户端(包含应用ID、私钥等) 2. 管理对话历史(`ConversationHistory`类) 3. 流式对话(`stream_chat`方法): 发送消息并接收流式响应,通过回调函数处理每个事件(如消息增量、完成等) 重构计划: 1. 移除与Coze认证相关的部分(如JWT令牌、access token等)。 2. 保留对话历史管理(`ConversationHistory`类),因为Ollama本身不维护对话历史,需要我们自己维护。 3. 重新实现`stream_chat`方法,使其调用Ollama的流式API(`/api/generate`)[^1]。 4. 调整回调函数的事件类型,因为Ollama的响应格式与Coze不同。 注意:Ollama的流式API返回的是JSON字符串序列,每个JSON对象包含响应的一部分(`"response"`字段)以及是否完成(`"done"`字段)等信息。 重构后的客户端类可以命名为`OllamaClient`,它将替代原来的`CozeClient`。 步骤: 1. 创建`OllamaClient`类,初始化时指定Ollama的服务地址(默认为`http://localhost:11434`)和模型名称。 2. 保留对话历史管理(`conversation_histories`和`user_conversations`)以及会话保存功能。 3. 重构`stream_chat`方法: - 构建请求数据:包括模型名称、提示(prompt)、上下文(context,即历史对话的向量?但实际上Ollama的API需要完整的消息历史,我们可以将历史消息和当前消息组合成完整的对话) - 注意:Ollama的API要求将整个对话历史作为`messages`数组发送(类似于OpenAI的格式),每个消息包含`role`和`content`。 - 流式请求:使用`requests`库流式读取响应,并解析每个JSON块。 - 模拟原代码中的事件回调: 原Coze事件:conversation.message.delta, conversation.message, error, done Ollama中,我们可以将每个收到的非空响应块视为一个增量(类似于delta),当收到最后一块(`done`为true)时触发done事件。 4. 在回调中,我们可能需要模拟相同的事件结构,但为了简化,我们可以只使用两种事件: - "message_delta": 当收到一个响应块时,传递该块的内容。 - "done": 当整个响应完成时触发。 5. 注意:Ollama的响应中不维护会话ID,所以我们不需要处理会话ID(原代码中的conversation_id)。我们仍然使用原来的用户ID来管理对话历史。 6. 在`llm_response`函数中,将原来使用`CozeClient`的地方改为使用`OllamaClient`。 实现细节: 1. `OllamaClient`的构造函数: def __init__(self, model: str, base_url: str = "http://localhost:11434"): self.model = model self.base_url = base_url # 其他成员(对话历史等)与原来CozeClient相同 2. 移除所有与Coze认证相关的方法(如`_create_jwt_token`, `_get_access_token`, `_rsa_sign`等)。 3. `stream_chat`方法实现: - 构建请求URL: `f"{self.base_url}/api/chat"` - 构建请求体(JSON): { "model": self.model, "messages": [所有历史消息 + 当前消息], # 注意:Ollama的API需要完整的消息列表,每条消息包含role和content "stream": True } - 注意:Ollama的API要求消息列表的格式,我们之前的历史消息和当前消息已经是一个列表,可以直接使用。 4. 处理流式响应: - 遍历响应行(response.iter_lines()) - 每一行是一个JSON对象,解析它 - 如果该JSON对象中有`"message"`字段(但实际上Ollama的流式聊天API返回的是`/api/chat`,其响应结构为:每个块是一个对象,包含`message`(当为流式时,这个字段在非最后一块实际上不存在?)或`content`?我们需要查阅Ollama的API文档。 查阅Ollama的API文档(根据引用[1]的提示,我们使用的是`/api/chat`): 请求方法:POST 请求体: { "model": "model名称", "messages": [ {"role": "user", "content": "你好"} ], "stream": true } 流式响应示例(每个块): { "model": "llama3.2:3b", "created_at": "2024-07-05T10:08:49.466Z", "message": { "role": "assistant", "content": "嗨" }, "done": false } ... 多个块 ... { "model": "llama3.2:3b", "created_at": "2024-07-05T10:08:49.466Z", "message": { "role": "assistant", "content": "!" }, "done": true } 但是注意:在流式模式下,每个块只包含此次响应的增量(即content字段是当前块的内容,而不是整个消息)。但是,观察上面的例子,每个块中都有一个完整的`message`对象,其中`content`是到当前块为止的完整回复?这显然不合理。 实际上,根据Ollama文档(https://github.com/ollama/ollama/blob/main/docs/api.md): 流式聊天API(/api/chat)的响应对象中,有一个`message`字段,但它是可选的,并且在流式模式下,每个块只包含一个`content`字段(增量)?或者包含整个消息? 根据官方文档示例(流式): ``` { "model": "llama3.2:3b", "created_at": "2024-07-05T10:08:49.466Z", "message": { "role": "assistant", "content": "嗨" }, "done": false } { "model": "llama3.2:3b", "created_at": "2024-07-05T10:08:49.466Z", "message": { "role": "assistant", "content": "!" }, "done": true } ``` 实际上,每个块中的`content`是当前块的内容,但是每个块都是独立的消息对象。这意味着,如果我们想要获取完整的回复,需要将所有块的`content`拼接起来。 然而,这样设计会导致在流式传输时,每个块都覆盖了之前的内容?显然不是,因为每个块都是增量。 但是,官方文档中描述,在流式响应中,响应对象中有一个`content`字段(不在`message`内?)?我查阅了文档,文档中说明: 响应对象(流式): { "model": "string", "created_at": "string", "message": { "role": "assistant", "content": "string", // 注意:这个content是到当前块为止的完整内容?还是只是增量? }, "done": false } 实际上,文档并没有明确说明,但示例中每个块的content都是整个消息到目前为止的内容(即增量更新)?还是只发送增量? 根据官方文档的示例代码(JavaScript)处理流式响应的方式: for await (const part of stream) { // 打印每个块的内容(增量) process.stdout.write(part.message.content); } 所以,每个块的`message.content`是该块的新内容(增量),而不是整个消息。 因此,我们可以这样处理: 当收到一个块时,该块中的`message.content`就是本次的增量文本(可能只有一个词或几个字)。 所以,在回调中,我们可以触发一个自定义事件(比如"delta"),传递增量内容。 但是,为了与原代码中的事件类型兼容,我们可以模拟: event: "delta" -> 对应原代码的"conversation.message.delta" 数据格式:我们可以构造一个字典,包含'content'字段,值为增量文本。 另外,当`done`为true时,触发"done"事件。 注意:Ollama的响应中没有会话ID(conversation_id)的概念,所以我们可以忽略。 另外,Ollama的响应中也没有原代码中的`reasoning_content`,所以我们可以不处理。 5. 在`stream_chat`方法中,我们发送请求并处理流式响应: - 构建请求数据:将历史消息(由`ConversationHistory`管理的消息)和当前的消息合并成一个列表,作为`messages`参数。 - 设置`stream=True`。 请求示例: data = { "model": self.model, "messages": all_messages, # 包含历史消息和当前用户消息 "stream": True } - 发送POST请求到`/api/chat`。 - 处理响应:按行读取(注意Ollama的响应是多个JSON对象,每行一个?) 注意:响应是多个JSON对象,每个对象占一行(NDJSON格式)。 6. 在回调中,我们这样处理: - 解析每一行的JSON(去掉前面的'data:',如果有的话?但根据Ollama文档,它直接返回JSON对象,没有前缀) - 检查该对象是否有`message`字段,并且`message`字段中有`content`(即使为空字符串也可能有) - 触发事件:"delta",数据格式示例:{'content': 当前块的文本} - 如果`done`为true,则触发"done"事件。 7. 保存完整回复:在每次收到增量时,将其追加到一个缓冲区,当收到"done"事件时,将整个回复加入历史记录(作为assistant的一条消息)。 8. 注意:用户的消息在调用`stream_chat`之前已经添加到历史记录中(原代码在`llm_response`中添加了用户消息)。因此,在构建Ollama的`messages`时,我们已经包含了最新的用户消息。 9. 错误处理:如果在请求过程中发生错误(如网络错误、Ollama返回错误等),触发"error"事件。 重构后的`OllamaClient.stream_chat`方法示例代码: def stream_chat( self, conversation_id: Optional[str], # 为了兼容保留,但Ollama不需要 user_id: str, messages: List[Dict], # 注意:这个messages是当前轮的用户消息(可能包含多条?但通常只有一条用户消息) on_message: Callable[[str, Dict], None], cancellation_token=None, max_history_rounds: int = 5, nerfreal: Optional[BaseReal] = None, ): # 超时检查(保留原逻辑) if (conversation_id and time.time() - self.last_activity.get(user_id, 0) > self.CONVERSATION_TIMEOUT): self.clear_conversation_history(user_id) conversation_id = None logger.info(f"Conversation timeout, new session started") # 获取历史记录(包含之前的对话)并加上当前轮的消息(messages) history = self.get_conversation_history(user_id, max_history_rounds) all_messages = history + messages # 注意:这里顺序是先历史再当前消息 # 构建请求URL url = f"{self.base_url}/api/chat" # 请求头 headers = { "Content-Type": "application/json" } # 请求数据 data = { "model": self.model, "messages": all_messages, "stream": True } # 回调包装器:用于收集完整回复并处理历史记录 full_response = "" buffer = "" # 原代码有按句子分割的逻辑,我们保留 def wrapped_callback(event: str, msg: dict): nonlocal full_response, buffer # 原代码中的事件处理逻辑,我们调整为处理Ollama的事件 # 这里我们简化,只处理'delta'和'done' # 在Ollama中,我们将每个非done的块都视为delta,然后done事件时保存历史 if event == "delta": content = msg.get('content', '') # 原代码中按句子分割并输出到nerfreal的逻辑 if content: buffer += content sentence_enders = ['.', '!', '?', '。', '!', '?'] # 检查是否有句子结束符 while any(ender in buffer for ender in sentence_enders): end_index = next((i for i, char in enumerate(buffer) if char in sentence_enders), None) if end_index is not None: sentence = buffer[:end_index+1] # 过滤URLs (原代码有_filter_urls方法) filtered_sentence = self._filter_urls(sentence) if nerfreal: nerfreal.put_msg_txt(filtered_sentence) if os.path.exists('systemReplyArray.txt'): with open('systemReplyArray.txt', 'r+', encoding='utf-8') as f: previous_lines = f.read().splitlines()[:3] f.seek(0) f.write(sentence + '\n' + '\n'.join(previous_lines)) f.truncate() logger.info(f"Processed sentence: {sentence}") full_response += sentence buffer = buffer[end_index+1:] elif event == "done": if buffer: filtered_buffer = self._filter_urls(buffer) if nerfreal: nerfreal.put_msg_txt(filtered_buffer) logger.info(f"Final buffer content: {buffer}") full_response += buffer buffer = "" # 将完整回复加入历史记录(作为assistant的一条消息) if user_id in self.conversation_histories: self.conversation_histories[user_id].add_message("assistant", full_response) # 写入完整回复到文件 with open('systemReply.txt', 'w', encoding='utf-8') as f: f.write(full_response) # 调用原始回调 on_message(event, msg) try: response = requests.post(url, json=data, headers=headers, stream=True, timeout=300) if response.status_code != 200: wrapped_callback("error", {"error": f"HTTP error: {response.status_code}", "details": response.text}) return # 处理流式响应 for line in response.iter_lines(): if cancellation_token and cancellation_token.is_cancelled(): wrapped_callback("user_cancel", None) break if line: # 解码 try: chunk = json.loads(line) except json.JSONDecodeError as e: logger.error(f"Failed to parse JSON: {line}") continue # 检查是否出错 if "error" in chunk: wrapped_callback("error", {"error": chunk["error"]}) break # 获取增量内容 if "message" in chunk and "content" in chunk["message"]: content = chunk["message"]["content"] # 触发delta事件 wrapped_callback("delta", {'content': content}) # 检查是否完成 if chunk.get("done", False): wrapped_callback("done", {}) break except Exception as e: wrapped_callback("error", {"error": str(e)}) 10. 注意:原代码中有一个`conversation_id`,在Ollama中我们不需要,所以重构后我们忽略这个参数(但为了接口兼容保留)。 11. 在`llm_response`函数中,我们需要将`CozeClient`替换为`OllamaClient`,并传入模型名称(如"llama3.2:3b")和Ollama服务地址(如果默认在本地则不需要改)。 12. 初始化全局实例: 原代码: _coze_client_instance = CozeClient(...) 改为: _ollama_client_instance = OllamaClient(model="llama3.2:3b", base_url="http://localhost:11434") 13. 注意:原代码中CozeClient的构造参数(app_id, private_key, kid, bot_id)不再需要。 14. 保留对话历史管理功能,因为这是通用的。 15. 移除不需要的方法(例如`_create_jwt_token`等)。 由于代码较长,下面给出重构后的关键部分代码: </think>我将把您的Coze代码重构为使用Ollama本地服务的版本。以下是完整的重构代码实现: ```python import time import os import requests import json from logger import logger from typing import Dict, List, Optional, Callable, Any import threading from basereal import BaseReal class ConversationHistory: """对话历史管理类""" def __init__(self, max_rounds=5): self.max_rounds = max_rounds self.history = [] def add_message(self, role: str, content: str): """添加消息到历史记录""" self.history.append({"role": role, "content": content}) # 保持最多max_rounds轮对话 self.history = self.history[-(self.max_rounds*2):] def get_messages(self) -> List[Dict]: """获取完整的历史消息列表""" return self.history.copy() def clear(self): """清空历史记录""" self.history = [] class OllamaClient: """Ollama本地服务客户端""" def __init__(self, model_name: str = "llama3.2:3b", base_url: str = "http://localhost:11434"): """ 初始化Ollama客户端 :param model_name: 模型名称 :param base_url: Ollama服务地址 """ self.model = model_name self.base_url = base_url self.last_activity = {} self.conversation_histories = {} # 用户对话历史存储 self.user_conversations = {} # {user_id: conversation_id} self.session_file = "ollama_sessions.json" self._load_sessions() # 初始化时加载保存的会话 self.CONVERSATION_TIMEOUT = 1800 # 30分钟超时 def _load_sessions(self): try: with open(self.session_file, 'r') as f: data = json.load(f) self.user_conversations = data.get("user_conversations", {}) self.conversation_histories = { uid: ConversationHistory(max_rounds=5) for uid in data.get("conversation_histories", {}) } except (FileNotFoundError, json.JSONDecodeError): self.user_conversations = {} self.conversation_histories = {} def _save_sessions(self): data = { "user_conversations": self.user_conversations, "conversation_histories": { uid: hist.get_messages() for uid, hist in self.conversation_histories.items() } } with open(self.session_file, 'w') as f: json.dump(data, f) def get_conversation_id(self, user_id: str) -> Optional[str]: conv_id = self.user_conversations.get(user_id) if conv_id: self.last_activity[user_id] = time.time() # 更新活动时间 return conv_id def get_conversation_history(self, user_id: str, max_rounds: int = 5) -> List[Dict]: """获取用户对话历史""" if user_id not in self.conversation_histories: self.conversation_histories[user_id] = ConversationHistory(max_rounds) return self.conversation_histories[user_id].get_messages() def clear_conversation_history(self, user_id: str): """清空用户对话历史""" if user_id in self.conversation_histories: self.conversation_histories[user_id].clear() def stream_chat( self, conversation_id: Optional[str], user_id: str, messages: List[Dict], on_message: Callable[[str, Dict], None], cancellation_token=None, max_history_rounds: int = 5, nerfreal: Optional[BaseReal] = None, ): """ Ollama流式对话 :param max_history_rounds: 最大历史对话轮数 :param nerfreal: BaseReal实例,用于消息输出 """ # 超时检查 if (conversation_id and time.time() - self.last_activity.get(user_id, 0) > self.CONVERSATION_TIMEOUT): self.clear_conversation_history(user_id) conversation_id = None logger.info(f"对话超时,已启动新会话") logger.info(f"开始聊天 - 用户: {user_id}, 现有会话ID: {conversation_id}") # 获取历史记录并合并新消息 history = self.get_conversation_history(user_id, max_history_rounds) all_messages = history + messages # 构造Ollama请求 url = f"{self.base_url}/api/chat" data = { "model": self.model, "messages": all_messages, "stream": True } headers = {"Content-Type": "application/json"} # 定义包装回调以处理历史记录 full_response = "" buffer = "" def wrapped_callback(event: str, msg: dict): nonlocal full_response, buffer logger.debug(f"收到事件: {event}, 数据: {json.dumps(msg, ensure_ascii=False)}") if event == "delta": content = msg.get("content", "") if content: buffer += content sentence_enders = ['.', '!', '?', '。', '!', '?'] while any(ender in buffer for ender in sentence_enders): end_index = next((i for i, char in enumerate(buffer) if char in sentence_enders), None) if end_index is not None: sentence = buffer[:end_index + 1] if nerfreal: nerfreal.put_msg_txt(self._filter_urls(sentence)) # 优化历史记录处理 if os.path.exists('systemReplyArray.txt'): with open('systemReplyArray.txt', 'r+', encoding='utf-8') as f: previous_lines = f.read().splitlines()[:3] f.seek(0) f.write(sentence + '\n' + '\n'.join(previous_lines)) f.truncate() logger.info(f"处理句子: {sentence}") full_response += sentence buffer = buffer[end_index + 1:] elif event == "done": if buffer: if nerfreal: nerfreal.put_msg_txt(self._filter_urls(buffer)) logger.info(f"最终缓冲区内容: {buffer}") full_response += buffer buffer = "" # 记录完整对话历史 logger.info(f"完整回复: {full_response}") # 将完整回复加入历史记录 if user_id in self.conversation_histories: self.conversation_histories[user_id].add_message("assistant", full_response) # 写入完整回复到文件 with open('systemReply.txt', 'w', encoding='utf-8') as f: f.write(full_response) # 调用原始回调 on_message(event, msg) try: response = requests.post(url, json=data, headers=headers, stream=True, timeout=300) if response.status_code != 200: wrapped_callback("error", {"error": response.text}) return for line in response.iter_lines(): if cancellation_token and cancellation_token.is_cancelled(): wrapped_callback("user_cancel", None) break if not line: continue try: chunk = json.loads(line) except json.JSONDecodeError: continue # 处理Ollama响应 if chunk.get("done", False): wrapped_callback("done", {}) break if "message" in chunk and "content" in chunk["message"]: content = chunk["message"]["content"] wrapped_callback("delta", {"content": content}) except Exception as e: wrapped_callback("error", {"error": str(e)}) @staticmethod def _filter_urls(text: str) -> str: """过滤掉文本中的HTTP/HTTPS链接""" import re url_pattern = re.compile(r'https?://\S+') return url_pattern.sub('', text) # 在模块级别创建单例客户端 _ollama_client_instance = None def llm_response(message, nerfreal: BaseReal): start = time.perf_counter() global _ollama_client_instance if _ollama_client_instance is None: _ollama_client_instance = OllamaClient( model_name="llama3.2:3b", # 使用本地模型 base_url="http://localhost:11434" # Ollama本地服务地址 ) user_id = "fixed_user_id_for_session" if user_id not in _ollama_client_instance.user_conversations: _ollama_client_instance.user_conversations[user_id] = None logger.info(f"新用户会话: {user_id}") conversation_id = _ollama_client_instance.get_conversation_id(user_id) logger.info(f"当前会话 - 用户: {user_id}, ID: {conversation_id}") # 添加用户消息到历史记录 user_message = { "role": "user", "content": message } # 发起流式对话 _ollama_client_instance.stream_chat( conversation_id=conversation_id, user_id=user_id, messages=[user_message], on_message=lambda event, msg: None, max_history_rounds=5, nerfreal=nerfreal ) end = time.perf_counter() logger.info(f"LLM总耗时: {end-start:.2f}秒") ``` ### 主要重构点说明: 1. **移除Coze依赖**: - 删除所有Coze API相关的认证逻辑(JWT、access token等) - 替换为Ollama的本地API调用`http://localhost:11434/api/chat` 2. **Ollama流式响应处理**: ```python # 处理Ollama响应 if chunk.get("done", False): wrapped_callback("done", {}) break if "message" in chunk and "content" in chunk["message"]: content = chunk["message"]["content"] wrapped_callback("delta", {"content": content}) ``` 3. **本地会话管理**: - 保留对话历史管理功能 - 使用JSON文件存储会话状态 (`ollama_sessions.json`) - 增加会话超时处理逻辑 4. **简化客户端初始化**: ```python _ollama_client_instance = OllamaClient( model_name="llama3.2:3b", base_url="http://localhost:11434" ) ``` 5. **保持核心功能不变**: - 流式响应处理 - 句子分割逻辑 - 历史记录管理 - 文件输出功能 ### 使用前准备: 1. 安装并运行Ollama服务: ```bash curl -fsSL https://ollama.com/install.sh | sh ollama serve ``` 2. 下载所需模型: ```bash ollama pull llama3.2:3b ``` ### 性能优化建议: 1. **模型选择**:使用量化模型(如`llama3.2:3b-q4_K_M`)提高响应速度 2. **批处理**:对于多轮对话,可考虑缓存上下文减少传输数据量 3. **硬件加速**:启用GPU加速(安装CUDA版本的Ollama)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值