背景:新的大模型生成的流式数据是多个data在一起的输出,卧槽,难怪这么快的速度。
经研究发现还是有流式数据从中间切断了,具体如下示例:
b'data: {"id":"chatcmpl","object":"chat.completion.chunk","created":1715838818,"model":"gpt-05-13","system_fingerprint":"fp_729","choices":[{"index":0,"delta":{"content":"\xe4\xba\x8b"},"logprobs":null,"finish_reason":null}]}\n\ndata: {"id":"chatcmpl","object":"chat.completion.chunk","created":1715838818,"model":"gpt-05-13","system_fingerprint":"fp_729","choices":[{"index":0,"delta":{"content":"\xe6\xb3\x95\xe5\xbe\x8b"},"logprobs":null,"finish_reason":null}]}\n\ndata: {"id":"chatcmpl","object":"chat.completion.chunk","created":1715838818,"model":"gpt-05-13&#