spring-ai-openai调用Xinference1.4.1报错

1、Xinference 报错logs

此处是调用  /v1/chat/completions  接口

2025-04-06 15:48:51 xinference | return await dependant.call(**values)
2025-04-06 15:48:51 xinference | File "/usr/local/lib/python3.10/dist-packages/xinference/api/restful_api.py", line 1945, in create_chat_completion
2025-04-06 15:48:51 xinference | raw_body = await request.json()
2025-04-06 15:48:51 xinference | File "/usr/local/lib/python3.10/dist-packages/starlette/requests.py", line 252, in json
2025-04-06 15:48:51 xinference | self._json = json.loads(body)
2025-04-06 15:48:51 xinference | File "/usr/lib/python3.10/json/init.py", line 346, in loads
2025-04-06 15:48:51 xinference | return _default_decoder.decode(s)
2025-04-06 15:48:51 xinference | File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
2025-04-06 15:48:51 xinference | obj, end = self.raw_decode(s, idx=_w(s, 0).end())
2025-04-06 15:48:51 xinference | File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
2025-04-06 15:48:51 xinference | raise JSONDecodeError("Expecting value", s, err.value) from None
2025-04-06 15:48:51 xinference | json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

2、使用python openai客户端调用 正常

3、使用Wireshark 抓包 ,发现问题

openai 调用抓包如下

Hypertext Transfer Protocol
POST /v1/chat/completions HTTP/1.1\r\n
Request Method: POST
Request URI: /v1/chat/completions
Request Version: HTTP/1.1
Host: localhost:9997\r\n
Accept-Encoding: gzip, deflate\r\n
Connection: keep-alive\r\n
Accept: application/json\r\n
Content-Type: application/json\r\n
User-Agent: OpenAI/Python 1.70.0\r\n
X-Stainless-Lang: python\r\n
X-Stainless-Package-Version: 1.70.0\r\n
X-Stainless-OS: Windows\r\n
X-Stainless-Arch: other:amd64\r\n
X-Stainless-Runtime: CPython\r\n
X-Stainless-Runtime-Version: 3.11.9\r\n
Authorization: Bearer not empty\r\n
X-Stainless-Async: false\r\n
x-stainless-retry-count: 0\r\n
x-stainless-read-timeout: 600\r\n
Content-Length: 95\r\n
\r\n
[Response in frame: 61]
[Full request URI: http://localhost:9997/v1/chat/completions]
File Data: 95 bytes
JavaScript Object Notation: application/jsonJSON raw form:{"messages": [{"content": "你是谁","role": "user"}],"model": "qwen2-instruct","max_tokens": 1024}

spring-ai调用抓包如下 

Hypertext Transfer Protocol, has 2 chunks (including last chunk)
POST /v1/chat/completions HTTP/1.1\r\n
Request Method: POST
Request URI: /v1/chat/completions
Request Version: HTTP/1.1
Connection: Upgrade, HTTP2-Settings\r\n
Host: 192.168.3.100:9997\r\n
HTTP2-Settings: AAEAAEAAAAIAAAAAAAMAAAAAAAQBAAAAAAUAAEAAAAYABgAA\r\n
Settings - Header table size : 16384
Settings Identifier: Header table size (1)
Header table size: 16384
Settings - Enable PUSH : 0
Settings Identifier: Enable PUSH (2)
Enable PUSH: 0
Settings - Max concurrent streams : 0
Settings Identifier: Max concurrent streams (3)
Max concurrent streams: 0
Settings - Initial Windows size : 16777216
Settings Identifier: Initial Windows size (4)
Initial Window Size: 16777216
Settings - Max frame size : 16384
Settings Identifier: Max frame size (5)
Max frame size: 16384
Settings - Max header list size : 393216
Settings Identifier: Max header list size (6)
Max header list size: 393216
Transfer-encoding: chunked\r\n
Upgrade: h2c\r\n
User-Agent: Java-http-client/17.0.14\r\n
Authorization: Bearer not empty\r\n
Content-Type: application/json\r\n
\r\n
[Full request URI: http://192.168.3.100:9997/v1/chat/completions]
HTTP chunked response
File Data: 143 bytes
JavaScript Object Notation: application/jsonJSON raw form:{"messages": [{"content": "你好,介绍下你自己!","role": "user"}],"model": "qwen2-instruct","stream": false,"temperature": 0.7,"top_p": 0.7}

发现问题,spring-ai 升级为Http2了,百度下貌似 Xinference 不支持 http2

代码debug

发现  

OpenAiChatModel类的  OpenAiApi使用了
import org.springframework.web.client.RestClient;
import org.springframework.web.reactive.function.client.WebClient;private final RestClient restClient;private final WebClient webClient;

这两个Http请求的使用的是jdk自带的  jdk.internal.net.http.HttpClientImpl,默认会使用http2

4、修改  OpenAiApi 类 ,下载spring-ai 源码 ,找到  

spring-ai-openai

路径如下 

\spring-ai\models\spring-ai-openai

OpenAiApi.java改动后如下

/** Copyright 2023-2025 the original author or authors.** Licensed under the Apache License, Version 2.0 (the "License");* you may not use this file except in compliance with the License.* You may obtain a copy of the License at**      https://www.apache.org/licenses/LICENSE-2.0** Unless required by applicable law or agreed to in writing, software* distributed under the License is distributed on an "AS IS" BASIS,* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.* See the License for the specific language governing permissions and* limitations under the License.*/package org.springframework.ai.openai.api;import java.util.List;
import java.util.Map;
import java.util.concurrent.atomic.AtomicBoolean;
import java.util.function.Consumer;
import java.util.function.Predicate;import com.fasterxml.jackson.annotation.JsonFormat;
import com.fasterxml.jackson.annotation.JsonIgnore;
import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import com.fasterxml.jackson.annotation.JsonInclude;
import com.fasterxml.jackson.annotation.JsonInclude.Include;
import com.fasterxml.jackson.annotation.JsonProperty;
import reactor.core.publisher.Flux;
import reactor.core.publisher.Mono;import org.springframework.ai.model.ApiKey;
import org.springframework.ai.model.ChatModelDescription;
import org.springframework.ai.model.ModelOptionsUtils;
import org.springframework.ai.model.NoopApiKey;
import org.springframework.ai.model.SimpleApiKey;
import org.springframework.ai.openai.api.common.OpenAiApiConstants;
import org.springframework.ai.retry.RetryUtils;
import org.springframework.core.ParameterizedTypeReference;
import org.springframework.http.HttpHeaders;
import org.springframework.http.MediaType;
import org.springframework.http.ResponseEntity;
import org.springframework.util.Assert;
import org.springframework.util.CollectionUtils;
import org.springframework.util.LinkedMultiValueMap;
import org.springframework.util.MultiValueMap;
import org.springframework.web.client.ResponseErrorHandler;
import org.springframework.web.client.RestClient;
import org.springframework.web.reactive.function.client.WebClient;
import okhttp3.ConnectionPool;
import okhttp3.OkHttpClient;
import org.springframework.http.client.ClientHttpRequestFactory;
import org.springframework.http.client.OkHttp3ClientHttpRequestFactory;
import java.util.concurrent.TimeUnit;
import java.time.Duration;
import io.netty.channel.ChannelOption;
import reactor.netty.http.client.HttpClient;
import org.springframework.http.client.reactive.ReactorClientHttpConnector;/*** Single class implementation of the* <a href="https://platform.openai.com/docs/api-reference/chat">OpenAI Chat Completion* API</a> and <a href="https://platform.openai.com/docs/api-reference/embeddings">OpenAI* Embedding API</a>.** @author Christian Tzolov* @author Michael Lavelle* @author Mariusz Bernacki* @author Thomas Vitale* @author David Frizelle* @author Alexandros Pappas*/
public class OpenAiApi {public static Builder builder() {return new Builder();}public static final OpenAiApi.ChatModel DEFAULT_CHAT_MODEL = ChatModel.GPT_4_O;public static final String DEFAULT_EMBEDDING_MODEL = EmbeddingModel.TEXT_EMBEDDING_ADA_002.getValue();private static final Predicate<String> SSE_DONE_PREDICATE = "[DONE]"::equals;private final String completionsPath;private final String embeddingsPath;private final RestClient restClient;private final WebClient webClient;private OpenAiStreamFunctionCallingHelper chunkMerger = new OpenAiStreamFunctionCallingHelper();/*** Create a new chat completion api.* @param baseUrl api base URL.* @param apiKey OpenAI apiKey.* @param headers the http headers to use.* @param completionsPath the path to the chat completions endpoint.* @param embeddingsPath the path to the embeddings endpoint.* @param restClientBuilder RestClient builder.* @param webClientBuilder WebClient builder.* @param responseErrorHandler Response error handler.*/public OpenAiApi(String baseUrl, ApiKey apiKey, MultiValueMap<String, String> headers, String completionsPath,String embeddingsPath, RestClient.Builder restClientBuilder, WebClient.Builder webClientBuilder,ResponseErrorHandler responseErrorHandler) {Assert.hasText(completionsPath, "Completions Path must not be null");Assert.hasText(embeddingsPath, "Embeddings Path must not be null");Assert.notNull(headers, "Headers must not be null");this.completionsPath = completionsPath;this.embeddingsPath = embeddingsPath;// @formatter:offConsumer<HttpHeaders> finalHeaders = h -> {if(!(apiKey instanceof NoopApiKey)) {h.setBearerAuth(apiKey.getValue());}h.setContentType(MediaType.APPLICATION_JSON);h.addAll(headers);};OkHttpClient okHttpClient = new OkHttpClient.Builder().connectTimeout(120, TimeUnit.SECONDS)   // 连接超时.readTimeout(120, TimeUnit.SECONDS)   // 读取超时.connectionPool(new ConnectionPool(100, 10, TimeUnit.MINUTES)).build();ClientHttpRequestFactory requestFactory = new OkHttp3ClientHttpRequestFactory(okHttpClient);this.restClient = restClientBuilder.baseUrl(baseUrl).defaultHeaders(finalHeaders).requestFactory(requestFactory).defaultStatusHandler(responseErrorHandler).build();// 1. 创建 Reactor Netty 的 HttpClient 实例HttpClient reactorHttpClient = HttpClient.create().responseTimeout(Duration.ofSeconds(1000))      // 响应超时配置.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 100000); // 连接超时配置ReactorClientHttpConnector clientHttpConnector = new ReactorClientHttpConnector(reactorHttpClient);this.webClient = webClientBuilder.clientConnector(clientHttpConnector).baseUrl(baseUrl).defaultHeaders(finalHeaders).build(); // @formatter:on}public static String getTextContent(List<ChatCompletionMessage.MediaContent> content) {return content.stream().filter(c -> "text".equals(c.type())).map(ChatCompletionMessage.MediaContent::text).reduce("", (a, b) -> a + b);}/*** Creates a model response for the given chat conversation.* @param chatRequest The chat completion request.* @return Entity response with {@link ChatCompletion} as a body and HTTP status code* and headers.*/public ResponseEntity<ChatCompletion> chatCompletionEntity(ChatCompletionRequest chatRequest) {return chatCompletionEntity(chatRequest, new LinkedMultiValueMap<>());}/*** Creates a model response for the given chat conversation.* @param chatRequest The chat completion request.* @param additionalHttpHeader Optional, additional HTTP headers to be added to the* request.* @return Entity response with {@link ChatCompletion} as a body and HTTP status code* and headers.*/public ResponseEntity<ChatCompletion> chatCompletionEntity(ChatCompletionRequest chatRequest,MultiValueMap<String, String> additionalHttpHeader) {Assert.notNull(chatRequest, "The request body can not be null.");Assert.isTrue(!chatRequest.stream(), "Request must set the stream property to false.");Assert.notNull(additionalHttpHeader, "The additional HTTP headers can not be null.");return this.restClient.post().uri(this.completionsPath).headers(headers -> headers.addAll(additionalHttpHeader)).body(chatRequest).retrieve().toEntity(ChatCompletion.class);}/*** Creates a streaming chat response for the given chat conversation.* @param chatRequest The chat completion request. Must have the stream property set* to true.* @return Returns a {@link Flux} stream from chat completion chunks.*/public Flux<ChatCompletionChunk> chatCompletionStream(ChatCompletionRequest chatRequest) {return chatCompletionStream(chatRequest, new LinkedMultiValueMap<>());}/*** Creates a streaming chat response for the given chat conversation.* @param chatRequest The chat completion request. Must have the stream property set* to true.* @param additionalHttpHeader Optional, additional HTTP headers to be added to the* request.* @return Returns a {@link Flux} stream from chat completion chunks.*/public Flux<ChatCompletionChunk> chatCompletionStream(ChatCompletionRequest chatRequest,MultiValueMap<String, String> additionalHttpHeader) {Assert.notNull(chatRequest, "The request body can not be null.");Assert.isTrue(chatRequest.stream(), "Request must set the stream property to true.");AtomicBoolean isInsideTool = new AtomicBoolean(false);return this.webClient.post().uri(this.completionsPath).headers(headers -> headers.addAll(additionalHttpHeader)).body(Mono.just(chatRequest), ChatCompletionRequest.class).retrieve().bodyToFlux(String.class)// cancels the flux stream after the "[DONE]" is received..takeUntil(SSE_DONE_PREDICATE)// filters out the "[DONE]" message..filter(SSE_DONE_PREDICATE.negate()).map(content -> ModelOptionsUtils.jsonToObject(content, ChatCompletionChunk.class))// Detect is the chunk is part of a streaming function call..map(chunk -> {if (this.chunkMerger.isStreamingToolFunctionCall(chunk)) {isInsideTool.set(true);}return chunk;})// Group all chunks belonging to the same function call.// Flux<ChatCompletionChunk> -> Flux<Flux<ChatCompletionChunk>>.windowUntil(chunk -> {if (isInsideTool.get() && this.chunkMerger.isStreamingToolFunctionCallFinish(chunk)) {isInsideTool.set(false);return true;}return !isInsideTool.get();})// Merging the window chunks into a single chunk.// Reduce the inner Flux<ChatCompletionChunk> window into a single// Mono<ChatCompletionChunk>,// Flux<Flux<ChatCompletionChunk>> -> Flux<Mono<ChatCompletionChunk>>.concatMapIterable(window -> {Mono<ChatCompletionChunk> monoChunk = window.reduce(new ChatCompletionChunk(null, null, null, null, null, null, null, null),(previous, current) -> this.chunkMerger.merge(previous, current));return List.of(monoChunk);})// Flux<Mono<ChatCompletionChunk>> -> Flux<ChatCompletionChunk>.flatMap(mono -> mono);}/*** Creates an embedding vector representing the input text or token array.* @param embeddingRequest The embedding request.* @return Returns list of {@link Embedding} wrapped in {@link EmbeddingList}.* @param <T> Type of the entity in the data list. Can be a {@link String} or* {@link List} of tokens (e.g. Integers). For embedding multiple inputs in a single* request, You can pass a {@link List} of {@link String} or {@link List} of* {@link List} of tokens. For example:** <pre>{@code List.of("text1", "text2", "text3") or List.of(List.of(1, 2, 3), List.of(3, 4, 5))} </pre>*/public <T> ResponseEntity<EmbeddingList<Embedding>> embeddings(EmbeddingRequest<T> embeddingRequest) {Assert.notNull(embeddingRequest, "The request body can not be null.");// Input text to embed, encoded as a string or array of tokens. To embed multiple// inputs in a single// request, pass an array of strings or array of token arrays.Assert.notNull(embeddingRequest.input(), "The input can not be null.");Assert.isTrue(embeddingRequest.input() instanceof String || embeddingRequest.input() instanceof List,"The input must be either a String, or a List of Strings or List of List of integers.");// The input must not exceed the max input tokens for the model (8192 tokens for// text-embedding-ada-002), cannot// be an empty string, and any array must be 2048 dimensions or less.if (embeddingRequest.input() instanceof List list) {Assert.isTrue(!CollectionUtils.isEmpty(list), "The input list can not be empty.");Assert.isTrue(list.size() <= 2048, "The list must be 2048 dimensions or less");Assert.isTrue(list.get(0) instanceof String || list.get(0) instanceof Integer || list.get(0) instanceof List,"The input must be either a String, or a List of Strings or list of list of integers.");}return this.restClient.post().uri(this.embeddingsPath).body(embeddingRequest).retrieve().toEntity(new ParameterizedTypeReference<>() {});}/*** OpenAI Chat Completion Models.* <p>* This enum provides a selective list of chat completion models available through the* OpenAI API, along with their key features and links to the official OpenAI* documentation for further details.* <p>* The models are grouped by their capabilities and intended use cases. For each* model, a brief description is provided, highlighting its strengths, limitations,* and any specific features. When available, the description also includes* information about the model's context window, maximum output tokens, and knowledge* cutoff date.* <p>* <b>References:</b>* <ul>* <li><a href="https://platform.openai.com/docs/models#gpt-4o">GPT-4o</a></li>* <li><a href="https://platform.openai.com/docs/models#gpt-4-and-gpt-4-turbo">GPT-4* and GPT-4 Turbo</a></li>* <li><a href="https://platform.openai.com/docs/models#gpt-3-5-turbo">GPT-3.5* Turbo</a></li>* <li><a href="https://platform.openai.com/docs/models#o1-and-o1-mini">o1 and* o1-mini</a></li>* <li><a href="https://platform.openai.com/docs/models#o3-mini">o3-mini</a></li>* </ul>*/public enum ChatModel implements ChatModelDescription {/*** <b>o1</b> is trained with reinforcement learning to perform complex reasoning.* It thinks before it answers, producing a long internal chain of thought before* responding to the user.* <p>* The latest o1 model supports both text and image inputs, and produces text* outputs (including Structured Outputs).* <p>* The knowledge cutoff for o1 is October, 2023.* <p>*/O1("o1"),/*** <b>o1-preview</b> is trained with reinforcement learning to perform complex* reasoning. It thinks before it answers, producing a long internal chain of* thought before responding to the user.* <p>* The latest o1-preview model supports both text and image inputs, and produces* text outputs (including Structured Outputs).* <p>* The knowledge cutoff for o1-preview is October, 2023.* <p>*/O1_PREVIEW("o1-preview"),/*** <b>o1-mini</b> is a faster and more affordable reasoning model compared to o1.* o1-mini currently only supports text inputs and outputs.* <p>* The knowledge cutoff for o1-mini is October, 2023.* <p>*/O1_MINI("o1-mini"),/*** <b>o3-mini</b> is our most recent small reasoning model, providing high* intelligence at the same cost and latency targets of o1-mini. o3-mini also* supports key developer features, like Structured Outputs, function calling,* Batch API, and more. Like other models in the o-series, it is designed to excel* at science, math, and coding tasks.* <p>* The knowledge cutoff for o3-mini models is October, 2023.* <p>*/O3_MINI("o3-mini"),/*** <b>GPT-4o ("omni")</b> is our versatile, high-intelligence flagship model. It* accepts both text and image inputs and produces text outputs (including* Structured Outputs).* <p>* The knowledge cutoff for GPT-4o models is October, 2023.* <p>*/GPT_4_O("gpt-4o"),/*** The <b>chatgpt-4o-latest</b> model ID continuously points to the version of* GPT-4o used in ChatGPT. It is updated frequently when there are significant* changes to ChatGPT's GPT-4o model.* <p>* Context window: 128,000 tokens. Max output tokens: 16,384 tokens. Knowledge* cutoff: October, 2023.*/CHATGPT_4_O_LATEST("chatgpt-4o-latest"),/*** <b>GPT-4o Audio</b> is a preview release model that accepts audio inputs and* outputs and can be used in the Chat Completions REST API.* <p>* The knowledge cutoff for GPT-4o Audio models is October, 2023.* <p>*/GPT_4_O_AUDIO_PREVIEW("gpt-4o-audio-preview"),/*** <b>GPT-4o-mini Audio</b> is a preview release model that accepts audio inputs* and outputs and can be used in the Chat Completions REST API.* <p>* The knowledge cutoff for GPT-4o-mini Audio models is October, 2023.* <p>*/GPT_4_O_MINI_AUDIO_PREVIEW("gpt-4o-mini-audio-preview"),/*** <b>GPT-4o-mini</b> is a fast, affordable small model for focused tasks. It* accepts both text and image inputs and produces text outputs (including* Structured Outputs). It is ideal for fine-tuning, and model outputs from a* larger model like GPT-4o can be distilled to GPT-4o-mini to produce similar* results at lower cost and latency.* <p>* The knowledge cutoff for GPT-4o-mini models is October, 2023.* <p>*/GPT_4_O_MINI("gpt-4o-mini"),/*** <b>GPT-4 Turbo</b> is a high-intelligence GPT model with vision capabilities,* usable in Chat Completions. Vision requests can now use JSON mode and function* calling.* <p>* The knowledge cutoff for the latest GPT-4 Turbo version is December, 2023.* <p>*/GPT_4_TURBO("gpt-4-turbo"),/*** <b>GPT-4-0125-preview</b> is the latest GPT-4 model intended to reduce cases of* “laziness” where the model doesn’t complete a task.* <p>* Context window: 128,000 tokens. Max output tokens: 4,096 tokens.*/GPT_4_0125_PREVIEW("gpt-4-0125-preview"),/*** Currently points to {@link #GPT_4_0125_PREVIEW}.* <p>* Context window: 128,000 tokens. Max output tokens: 4,096 tokens.*/GPT_4_1106_PREVIEW("gpt-4-1106-preview"),/*** <b>GPT-4 Turbo Preview</b> is a high-intelligence GPT model usable in Chat* Completions.* <p>* Currently points to {@link #GPT_4_0125_PREVIEW}.* <p>* Context window: 128,000 tokens. Max output tokens: 4,096 tokens.*/GPT_4_TURBO_PREVIEW("gpt-4-turbo-preview"),/*** <b>GPT-4</b> is an older version of a high-intelligence GPT model, usable in* Chat Completions.* <p>* Currently points to {@link #GPT_4_0613}.* <p>* Context window: 8,192 tokens. Max output tokens: 8,192 tokens.*/GPT_4("gpt-4"),/*** GPT-4 model snapshot.* <p>* Context window: 8,192 tokens. Max output tokens: 8,192 tokens.*/GPT_4_0613("gpt-4-0613"),/*** GPT-4 model snapshot.* <p>* Context window: 8,192 tokens. Max output tokens: 8,192 tokens.*/GPT_4_0314("gpt-4-0314"),/*** <b>GPT-3.5 Turbo</b> models can understand and generate natural language or* code and have been optimized for chat using the Chat Completions API but work* well for non-chat tasks as well.* <p>* As of July 2024, {@link #GPT_4_O_MINI} should be used in place of* gpt-3.5-turbo, as it is cheaper, more capable, multimodal, and just as fast.* gpt-3.5-turbo is still available for use in the API.* <p>* <p>* Context window: 16,385 tokens. Max output tokens: 4,096 tokens. Knowledge* cutoff: September, 2021.*/GPT_3_5_TURBO("gpt-3.5-turbo"),/*** <b>GPT-3.5 Turbo Instruct</b> has similar capabilities to GPT-3 era models.* Compatible with the legacy Completions endpoint and not Chat Completions.* <p>* Context window: 4,096 tokens. Max output tokens: 4,096 tokens. Knowledge* cutoff: September, 2021.*/GPT_3_5_TURBO_INSTRUCT("gpt-3.5-turbo-instruct");public final String value;ChatModel(String value) {this.value = value;}public String getValue() {return this.value;}@Overridepublic String getName() {return this.value;}}/*** The reason the model stopped generating tokens.*/public enum ChatCompletionFinishReason {/*** The model hit a natural stop point or a provided stop sequence.*/@JsonProperty("stop")STOP,/*** The maximum number of tokens specified in the request was reached.*/@JsonProperty("length")LENGTH,/*** The content was omitted due to a flag from our content filters.*/@JsonProperty("content_filter")CONTENT_FILTER,/*** The model called a tool.*/@JsonProperty("tool_calls")TOOL_CALLS,/*** Only for compatibility with Mistral AI API.*/@JsonProperty("tool_call")TOOL_CALL}/*** OpenAI Embeddings Models:* <a href="https://platform.openai.com/docs/models/embeddings">Embeddings</a>.*/public enum EmbeddingModel {/*** Most capable embedding model for both english and non-english tasks. DIMENSION:* 3072*/TEXT_EMBEDDING_3_LARGE("text-embedding-3-large"),/*** Increased performance over 2nd generation ada embedding model. DIMENSION: 1536*/TEXT_EMBEDDING_3_SMALL("text-embedding-3-small"),/*** Most capable 2nd generation embedding model, replacing 16 first generation* models. DIMENSION: 1536*/TEXT_EMBEDDING_ADA_002("text-embedding-ada-002");public final String value;EmbeddingModel(String value) {this.value = value;}public String getValue() {return this.value;}}/*** Represents a tool the model may call. Currently, only functions are supported as a* tool.*/@JsonInclude(JsonInclude.Include.NON_NULL)public static class FunctionTool {/*** The type of the tool. Currently, only 'function' is supported.*/@JsonProperty("type")private Type type = Type.FUNCTION;/*** The function definition.*/@JsonProperty("function")private Function function;public FunctionTool() {}/*** Create a tool of type 'function' and the given function definition.* @param type the tool type* @param function function definition*/public FunctionTool(Type type, Function function) {this.type = type;this.function = function;}/*** Create a tool of type 'function' and the given function definition.* @param function function definition.*/public FunctionTool(Function function) {this(Type.FUNCTION, function);}public Type getType() {return this.type;}public Function getFunction() {return this.function;}public void setType(Type type) {this.type = type;}public void setFunction(Function function) {this.function = function;}/*** Create a tool of type 'function' and the given function definition.*/public enum Type {/*** Function tool type.*/@JsonProperty("function")FUNCTION}/*** Function definition.*/@JsonInclude(JsonInclude.Include.NON_NULL)public static class Function {@JsonProperty("description")private String description;@JsonProperty("name")private String name;@JsonProperty("parameters")private Map<String, Object> parameters;@JsonProperty("strict")Boolean strict;@JsonIgnoreprivate String jsonSchema;/*** NOTE: Required by Jackson, JSON deserialization!*/@SuppressWarnings("unused")private Function() {}/*** Create tool function definition.* @param description A description of what the function does, used by the* model to choose when and how to call the function.* @param name The name of the function to be called. Must be a-z, A-Z, 0-9,* or contain underscores and dashes, with a maximum length of 64.* @param parameters The parameters the functions accepts, described as a JSON* Schema object. To describe a function that accepts no parameters, provide* the value {"type": "object", "properties": {}}.* @param strict Whether to enable strict schema adherence when generating the* function call. If set to true, the model will follow the exact schema* defined in the parameters field. Only a subset of JSON Schema is supported* when strict is true.*/public Function(String description, String name, Map<String, Object> parameters, Boolean strict) {this.description = description;this.name = name;this.parameters = parameters;this.strict = strict;}/*** Create tool function definition.* @param description tool function description.* @param name tool function name.* @param jsonSchema tool function schema as json.*/public Function(String description, String name, String jsonSchema) {this(description, name, ModelOptionsUtils.jsonToMap(jsonSchema), null);}public String getDescription() {return this.description;}public String getName() {return this.name;}public Map<String, Object> getParameters() {return this.parameters;}public void setDescription(String description) {this.description = description;}public void setName(String name) {this.name = name;}public void setParameters(Map<String, Object> parameters) {this.parameters = parameters;}public Boolean getStrict() {return this.strict;}public void setStrict(Boolean strict) {this.strict = strict;}public String getJsonSchema() {return this.jsonSchema;}public void setJsonSchema(String jsonSchema) {this.jsonSchema = jsonSchema;if (jsonSchema != null) {this.parameters = ModelOptionsUtils.jsonToMap(jsonSchema);}}}}/*** The type of modality for the model completion.*/public enum OutputModality {// @formatter:off@JsonProperty("audio")AUDIO,@JsonProperty("text")TEXT// @formatter:on}/*** Creates a model response for the given chat conversation.** @param messages A list of messages comprising the conversation so far.* @param model ID of the model to use.* @param store Whether to store the output of this chat completion request for use in* OpenAI's model distillation or evals products.* @param metadata Developer-defined tags and values used for filtering completions in* the OpenAI's dashboard.* @param frequencyPenalty Number between -2.0 and 2.0. Positive values penalize new* tokens based on their existing frequency in the text so far, decreasing the model's* likelihood to repeat the same line verbatim.* @param logitBias Modify the likelihood of specified tokens appearing in the* completion. Accepts a JSON object that maps tokens (specified by their token ID in* the tokenizer) to an associated bias value from -100 to 100. Mathematically, the* bias is added to the logits generated by the model prior to sampling. The exact* effect will vary per model, but values between -1 and 1 should decrease or increase* likelihood of selection; values like -100 or 100 should result in a ban or* exclusive selection of the relevant token.* @param logprobs Whether to return log probabilities of the output tokens or not. If* true, returns the log probabilities of each output token returned in the 'content'* of 'message'.* @param topLogprobs An integer between 0 and 5 specifying the number of most likely* tokens to return at each token position, each with an associated log probability.* 'logprobs' must be set to 'true' if this parameter is used.* @param maxTokens The maximum number of tokens that can be generated in the chat* completion. This value can be used to control costs for text generated via API.* This value is now deprecated in favor of max_completion_tokens, and is not* compatible with o1 series models.* @param maxCompletionTokens An upper bound for the number of tokens that can be* generated for a completion, including visible output tokens and reasoning tokens.* @param n How many chat completion choices to generate for each input message. Note* that you will be charged based on the number of generated tokens across all the* choices. Keep n as 1 to minimize costs.* @param outputModalities Output types that you would like the model to generate for* this request. Most models are capable of generating text, which is the default:* ["text"]. The gpt-4o-audio-preview model can also be used to generate audio. To* request that this model generate both text and audio responses, you can use:* ["text", "audio"].* @param audioParameters Parameters for audio output. Required when audio output is* requested with outputModalities: ["audio"].* @param presencePenalty Number between -2.0 and 2.0. Positive values penalize new* tokens based on whether they appear in the text so far, increasing the model's* likelihood to talk about new topics.* @param responseFormat An object specifying the format that the model must output.* Setting to { "type": "json_object" } enables JSON mode, which guarantees the* message the model generates is valid JSON.* @param seed This feature is in Beta. If specified, our system will make a best* effort to sample deterministically, such that repeated requests with the same seed* and parameters should return the same result. Determinism is not guaranteed, and* you should refer to the system_fingerprint response parameter to monitor changes in* the backend.* @param serviceTier Specifies the latency tier to use for processing the request.* This parameter is relevant for customers subscribed to the scale tier service. When* this parameter is set, the response body will include the service_tier utilized.* @param stop Up to 4 sequences where the API will stop generating further tokens.* @param stream If set, partial message deltas will be sent.Tokens will be sent as* data-only server-sent events as they become available, with the stream terminated* by a data: [DONE] message.* @param streamOptions Options for streaming response. Only set this when you set.* @param temperature What sampling temperature to use, between 0 and 1. Higher values* like 0.8 will make the output more random, while lower values like 0.2 will make it* more focused and deterministic. We generally recommend altering this or top_p but* not both.* @param topP An alternative to sampling with temperature, called nucleus sampling,* where the model considers the results of the tokens with top_p probability mass. So* 0.1 means only the tokens comprising the top 10% probability mass are considered.* We generally recommend altering this or temperature but not both.* @param tools A list of tools the model may call. Currently, only functions are* supported as a tool. Use this to provide a list of functions the model may generate* JSON inputs for.* @param toolChoice Controls which (if any) function is called by the model. none* means the model will not call a function and instead generates a message. auto* means the model can pick between generating a message or calling a function.* Specifying a particular function via {"type: "function", "function": {"name":* "my_function"}} forces the model to call that function. none is the default when no* functions are present. auto is the default if functions are present. Use the* {@link ToolChoiceBuilder} to create the tool choice value.* @param user A unique identifier representing your end-user, which can help OpenAI* to monitor and detect abuse.* @param parallelToolCalls If set to true, the model will call all functions in the* tools list in parallel. Otherwise, the model will call the functions in the tools* list in the order they are provided.*/@JsonInclude(Include.NON_NULL)public record ChatCompletionRequest(// @formatter:off@JsonProperty("messages") List<ChatCompletionMessage> messages,@JsonProperty("model") String model,@JsonProperty("store") Boolean store,@JsonProperty("metadata") Map<String, String> metadata,@JsonProperty("frequency_penalty") Double frequencyPenalty,@JsonProperty("logit_bias") Map<String, Integer> logitBias,@JsonProperty("logprobs") Boolean logprobs,@JsonProperty("top_logprobs") Integer topLogprobs,@JsonProperty("max_tokens") @Deprecated Integer maxTokens, // Use maxCompletionTokens instead@JsonProperty("max_completion_tokens") Integer maxCompletionTokens,@JsonProperty("n") Integer n,@JsonProperty("modalities") List<OutputModality> outputModalities,@JsonProperty("audio") AudioParameters audioParameters,@JsonProperty("presence_penalty") Double presencePenalty,@JsonProperty("response_format") ResponseFormat responseFormat,@JsonProperty("seed") Integer seed,@JsonProperty("service_tier") String serviceTier,@JsonProperty("stop") List<String> stop,@JsonProperty("stream") Boolean stream,@JsonProperty("stream_options") StreamOptions streamOptions,@JsonProperty("temperature") Double temperature,@JsonProperty("top_p") Double topP,@JsonProperty("tools") List<FunctionTool> tools,@JsonProperty("tool_choice") Object toolChoice,@JsonProperty("parallel_tool_calls") Boolean parallelToolCalls,@JsonProperty("user") String user,@JsonProperty("reasoning_effort") String reasoningEffort) {/*** Shortcut constructor for a chat completion request with the given messages, model and temperature.** @param messages A list of messages comprising the conversation so far.* @param model ID of the model to use.* @param temperature What sampling temperature to use, between 0 and 1.*/public ChatCompletionRequest(List<ChatCompletionMessage> messages, String model, Double temperature) {this(messages, model, null, null, null, null, null, null, null, null, null, null, null, null, null,null, null, null, false, null, temperature, null,null, null, null, null, null);}/*** Shortcut constructor for a chat completion request with text and audio output.** @param messages A list of messages comprising the conversation so far.* @param model ID of the model to use.* @param audio Parameters for audio output. Required when audio output is requested with outputModalities: ["audio"].*/public ChatCompletionRequest(List<ChatCompletionMessage> messages, String model, AudioParameters audio, boolean stream) {this(messages, model, null, null, null, null, null, null,null, null, null, List.of(OutputModality.AUDIO, OutputModality.TEXT), audio, null, null,null, null, null, stream, null, null, null,null, null, null, null, null);}/*** Shortcut constructor for a chat completion request with the given messages, model, temperature and control for streaming.** @param messages A list of messages comprising the conversation so far.* @param model ID of the model to use.* @param temperature What sampling temperature to use, between 0 and 1.* @param stream If set, partial message deltas will be sent.Tokens will be sent as data-only server-sent events* as they become available, with the stream terminated by a data: [DONE] message.*/public ChatCompletionRequest(List<ChatCompletionMessage> messages, String model, Double temperature, boolean stream) {this(messages, model, null, null, null, null, null, null, null, null, null,null, null, null, null, null, null, null, stream, null, temperature, null,null, null, null, null, null);}/*** Shortcut constructor for a chat completion request with the given messages, model, tools and tool choice.* Streaming is set to false, temperature to 0.8 and all other parameters are null.** @param messages A list of messages comprising the conversation so far.* @param model ID of the model to use.* @param tools A list of tools the model may call. Currently, only functions are supported as a tool.* @param toolChoice Controls which (if any) function is called by the model.*/public ChatCompletionRequest(List<ChatCompletionMessage> messages, String model,List<FunctionTool> tools, Object toolChoice) {this(messages, model, null, null, null, null, null, null, null, null, null,null, null, null, null, null, null, null, false, null, 0.8, null,tools, toolChoice, null, null, null);}/*** Shortcut constructor for a chat completion request with the given messages for streaming.** @param messages A list of messages comprising the conversation so far.* @param stream If set, partial message deltas will be sent.Tokens will be sent as data-only server-sent events* as they become available, with the stream terminated by a data: [DONE] message.*/public ChatCompletionRequest(List<ChatCompletionMessage> messages, Boolean stream) {this(messages, null, null, null, null, null, null, null, null, null, null,null, null, null, null, null, null, null, stream, null, null, null,null, null, null, null, null);}/*** Sets the {@link StreamOptions} for this request.** @param streamOptions The new stream options to use.* @return A new {@link ChatCompletionRequest} with the specified stream options.*/public ChatCompletionRequest streamOptions(StreamOptions streamOptions) {return new ChatCompletionRequest(this.messages, this.model, this.store, this.metadata, this.frequencyPenalty, this.logitBias, this.logprobs,this.topLogprobs, this.maxTokens, this.maxCompletionTokens, this.n, this.outputModalities, this.audioParameters, this.presencePenalty,this.responseFormat, this.seed, this.serviceTier, this.stop, this.stream, streamOptions, this.temperature, this.topP,this.tools, this.toolChoice, this.parallelToolCalls, this.user, this.reasoningEffort);}/*** Helper factory that creates a tool_choice of type 'none', 'auto' or selected function by name.*/public static class ToolChoiceBuilder {/*** Model can pick between generating a message or calling a function.*/public static final String AUTO = "auto";/*** Model will not call a function and instead generates a message*/public static final String NONE = "none";/*** Specifying a particular function forces the model to call that function.*/public static Object FUNCTION(String functionName) {return Map.of("type", "function", "function", Map.of("name", functionName));}}/*** Parameters for audio output. Required when audio output is requested with outputModalities: ["audio"].* @param voice Specifies the voice type.* @param format Specifies the output audio format.*/@JsonInclude(Include.NON_NULL)public record AudioParameters(@JsonProperty("voice") Voice voice,@JsonProperty("format") AudioResponseFormat format) {/*** Specifies the voice type.*/public enum Voice {/** Alloy voice */@JsonProperty("alloy") ALLOY,/** Echo voice */@JsonProperty("echo") ECHO,/** Fable voice */@JsonProperty("fable") FABLE,/** Onyx voice */@JsonProperty("onyx") ONYX,/** Nova voice */@JsonProperty("nova") NOVA,/** Shimmer voice */@JsonProperty("shimmer") SHIMMER}/*** Specifies the output audio format.*/public enum AudioResponseFormat {/** MP3 format */@JsonProperty("mp3") MP3,/** FLAC format */@JsonProperty("flac") FLAC,/** OPUS format */@JsonProperty("opus") OPUS,/** PCM16 format */@JsonProperty("pcm16") PCM16,/** WAV format */@JsonProperty("wav") WAV}}/*** @param includeUsage If set, an additional chunk will be streamed* before the data: [DONE] message. The usage field on this chunk* shows the token usage statistics for the entire request, and* the choices field will always be an empty array. All other chunks* will also include a usage field, but with a null value.*/@JsonInclude(Include.NON_NULL)public record StreamOptions(@JsonProperty("include_usage") Boolean includeUsage) {public static StreamOptions INCLUDE_USAGE = new StreamOptions(true);}} // @formatter:on/*** Message comprising the conversation.** @param rawContent The contents of the message. Can be either a {@link MediaContent}* or a {@link String}. The response message content is always a {@link String}.* @param role The role of the messages author. Could be one of the {@link Role}* types.* @param name An optional name for the participant. Provides the model information to* differentiate between participants of the same role. In case of Function calling,* the name is the function name that the message is responding to.* @param toolCallId Tool call that this message is responding to. Only applicable for* the {@link Role#TOOL} role and null otherwise.* @param toolCalls The tool calls generated by the model, such as function calls.* Applicable only for {@link Role#ASSISTANT} role and null otherwise.* @param refusal The refusal message by the assistant. Applicable only for* {@link Role#ASSISTANT} role and null otherwise.* @param audioOutput Audio response from the model. >>>>>>> bdb66e577 (OpenAI -* Support audio input modality)*/@JsonInclude(Include.NON_NULL)public record ChatCompletionMessage(// @formatter:off@JsonProperty("content") Object rawContent,@JsonProperty("role") Role role,@JsonProperty("name") String name,@JsonProperty("tool_call_id") String toolCallId,@JsonProperty("tool_calls")@JsonFormat(with = JsonFormat.Feature.ACCEPT_SINGLE_VALUE_AS_ARRAY) List<ToolCall> toolCalls,@JsonProperty("refusal") String refusal,@JsonProperty("audio") AudioOutput audioOutput) { // @formatter:on/*** Create a chat completion message with the given content and role. All other* fields are null.* @param content The contents of the message.* @param role The role of the author of this message.*/public ChatCompletionMessage(Object content, Role role) {this(content, role, null, null, null, null, null);}/*** Get message content as String.*/public String content() {if (this.rawContent == null) {return null;}if (this.rawContent instanceof String text) {return text;}throw new IllegalStateException("The content is not a string!");}/*** The role of the author of this message.*/public enum Role {/*** System message.*/@JsonProperty("system")SYSTEM,/*** User message.*/@JsonProperty("user")USER,/*** Assistant message.*/@JsonProperty("assistant")ASSISTANT,/*** Tool message.*/@JsonProperty("tool")TOOL}/*** An array of content parts with a defined type. Each MediaContent can be of* either "text", "image_url", or "input_audio" type. Only one option allowed.** @param type Content type, each can be of type text or image_url.* @param text The text content of the message.* @param imageUrl The image content of the message. You can pass multiple images* by adding multiple image_url content parts. Image input is only supported when* using the gpt-4-visual-preview model.* @param inputAudio Audio content part.*/@JsonInclude(Include.NON_NULL)public record MediaContent(// @formatter:off@JsonProperty("type") String type,@JsonProperty("text") String text,@JsonProperty("image_url") ImageUrl imageUrl,@JsonProperty("input_audio") InputAudio inputAudio) { // @formatter:on/*** Shortcut constructor for a text content.* @param text The text content of the message.*/public MediaContent(String text) {this("text", text, null, null);}/*** Shortcut constructor for an image content.* @param imageUrl The image content of the message.*/public MediaContent(ImageUrl imageUrl) {this("image_url", null, imageUrl, null);}/*** Shortcut constructor for an audio content.* @param inputAudio The audio content of the message.*/public MediaContent(InputAudio inputAudio) {this("input_audio", null, null, inputAudio);}/*** @param data Base64 encoded audio data.* @param format The format of the encoded audio data. Currently supports* "wav" and "mp3".*/@JsonInclude(Include.NON_NULL)public record InputAudio(// @formatter:off@JsonProperty("data") String data,@JsonProperty("format") Format format) {public enum Format {/** MP3 audio format */@JsonProperty("mp3") MP3,/** WAV audio format */@JsonProperty("wav") WAV} // @formatter:on}/*** Shortcut constructor for an image content.** @param url Either a URL of the image or the base64 encoded image data. The* base64 encoded image data must have a special prefix in the following* format: "data:{mimetype};base64,{base64-encoded-image-data}".* @param detail Specifies the detail level of the image.*/@JsonInclude(Include.NON_NULL)public record ImageUrl(@JsonProperty("url") String url, @JsonProperty("detail") String detail) {public ImageUrl(String url) {this(url, null);}}}/*** The relevant tool call.** @param index The index of the tool call in the list of tool calls. Required in* case of streaming.* @param id The ID of the tool call. This ID must be referenced when you submit* the tool outputs in using the Submit tool outputs to run endpoint.* @param type The type of tool call the output is required for. For now, this is* always function.* @param function The function definition.*/@JsonInclude(Include.NON_NULL)public record ToolCall(// @formatter:off@JsonProperty("index") Integer index,@JsonProperty("id") String id,@JsonProperty("type") String type,@JsonProperty("function") ChatCompletionFunction function) { // @formatter:onpublic ToolCall(String id, String type, ChatCompletionFunction function) {this(null, id, type, function);}}/*** The function definition.** @param name The name of the function.* @param arguments The arguments that the model expects you to pass to the* function.*/@JsonInclude(Include.NON_NULL)public record ChatCompletionFunction(// @formatter:off@JsonProperty("name") String name,@JsonProperty("arguments") String arguments) { // @formatter:on}/*** Audio response from the model.** @param id Unique identifier for the audio response from the model.* @param data Audio output from the model.* @param expiresAt When the audio content will no longer be available on the* server.* @param transcript Transcript of the audio output from the model.*/@JsonInclude(Include.NON_NULL)public record AudioOutput(// @formatter:off@JsonProperty("id") String id,@JsonProperty("data") String data,@JsonProperty("expires_at") Long expiresAt,@JsonProperty("transcript") String transcript) { // @formatter:on}}/*** Represents a chat completion response returned by model, based on the provided* input.** @param id A unique identifier for the chat completion.* @param choices A list of chat completion choices. Can be more than one if n is* greater than 1.* @param created The Unix timestamp (in seconds) of when the chat completion was* created.* @param model The model used for the chat completion.* @param serviceTier The service tier used for processing the request. This field is* only included if the service_tier parameter is specified in the request.* @param systemFingerprint This fingerprint represents the backend configuration that* the model runs with. Can be used in conjunction with the seed request parameter to* understand when backend changes have been made that might impact determinism.* @param object The object type, which is always chat.completion.* @param usage Usage statistics for the completion request.*/@JsonInclude(Include.NON_NULL)public record ChatCompletion(// @formatter:off@JsonProperty("id") String id,@JsonProperty("choices") List<Choice> choices,@JsonProperty("created") Long created,@JsonProperty("model") String model,@JsonProperty("service_tier") String serviceTier,@JsonProperty("system_fingerprint") String systemFingerprint,@JsonProperty("object") String object,@JsonProperty("usage") Usage usage) { // @formatter:on/*** Chat completion choice.** @param finishReason The reason the model stopped generating tokens.* @param index The index of the choice in the list of choices.* @param message A chat completion message generated by the model.* @param logprobs Log probability information for the choice.*/@JsonInclude(Include.NON_NULL)public record Choice(// @formatter:off@JsonProperty("finish_reason") ChatCompletionFinishReason finishReason,@JsonProperty("index") Integer index,@JsonProperty("message") ChatCompletionMessage message,@JsonProperty("logprobs") LogProbs logprobs) { // @formatter:on}}/*** Log probability information for the choice.** @param content A list of message content tokens with log probability information.* @param refusal A list of message refusal tokens with log probability information.*/@JsonInclude(Include.NON_NULL)public record LogProbs(@JsonProperty("content") List<Content> content,@JsonProperty("refusal") List<Content> refusal) {/*** Message content tokens with log probability information.** @param token The token.* @param logprob The log probability of the token.* @param probBytes A list of integers representing the UTF-8 bytes representation* of the token. Useful in instances where characters are represented by multiple* tokens and their byte representations must be combined to generate the correct* text representation. Can be null if there is no bytes representation for the* token.* @param topLogprobs List of the most likely tokens and their log probability, at* this token position. In rare cases, there may be fewer than the number of* requested top_logprobs returned.*/@JsonInclude(Include.NON_NULL)public record Content(// @formatter:off@JsonProperty("token") String token,@JsonProperty("logprob") Float logprob,@JsonProperty("bytes") List<Integer> probBytes,@JsonProperty("top_logprobs") List<TopLogProbs> topLogprobs) { // @formatter:on/*** The most likely tokens and their log probability, at this token position.** @param token The token.* @param logprob The log probability of the token.* @param probBytes A list of integers representing the UTF-8 bytes* representation of the token. Useful in instances where characters are* represented by multiple tokens and their byte representations must be* combined to generate the correct text representation. Can be null if there* is no bytes representation for the token.*/@JsonInclude(Include.NON_NULL)public record TopLogProbs(// @formatter:off@JsonProperty("token") String token,@JsonProperty("logprob") Float logprob,@JsonProperty("bytes") List<Integer> probBytes) { // @formatter:on}}}// Embeddings API/*** Usage statistics for the completion request.** @param completionTokens Number of tokens in the generated completion. Only* applicable for completion requests.* @param promptTokens Number of tokens in the prompt.* @param totalTokens Total number of tokens used in the request (prompt +* completion).* @param promptTokensDetails Breakdown of tokens used in the prompt.* @param completionTokenDetails Breakdown of tokens used in a completion.* @param promptCacheHitTokens Number of tokens in the prompt that were served from* (util for* <a href="https://api-docs.deepseek.com/api/create-chat-completion">DeepSeek</a>* support).* @param promptCacheMissTokens Number of tokens in the prompt that were not served* (util for* <a href="https://api-docs.deepseek.com/api/create-chat-completion">DeepSeek</a>* support).*/@JsonInclude(Include.NON_NULL)@JsonIgnoreProperties(ignoreUnknown = true)public record Usage(// @formatter:off@JsonProperty("completion_tokens") Integer completionTokens,@JsonProperty("prompt_tokens") Integer promptTokens,@JsonProperty("total_tokens") Integer totalTokens,@JsonProperty("prompt_tokens_details") PromptTokensDetails promptTokensDetails,@JsonProperty("completion_tokens_details") CompletionTokenDetails completionTokenDetails,@JsonProperty("prompt_cache_hit_tokens") Integer promptCacheHitTokens,@JsonProperty("prompt_cache_miss_tokens") Integer promptCacheMissTokens) { // @formatter:onpublic Usage(Integer completionTokens, Integer promptTokens, Integer totalTokens) {this(completionTokens, promptTokens, totalTokens, null, null, null, null);}/*** Breakdown of tokens used in the prompt** @param audioTokens Audio input tokens present in the prompt.* @param cachedTokens Cached tokens present in the prompt.*/@JsonInclude(Include.NON_NULL)public record PromptTokensDetails(// @formatter:off@JsonProperty("audio_tokens") Integer audioTokens,@JsonProperty("cached_tokens") Integer cachedTokens) { // @formatter:on}/*** Breakdown of tokens used in a completion.** @param reasoningTokens Number of tokens generated by the model for reasoning.* @param acceptedPredictionTokens Number of tokens generated by the model for* accepted predictions.* @param audioTokens Number of tokens generated by the model for audio.* @param rejectedPredictionTokens Number of tokens generated by the model for* rejected predictions.*/@JsonInclude(Include.NON_NULL)@JsonIgnoreProperties(ignoreUnknown = true)public record CompletionTokenDetails(// @formatter:off@JsonProperty("reasoning_tokens") Integer reasoningTokens,@JsonProperty("accepted_prediction_tokens") Integer acceptedPredictionTokens,@JsonProperty("audio_tokens") Integer audioTokens,@JsonProperty("rejected_prediction_tokens") Integer rejectedPredictionTokens) { // @formatter:on}}/*** Represents a streamed chunk of a chat completion response returned by model, based* on the provided input.** @param id A unique identifier for the chat completion. Each chunk has the same ID.* @param choices A list of chat completion choices. Can be more than one if n is* greater than 1.* @param created The Unix timestamp (in seconds) of when the chat completion was* created. Each chunk has the same timestamp.* @param model The model used for the chat completion.* @param serviceTier The service tier used for processing the request. This field is* only included if the service_tier parameter is specified in the request.* @param systemFingerprint This fingerprint represents the backend configuration that* the model runs with. Can be used in conjunction with the seed request parameter to* understand when backend changes have been made that might impact determinism.* @param object The object type, which is always 'chat.completion.chunk'.* @param usage Usage statistics for the completion request. Present in the last chunk* only if the StreamOptions.includeUsage is set to true.*/@JsonInclude(Include.NON_NULL)public record ChatCompletionChunk(// @formatter:off@JsonProperty("id") String id,@JsonProperty("choices") List<ChunkChoice> choices,@JsonProperty("created") Long created,@JsonProperty("model") String model,@JsonProperty("service_tier") String serviceTier,@JsonProperty("system_fingerprint") String systemFingerprint,@JsonProperty("object") String object,@JsonProperty("usage") Usage usage) { // @formatter:on/*** Chat completion choice.** @param finishReason The reason the model stopped generating tokens.* @param index The index of the choice in the list of choices.* @param delta A chat completion delta generated by streamed model responses.* @param logprobs Log probability information for the choice.*/@JsonInclude(Include.NON_NULL)public record ChunkChoice(// @formatter:off@JsonProperty("finish_reason") ChatCompletionFinishReason finishReason,@JsonProperty("index") Integer index,@JsonProperty("delta") ChatCompletionMessage delta,@JsonProperty("logprobs") LogProbs logprobs) { // @formatter:on}}/*** Represents an embedding vector returned by embedding endpoint.** @param index The index of the embedding in the list of embeddings.* @param embedding The embedding vector, which is a list of floats. The length of* vector depends on the model.* @param object The object type, which is always 'embedding'.*/@JsonInclude(Include.NON_NULL)public record Embedding(// @formatter:off@JsonProperty("index") Integer index,@JsonProperty("embedding") float[] embedding,@JsonProperty("object") String object) { // @formatter:on/*** Create an embedding with the given index, embedding and object type set to* 'embedding'.* @param index The index of the embedding in the list of embeddings.* @param embedding The embedding vector, which is a list of floats. The length of* vector depends on the model.*/public Embedding(Integer index, float[] embedding) {this(index, embedding, "embedding");}}/*** Creates an embedding vector representing the input text.** @param <T> Type of the input.* @param input Input text to embed, encoded as a string or array of tokens. To embed* multiple inputs in a single request, pass an array of strings or array of token* arrays. The input must not exceed the max input tokens for the model (8192 tokens* for text-embedding-ada-002), cannot be an empty string, and any array must be 2048* dimensions or less.* @param model ID of the model to use.* @param encodingFormat The format to return the embeddings in. Can be either float* or base64.* @param dimensions The number of dimensions the resulting output embeddings should* have. Only supported in text-embedding-3 and later models.* @param user A unique identifier representing your end-user, which can help OpenAI* to monitor and detect abuse.*/@JsonInclude(Include.NON_NULL)public record EmbeddingRequest<T>(// @formatter:off@JsonProperty("input") T input,@JsonProperty("model") String model,@JsonProperty("encoding_format") String encodingFormat,@JsonProperty("dimensions") Integer dimensions,@JsonProperty("user") String user) { // @formatter:on/*** Create an embedding request with the given input, model and encoding format set* to float.* @param input Input text to embed.* @param model ID of the model to use.*/public EmbeddingRequest(T input, String model) {this(input, model, "float", null, null);}/*** Create an embedding request with the given input. Encoding format is set to* float and user is null and the model is set to 'text-embedding-ada-002'.* @param input Input text to embed.*/public EmbeddingRequest(T input) {this(input, DEFAULT_EMBEDDING_MODEL);}}/*** List of multiple embedding responses.** @param <T> Type of the entities in the data list.* @param object Must have value "list".* @param data List of entities.* @param model ID of the model to use.* @param usage Usage statistics for the completion request.*/@JsonInclude(Include.NON_NULL)public record EmbeddingList<T>(// @formatter:off@JsonProperty("object") String object,@JsonProperty("data") List<T> data,@JsonProperty("model") String model,@JsonProperty("usage") Usage usage) { // @formatter:on}public static class Builder {private String baseUrl = OpenAiApiConstants.DEFAULT_BASE_URL;private ApiKey apiKey;private MultiValueMap<String, String> headers = new LinkedMultiValueMap<>();private String completionsPath = "/v1/chat/completions";private String embeddingsPath = "/v1/embeddings";private RestClient.Builder restClientBuilder = RestClient.builder();private WebClient.Builder webClientBuilder = WebClient.builder();private ResponseErrorHandler responseErrorHandler = RetryUtils.DEFAULT_RESPONSE_ERROR_HANDLER;public Builder baseUrl(String baseUrl) {Assert.hasText(baseUrl, "baseUrl cannot be null or empty");this.baseUrl = baseUrl;return this;}public Builder apiKey(ApiKey apiKey) {Assert.notNull(apiKey, "apiKey cannot be null");this.apiKey = apiKey;return this;}public Builder apiKey(String simpleApiKey) {Assert.notNull(simpleApiKey, "simpleApiKey cannot be null");this.apiKey = new SimpleApiKey(simpleApiKey);return this;}public Builder headers(MultiValueMap<String, String> headers) {Assert.notNull(headers, "headers cannot be null");this.headers = headers;return this;}public Builder completionsPath(String completionsPath) {Assert.hasText(completionsPath, "completionsPath cannot be null or empty");this.completionsPath = completionsPath;return this;}public Builder embeddingsPath(String embeddingsPath) {Assert.hasText(embeddingsPath, "embeddingsPath cannot be null or empty");this.embeddingsPath = embeddingsPath;return this;}public Builder restClientBuilder(RestClient.Builder restClientBuilder) {Assert.notNull(restClientBuilder, "restClientBuilder cannot be null");this.restClientBuilder = restClientBuilder;return this;}public Builder webClientBuilder(WebClient.Builder webClientBuilder) {Assert.notNull(webClientBuilder, "webClientBuilder cannot be null");this.webClientBuilder = webClientBuilder;return this;}public Builder responseErrorHandler(ResponseErrorHandler responseErrorHandler) {Assert.notNull(responseErrorHandler, "responseErrorHandler cannot be null");this.responseErrorHandler = responseErrorHandler;return this;}public OpenAiApi build() {Assert.notNull(this.apiKey, "apiKey must be set");return new OpenAiApi(this.baseUrl, this.apiKey, this.headers, this.completionsPath, this.embeddingsPath,this.restClientBuilder, this.webClientBuilder, this.responseErrorHandler);}}}

主要修改 这个 方法

public OpenAiApi(String baseUrl, ApiKey apiKey, MultiValueMap<String, String> headers, String completionsPath,String embeddingsPath, RestClient.Builder restClientBuilder, WebClient.Builder webClientBuilder,ResponseErrorHandler responseErrorHandler)

spring-ai-openai的pom文件添加以下依赖

		<!-- production dependencies --><dependency><groupId>com.squareup.okhttp3</groupId><artifactId>okhttp</artifactId><version>4.12.0</version></dependency><dependency><groupId>io.projectreactor.netty</groupId><artifactId>reactor-netty</artifactId><version>1.3.0-M1</version></dependency>

然后mvn  编译安装

spring-ai-openai 使用如下

<dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-openai-spring-boot-starter</artifactId><exclusions><exclusion><groupId>org.springframework.ai</groupId><artifactId>spring-ai-openai</artifactId></exclusion></exclusions></dependency><dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-openai</artifactId><version>1.0.0-M6-XIN</version></dependency><dependency><groupId>com.squareup.okhttp3</groupId><artifactId>okhttp</artifactId><version>4.12.0</version></dependency><dependency><groupId>io.projectreactor.netty</groupId><artifactId>reactor-netty</artifactId><version>1.3.0-M1</version></dependency>

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/bicheng/75985.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

刻意练习:如何从新手到大师

1. 练习方式 练习主要有两类&#xff1a;天真的练习和刻意练习。 所谓“天真的练习”&#xff0c;基本上只是反复地做某些事情&#xff0c;并指望只靠那种反复&#xff0c;就能提高表现和水平。一旦某个人的表现达到了“可接受”的水平&#xff0c;并且可以做到自动化&#x…

基于Java的人脸识别在线考试系统(jsp+springboot+mysql8.x)

基于Java的人脸识别在线考试系统(jspspringbootmysql8.x) 在线考试系统提供全面的考试管理和用户管理功能。登录界面支持管理员、教师和学生三种身份验证&#xff0c;确保不同用户访问相应的功能模块。系统自动组卷功能允许管理员根据不同科目和题型&#xff0c;如单选题、多选…

预测分析(二):基于机器学习的数值预测

文章目录 基于机器学习的数值预测机器学习简介监督学习的任务创建第一个机器学习模型机器学习的目标——泛化过拟合现象评价函数与最优化 建模前的数据处理进一步特征变换 多元线性回归模型LASSO回归kNN算法原理算法步骤k值的选择 基于机器学习的数值预测 机器学习是人工智能的…

批量压缩 jpg/png 等格式照片|批量调整图片的宽高尺寸

图片格式种类非常的多&#xff0c;并且不同的图片由于像素、尺寸不一样&#xff0c;可能占用的空间也会不一样。文件太大会占用较多的磁盘空间&#xff0c;传输及上传系统都非常不方便&#xff0c;可能会收到限制&#xff0c;因此我们经常会碰到需要对图片进行压缩的需求。如何…

生鲜果蔬便利店实体零售门店商城小程序

——线上线下融合赋能社区零售新生态 随着新零售模式的深化和消费者需求的升级&#xff0c;生鲜果蔬便利店亟需通过数字化工具实现经营效率与用户体验的双重提升。结合线下实体门店与线上商城的一体化小程序&#xff0c;成为行业转型的核心工具。以下从功能模块、运营策略及行…

如何开通google Free Tier长期免费云服务器(1C/1G)

Google宣布的一项政策&#xff0c;为标准层级的网络提供每地域200G的免费流量。两项政策结合&#xff0c;于是便可以得到一台1核心、1G内存、30G磁盘、200G流量的小云服务器&#xff0c;可玩性大大提高。这篇文章就分享一下如何正确开机&#xff0c;避免产生额外的费用。 免费…

C# 多线程并发编程基础

1. 线程基础 1.1 线程简介 C# 中的线程是操作系统能够进行运算调度的最小单位&#xff0c;它被包含在进程中&#xff0c;是进程中的实际运作单位。一个进程可以包含多个线程&#xff0c;这些线程可以并发执行不同的任务。 1.2 线程的创建与启动 在 C# 中&#xff0c;可以使…

【Introduction to Reinforcement Learning】翻译解读2

2.2 马尔可夫决策过程&#xff08;MDPs&#xff09; 马尔可夫决策过程&#xff08;MDP&#xff09;为顺序决策提供了框架&#xff0c;其中动作不仅影响即时奖励&#xff0c;还会影响未来结果。与多臂老虎机问题不同&#xff0c;MDP中的即时奖励与延迟奖励相平衡。在多臂老虎机…

STM32单片机入门学习——第22节: [7-2] AD单通道AD多通道

写这个文章是用来学习的,记录一下我的学习过程。希望我能一直坚持下去,我只是一个小白,只是想好好学习,我知道这会很难&#xff0c;但我还是想去做&#xff01; 本文写于&#xff1a;2025.04.07 STM32开发板学习——第22节: [7-2] AD单通道&AD多通道 前言开发板说明引用解…

Python高阶函数-filter

1. 基本概念 filter() 是Python内置的高阶函数&#xff0c;用于过滤序列中的元素。它接收一个函数和一个可迭代对象作为参数&#xff0c;返回一个迭代器&#xff0c;包含使函数返回True的所有元素。 filter(function, iterable)2. 工作原理 惰性计算&#xff1a;filter对象是…

密码学基础——分组密码的运行模式

前面的文章中文我们已经知道了分组密码是一种对称密钥密码体制&#xff0c;其工作原理可以概括为将明文消息分割成固定长度的分组&#xff0c;然后对每个分组分别进行加密处理。 下面介绍分组密码的运行模式 1.电码本模式&#xff08;ECB&#xff09; 2.密码分组链接模式&…

Redlinux(2025.3.29)

1、将你的虚拟机的网卡模式设置为nat模式&#xff0c;给虚拟机网卡配置三个主机位分别为100、200、168的ip地址。(以nmtui命令为例) 2、测试你的虚拟机是否能够ping通网关和dns&#xff0c;如果不能请修改网关和dns的地址。 首先打开虚拟网络编辑器查看NAT设置里的网关IP&…

【PalladiumZ2 使用专栏 1 -- 波形 trigger 抓取详细介绍】

文章目录 Palladium Z2 OverviewPalladium 波形抓取Palladium 波形存放文件创建Palladium Trigger 断点设置Palladium 加探针并 dumpPalladium 波形查看 Palladium Z2 Overview Cadence Palladium Z2 是 Cadence 推出的企业级硬件仿真加速平台&#xff0c;旨在应对复杂 SoC 设…

Redisson分布式锁:原理、使用

1. Redisson简介 Redisson是一个基于Redis的Java客户端库&#xff0c;提供了丰富的分布式对象和服务&#xff08;如分布式锁、信号量、Map等&#xff09;。其核心优势在于​​简化分布式锁的实现​​&#xff0c;并解决了原生Redis分布式锁的常见问题&#xff08;如死锁、误删…

Java大厂面试题 -- JVM 优化进阶之路:从原理到实战的深度剖析(2)

最近佳作推荐&#xff1a; Java大厂面试题 – 深度揭秘 JVM 优化&#xff1a;六道面试题与行业巨头实战解析&#xff08;1&#xff09;&#xff08;New&#xff09; 开源架构与人工智能的融合&#xff1a;开启技术新纪元&#xff08;New&#xff09; 开源架构的自动化测试策略优…

MySQL学习笔记(四)——DML和DQL

目录 1. DML 1.1 添加数据 1.1.1 给指定字段添加数据 1.1.2 给全部字段添加数据 1.1.3 批量添加数据 1.2 修改数据 1.3 删除数据 2. DQL 2.1 基本语法 2.2 基础查询 2.2.1 查询多个字段 2.2.2 字段设置别名 2.2.3 去除重复记录 2.3 条件查询 2.4 聚合函数 2.5 …

DeepSeek-MLA

MLA 结构 需要缓存 KV 向量共用的压缩隐特征K 向量多头共享的带位置编码的向量 为什么带有位置信息的 Q 向量来自于隐特征向量&#xff0c;而带有位置的 K 向量来自于 H 向量且共享呢&#xff1f; 最好的方法肯定是从H向量直接计算并且不共享&#xff0c;但是会大大增加显存使…

检索增强技术RAG和向量数据库技术的优势和劣势,应用范围和价值

RAG 和向量数据库在技术栈中处于不同层级&#xff0c;前者侧重生成任务的准确性与动态性&#xff0c;后者专注检索效率与扩展性。在实际应用中&#xff0c;二者常协同工作&#xff0c;但也可独立服务于不同场景。企业需根据需求选择&#xff1a;若需生成内容&#xff0c;RAG 是…

Python爬虫教程013:使用CrawlSpider爬取读书网数据并保存到mysql数据库

文章目录 3.8 CrawlSpider介绍3.9 CrawlSpider爬取读书网案例3.9.1 创建项目3.9.2 定义要爬取的数据结构3.9.3 获取数据3.9.4 保存数据到本地3.9.5 保存数据到mysql数据库3.9.6 完整项目下载3.8 CrawlSpider介绍 CrawlSpider 是 Scrapy 框架中 最常用的高级爬虫类之一,用于构…

Three.js 系列专题 5:加载外部模型

内容概述 Three.js 支持加载多种 3D 文件格式(如 GLTF、OBJ、FBX),这让开发者可以直接使用专业建模软件(如 Blender、Maya)创建的复杂模型。本专题将重点介绍 GLTF 格式的加载,并调整模型的位置和材质。 学习目标 理解常见 3D 文件格式及其特点。掌握使用 GLTFLoader 加…