springai+pgvector+ollama实现rag

        首先在ollama中安装mofanke/dmeta-embedding-zh:latest。执行ollama run mofanke/dmeta-embedding-zh 。实现将文本转化为向量数据

        接着安装pgvector(建议使用pgadmin4作为可视化工具,用navicate会出现表不显示的问题)

        安装好需要的软件后我们开始编码操作。

1:在pom文件中加入:

        <!--用于连接pgsql--><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-jdbc</artifactId></dependency><!--用于使用pgvector来操作向量数据库--><dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId></dependency><!--pdf解析--><dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-pdf-document-reader</artifactId></dependency><!--文档解析l--><dependency><groupId>org.springframework.ai</groupId><artifactId>spring-ai-tika-document-reader</artifactId></dependency>

2:在yml中配置:

spring:datasource:url: jdbc:postgresql://127.0.0.1:5432/postgresusername: postgrespassword: passwordai:vectorstore:pgvector:dimensions: 768   #不同的embeddingmodel对应的值ollama:base-url: http://127.0.0.1:11434chat:enabled: trueoptions:model: qwen2:7bembedding:model: mofanke/dmeta-embedding-zh

3:在controller中加入:

   /*** 嵌入文件** @param file 待嵌入的文件* @return 是否成功*/@SneakyThrows@PostMapping("embedding")public List<Document> embedding(@RequestParam MultipartFile file) {// 从IO流中读取文件TikaDocumentReader tikaDocumentReader = new TikaDocumentReader(new InputStreamResource(file.getInputStream()));// 将文本内容划分成更小的块List<Document> splitDocuments = new TokenTextSplitter().apply(tikaDocumentReader.read());// 存入向量数据库,这个过程会自动调用embeddingModel,将文本变成向量再存入。vector.add(splitDocuments);return splitDocuments;}

调用上方的接口可以将文档转为向量数据存入到pgvector中

4:请求聊天,先根据聊天内容通过pgvector获取对应的数据,并将结果丢到qwen2模型中进行数据分析并返回结果

/*** 获取prompt** @param message 提问内容* @param context 上下文* @return prompt*/private String getChatPrompt2String(String message, String context) {String promptText = """请用仅用以下内容回答"%s" ,输出结果仅在以下内容中,输出内容仅以下内容,不需要其他描述词:%s""";return String.format(promptText, message, context);}@GetMapping("chatToPgVector")public String chatToPgVector(String message) {// 1. 定义提示词模板,question_answer_context会被替换成向量数据库中查询到的文档。String promptWithContext = """你是一个代码程序,你需要在文本中获取信息并输出成json格式的数据,下面是上下文信息---------------------{question_answer_context}---------------------给定的上下文和提供的历史信息,而不是事先的知识,回复用户的意见。如果答案不在上下文中,告诉用户你不能回答这个问题。""";//查询获取文档信息List<Document> documents = vector.similaritySearch(message,"test_store");//提取文本内容String content = documents.stream().map(Document::getContent).collect(Collectors.joining("\n"));System.out.println(content);//封装prompt并调用大模型String chatResponse = ollamaChatModel.call(getChatPrompt2String(message, content));return chatResponse;/*     return ChatClient.create(ollamaChatModel).prompt().user(message)// 2. QuestionAnswerAdvisor会在运行时替换模板中的占位符`question_answer_context`,替换成向量数据库中查询到的文档。此时的query=用户的提问+替换完的提示词模板;.advisors(new QuestionAnswerAdvisor(vectorStore, SearchRequest.defaults(), promptWithContext)).call().content();*/}

至此一个简单的rag搜索增强demo就完成了。接下来我们来看看PgVectorStore为我们做了什么

//
// Source code recreated from a .class file by IntelliJ IDEA
// (powered by FernFlower decompiler)
//package org.springframework.ai.vectorstore;import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.pgvector.PGvector;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.UUID;
import java.util.stream.IntStream;
import org.postgresql.util.PGobject;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.ai.document.Document;
import org.springframework.ai.embedding.EmbeddingModel;
import org.springframework.ai.vectorstore.filter.FilterExpressionConverter;
import org.springframework.ai.vectorstore.filter.converter.PgVectorFilterExpressionConverter;
import org.springframework.beans.factory.InitializingBean;
import org.springframework.jdbc.core.BatchPreparedStatementSetter;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.jdbc.core.RowMapper;
import org.springframework.jdbc.core.StatementCreatorUtils;
import org.springframework.lang.Nullable;
import org.springframework.util.StringUtils;public class PgVectorStore implements VectorStore, InitializingBean {private static final Logger logger = LoggerFactory.getLogger(PgVectorStore.class);public static final int OPENAI_EMBEDDING_DIMENSION_SIZE = 1536;public static final int INVALID_EMBEDDING_DIMENSION = -1;public static final String VECTOR_TABLE_NAME = "vector_store";public static final String VECTOR_INDEX_NAME = "spring_ai_vector_index";public final FilterExpressionConverter filterExpressionConverter;private final JdbcTemplate jdbcTemplate;private final EmbeddingModel embeddingModel;private int dimensions;private PgDistanceType distanceType;private ObjectMapper objectMapper;private boolean removeExistingVectorStoreTable;private PgIndexType createIndexMethod;private final boolean initializeSchema;public PgVectorStore(JdbcTemplate jdbcTemplate, EmbeddingModel embeddingModel) {this(jdbcTemplate, embeddingModel, -1, PgVectorStore.PgDistanceType.COSINE_DISTANCE, false, PgVectorStore.PgIndexType.NONE, false);}public PgVectorStore(JdbcTemplate jdbcTemplate, EmbeddingModel embeddingModel, int dimensions) {this(jdbcTemplate, embeddingModel, dimensions, PgVectorStore.PgDistanceType.COSINE_DISTANCE, false, PgVectorStore.PgIndexType.NONE, false);}public PgVectorStore(JdbcTemplate jdbcTemplate, EmbeddingModel embeddingModel, int dimensions, PgDistanceType distanceType, boolean removeExistingVectorStoreTable, PgIndexType createIndexMethod, boolean initializeSchema) {this.filterExpressionConverter = new PgVectorFilterExpressionConverter();this.objectMapper = new ObjectMapper();this.jdbcTemplate = jdbcTemplate;this.embeddingModel = embeddingModel;this.dimensions = dimensions;this.distanceType = distanceType;this.removeExistingVectorStoreTable = removeExistingVectorStoreTable;this.createIndexMethod = createIndexMethod;this.initializeSchema = initializeSchema;}public PgDistanceType getDistanceType() {return this.distanceType;}public void add(final List<Document> documents) {final int size = documents.size();this.jdbcTemplate.batchUpdate("INSERT INTO vector_store (id, content, metadata, embedding) VALUES (?, ?, ?::jsonb, ?) ON CONFLICT (id) DO UPDATE SET content = ? , metadata = ?::jsonb , embedding = ? ", new BatchPreparedStatementSetter() {public void setValues(PreparedStatement ps, int i) throws SQLException {Document document = (Document)documents.get(i);String content = document.getContent();String json = PgVectorStore.this.toJson(document.getMetadata());PGvector pGvector = new PGvector(PgVectorStore.this.toFloatArray(PgVectorStore.this.embeddingModel.embed(document)));StatementCreatorUtils.setParameterValue(ps, 1, Integer.MIN_VALUE, UUID.fromString(document.getId()));StatementCreatorUtils.setParameterValue(ps, 2, Integer.MIN_VALUE, content);StatementCreatorUtils.setParameterValue(ps, 3, Integer.MIN_VALUE, json);StatementCreatorUtils.setParameterValue(ps, 4, Integer.MIN_VALUE, pGvector);StatementCreatorUtils.setParameterValue(ps, 5, Integer.MIN_VALUE, content);StatementCreatorUtils.setParameterValue(ps, 6, Integer.MIN_VALUE, json);StatementCreatorUtils.setParameterValue(ps, 7, Integer.MIN_VALUE, pGvector);}public int getBatchSize() {return size;}});}private String toJson(Map<String, Object> map) {try {return this.objectMapper.writeValueAsString(map);} catch (JsonProcessingException var3) {throw new RuntimeException(var3);}}private float[] toFloatArray(List<Double> embeddingDouble) {float[] embeddingFloat = new float[embeddingDouble.size()];int i = 0;Double d;for(Iterator var4 = embeddingDouble.iterator(); var4.hasNext(); embeddingFloat[i++] = d.floatValue()) {d = (Double)var4.next();}return embeddingFloat;}public Optional<Boolean> delete(List<String> idList) {int updateCount = 0;int count;for(Iterator var3 = idList.iterator(); var3.hasNext(); updateCount += count) {String id = (String)var3.next();count = this.jdbcTemplate.update("DELETE FROM vector_store WHERE id = ?", new Object[]{UUID.fromString(id)});}return Optional.of(updateCount == idList.size());}public List<Document> similaritySearch(SearchRequest request) {String nativeFilterExpression = request.getFilterExpression() != null ? this.filterExpressionConverter.convertExpression(request.getFilterExpression()) : "";String jsonPathFilter = "";if (StringUtils.hasText(nativeFilterExpression)) {jsonPathFilter = " AND metadata::jsonb @@ '" + nativeFilterExpression + "'::jsonpath ";}double distance = 1.0 - request.getSimilarityThreshold();PGvector queryEmbedding = this.getQueryEmbedding(request.getQuery());return this.jdbcTemplate.query(String.format(this.getDistanceType().similaritySearchSqlTemplate, "vector_store", jsonPathFilter), new DocumentRowMapper(this.objectMapper), new Object[]{queryEmbedding, queryEmbedding, distance, request.getTopK()});}public List<Double> embeddingDistance(String query) {return this.jdbcTemplate.query("SELECT embedding " + this.comparisonOperator() + " ? AS distance FROM vector_store", new RowMapper<Double>() {@Nullablepublic Double mapRow(ResultSet rs, int rowNum) throws SQLException {return rs.getDouble("distance");}}, new Object[]{this.getQueryEmbedding(query)});}private PGvector getQueryEmbedding(String query) {List<Double> embedding = this.embeddingModel.embed(query);return new PGvector(this.toFloatArray(embedding));}private String comparisonOperator() {return this.getDistanceType().operator;}public void afterPropertiesSet() throws Exception {if (this.initializeSchema) {this.jdbcTemplate.execute("CREATE EXTENSION IF NOT EXISTS vector");this.jdbcTemplate.execute("CREATE EXTENSION IF NOT EXISTS hstore");this.jdbcTemplate.execute("CREATE EXTENSION IF NOT EXISTS \"uuid-ossp\"");if (this.removeExistingVectorStoreTable) {this.jdbcTemplate.execute("DROP TABLE IF EXISTS vector_store");}this.jdbcTemplate.execute(String.format("CREATE TABLE IF NOT EXISTS %s (\n\tid uuid DEFAULT uuid_generate_v4() PRIMARY KEY,\n\tcontent text,\n\tmetadata json,\n\tembedding vector(%d)\n)\n", "vector_store", this.embeddingDimensions()));if (this.createIndexMethod != PgVectorStore.PgIndexType.NONE) {this.jdbcTemplate.execute(String.format("CREATE INDEX IF NOT EXISTS %s ON %s USING %s (embedding %s)\n", "spring_ai_vector_index", "vector_store", this.createIndexMethod, this.getDistanceType().index));}}}int embeddingDimensions() {if (this.dimensions > 0) {return this.dimensions;} else {try {int embeddingDimensions = this.embeddingModel.dimensions();if (embeddingDimensions > 0) {return embeddingDimensions;}} catch (Exception var2) {logger.warn("Failed to obtain the embedding dimensions from the embedding model and fall backs to default:1536", var2);}return 1536;}}public static enum PgDistanceType {EUCLIDEAN_DISTANCE("<->", "vector_l2_ops", "SELECT *, embedding <-> ? AS distance FROM %s WHERE embedding <-> ? < ? %s ORDER BY distance LIMIT ? "),NEGATIVE_INNER_PRODUCT("<#>", "vector_ip_ops", "SELECT *, (1 + (embedding <#> ?)) AS distance FROM %s WHERE (1 + (embedding <#> ?)) < ? %s ORDER BY distance LIMIT ? "),COSINE_DISTANCE("<=>", "vector_cosine_ops", "SELECT *, embedding <=> ? AS distance FROM %s WHERE embedding <=> ? < ? %s ORDER BY distance LIMIT ? ");public final String operator;public final String index;public final String similaritySearchSqlTemplate;private PgDistanceType(String operator, String index, String sqlTemplate) {this.operator = operator;this.index = index;this.similaritySearchSqlTemplate = sqlTemplate;}}public static enum PgIndexType {NONE,IVFFLAT,HNSW;private PgIndexType() {}}private static class DocumentRowMapper implements RowMapper<Document> {private static final String COLUMN_EMBEDDING = "embedding";private static final String COLUMN_METADATA = "metadata";private static final String COLUMN_ID = "id";private static final String COLUMN_CONTENT = "content";private static final String COLUMN_DISTANCE = "distance";private ObjectMapper objectMapper;public DocumentRowMapper(ObjectMapper objectMapper) {this.objectMapper = objectMapper;}public Document mapRow(ResultSet rs, int rowNum) throws SQLException {String id = rs.getString("id");String content = rs.getString("content");PGobject pgMetadata = (PGobject)rs.getObject("metadata", PGobject.class);PGobject embedding = (PGobject)rs.getObject("embedding", PGobject.class);Float distance = rs.getFloat("distance");Map<String, Object> metadata = this.toMap(pgMetadata);metadata.put("distance", distance);Document document = new Document(id, content, metadata);document.setEmbedding(this.toDoubleList(embedding));return document;}private List<Double> toDoubleList(PGobject embedding) throws SQLException {float[] floatArray = (new PGvector(embedding.getValue())).toArray();return IntStream.range(0, floatArray.length).mapToDouble((i) -> {return (double)floatArray[i];}).boxed().toList();}private Map<String, Object> toMap(PGobject pgObject) {String source = pgObject.getValue();try {return (Map)this.objectMapper.readValue(source, Map.class);} catch (JsonProcessingException var4) {throw new RuntimeException(var4);}}}
}

我们可以看到PgVectorStore实现了InitializingBean并实现了afterPropertiesSet方法。它会在属性设置完成后执行。

 public void afterPropertiesSet() throws Exception {if (this.initializeSchema) {this.jdbcTemplate.execute("CREATE EXTENSION IF NOT EXISTS vector");this.jdbcTemplate.execute("CREATE EXTENSION IF NOT EXISTS hstore");this.jdbcTemplate.execute("CREATE EXTENSION IF NOT EXISTS \"uuid-ossp\"");if (this.removeExistingVectorStoreTable) {this.jdbcTemplate.execute("DROP TABLE IF EXISTS vector_store");}this.jdbcTemplate.execute(String.format("CREATE TABLE IF NOT EXISTS %s (\n\tid uuid DEFAULT uuid_generate_v4() PRIMARY KEY,\n\tcontent text,\n\tmetadata json,\n\tembedding vector(%d)\n)\n", "vector_store", this.embeddingDimensions()));if (this.createIndexMethod != PgVectorStore.PgIndexType.NONE) {this.jdbcTemplate.execute(String.format("CREATE INDEX IF NOT EXISTS %s ON %s USING %s (embedding %s)\n", "spring_ai_vector_index", "vector_store", this.createIndexMethod, this.getDistanceType().index));}}}

这里它会根据initializeSchema(在PgVectorStoreProperties中,默认为true,我们可以yml中配置spring:ai:vectorstore:pgvector:initialize-schema:false来禁用)来判断是否帮我们建表。这里他会帮我们建一个叫vector_store的表,其中包含id(uuid),metadate(json),content(text),embedding(vector(1536))。这里1536指的就是dimensions的值。当我们用默认建的表去做pgvector的诗句存储时会出现 ERROR: expected 1536 dimensions, not 768这样的报错,就是表示我们ollama中的embedding模型输出的dimensions是768,而pgvector中的embedding是1536,他们不匹配所以无法存储。这时我们需要去pgvector中修改embedding字段的token数为768即可(这里不同模型返回的dimension值不一样,可以根据报错信息自行调整)

接下来我们看一下核心的操作方法-向数据库中插入数据

 public void add(final List<Document> documents) {final int size = documents.size();this.jdbcTemplate.batchUpdate("INSERT INTO vector_store (id, content, metadata, embedding) VALUES (?, ?, ?::jsonb, ?) ON CONFLICT (id) DO UPDATE SET content = ? , metadata = ?::jsonb , embedding = ? ", new BatchPreparedStatementSetter() {public void setValues(PreparedStatement ps, int i) throws SQLException {Document document = (Document)documents.get(i);String content = document.getContent();String json = PgVectorStore.this.toJson(document.getMetadata());PGvector pGvector = new PGvector(PgVectorStore.this.toFloatArray(PgVectorStore.this.embeddingModel.embed(document)));StatementCreatorUtils.setParameterValue(ps, 1, Integer.MIN_VALUE, UUID.fromString(document.getId()));StatementCreatorUtils.setParameterValue(ps, 2, Integer.MIN_VALUE, content);StatementCreatorUtils.setParameterValue(ps, 3, Integer.MIN_VALUE, json);StatementCreatorUtils.setParameterValue(ps, 4, Integer.MIN_VALUE, pGvector);StatementCreatorUtils.setParameterValue(ps, 5, Integer.MIN_VALUE, content);StatementCreatorUtils.setParameterValue(ps, 6, Integer.MIN_VALUE, json);StatementCreatorUtils.setParameterValue(ps, 7, Integer.MIN_VALUE, pGvector);}public int getBatchSize() {return size;}});}

这里因为Springai刚出,也不是稳定版的,它在代码中直接写死了操作表。我们使用pgvectorStore时只能对vector_store进行操作,这在实际应用场景中可能会造成一定的局限性。所以我们可以自己写一个扩展操作类来替换它。如下:

package com.lccloud.tenderdocument.vector;import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.pgvector.PGvector;
import org.postgresql.util.PGobject;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.ai.document.Document;
import org.springframework.ai.embedding.EmbeddingModel;
import org.springframework.ai.vectorstore.PgVectorStore;
import org.springframework.ai.vectorstore.SearchRequest;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.ai.vectorstore.filter.FilterExpressionConverter;
import org.springframework.ai.vectorstore.filter.converter.PgVectorFilterExpressionConverter;
import org.springframework.beans.factory.InitializingBean;
import org.springframework.jdbc.core.BatchPreparedStatementSetter;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.jdbc.core.RowMapper;
import org.springframework.jdbc.core.StatementCreatorUtils;
import org.springframework.lang.Nullable;
import org.springframework.util.StringUtils;import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.*;
import java.util.stream.IntStream;public class ExtendPgVectorStore  {private static final Logger logger = LoggerFactory.getLogger(ExtendPgVectorStore.class);public final FilterExpressionConverter filterExpressionConverter;private final JdbcTemplate jdbcTemplate;private final EmbeddingModel embeddingModel;private int dimensions;private PgVectorStore.PgDistanceType distanceType;private ObjectMapper objectMapper;private boolean removeExistingVectorStoreTable;private PgVectorStore.PgIndexType createIndexMethod;private final boolean initializeSchema;public ExtendPgVectorStore(JdbcTemplate jdbcTemplate, EmbeddingModel embeddingModel) {this(jdbcTemplate, embeddingModel, -1, PgVectorStore.PgDistanceType.COSINE_DISTANCE, false, PgVectorStore.PgIndexType.NONE, false);}public ExtendPgVectorStore(JdbcTemplate jdbcTemplate, EmbeddingModel embeddingModel, int dimensions) {this(jdbcTemplate, embeddingModel, dimensions, PgVectorStore.PgDistanceType.COSINE_DISTANCE, false, PgVectorStore.PgIndexType.NONE, false);}public ExtendPgVectorStore(JdbcTemplate jdbcTemplate, EmbeddingModel embeddingModel, int dimensions, PgVectorStore.PgDistanceType distanceType, boolean removeExistingVectorStoreTable, PgVectorStore.PgIndexType createIndexMethod, boolean initializeSchema) {this.filterExpressionConverter = new PgVectorFilterExpressionConverter();this.objectMapper = new ObjectMapper();this.jdbcTemplate = jdbcTemplate;this.embeddingModel = embeddingModel;this.dimensions = dimensions;this.distanceType = distanceType;this.removeExistingVectorStoreTable = removeExistingVectorStoreTable;this.createIndexMethod = createIndexMethod;this.initializeSchema = initializeSchema;}public PgVectorStore.PgDistanceType getDistanceType() {return this.distanceType;}public void add(final List<Document> documents,String tableName) {final int size = documents.size();this.jdbcTemplate.batchUpdate("INSERT INTO "+ tableName+" (id, content, metadata, embedding) VALUES (?, ?, ?::jsonb, ?) ON CONFLICT (id) DO UPDATE SET content = ? , metadata = ?::jsonb , embedding = ? ", new BatchPreparedStatementSetter() {public void setValues(PreparedStatement ps, int i) throws SQLException {Document document = (Document)documents.get(i);String content = document.getContent();String json = ExtendPgVectorStore.this.toJson(document.getMetadata());PGvector pGvector = new PGvector(ExtendPgVectorStore.this.toFloatArray(ExtendPgVectorStore.this.embeddingModel.embed(document)));StatementCreatorUtils.setParameterValue(ps, 1, Integer.MIN_VALUE, UUID.fromString(document.getId()));StatementCreatorUtils.setParameterValue(ps, 2, Integer.MIN_VALUE, content);StatementCreatorUtils.setParameterValue(ps, 3, Integer.MIN_VALUE, json);StatementCreatorUtils.setParameterValue(ps, 4, Integer.MIN_VALUE, pGvector);StatementCreatorUtils.setParameterValue(ps, 5, Integer.MIN_VALUE, content);StatementCreatorUtils.setParameterValue(ps, 6, Integer.MIN_VALUE, json);StatementCreatorUtils.setParameterValue(ps, 7, Integer.MIN_VALUE, pGvector);}public int getBatchSize() {return size;}});}private String toJson(Map<String, Object> map) {try {return this.objectMapper.writeValueAsString(map);} catch (JsonProcessingException var3) {throw new RuntimeException(var3);}}private float[] toFloatArray(List<Double> embeddingDouble) {float[] embeddingFloat = new float[embeddingDouble.size()];int i = 0;Double d;for(Iterator var4 = embeddingDouble.iterator(); var4.hasNext(); embeddingFloat[i++] = d.floatValue()) {d = (Double)var4.next();}return embeddingFloat;}public Optional<Boolean> delete(List<String> idList,String tableName) {int updateCount = 0;int count;for(Iterator var3 = idList.iterator(); var3.hasNext(); updateCount += count) {String id = (String)var3.next();count = this.jdbcTemplate.update("DELETE FROM "+tableName+" WHERE id = ?", new Object[]{UUID.fromString(id)});}return Optional.of(updateCount == idList.size());}public List<Document> similaritySearch(String query,String tableName) {return this.similaritySearch(SearchRequest.query(query),tableName);}public List<Document> similaritySearch(SearchRequest request,String tableName) {String nativeFilterExpression = request.getFilterExpression() != null ? this.filterExpressionConverter.convertExpression(request.getFilterExpression()) : "";String jsonPathFilter = "";if (StringUtils.hasText(nativeFilterExpression)) {jsonPathFilter = " AND metadata::jsonb @@ '" + nativeFilterExpression + "'::jsonpath ";}double distance = 1.0 - request.getSimilarityThreshold();PGvector queryEmbedding = this.getQueryEmbedding(request.getQuery());return this.jdbcTemplate.query(String.format(this.getDistanceType().similaritySearchSqlTemplate, tableName, jsonPathFilter), new ExtendPgVectorStore.DocumentRowMapper(this.objectMapper), new Object[]{queryEmbedding, queryEmbedding, distance, request.getTopK()});}public List<Double> embeddingDistance(String query,String tableName) {return this.jdbcTemplate.query("SELECT embedding " + this.comparisonOperator() + " ? AS distance FROM vector_store", new RowMapper<Double>() {@Nullablepublic Double mapRow(ResultSet rs, int rowNum) throws SQLException {return rs.getDouble("distance");}}, new Object[]{this.getQueryEmbedding(query)});}private PGvector getQueryEmbedding(String query) {List<Double> embedding = this.embeddingModel.embed(query);return new PGvector(this.toFloatArray(embedding));}private String comparisonOperator() {return this.getDistanceType().operator;}/*    public void afterPropertiesSet() throws Exception {if (this.initializeSchema) {this.jdbcTemplate.execute("CREATE EXTENSION IF NOT EXISTS vector");this.jdbcTemplate.execute("CREATE EXTENSION IF NOT EXISTS hstore");this.jdbcTemplate.execute("CREATE EXTENSION IF NOT EXISTS \"uuid-ossp\"");if (this.removeExistingVectorStoreTable) {this.jdbcTemplate.execute("DROP TABLE IF EXISTS vector_store");}this.jdbcTemplate.execute(String.format("CREATE TABLE IF NOT EXISTS %s (\n\tid uuid DEFAULT uuid_generate_v4() PRIMARY KEY,\n\tcontent text,\n\tmetadata json,\n\tembedding vector(%d)\n)\n", "vector_store", this.embeddingDimensions()));if (this.createIndexMethod != PgVectorStore.PgIndexType.NONE) {this.jdbcTemplate.execute(String.format("CREATE INDEX IF NOT EXISTS %s ON %s USING %s (embedding %s)\n", "spring_ai_vector_index", "vector_store", this.createIndexMethod, this.getDistanceType().index));}}}*/int embeddingDimensions() {if (this.dimensions > 0) {return this.dimensions;} else {try {int embeddingDimensions = this.embeddingModel.dimensions();if (embeddingDimensions > 0) {return embeddingDimensions;}} catch (Exception var2) {logger.warn("Failed to obtain the embedding dimensions from the embedding model and fall backs to default:1536", var2);}return 1536;}}public static enum PgDistanceType {EUCLIDEAN_DISTANCE("<->", "vector_l2_ops", "SELECT *, embedding <-> ? AS distance FROM %s WHERE embedding <-> ? < ? %s ORDER BY distance LIMIT ? "),NEGATIVE_INNER_PRODUCT("<#>", "vector_ip_ops", "SELECT *, (1 + (embedding <#> ?)) AS distance FROM %s WHERE (1 + (embedding <#> ?)) < ? %s ORDER BY distance LIMIT ? "),COSINE_DISTANCE("<=>", "vector_cosine_ops", "SELECT *, embedding <=> ? AS distance FROM %s WHERE embedding <=> ? < ? %s ORDER BY distance LIMIT ? ");public final String operator;public final String index;public final String similaritySearchSqlTemplate;private PgDistanceType(String operator, String index, String sqlTemplate) {this.operator = operator;this.index = index;this.similaritySearchSqlTemplate = sqlTemplate;}}public static enum PgIndexType {NONE,IVFFLAT,HNSW;private PgIndexType() {}}private static class DocumentRowMapper implements RowMapper<Document> {private static final String COLUMN_EMBEDDING = "embedding";private static final String COLUMN_METADATA = "metadata";private static final String COLUMN_ID = "id";private static final String COLUMN_CONTENT = "content";private static final String COLUMN_DISTANCE = "distance";private ObjectMapper objectMapper;public DocumentRowMapper(ObjectMapper objectMapper) {this.objectMapper = objectMapper;}public Document mapRow(ResultSet rs, int rowNum) throws SQLException {String id = rs.getString("id");String content = rs.getString("content");PGobject pgMetadata = (PGobject)rs.getObject("metadata", PGobject.class);PGobject embedding = (PGobject)rs.getObject("embedding", PGobject.class);Float distance = rs.getFloat("distance");Map<String, Object> metadata = this.toMap(pgMetadata);metadata.put("distance", distance);Document document = new Document(id, content, metadata);document.setEmbedding(this.toDoubleList(embedding));return document;}private List<Double> toDoubleList(PGobject embedding) throws SQLException {float[] floatArray = (new PGvector(embedding.getValue())).toArray();return IntStream.range(0, floatArray.length).mapToDouble((i) -> {return (double)floatArray[i];}).boxed().toList();}private Map<String, Object> toMap(PGobject pgObject) {String source = pgObject.getValue();try {return (Map)this.objectMapper.readValue(source, Map.class);} catch (JsonProcessingException var4) {throw new RuntimeException(var4);}}}
}

当我们要使用上面这个ExtendPgVectorStore进行操作时首先我们要排除掉原PgVectorStore的注入。

接着我们需要注入自己的ExtendPgVectorStore类


import com.lccloud.tenderdocument.vector.ExtendPgVectorStore;
import org.springframework.ai.autoconfigure.vectorstore.pgvector.PgVectorStoreProperties;
import org.springframework.ai.embedding.EmbeddingModel;
import org.springframework.ai.transformer.splitter.TokenTextSplitter;
import org.springframework.boot.autoconfigure.AutoConfigureAfter;
import org.springframework.boot.autoconfigure.jdbc.JdbcTemplateAutoConfiguration;
import org.springframework.boot.context.properties.EnableConfigurationProperties;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.jdbc.core.JdbcTemplate;@Configuration
@AutoConfigureAfter(JdbcTemplateAutoConfiguration.class)
@EnableConfigurationProperties({PgVectorStoreProperties.class})
public class PgVectorConfig {public PgVectorConfig() {}/*** 向量数据库进行检索操作* @param jdbcTemplate* @return*/@Beanpublic ExtendPgVectorStore vectorStore(JdbcTemplate jdbcTemplate, EmbeddingModel embeddingModel, PgVectorStoreProperties properties) {boolean initializeSchema = properties.isInitializeSchema();return new ExtendPgVectorStore(jdbcTemplate, embeddingModel, properties.getDimensions(), properties.getDistanceType(), properties.isRemoveExistingVectorStoreTable(), properties.getIndexType(), initializeSchema);}/*** 文本分割器* @return*/@Beanpublic TokenTextSplitter tokenTextSplitter() {return new TokenTextSplitter();}
}

上面这里的PgVectorStoreProperties也可以换成我们自己的类方法(这里我懒得换就用pgVectorStore自带的了)。然后我们在使用的时候就可以注入ExtendPgVectorStore进行操作了。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/pingmian/41352.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

【Linux进阶】磁盘分区3——目录树,挂载

Linux安装模式下&#xff0c;磁盘分区的选择&#xff08;极重要&#xff09; 在Windows 系统重新安装之前&#xff0c;你可能会事先考虑&#xff0c;到底系统盘C盘要有多大容量&#xff1f;而数据盘D盘又要给多大容量等&#xff0c;然后实际安装的时候&#xff0c;你会发现其实…

CV02_超强数据集:MSCOCO数据集的简单介绍

1.1 简介 MSCOCO数据集&#xff0c;全称为Microsoft Common Objects in Context&#xff0c;是由微软公司在2014年推出并维护的一个大规模的图像数据集&#xff0c;旨在推动计算机视觉领域的研究&#xff0c;尤其是目标识别、目标检测、实例分割、图像描述生成等任务。该数据集…

Qt项目:基于Qt实现的网络聊天室---注册模块

文章目录 基本页面设计创建登录界面创建注册界面优化样式完善注册类界面 客户端逻辑完善客户端增加post逻辑客户端配置管理 邮箱注册服务认证服务读取配置邮箱验证服务联调设置验证码过期封装redis操作类封装redis连接池注册功能Server端接受注册请求封装mysql连接池封装DAO操作…

鸿蒙开发:Universal Keystore Kit(密钥管理服务)【密钥导入介绍及算法规格】

密钥导入介绍及算法规格 如果业务在HUKS外部生成密钥&#xff08;比如应用间协商生成、服务器端生成&#xff09;&#xff0c;业务可以将密钥导入到HUKS中由HUKS进行管理。密钥一旦导入到HUKS中&#xff0c;在密钥的生命周期内&#xff0c;其明文仅在安全环境中进行访问操作&a…

【CV炼丹师勇闯力扣训练营 Day24:§7 回溯3】

CV炼丹师勇闯力扣训练营 代码随想录算法训练营第24天 93 复原IP地址 有效 IP 地址 正好由四个整数&#xff08;每个整数位于 0 到 255 之间组成&#xff0c;且不能含有前导 0&#xff09;&#xff0c;整数之间用 ‘.’ 分隔。 例如&#xff1a;“0.1.2.201” 和 “192.168.…

JavaEE——计算机工作原理

冯诺依曼体系&#xff08;VonNeumannArchitecture&#xff09; 现代计算机&#xff0c;大多遵守冯诺依曼体系结构 CPU中央处理器&#xff1a;进行算术运算与逻辑判断 存储器&#xff1a;分为外存和内存&#xff0c;用于存储数据&#xff08;使用二进制存储&#xff09; 输入…

鸿蒙开发设备管理:【@ohos.account.appAccount (应用帐号管理)】

应用帐号管理 说明&#xff1a; 本模块首批接口从API version 7开始支持。后续版本的新增接口&#xff0c;采用上角标单独标记接口的起始版本。开发前请熟悉鸿蒙开发指导文档&#xff1a;gitee.com/li-shizhen-skin/harmony-os/blob/master/README.md点击或者复制转到。 导入模…

景区智慧公厕解决方案,公厕革命新方式

在智慧旅游的浪潮下&#xff0c;景区智慧公厕解决方案正悄然引领着一场公厕革命&#xff0c;不仅革新了传统公厕的管理模式&#xff0c;更以智能化、人性化的服务理念&#xff0c;为游客提供了前所未有的舒适体验。作为智慧城市建设的重要一环&#xff0c;智慧公厕解决方案正逐…

计算机网络之以太网

上文内容&#xff1a;总线局域网以及冲突的解决方法 1.以太网的起源 1.1起源 60年代末期&#xff0c;夏威夷大学Norman Abramson等研制ALOHA无线网络系统,实现Oahu岛上的主机和其它岛及船上的读卡机和终端通信&#xff1b; 出境信道地址&#xff1a;主机到终端&#xff1…

如何利用好用便签提高工作效率?

在忙碌的工作中&#xff0c;我们经常需要记住许多琐碎的任务。如果这些任务被遗忘&#xff0c;可能会对我们的工作产生影响。这时&#xff0c;便签就成为了我们的得力助手。通过合理的使用和管理&#xff0c;便签不仅能帮助我们记住重要的事项&#xff0c;还能提高我们的工作效…

Redis基础教程(十四):Redis连接

&#x1f49d;&#x1f49d;&#x1f49d;首先&#xff0c;欢迎各位来到我的博客&#xff0c;很高兴能够在这里和您见面&#xff01;希望您在这里不仅可以有所收获&#xff0c;同时也能感受到一份轻松欢乐的氛围&#xff0c;祝你生活愉快&#xff01; &#x1f49d;&#x1f49…

在Windows可以如此丝滑的测试ios应用

在没有Mac本的时候,又想测试iphone或者ipad该怎么办? 最简单的办法当然是买一个了,如果经济上觉得不划算的话,不妨看看这篇文章,或许能帮到您,有任何问题欢迎一起交流。 原理图 开发环境 操作系统:Windows11 被测设备: iPad mini 15.5 注意事项 一定要安装iTunes!一…

NET程序开发可能会用到的一些资料文档

NET程序开发使用的一些资料文件&#xff0c;NET高级调试&#xff0c;NET关键技术深入解析&#xff0c;WPF专业编程指南&#xff0c;程序员求职攻略&#xff0c;WPF编程宝典等。 下载链接&#xff1a;https://download.csdn.net/download/qq_43307934/89518582

GPT-4o将改变论文学术文风,科学家揭示5年内百万篇论文“is”“are”词频减少10%!

在最近的一项研究中&#xff0c;意大利国际高等研究院&#xff08;SISSA&#xff09;的博士生耿明萌量化了 ChatGPT 对学术论文写作的影响。 与以往主要分析 ChatGPT 生成的段落或文章不同&#xff0c;这次研究更关注整体情况。 论文地址&#xff1a;https://arxiv.org/pdf/2…

SpringBoot怎么单独关闭某个类打印出来的日志?

application.yml文件增加以下内容&#xff1a; logging:level:org.springframework.amgp.rabbit: OFF 配置logging:level是配置的什么&#xff1f; 在application.yml文件中配置logging.level是用来设置日志级别的。这是Spring Boot应用中的一个常用配置&#xff0c;它允许您…

Spring AOP实现操作日志记录示例

1. 准备工作 项目环境&#xff1a;jdk8springboot2.6.13mysql8 1.1 MySQL表 /*Navicat Premium Data TransferSource Server : localhostSource Server Type : MySQLSource Server Version : 50730Source Host : 127.0.0.1:3306Source Schema …

双扩散金属氧化物半导体(DMOS)应用广泛 超结VDMOS市场需求空间大

双扩散金属氧化物半导体&#xff08;DMOS&#xff09;应用广泛 超结VDMOS市场需求空间大 双扩散金属氧化物半导体简称DMOS&#xff0c;是MOS管的一种。MOS管全称为金属氧化物半导体场效应管&#xff0c;又称为MOSFET&#xff0c;是一种利用改变电压来控制电流的半导体器件。  …

《梦醒蝶飞:释放Excel函数与公式的力量》8.8 STDEVP函数

8.8 STDEVP函数 STDEVP函数是Excel中用于计算总体数据的标准偏差的函数。标准偏差是统计学中的一个重要指标&#xff0c;用于衡量数据集中各数值偏离平均值的程度。总体标准偏差考虑了整个数据集&#xff0c;而不是样本。 8.8.1 函数简介 STDEVP函数用于返回总体数据的标准偏…

Redis 中的通用命令(命令的返回值、复杂度、注意事项及操作演示)

Redis 中的通用命令(高频率操作) 文章目录 Redis 中的通用命令(高频率操作)Redis 的数据类型redis-cli 命令Keys 命令Exists 命令Expire 命令Ttl 命令Type命令 Redis 的数据类型 Redis 支持多种数据类型&#xff0c;整体来说&#xff0c;Redis 是一个键值对结构的&#xff0c;…

第N7周:seq2seq翻译实战-pytorch复现-小白版

&#x1f368; 本文为&#x1f517;365天深度学习训练营 中的学习记录博客&#x1f356; 原作者&#xff1a;K同学啊 理论基础 seq2seq&#xff08;Sequence-to-Sequence&#xff09;模型是一种用于机器翻译、文本摘要等序列转换任务的框架。它由两个主要的递归神经网络&#…