Oracle AI Vector Search 支持使用 SQL 生成向量和计算向量相似度
- 0. 事情准备
- 1. 使用 SQL 生成向量数据
- 2. 使用 SQL 计算欧氏距离(Euclidean distance)
- 3. 使用 SQL 计算余弦相似度(Cosine similarity)
- 4. 使用 SQL 计算点积相似度 (Dot Product Similarity)
- 5. 使用 SQL 计算曼哈顿距离 (Manhattan Distance)
- 6. 使用 VECTOR_DISTANCE() 函数
注意:生成向量需要使用 Embedding 模型,1是使用内嵌 ONNX 格式模型,2是使用 OCI Cohere Embedding 模型。
0. 事情准备
设置OCI Cohere 认证信息,
exec dbms_vector.drop_credential('OCI_CRED');
declarejo json_object_t;
begin-- create an OCI credentialjo := json_object_t();jo.put('user_ocid', 'user ocid value');jo.put('tenancy_ocid', 'tenancy ocid value');jo.put('compartment_ocid', 'compartment ocid value');jo.put('private_key', 'private key value');jo.put('fingerprint', 'fingerprint value');dbms_output.put_line(jo.to_string);dbms_vector.create_credential(credential_name => 'OCI_CRED',params => json(jo.to_string));
end;
/var embed_genai_params clob;
exec :embed_genai_params := '{"provider": "ocigenai", "credential_name": "OCI_CRED", "url": "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com/20231130/actions/embedText", "model": "cohere.embed-multilingual-v3.0"}';
1. 使用 SQL 生成向量数据
SELECTet.embed_id,et.embed_data,to_vector(et.embed_vector) embed_vector
FROMdbms_vector_chain.utl_to_embeddings('hello', JSON(:embed_genai_params)) t,JSON_TABLE ( t.column_value, '$[*]'COLUMNS (embed_id NUMBER PATH '$.embed_id',embed_data VARCHAR2 ( 4000 ) PATH '$.embed_data',embed_vector CLOB PATH '$.embed_vector'))et;
2. 使用 SQL 计算欧氏距离(Euclidean distance)
SELECTl2_distance(t1.embed_vector, t2.embed_vector)
FROM(SELECTto_vector(et.embed_vector) embed_vectorFROMdbms_vector_chain.utl_to_embeddings('hello', JSON(:embed_genai_params)) t,JSON_TABLE ( t.column_value, '$[*]'COLUMNS (embed_id NUMBER PATH '$.embed_id',embed_data VARCHAR2 ( 4000 ) PATH '$.embed_data',embed_vector CLOB PATH '$.embed_vector'))et) t1,(SELECTto_vector(et.embed_vector) embed_vectorFROMdbms_vector_chain.utl_to_embeddings('morning', JSON(:embed_genai_params)) t,JSON_TABLE ( t.column_value, '$[*]'COLUMNS (embed_id NUMBER PATH '$.embed_id',embed_data VARCHAR2 ( 4000 ) PATH '$.embed_data',embed_vector CLOB PATH '$.embed_vector'))et) t2;
输出结果,
L2_DISTANCE(T1.EMBED_VECTOR,T2.EMBED_VECTOR)
--------------------------------------------8.01E-001
3. 使用 SQL 计算余弦相似度(Cosine similarity)
SELECTcosine_distance(t1.embed_vector, t2.embed_vector)
FROM(SELECTto_vector(et.embed_vector) embed_vectorFROMdbms_vector_chain.utl_to_embeddings('hello', JSON(:embed_genai_params)) t,JSON_TABLE ( t.column_value, '$[*]'COLUMNS (embed_id NUMBER PATH '$.embed_id',embed_data VARCHAR2 ( 4000 ) PATH '$.embed_data',embed_vector CLOB PATH '$.embed_vector'))et) t1,(SELECTto_vector(et.embed_vector) embed_vectorFROMdbms_vector_chain.utl_to_embeddings('morning', JSON(:embed_genai_params)) t,JSON_TABLE ( t.column_value, '$[*]'COLUMNS (embed_id NUMBER PATH '$.embed_id',embed_data VARCHAR2 ( 4000 ) PATH '$.embed_data',embed_vector CLOB PATH '$.embed_vector'))et) t2;
输出结果,
COSINE_DISTANCE(T1.EMBED_VECTOR,T2.EMBED_VECTOR)
------------------------------------------------3.208E-001
4. 使用 SQL 计算点积相似度 (Dot Product Similarity)
SELECTinner_product(t1.embed_vector, t2.embed_vector)
FROM(SELECTto_vector(et.embed_vector) embed_vectorFROMdbms_vector_chain.utl_to_embeddings('hello', JSON(:embed_genai_params)) t,JSON_TABLE ( t.column_value, '$[*]'COLUMNS (embed_id NUMBER PATH '$.embed_id',embed_data VARCHAR2 ( 4000 ) PATH '$.embed_data',embed_vector CLOB PATH '$.embed_vector'))et) t1,(SELECTto_vector(et.embed_vector) embed_vectorFROMdbms_vector_chain.utl_to_embeddings('morning', JSON(:embed_genai_params)) t,JSON_TABLE ( t.column_value, '$[*]'COLUMNS (embed_id NUMBER PATH '$.embed_id',embed_data VARCHAR2 ( 4000 ) PATH '$.embed_data',embed_vector CLOB PATH '$.embed_vector'))et) t2;
输出结果,
INNER_PRODUCT(T1.EMBED_VECTOR,T2.EMBED_VECTOR)
----------------------------------------------6.791E-001
5. 使用 SQL 计算曼哈顿距离 (Manhattan Distance)
SELECTl1_distance(t1.embed_vector, t2.embed_vector)
FROM(SELECTto_vector(et.embed_vector) embed_vectorFROMdbms_vector_chain.utl_to_embeddings('hello', JSON(:embed_genai_params)) t,JSON_TABLE ( t.column_value, '$[*]'COLUMNS (embed_id NUMBER PATH '$.embed_id',embed_data VARCHAR2 ( 4000 ) PATH '$.embed_data',embed_vector CLOB PATH '$.embed_vector'))et) t1,(SELECTto_vector(et.embed_vector) embed_vectorFROMdbms_vector_chain.utl_to_embeddings('morning', JSON(:embed_genai_params)) t,JSON_TABLE ( t.column_value, '$[*]'COLUMNS (embed_id NUMBER PATH '$.embed_id',embed_data VARCHAR2 ( 4000 ) PATH '$.embed_data',embed_vector CLOB PATH '$.embed_vector'))et) t2;
输出结果,
L1_DISTANCE(T1.EMBED_VECTOR,T2.EMBED_VECTOR)
--------------------------------------------2.001E+001
6. 使用 VECTOR_DISTANCE() 函数
- VECTOR_DISTANCE(expr1, expr2, EUCLIDEAN) is equivalent to L2_DISTANCE(expr1, expr2), is equivanlent to
expr1 <-1> expr2
- VECTOR_DISTANCE(expr1, expr2, COSINE) is equivalent to COSINE_DISTANCE(expr1, expr2), is equivanlent to
expr1 <=> expr2
- VECTOR_DISTANCE(expr1, expr2, DOT) is equivalent to -1*INNER_PRODUCT(expr1, expr2), is equivanlent to
expr1 <#> expr2
- VECTOR_DISTANCE(expr1, expr2, MANHANTTANT) is equivalent to L1_DISTANCE(expr1, expr2)
完结!