题意:
使用 langchain 时,特别是在定义或调用嵌入函数(Embedding Function)时,签名(函数的参数列表和返回类型)不符合预期
问题背景:
When I try to pass a Chroma Client to Langchain that uses OpenAIEmbeddings
, I get a ValueError:
当我尝试将一个使用 OpenAIEmbeddings 的 Chroma Client 传递给 Langchain 时,我遇到了一个 ValueError:
ValueError: Expected EmbeddingFunction.__call__ to have the following signature: odict_keys(['self', 'input']), got odict_keys(['self', 'args', 'kwargs'])
How do I resolve this error? 怎样处理这个错误?
The error seems to be related to the fact that langchain's embedding function implementation doesn't meet the new requirements introduced by Chroma's latest update because the issue showed up after upgrading Chroma.
在升级 Chroma 之后出现的问题表明,langchain 的嵌入函数实现不符合 Chroma 最新更新引入的新要求。
My code:
import chromadb
from langchain_openai import OpenAIEmbeddings
client = chromadb.PersistentClient()
collection = client.get_or_create_collection(name='chroma', embedding_function=OpenAIEmbeddings()
)
I have langchain==0.1.1, langchain-openai==0.0.3 and chromadb==0.4.22. Looking into github issues, it seems downgrading chromadb to 0.4.15 solves the issue but since these libraries will upgrade even more in the coming months, I don't want to downgrade chroma but find a solution that works in the current version.
我当前使用的库版本是 langchain==0.1.1, langchain-openai==0.0.3 和 chromadb==0.4.22。在查看 GitHub 上的问题时,似乎将 chromadb 降级到 0.4.15 可以解决这个问题。但是,由于这些库在未来几个月内会进一步升级,我不想降级 chromadb,而是想找到一个在当前版本中也能工作的解决方案。
问题解决:
Since version 0.4.16(?), Chroma requires an embedding model that defines a __call__()
method that returns list of embeddings. It says as much in the migrations link shown in the error.
从版本 0.4.16(?)开始,Chroma 要求嵌入模型定义一个 __call__()
方法,该方法返回嵌入列表。错误中显示的迁移链接中明确指出了这一点。
Given that we need a method that returns a list of embeddings and it's already defined in OpenAIEmbeddings
(embed_documents()
), the easiest solution I found was to create a custom class that inherits from OpenAIEmbeddings
wherein a __call__
method that triggers a call to OpenAIEmbeddings.embed_documents
is defined.
鉴于我们需要一个返回嵌入列表的方法,并且这个方法已经在 OpenAIEmbeddings(embed_documents()
)中定义,我发现的最简单的解决方案是创建一个继承自 OpenAIEmbeddings 的自定义类,并在其中定义一个 __call__
方法,该方法触发对 OpenAIEmbeddings.embed_documents 的调用。
A small note: Unless you stored your OpenAI API Key in your .env file, you'll probably need to pass it as openai_api_key
parameter.
一个小提示:除非你已经将你的 OpenAI API 密钥存储在 .env
文件中,否则你可能需要将其作为 openai_api_key
参数传递。
import chromadb
from langchain_openai import OpenAIEmbeddingsclass CustomOpenAIEmbeddings(OpenAIEmbeddings):def __init__(self, openai_api_key, *args, **kwargs):super().__init__(openai_api_key=openai_api_key, *args, **kwargs)def _embed_documents(self, texts):return super().embed_documents(texts) # <--- use OpenAIEmbedding's embedding functiondef __call__(self, input):return self._embed_documents(input) # <--- get the embeddingsclient = chromadb.PersistentClient()
collection = client.get_or_create_collection(name='chroma', embedding_function=CustomOpenAIEmbeddings(openai_api_key="your very secret OpenAI api key") # <-- pass the new object instead of OpenAIEmbeddings()
)
Using OpenAI's Embedding object also works too (which can be accessed via self.client
). Basically we can define CustomOpenAIEmbeddings
like below by invoking the Embedding.create()
method in a loop like in this example use case.
使用 OpenAI 的 Embedding 对象也是可行的(可以通过 self.client
访问)。基本上,我们可以通过在循环中调用 Embedding.create() 方法(如本示例用例中所示)来定义 CustomOpenAIEmbeddings,如下所示:
class CustomOpenAIEmbeddings(OpenAIEmbeddings):def __init__(self, openai_api_key, *args, **kwargs):super().__init__(openai_api_key=openai_api_key, *args, **kwargs)def _embed_documents(self, texts):embeddings = [self.client.create(input=text, model="text-embedding-ada-002").data[0].embedding for text in texts]return embeddingsdef __call__(self, input):return self._embed_documents(input)