How does LangChain work?
LangChain是如何工作的?
Let’s consider our initial example where we upload the US Constitution PDF and pose questions to it. In this scenario, LangChain compiles the data from the PDF and organizes it.
让我们考虑我们最初的例子,我们上传美国宪法PDF并向它提出问题。在这个场景中,LangChain编译PDF中的数据并对其进行组织。
Although we’ve mentioned a PDF, the data source could be diverse: a text file, a Microsoft Word document, a YouTube transcript, a website, and more. LangChain collates this data, subsequently dividing it into manageable chunks. Once segmented, these chunks are saved in a vector store.
虽然我们已经提到了PDF,但是数据源可以是多种多样的:文本文件、Microsoft Word文档、YouTube记录、网站等等。LangChain整理这些数据,然后将其分成可管理的块。一旦分割,这些块就保存在矢量存储中。
For illustration, let’s say our PDF contains the text displayed on the left, termed the text corpus. LangChain will segment this text into smaller chunks.
举例来说,假设我们的PDF包含显示在左侧的文本,称为文本语料库。LangChain将此文本分割成更小的块。
For instance, one chunk might read “GPT is a great conversational tool …”, followed by “if you ask ChatGPT about general knowledge …”. These are distinct sections of the chunked text.
例如,一个块可能是“GPT是一个伟大的会话工具……”,然后是“如果你问ChatGPT关于一般知识……”。这些是分块文本的不同部分。
Our large language model (LLM) will then transform this chunked text into embeddings.
然后,我们的大型语言模型(LLM)将把这些分块文本转换为嵌入。
Embeddings are simply numerical representations of the data. The text chunk is rendered into a numerical vector, which is then stored in a vector store/database.
嵌入只是数据的数字表示。文本块被渲染成一个数字向量,然后存储在一个向量存储库/数据库中。
You might question the necessity of this conversion. Data is multifaceted, encompassing text, images, audio, and video. To assign meaningful interpretations to such diverse content forms, they must be translated into numerical vectors.
你可能会质疑这种转换的必要性。数据是多方面的,包括文本、图像、音频和视频。为了给如此多样的内容形式赋予有意义的解释,它们必须被转换成数字载体。
This translation from chunks to embeddings employs various machine learning algorithms. These algorithms categorize the data, and the resultant classifications are saved in the vector database.
这种从块到嵌入的转换使用了各种机器学习算法。这些算法对数据进行分类,并将分类结果保存在向量数据库中。
Querying the Vector Store and Generating a Completion with a LLM.
使用LLM查询Vector Store和生成补全。
When a user poses a question, this inquiry is also converted into an embedding, anumerical vector representation. This vector is juxtaposed with existing vectors in the database to perform a similarity search. The database identifies vectors most akin to the query and retrieves the chunks from which these vectors originated.
当用户提出问题时,该查询也被转换为嵌入的数值向量表示。该向量与数据库中的现有向量并置以执行相似性搜索。数据库识别与查询最相似的向量,并检索这些向量源自的块。
With existing vectors in the database to perform a similarity search. The database identifies vectors most akin to the query and retrieves the chunks from which these vectors originated.
与数据库中已有的向量执行相似度搜索。数据库识别与查询最相似的向量,并检索这些向量源自的块。
After extracting the pertinent chunks taht generated those vectors. We generate a completion with our LLM and subsequently produce a response.
在提取生成这些向量的相关块之后。我们用LLM生成一个完成,然后生成一个响应。
In all, this turns your document into a mini Google search engine, enabling query-based searches.
总之,这将把您的文档变成一个迷你的Google搜索引擎,支持基于查询的搜索。
Overall, LangChain can be harnessed to craft automated apps or workflows.
总的来说,LangChain可以用来制作自动化的应用程序或工作流。
For instance, one could design a YouTube script generator or a medium article script generator, we’ll build one in the next chapter. LangChain can also be employed as a web research tool, capable of summarizing voluminous texts such as documents, articles, research papers, and event books. Moreover, LangChain can be utilized to fashion question-answering systems. By inputting our documents, PDFs, or books, we can solicit answers to our queries.
例如,一个人可以设计一个YouTube脚本生成器或一个中等文章脚本生成器,我们将在下一章中构建一个。LangChain也可以作为一个网络研究工具,能够总结大量的文本,如文件、文章、研究论文和活动记录。此外,还可以利用LangChain来设计问答系统。通过输入我们的文档、pdf文件或书籍,我们可以请求对我们问题的答案。
Envision a scenario where one is perusing legal documents, typically necessitating a lawyer’s expertise. Now, one can input a legal document or even a medical research paper into a LangChain app and pose questions to derive insights.
设想这样一个场景:一个人正在仔细阅读法律文件,通常需要律师的专业知识。现在,人们可以在LangChain应用程序中输入一份法律文件,甚至是一篇医学研究论文,并提出问题以获得见解。