AI系列：大语言模型的RAG（检索增强生成）技术（下）-- 使用LlamaIndex

前言

继上一篇文章AI系列：大语言模型的RAG（检索增强生成）技术（上），这篇文章主要以LlamaIndex为案例来实现RAG技术。如对背景知识感兴趣，请移步大语言模型的RAG（检索增强生成）技术（上）。

什么是LlamaIndex?

从LlamaIndex官网，可以找到如下的介绍：

LlamaIndex is a framework for building context-augmented LLM applications.
LlamaIndex provides tooling to enable context augmentation. A popular example is Retrieval-Augmented Generation (RAG) which combines context with LLMs at inference time. Another is
finetuning.
翻译成中文：
如LlamaIndex 是一个用于构建上下文增强型大型语言模型（LLM）应用的框架。
LlamaIndex 提供了工具来实现上下文增强。一个流行的例子是检索增强生成（RAG），它在推理时将上下文与大型语言模型结合起来。另一个例子是微调（finetuning）。

LlamaIndex为实现RAG技术提供了很多工具，详细信息可以参考官网。这里列出了一种实现方式，跟下方的代码示例相匹配，图示如下：

LlamaIndex代码

本部分代码参考了LlamaIndex官网的RAG Starter Tutorial (OpenAI) 和Starter Tutorial (Local Models)等文档。

RAG Starter Tutorial (OpenAI) 中提到了一个使用OpenAI服务的例子，只需5行代码即可实现RAG。
如果使用OpenAI服务，则可以跳过下方embedding模型和LLM模型的设置，配好OPENAI_API_KEY环境变量即可。

设置embedding模型

因为没有OpenAI的token，采用HuggingFace服务器上的北京智源的bge-small-en-v1.5作为嵌入模型。

from llama_index.core import Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

设置LLM模型

对于大型语言模型（LLM），我使用的是本地运行的Ollama服务器上的gemma:2b模型。由于我的个人笔记本配置较低（M1芯片，8G内存），我只能运行参数最低的模型。尽管如此，这并不影响我们演示RAG流程的基本原理：

from llama_index.llms.ollama import Ollama
gemma_2b = Ollama(model="gemma:2b", request_timeout=30.0)
Settings.llm = gemma_2b

LlamaIndex官网也提供了使用Hugging Face模型（本地及远程）及其他类型模型的代码示例，参见Hugging face LLMs。

索引

这部分代码参考了官网RAG Starter Tutorial (OpenAI) 中的例子。不同的是，我使用的是本地硬盘上的一篇介绍llama2的pdf文档，之后我会做关于llama2的提问。

这里包括嵌入向量索引的创建和持久化。如果去掉持久化这个非必需的部分，其实只需要两行代码。

PERSIST_DIR = "./storage"
if not os.path.exists(PERSIST_DIR):documents = SimpleDirectoryReader("./articles").load_data()# store docs into vector DB 将文档切块，计算嵌入向量，并索引index = VectorStoreIndex.from_documents(documents)# store it for later 持久化数据到本地磁盘index.storage_context.persist(persist_dir=PERSIST_DIR)
else:# load the existing index 直接读取本地磁盘数据到索引中storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)index = load_index_from_storage(storage_context)

查询机

LlamaIndex提供了query engine，它可以通过retriever检索到index索引中语义相近的文档，与初始问题合并提供给大语言模型。
这里也只需要两行代码。

# 
query_engine = index.as_query_engine()
response = query_engine.query("Is there Llama 2 70B?")
print(response)

如果你想获得更大的灵活性，也可以显示的定义retriever检索器。

验证

提问：

Is there Llama 2 70B?

gemma:2b大模型实在是个玩具，只能提问这样简单的问题。复杂点的问题它回答的乱七八糟。

执行程序，gemma:2b基于检索获得的增强上下文，回答正确：

Yes, Llama2 70B is mentioned in the context information. It is a large language model that outperforms all open-source models.

如果不使用上述程序，而直接提问它相同的问题，得到的答案则是无法回答相关问题：

I am unable to access external sources or provide real-time information, therefore I cannot answer this question.

使用感受

LlamaIndex使得RAG的实现变得简单。它的结构看起来非常简洁和优雅。
但是实际生产中可能涉及到的细节则很多，比如切块的粒度，检索的各种特性，提示语的自定义，等等。很多在Llama index是支持的，但使用效果有待验证。
LlamaIndex的设计理念及其发展值得持续关注。

参考资料

什么是LlamaIndex?
-https://docs.llamaindex.ai/en/stable/
LlamaIndex RAG Starter Tutorial (OpenAI) https://docs.llamaindex.ai/en/stable/getting_started/starter_example/
LlamaIndex RAG Starter Tutorial (Local Models) https://docs.llamaindex.ai/en/stable/getting_started/starter_example_local/
LlamaIndex query engine
https://docs.llamaindex.ai/en/stable/module_guides/deploying/query_engine/
LlmaIndex [Hugging face LLMs]
https://docs.llamaindex.ai/en/stable/examples/llm/huggingface/)

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mzph.cn/web/12010.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！