万字长文深度解析Agent反思工作流框架Reflexion中篇

前文《LLM-Agents]万字长文深度解析Agent反思工作流框架Reflexion上篇：安装与运行》我们已经介绍了 Reflexion 框架的背景知识、数据集以及安装运行方法。在本文中，我们将深入探讨 Agent 的具体运行细节。

上篇讲到agent.run(reflect_strategy=strategy)，我们知道agent是ReactReflectAgent类的实例，而ReactReflectAgent继承自ReactAgent。因此，本文将从 ReactAgent 开始，然后逐步深入到 ReactReflectAgent，最终将整个流程连接起来。

此外，如果你对大语言模型应用开发开发有兴趣，可以考虑购买Langchain实战课程LangChain 实战：LLM 应用开发指南。

1. ReactAgent 论文

ReAct来自论文《ReAct: Synergizing Reasoning and Acting in Language Models》，它提出了一种新的方法，通过结合语言模型中的推理（reasoning）和行动（acting）来解决多样化的语言推理和决策任务。在多种任务上对 ReAct 进行了实验评估，包括问答（HotpotQA）、事实验证（Fever）、基于文本的游戏（ALFWorld）和网页导航（WebShop），并展示了其在少量样本学习设置下相比现有方法的优势。通过一系列的消融实验和分析，探讨了在推理任务中行动的重要性，以及在交互任务中推理的重要性。ReAct 提供了一种更易于人类理解、诊断和控制的决策和推理过程。它的典型流程如下图所示，可以用一个有趣的循环来描述：思考（Thought）→ 行动（Action）→ 观察（Observation），简称TAO循环。

思考（Thought）首先，面对一个问题，我们需要进行深入的思考。这个思考过程是关于如何定义问题、确定解决问题所需的关键信息和推理步骤。
行动（Action）确定了思考的方向后，接下来就是行动的时刻。根据我们的思考，采取相应的措施或执行特定的任务，以期望推动问题向解决的方向发展。
观察（Observation）行动之后，我们必须仔细观察结果。这一步是检验我们的行动是否有效，是否接近了问题的答案。
循环迭代

如果观察到的结果并不匹配我们预期的答案，那么就需要回到思考阶段，重新审视问题和行动计划。这样，我们就开始了新一轮的TAO循环，直到找到问题的解决方案。

它的典型的流程如下图所示，通过不断地循环迭代来推理到最终答案。 ReAct演示

2. 设计ReAct Agent

从上面的演示图来看，如果我们要实现ReAct，他应该是什么样子呢？首先，他需要一个循环迭代。如何让LLM能够先思考，然后基于思考结果给出行动指导呢？我们需要设计一个良好的Prompt，并给出Few-shot示例。如何将迭代的流程告诉LLM，避免多次思考出相同的结果呢？可能有人会说，把整个对话流程都塞给LLM，这也不是不行，但是我们有很多的示例数据。那么这里我要介绍一个概念ScratchPad，简单理解他是一个草稿本，用来记录LLM思考、行动和观察的结果过程，类似不断的推理的草稿本。

2. 1 设计Prompt

我认为良好的Prompt，要有明确的任务说明，完整的输入说明和输出说明，格式要求，示例，对于ReAct，还需要有草稿本。以上述问答的Prompt为例，它的Prompt设计如下。其中example中应该给出Thought时候，要搜索的实体，然后在Action中直接自动提取实体，在Observation中给出观察的结果，example大约在4-5个左右。

python
复制代码
用交替进行的"思考、行动、观察"三个步骤来解决问答任务。思考可以对当前情况进行推理，而行动必须是以下三种类型：
(1) Search[entity]，在维基百科上搜索确切的实体，并返回第一个段落（如果存在）。如果不存在，将返回一些相似的实体以供搜索。
(2) Lookup[keyword]，在上一次成功通过Search找到的段落中返回包含关键字的下一句。
(3) Finish[answer]，返回答案并结束任务。
你可以采取必要的步骤。确保你的回应必须严格遵循上述格式，尤其是行动必须是以上三种类型之一。
以下是一些参考示例：
Question: What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?
Thought 1: I need to search Colorado orogeny, find the area that the eastern sector of the Colorado orogeny extends into, then find the elevation range of the area.
Action 1: Search[Colorado orogeny]
Observation 1: The Colorado orogeny was an episode of mountain building (an orogeny) in Colorado and surrounding areas.
Thought 2: It does not mention the eastern sector. So I need to look up eastern sector.
...
（例子结束）
Question：{question}
{scratchpad}

需要注意的是，对于LLM来说，如果你期望LLM能够按照你设想的格式返回，在Prompt中应该以强硬的语气类似必须（Must）等文字来设定。GPT-3.5可能还好，我本地部署模型经常在找到结果的时候，不会以Finish[answer]回复，当我修改了Prompt并用力的PUA它，它正常多了。。。

这里ScratchPad，我们需要手动填入当前是Thought 1: 外加LLM的思考的返回结果，然后到Action 1我们再次填入LLM返回的Action结果，经过迭代，我们就能实现上图中的过程。

2. 2 流程设计图

react flow-2024-05-20-0947

接下来，进入Reflexion框架，查看ReactAgent实现代码，探索具体的实现细节。

3. ReactAgent实现

3.1 初始化

python
复制代码def __init__(self,question: str,key: str,max_steps: int = 6,agent_prompt: PromptTemplate = react_agent_prompt,docstore: Docstore = Wikipedia(),react_llm: AnyOpenAILLM = AnyOpenAILLM(temperature=0,max_tokens=100,model_name="gpt-3.5-turbo",model_kwargs={"stop": "\n"},openai_api_key="sk"),) -> None:self.question = questionself.answer = ''self.key = keyself.max_steps = max_stepsself.agent_prompt = agent_promptself.react_examples = WEBTHINK_SIMPLE6self.docstore = DocstoreExplorer(docstore) # Search, Lookupself.llm = react_llmself.enc = tiktoken.encoding_for_model("text-davinci-003")self.__reset_agent()

question、answer和key：从hotpotqa中传入question和answer，传入answer是为了评估agent结果是否准确，并不是用来告诉agent答案。
设定max_steps为6，设定ReactAgent最多运行6步，会判断获取的answer和key是否相同。
agent_prompt：设定提示词，采用langchain的PromptTemplate设定要输入的字段和模板。

python
复制代码
REACT_INSTRUCTION = """Solve a question answering task with interleaving Thought, Action, Observation steps. Thought can reason about the current situation, and Action must be three types: 
(1) Search[entity], which searches the exact entity on Wikipedia and returns the first paragraph if it exists. If not, it will return some similar entities to search.
(2) Lookup[keyword], which returns the next sentence containing keyword in the last passage successfully found by Search.
(3) Finish[answer], which returns the answer and finishes the task.
You may take as many steps as necessary. Ensure that your responses MUST strictly to the above formats, especially Action must be one of the three types.
Here are some examples:
{examples}
(END OF EXAMPLES)
{reflections}
Question: {question}{scratchpad}"""
react_agent_prompt = PromptTemplate(input_variables=["examples", "question", "scratchpad"],template = REACT_INSTRUCTION)

注意，这里的Prompt我做了一点PUA式的修改，和Repo中相比我强调了输出Action必须是这三者之一，不然在运行时会有很多意外

设定react_examples为WEBTHINK_SIMPLE6

python
复制代码
WEBTHINK_SIMPLE6 = """Question: What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?
Thought 1: I need to search Colorado orogeny, find the area that the eastern sector of the Colorado orogeny extends into, then find the elevation range of the area.
Action 1: Search[Colorado orogeny]
Observation 1: The Colorado orogeny was an episode of mountain building (an orogeny) in Colorado and surrounding areas.
....
"""

设定example模板给LLM指导它推理步骤，是一个典型的React Prompt，即Thought，Action和Observation，该代码中Example有6个案例，便于阅读起见，这里做了删减。在ReactAgent的方法_build_agent_prompt中，会将提示词中缺失信息examples, question和scratchpad补全。

python
复制代码
def _build_agent_prompt(self) -> str:return self.agent_prompt.format(examples = self.react_examples, question = self.question,scratchpad = self.scratchpad)

所以最终生成的Prompt如下, 对example有所删除。

python
复制代码
Solve a question answering task with interleaving Thought, Action, Observation steps. Thought can reason about the current situation, and Action must be three types: 
...
(END OF EXAMPLES)
Question: The creator of "Wallace and Gromit" also created what animation comedy that matched animated zoo animals with a soundtrack of people talking about their homes?
Thought 1:

初始化docstore为DocstoreExplorer(docstore)，其中dockstore为lanchiain内置的访问wikipedia工具。
赋值llm为reactllm，reactllm为AnyOpenAILLM的实例，我们在上节有将其修改为本地llm。AnyOpenAILLM包含两个方法__init__和__call__方法。其中init方法，初始化LLM是Chat模式还是扩写模式，而call方法是一种magic method，在类中实现这一方法可以使该类的实例(对象)像函数一样被调用，即我们可以直接通过llm(prompt)来调用的chat方法。

python
复制代码
class AnyOpenAILLM:def __init__(self, *args, **kwargs):# Determine model type from the kwargsmodel_name = kwargs.get('model_name', 'gpt-3.5-turbo')kwargs['openai_api_base'] = "http://localhost:8080/v1"if model_name.split('-')[0] == 'text':self.model = OpenAI(*args, **kwargs)self.model_type = 'completion'else:self.model = ChatOpenAI(*args, **kwargs)self.model_type = 'chat'def __call__(self, prompt: str):if self.model_type == 'completion':return self.model(prompt)else:return self.model([HumanMessage(content=prompt,)]).content

小结：初始化ReActAgent，主要是传入Prompt所需的输入question和template，并初始化所需使用的LLM。

3.2 运行函数run

python
复制代码
def run(self, reset = True) -> None:if reset:self.__reset_agent()while not self.is_halted() and not self.is_finished():self.step()

这几个函数调用都很简单，一是重置一些影响运行的条件状态变量，二是判断当前运行状态是否结束。

python
复制代码def __reset_agent(self) -> None:self.step_n = 1self.finished = Falseself.scratchpad: str = ''def is_halted(self) -> bool:return ((self.step_n > self.max_steps) or (len(self.enc.encode(self._build_agent_prompt())) > 3896)) and not self.finisheddef is_finished(self) -> bool:return self.finished

如果当前没有达到最大运行步骤6或者输入没有超过3896个提示词（应该是防止超过4K上下文而设定）且finished标志不是true，就运行step方法。所以step方法最多运行6次，每次运行都会得到Thought，Action和Observe。

3.3 step方法

python
复制代码
def step(self) -> None:# Thinkself.scratchpad += f'\nThought {self.step_n}:'self.scratchpad += ' ' + self.prompt_agent()print(self.scratchpad.split('\n')[-1])# Actself.scratchpad += f'\nAction {self.step_n}:'action = self.prompt_agent()self.scratchpad += ' ' + actionaction_type, argument = parse_action(action)print(self.scratchpad.split('\n')[-1])# Observeself.scratchpad += f'\nObservation {self.step_n}: 'if action_type == 'Finish':self.answer = argumentif self.is_correct():self.scratchpad += 'Answer is CORRECT'else: self.scratchpad += 'Answer is INCORRECT'self.finished = Trueself.step_n += 1returnif action_type == 'Search':try:self.scratchpad += format_step(self.docstore.search(argument))except Exception as e:print(e)self.scratchpad += f'Could not find that page, please try again.'elif action_type == 'Lookup':try:self.scratchpad += format_step(self.docstore.lookup(argument))except ValueError:self.scratchpad += f'The last page Searched was not found, so you cannot Lookup a keyword in it. Please try one of the similar pages given.'else:self.scratchpad += 'Invalid Action. Valid Actions are Lookup[<topic>] Search[<topic>] and Finish[<answer>].'print(self.scratchpad.split('\n')[-1])self.step_n += 1

该方法共分为3个步骤：Thought，Act，Observe。

3.4 Thought

首先设定scratchpad为Thought 1，然后调用prompt_agent()方法，build_agent_prompt我们在3.1节有提到过，构造提示词并填充所需字段比如example，question和scratchpad，llm就是AnyOpenAILLM的call接口。

python
复制代码def prompt_agent(self) -> str:return format_step(self.llm(self._build_agent_prompt()))def format_step(step: str) -> str:return step.strip('\n').strip().replace('\n', '')

在Thought阶段，llm输入就是上面构建的promt，此时他应该长这样

plain
复制代码
Solve a question answering task with interleaving Thought, Action, Observation steps. Thought can reason about the current situation, and Action can be three types: 
...
(END OF EXAMPLES)
Question: The creator of "Wallace and Gromit" also created what animation comedy that matched animated zoo animals with a soundtrack of people talking about their homes?
Thought 1:

注意我们在初始化AnyOpenAILLM时候，有设定一些关键参数。比如要求temperature为0的严格模式，不要肆意发挥。设定stop条件为遇到换行，max_tokens为100。为什么呢？因为如果不设定stop为\n的话，那么LLM默认会按照Example将Thought，Action，Observe的几个步骤都输出了。这样的结果是，没有工具参与，都是模型完成了，但他并不能真的去网络搜索。因此我们要他在第一个\n 就结束输出。大家可以自己拷贝Prompt到Postman中测试一下。

python
复制代码
AnyOpenAILLM(temperature=0, max_tokens=100,model_name="gpt-3.5-turbo",model_kwargs={"stop": "\n"},openai_api_key="sk")

3.5 Action

经过Thought步骤后，进入Action环节，scratchpad被赋值为

python
复制代码
Thought 1: The creator of "Wallace and Gromit" is Nick Park. I need to search for other animation comedies by Nick Park that match this description.
Action 1:

调用action = self.prompt_agent()action会被赋值为

css
复制代码
Search[Nick Park zoo animals talking about their homes]

更新scratchpad为

kotlin
复制代码
Thought 1: The creator of "Wallace and Gromit" is Nick Park. I need to search for other animation comedies by Nick Park that match this description.
Action 1: Search[Nick Park zoo animals talking about their homes]

接下来使用正则表达式pattern = r'^(\w+)[(.+)]$'提取Search这个Action，提取中括号中的检索字符串。根据step方法，判断Action为Search需要执行Wikipedia检索行为，具体的wikipedia工具的实现这里不多赘述，可以参考Langchain官方文档。

python
复制代码
self.scratchpad += f'\nObservation {self.step_n}: '
self.scratchpad += format_step(self.docstore.search(argument))

3.6. Observe

在Wikipedia中检索结果如下

plain
复制代码
Nicholas Wulstan Park  (born 6 December 1958) is an  English filmm...

整个step完成，最终的scratchpad为

python
复制代码
Thought 1: The creator of "Wallace and Gromit" is Nick Park. I need to search for other animation comedies by Nick Park that match this description.
Action 1: Search[Nick Park zoo animals talking about their homes]
Observation 1: Nicholas Wulstan Park  (born 6 December 1958) is an  English filmmaker and ...

3.7 迭代React

循环调用step，直到满足条件退出。最终的scratchpad如下

bash
复制代码
Thought 1: The creator of "Wallace and Gromit" is Nick Park. I need to search for other animation comedies by Nick Park that match this description.
Action 1: Search[Nick Park zoo animals talking about their homes]
Observation 1: Nicholas Wulstan Park  (born 6 December 1958) is an  English filmmaker ...
Thought 2: Nick Park also created Creature Comforts, which is the animation comedy that matched animated zoo animals with a soundtrack of people talking about their homes.
Action 2: Finish[Creature Comforts]

4. 总结

根据上述分析，我们可以了解到ReactAgent的核心设计理念是通过有效的Prompt进行设计，并通过对Thought、Action和Observer的迭代优化回答的质量。Prompt应当简洁明了地说明回答问题的步骤，包括TAO（Thought、Action、Observer），并提供四到五个示例以供LLM参考。Thought部分应根据当前情境进行推理，而Action则要求以Search、Lookup和Finish三个选项中的一个回复，并详细说明所提取的实体。值得注意的是，这里的Action并非通用的，而是根据我们所面向的具体任务进行设计。比如Action是一些调用函数名，那么你应该在Prompt中说明Action可为什么，当然这种工具调用也可以直接考虑使用LLM的Function Calling（如果支持的话）。

LLM会根据Prompt对问题进行针对性推理，即根据问题推断出应采取何种行动，并提供Thought的推理结果。在这一设计中，我们要求LLM在遇到第一个换行符时停止，以防止其根据Example回复Search、Lookup和Finish，这一设计依赖于Example中的格式。接下来，我们将Thought与Prompt结合再次输入LLM，LLM将基于此进行进一步推理，确定应采取何种行动，从而对Thought中的想法进行总结提炼，决定是执行Search、Lookup还是Finish操作。随后，我们调用工具进行维基百科搜索以获取实体。在Observation阶段，我们会获取工具返回的结果，再次进入Thought以便确定是否找到了问题的答案。若找到了，给出Action Finished；若未找到，则根据Observation的结果或相似内容进行思考，然后在Action阶段开始检索。通过一次次的迭代，最终获得答案。

如何系统的去学习大模型LLM ？

作为一名热心肠的互联网老兵，我意识到有很多经验和知识值得分享给大家，也可以通过我们的能力和经验解答大家在人工智能学习中的很多困惑，所以在工作繁忙的情况下还是坚持各种整理和分享。

但苦于知识传播途径有限，很多互联网行业朋友无法获得正确的资料得到学习提升，故此将并将重要的 AI大模型资料 包括AI大模型入门学习思维导图、精品AI大模型学习书籍手册、视频教程、实战学习等录播视频免费分享出来。

所有资料 ⚡️ ，朋友们如果有需要全套《LLM大模型入门+进阶学习资源包》，扫码获取~

👉CSDN大礼包🎁：全网最全《LLM大模型入门+进阶学习资源包》免费分享（安全链接，放心点击）👈

一、全套AGI大模型学习路线

AI大模型时代的学习之旅：从基础到前沿，掌握人工智能的核心技能！

二、640套AI大模型报告合集

这套包含640份报告的合集，涵盖了AI大模型的理论研究、技术实现、行业应用等多个方面。无论您是科研人员、工程师，还是对AI大模型感兴趣的爱好者，这套报告合集都将为您提供宝贵的信息和启示。

三、AI大模型经典PDF籍

随着人工智能技术的飞速发展，AI大模型已经成为了当今科技领域的一大热点。这些大型预训练模型，如GPT-3、BERT、XLNet等，以其强大的语言理解和生成能力，正在改变我们对人工智能的认识。那以下这些PDF籍就是非常不错的学习资源。

在这里插入图片描述

四、AI大模型商业化落地方案

阶段1：AI大模型时代的基础理解

目标：了解AI大模型的基本概念、发展历程和核心原理。
内容：
- L1.1 人工智能简述与大模型起源
- L1.2 大模型与通用人工智能
- L1.3 GPT模型的发展历程
- L1.4 模型工程
  - L1.4.1 知识大模型
  - L1.4.2 生产大模型
  - L1.4.3 模型工程方法论
  - L1.4.4 模型工程实践
- L1.5 GPT应用案例

阶段2：AI大模型API应用开发工程

目标：掌握AI大模型API的使用和开发，以及相关的编程技能。
内容：
- L2.1 API接口
  - L2.1.1 OpenAI API接口
  - L2.1.2 Python接口接入
  - L2.1.3 BOT工具类框架
  - L2.1.4 代码示例
- L2.2 Prompt框架
  - L2.2.1 什么是Prompt
  - L2.2.2 Prompt框架应用现状
  - L2.2.3 基于GPTAS的Prompt框架
  - L2.2.4 Prompt框架与Thought
  - L2.2.5 Prompt框架与提示词
- L2.3 流水线工程
  - L2.3.1 流水线工程的概念
  - L2.3.2 流水线工程的优点
  - L2.3.3 流水线工程的应用
- L2.4 总结与展望

阶段3：AI大模型应用架构实践

目标：深入理解AI大模型的应用架构，并能够进行私有化部署。
内容：
- L3.1 Agent模型框架
  - L3.1.1 Agent模型框架的设计理念
  - L3.1.2 Agent模型框架的核心组件
  - L3.1.3 Agent模型框架的实现细节
- L3.2 MetaGPT
  - L3.2.1 MetaGPT的基本概念
  - L3.2.2 MetaGPT的工作原理
  - L3.2.3 MetaGPT的应用场景
- L3.3 ChatGLM
  - L3.3.1 ChatGLM的特点
  - L3.3.2 ChatGLM的开发环境
  - L3.3.3 ChatGLM的使用示例
- L3.4 LLAMA
  - L3.4.1 LLAMA的特点
  - L3.4.2 LLAMA的开发环境
  - L3.4.3 LLAMA的使用示例
- L3.5 其他大模型介绍