11｜代理（下）：结构化工具对话、Self-Ask with Search以及 Plan and execute代理

在上一讲中，我们深入LangChain程序内部机制，探索了AgentExecutor究竟是如何思考（Thought）、执行（Execute/Act）和观察（Observe）的，这些步骤之间的紧密联系就是代理在推理（Reasoning）和工具调用过程中的“生死因果”。

什么是结构化工具
2023年初，LangChain 引入了“多操作”代理框架，允许代理计划执行多个操作。在此基础上，LangChain 推出了结构化工具对话代理，允许更复杂、多方面的交互。通过指定AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION 这个代理类型，代理能够调用包含一系列复杂工具的“结构化工具箱”，组合调用其中的多个工具，完成批次相关的任务集合。

什么是 Playwright

Playwright是一个开源的自动化框架，它可以让你模拟真实用户操作网页，帮助开发者和测试者自动化网页交互和测试。
pip install playwright 安装Playwright工具。
还需要通过 playwright install 命令来安装三种常用的浏览器工具。
通过Playwright浏览器工具来访问一个测试网页

from playwright.sync_api import sync_playwrightdef run():# 使用Playwright上下文管理器with sync_playwright() as p:# 使用Chromium，但你也可以选择firefox或webkitbrowser = p.chromium.launch()# 创建一个新的页面page = browser.new_page()# 导航到指定的URLpage.goto('https://langchain.com/')# 获取并打印页面标题title = page.title()print(f"Page title is: {title}")# 关闭浏览器browser.close()if __name__ == "__main__":run()

这个脚本展示了Playwright的工作方式，一切都是在命令行里面直接完成。

使用结构化工具对话代理

在这里，我们要使用的Agent类型是STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION。要使用的工具则是PlayWrightBrowserToolkit，这是LangChain中基于PlayWrightBrowser包封装的工具箱，它继承自 BaseToolkit类。

工具名称	功能描述	应用场景
ClickTool	模拟浏览器中的点击操作	自动化某个动作，如提交表单或导航到另一个页面
NavigateBrowserTool	导航到指定的URL或网页	打开一个新的网页或进行网站间跳转
NavigateBackTool	允许浏览器回退到上一个页面	浏览了一个页面后需要返回到之前的页面
ExtractTextTool	从网页上提取文本内容	获取和分析网页上的文字信息，例如文章、评论或任何其他文本内容
ExtractHyperlinksTool	提取网页上的所有超链接	收集或分析一个页面上所有的外部链接或内部链接
GetElementsTool	根据给定的条件或选择器获取网页上的元素	需要找到并与特定的网页元素交互
CurrentWebPageTool	提供当前浏览器所在网页的信息，如 URL、标题等	了解当前页面的状态，或者需要记录浏览历史

from langchain.agents.agent_toolkits import PlayWrightBrowserToolkit
from langchain.tools.playwright.utils import create_async_playwright_browserasync_browser = create_async_playwright_browser()
toolkit = PlayWrightBrowserToolkit.from_browser(async_browser=async_browser)
tools = toolkit.get_tools()
print(tools)from langchain.agents import initialize_agent, AgentType
from langchain.chat_models import ChatAnthropic, ChatOpenAI# LLM不稳定，对于这个任务，可能要多跑几次才能得到正确结果
llm = ChatOpenAI(temperature=0.5)  agent_chain = initialize_agent(tools,llm,agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,verbose=True,
)async def main():response = await agent_chain.arun("What are the headers on python.langchain.com?")print(response)import asyncio
loop = asyncio.get_event_loop()
loop.run_until_complete(main())

在这个示例中，我们询问大模型，网页http://python.langchain.com中有哪些标题目录？
因为大模型的数据只在21年9月，所以不存在他的数据，需要通过PlayWrightBrowser工具来解决问题。

第一轮思考

I can use the “navigate_browser” tool to visit the website and then use the “get_elements” tool to retrieve the headers. Let me do that.

这是第一轮思考，大模型知道自己没有相关信息，决定使用PlayWrightBrowserToolkit工具箱中的 navigate_browser 工具。

Action:{"action": "navigate_browser", "action_input": {"url": "[https://python.langchain.com](https://link.zhihu.com/?target=https%3A//python.langchain.com)"}}

行动：通过Playwright浏览器访问这个网站。

Observation: Navigating to https://python.langchain.com returned status code 200

观察：成功得到浏览器访问的返回结果。
在第一轮思考过程中，模型决定使用PlayWrightBrowserToolkit中的navigate_browser工具。

第二轮思考

Thought:Now that I have successfully navigated to the website, I can use the “get_elements” tool to retrieve the headers. I will specify the CSS selector for the headers and retrieve their text.

第二轮思考：模型决定使用PlayWrightBrowserToolkit工具箱中的另一个工具 get_elements，并且指定CSS selector只拿标题的文字。

Action:{"action": "get_elements", "action_input": {"selector": "h1, h2, h3, h4, h5, h6", "attributes": ["innerText"]}}

行动：用Playwright的 get_elements 工具去拿网页中各级标题的文字。

Observation: [{“innerText”: “Introduction”}, {“innerText”: “Get started”}, {“innerText”: “Modules”}, {“innerText”: “Model I/O”}, {“innerText”: “Data connection”}, {“innerText”: “Chains”}, {“innerText”: “Agents”}, {“innerText”: “Memory”}, {“innerText”: “Callbacks”}, {“innerText”: “Examples, ecosystem, and resources”}, {“innerText”: “Use cases”}, {“innerText”: “Guides”}, {“innerText”: “Ecosystem”}, {“innerText”: “Additional resources”}, {“innerText”: “Support”}, {“innerText”: “API reference”}]

观察：成功地拿到了标题文本。
在第二轮思考过程中，模型决定使用PlayWrightBrowserToolkit中的get_elements工具。

第三轮思考

对上述思考做一个具体说明。第三轮思考：模型已经找到了网页中的所有标题。
行动：给出最终答案。

使用 Self-Ask with Search 代理

Self-Ask with Search 也是 LangChain 中的一个有用的代理类型（SELF_ASK_WITH_SEARCH）。它利用一种叫做 “Follow-up Question（追问）”加 “Intermediate Answer（中间答案）”的技巧，来辅助大模型寻找事实性问题的过渡性答案，从而引出最终答案。

from langchain import OpenAI, SerpAPIWrapper
from langchain.agents import initialize_agent, Tool
from langchain.agents import AgentType
llm = OpenAI(temperature=0)
search = SerpAPIWrapper()
tools = [
Tool(
name="Intermediate Answer",
func=search.run,
description="useful for when you need to ask with search",
)
]
self_ask_with_search = initialize_agent(
tools, llm, agent=AgentType.SELF_ASK_WITH_SEARCH, verbose=True
)
self_ask_with_search.run(
"使用玫瑰作为国花的国家的首都是哪里?"
)

工具集合：代理包含解决问题所必须的搜索工具，可以用来查询和验证多个信息点。这里我
们在程序中为代理武装了 SerpAPIWrapper 工具。
逐步逼近：代理可以根据第一个问题的答案，提出进一步的问题，直到得到最终答案。这种
逐步逼近的方式可以确保答案的准确性。
自我提问与搜索：代理可以自己提问并搜索答案。例如，首先确定哪个国家使用玫瑰作为国
花，然后确定该国家的首都是什么。
决策链：代理通过一个决策链来执行任务，使其可以跟踪和处理复杂的多跳问题，这对于解决需要多步推理的问题尤为重要。

使用 Plan and execute 代理

计划和执行代理通过首先计划要做什么，然后执行子任务来实现目标。这个想法是受到 Plan-and-Solve 论文的启发。论文中提出了计划与解决（Plan-and-Solve）提示。它由两部分组成：首先，制定一个计划，并将整个任务划分为更小的子任务；然后按照该计划执行子任务。
这种代理的独特之处在于，它的计划和执行不再是由同一个代理所完成，而是：

计划由一个大语言模型代理（负责推理）完成。
执行由另一个大语言模型代理（负责调用工具）完成

pip install -U langchain langchain_experimental

from langchain.chat_models import ChatOpenAI
from langchain_experimental.plan_and_execute import PlanAndExecute, load_agent_ex
from langchain.llms import OpenAI
from langchain import SerpAPIWrapper
from langchain.agents.tools import Tool
from langchain import LLMMathChain
search = SerpAPIWrapper()
llm = OpenAI(temperature=0)
llm_math_chain = LLMMathChain.from_llm(llm=llm, verbose=True)
tools = [
Tool(
name = "Search",
func=search.run,
description="useful for when you need to answer questions about current e
),
Tool(
name="Calculator",
func=llm_math_chain.run,
description="useful for when you need to answer questions about math"
),
]
model = ChatOpenAI(temperature=0)
planner = load_chat_planner(model)
executor = load_agent_executor(model, tools, verbose=True)
agent = PlanAndExecute(planner=planner, executor=executor, verbose=True)
agent.run("在纽约，100美元能买几束玫瑰?")

14｜工具和工具箱：LangChain中的Tool和Toolkits一览

未来的 AI Agent，应该就是以 LLM 为核心控制器的代理系统。而工具，则是代理身上延展出的三头六臂，是代理的武器，代理通过工具来与世界进行交互，控制并改造世界。

工具是代理的武器

当代理接收到一个任务时，它会根据任务的类型和需求，通过大模型的推理，来选择合适的工具处理这个任务。这个选择过程可以基于各种策略，例如基于工具的性能，或者基于工具处理特定类型任务的能力。
一旦选择了合适的工具，LangChain 就会将任务的输入传递给这个工具，然后工具会处理这些输入并生成输出。这个输出又经过大模型的推理，可以被用作其他工具的输入，或者作为最终结果，被返回给用户。

如何加载工具

from langchain.agents import load_tools
tool_names = [...]
tools = load_tools(tool_names某些工具（例如链、代理）可能需要 LLM 来初始化它们。
from langchain.agents import load_tools
tool_names = [...]
llm = ...
tools = load_tools(tool_names, llm=llm)

LangChain 支持的工具一览

使用 arXiv 工具开发科研助理

arXiv 是一个提供免费访问的预印本库，供研究者在正式出版前上传和分享其研究工作。

# 设置OpenAI API的密钥
import os
os.environ["OPENAI_API_KEY"] = 'Your Key'# 导入库
from langchain.chat_models import ChatOpenAI
from langchain.agents import load_tools, initialize_agent, AgentType# 初始化模型和工具
llm = ChatOpenAI(temperature=0.0)
tools = load_tools(["arxiv"],
)# 初始化链
agent_chain = initialize_agent(tools,llm,agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,verbose=True,
)# 运行链
agent_chain.run("介绍一下2005.14165这篇论文的创新点?")

研究一下 ZERO_SHOT_REACT_DESCRIPTION 这个 Agent 是怎么通过提示来引导模型调用工具的

“prompts”: [ "Answer the following questions as best you can. You have access to the following tools:\n\n

首先告诉模型，要尽力回答问题，但是可以访问下面的工具。

arxiv: A wrapper around Arxiv.org Useful for when you need to answer questions about 。。。。

arxiv 工具：一个围绕 Arxiv.org 的封装工具。

Use the following format:\n\n

指导模型输出下面的内容。

Question: the input question you must answer\n （问题：需要回答的问题）
Thought: you should always think about what to do\n （思考：应该总是思考下一步做什么）
Action: the action to take, should be one of [arxiv]\n （行动：从具体工具列表中选择行动——这里只有 arxiv 一个工具）
Action Input: the input to the action\n （行动的输入：输入工具的内容）
Observation: the result of the action\n… （观察：工具返回的结果）
(this Thought/Action/Action Input/Observation can repeat N times)\n （上面 Thought/Action/Action Input/Observation 的过程将重复 N 次）
Thought: I now know the final answer\n （现在我知道最终答案了）
Final Answer: the final answer to the original input question\n\n （原始问题的最终答案）

Begin!\n\n

现在开始！

Question: 'Chain-of-Thought Prompting Elicits Reasoning in Large Language Models 这篇论文的创新点\n

真正的问题在此。

Thought:"

开始思考吧！
然后，我们来看看 Chain 的运行过程

“text”: " I need to read the paper to understand the innovation\n （思考：我需要阅读文章才能理解创新点）
Action: arxiv\n （行动：arxiv 工具）
Action Input: ‘Chain-of-Thought Prompting Elicits Reasoning in Large Language Models’", （行动的输入：论文的标题）

因为在之前的提示中，LangChain 告诉大模型，对于 Arxiv 工具的输入总是以搜索的形式出现，因此尽管我指明了论文的 ID，Arxiv 还是根据这篇论文的关键词搜索到了 3 篇相关论文的信息。模型对这些信息进行了总结，认为信息已经完善，并给出了最终答案。

模型对这些信息进行了总结，认为信息已经完善，并给出了最终答案。

Thought: I now know the final answer

想法：我现在知道了最终答案。

Final Answer: The innovation of the paper ‘Chain-of-Thought Prompting Elicits Reasoning in Large Language Models’ is the introduction of a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting, which significantly improves the ability of large language models to perform complex reasoning."

最终答案：这篇名为《链式思考提示促使大型语言模型进行推理》的论文的创新之处在于，引入了一种简单的方法，即链式思考提示，在提示中提供了一些链式思考的示例，这大大提高了大型语言模型执行复杂推理的能力。

LangChain 中的工具箱一览

使用 Gmail 工具箱开发个人助理

通过 Gmail 工具箱，你可以通过 LangChain 应用检查邮件、删除垃圾邮件，甚至让它帮你撰写邮件草稿。
通过 Office365 工具箱，你可以让 LangChain 应用帮你读写文档、总结文档，甚至做 PPT。
通过 GitHub 工具箱，你可以指示 LangChain 应用来检查最新的代码，Commit Changes、Merge Branches，甚至尝试让大模型自动回答 Issues 中的问题——反正大模型解决代码问题的能力本来就更强。

目标：我要让 AI 应用来访问我的 Gmail 邮件，让他每天早晨检查一次我的邮箱，看看“易速鲜花”的客服有没有给我发信息。

第一步：在 Google Cloud 中设置你的应用程序接口

Gmail API 的官方配置链接完成，然后就获取一个开发秘钥。
。。。。。。。。

第二步：根据密钥生成开发 Token

pip install --upgrade google-api-python-client
pip install --upgrade google-auth-oauthlib
pip install --upgrade google-auth-httplib2from __future__ import print_functionimport os.pathfrom google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError# If modifying these scopes, delete the file token.json.
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']def main():"""Shows basic usage of the Gmail API.Lists the user's Gmail labels."""creds = None# The file token.json stores the user's access and refresh tokens, and is# created automatically when the authorization flow completes for the first# time.if os.path.exists('token.json'):creds = Credentials.from_authorized_user_file('token.json', SCOPES)# If there are no (valid) credentials available, let the user log in.if not creds or not creds.valid:if creds and creds.expired and creds.refresh_token:creds.refresh(Request())else:flow = InstalledAppFlow.from_client_secrets_file('credentials.json', SCOPES)creds = flow.run_local_server(port=8088)# Save the credentials for the next runwith open('token.json', 'w') as token:token.write(creds.to_json())try:# Call the Gmail APIservice = build('gmail', 'v1', credentials=creds)results = service.users().labels().list(userId='me').execute()labels = results.get('labels', [])if not labels:print('No labels found.')returnprint('Labels:')for label in labels:print(label['name'])except HttpError as error:# TODO(developer) - Handle errors from gmail API.print(f'An error occurred: {error}')if __name__ == '__main__':main()

这是 Google API 网站提供的标准示例代码，里面给了读取权限（gmail.readonly）的 Token，如果你要编写邮件，甚至发送邮件，需要根据需求来调整权限。更多细节可以参阅 Google API 的文档。
这个程序会生成一个 token.json 文件，是有相关权限的开发令牌。这个文件在 LangChain 应用中需要和密钥一起使用。

第三步：用 LangChain 框架开发 Gmail App

这段代码的核心目的是连接到 Gmail API，查询用户的邮件，并通过 LangChain 的 Agent 框架智能化地调用 API（用语言而不是具体 API），与邮件进行互动。

# 设置OpenAI API的密钥
import os
os.environ["OPENAI_API_KEY"] = 'Your Key'# 导入与Gmail交互所需的工具包
from langchain.agents.agent_toolkits import GmailToolkit# 初始化Gmail工具包
toolkit = GmailToolkit()# 从gmail工具中导入一些有用的功能
from langchain.tools.gmail.utils import build_resource_service, get_gmail_credentials# 获取Gmail API的凭证，并指定相关的权限范围
credentials = get_gmail_credentials(token_file="token.json",  # Token文件路径scopes=["https://mail.google.com/"],  # 具有完全的邮件访问权限client_secrets_file="credentials.json",  # 客户端的秘密文件路径
)
# 使用凭证构建API资源服务
api_resource = build_resource_service(credentials=credentials)
toolkit = GmailToolkit(api_resource=api_resource)# 获取工具
tools = toolkit.get_tools()
print(tools)# 导入与聊天模型相关的包
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent, AgentType# 初始化聊天模型
llm = ChatOpenAI(temperature=0, model='gpt-4')# 通过指定的工具和聊天模型初始化agent
agent = initialize_agent(tools=toolkit.get_tools(),llm=llm,agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,
)# 使用agent运行一些查询或指令
result = agent.run("今天易速鲜花客服给我发邮件了么？最新的邮件是谁发给我的？"
)# 打印结果
print(result)

你的请求是查询今天是否收到了来自“易速鲜花客服”的邮件，以及最新邮件的发送者是谁。 这个请求是模糊的，是自然语言格式，具体调用什么 API，由 Agent、Tool 也就是 Gmail API 它俩商量着来。

总结

**集成多模型和多策略：**LangChain 提供了一种方法，使得多个模型或策略能够在一个统一的框架下工作。例如，arXiv 是一个单独的工具，它负责处理特定的任务。
**更易于交互和维护：**通过 LangChain，你可以更方便地管理和维护你的工具和模型。
适应性：LangChain 提供的架构允许你轻松地添加新的工具或模型，或者替换现有的工具或模型。
可解释性：LangChain 还提供了对模型决策的可解释性。