LangChain教程 | langchain 文件加载器使用教程 | Document Loaders全集

提示:

        想要了解更多有关内置文档加载器与第三方工具集成的文档,甚至包括了:哔哩哔哩网站加载器、区块链加载器、汇编音频文本、Datadog日志加载器等。

        本文主要收集与讲解日常使用的加载器,足够咱们平时开发人工智能的工作使用,大概有:csv加载器、text加载器、word加载器、html加载器、pdf加载器、文件目录加载器、json加载器等。

概述

        使用文档加载器将数据从源加载为 Document Document是一段文本和相关的元数据。例如,有一些文档加载器用于加载简单的 .txt 文件,用于加载任何网页的文本内容,甚至用于加载 YouTube视频的副本

        文档加载器提供了一种“加载”方法,用于从配置的源中将数据作为文档加载。它们还可选地实现“延迟加载”,用于将数据延迟加载到内存中。

一、CSV 加载器

        CSV 文件是使用逗号分隔值的分隔文本文件。文件的每一行都是一条数据记录。每个记录由一个或多个用逗号分隔的字段组成。

        每个文档加载一行CSV数据。

from langchain_community.document_loaders.csv_loader import CSVLoaderloader = CSVLoader(file_path='./example_data/mlb_teams_2012.csv')
data = loader.load()print(data)

          打印结果:

    [Document(page_content='Team: Nationals\n"Payroll (millions)": 81.34\n"Wins": 98', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 0}, lookup_index=0), Document(page_content='Team: Reds\n"Payroll (millions)": 82.20\n"Wins": 97', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 1}, lookup_index=0), Document(page_content='Team: Yankees\n"Payroll (millions)": 197.96\n"Wins": 95', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 2}, lookup_index=0), Document(page_content='Team: Giants\n"Payroll (millions)": 117.62\n"Wins": 94', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 3}, lookup_index=0), Document(page_content='Team: Braves\n"Payroll (millions)": 83.31\n"Wins": 94', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 4}, lookup_index=0), Document(page_content='Team: Athletics\n"Payroll (millions)": 55.37\n"Wins": 94', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 5}, lookup_index=0), Document(page_content='Team: Rangers\n"Payroll (millions)": 120.51\n"Wins": 93', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 6}, lookup_index=0), Document(page_content='Team: Orioles\n"Payroll (millions)": 81.43\n"Wins": 93', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 7}, lookup_index=0), Document(page_content='Team: Rays\n"Payroll (millions)": 64.17\n"Wins": 90', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 8}, lookup_index=0), Document(page_content='Team: Angels\n"Payroll (millions)": 154.49\n"Wins": 89', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 9}, lookup_index=0), Document(page_content='Team: Tigers\n"Payroll (millions)": 132.30\n"Wins": 88', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 10}, lookup_index=0), Document(page_content='Team: Cardinals\n"Payroll (millions)": 110.30\n"Wins": 88', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 11}, lookup_index=0), Document(page_content='Team: Dodgers\n"Payroll (millions)": 95.14\n"Wins": 86', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 12}, lookup_index=0), Document(page_content='Team: White Sox\n"Payroll (millions)": 96.92\n"Wins": 85', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 13}, lookup_index=0), Document(page_content='Team: Brewers\n"Payroll (millions)": 97.65\n"Wins": 83', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 14}, lookup_index=0), Document(page_content='Team: Phillies\n"Payroll (millions)": 174.54\n"Wins": 81', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 15}, lookup_index=0), Document(page_content='Team: Diamondbacks\n"Payroll (millions)": 74.28\n"Wins": 81', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 16}, lookup_index=0), Document(page_content='Team: Pirates\n"Payroll (millions)": 63.43\n"Wins": 79', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 17}, lookup_index=0), Document(page_content='Team: Padres\n"Payroll (millions)": 55.24\n"Wins": 76', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 18}, lookup_index=0), Document(page_content='Team: Mariners\n"Payroll (millions)": 81.97\n"Wins": 75', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 19}, lookup_index=0), Document(page_content='Team: Mets\n"Payroll (millions)": 93.35\n"Wins": 74', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 20}, lookup_index=0), Document(page_content='Team: Blue Jays\n"Payroll (millions)": 75.48\n"Wins": 73', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 21}, lookup_index=0), Document(page_content='Team: Royals\n"Payroll (millions)": 60.91\n"Wins": 72', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 22}, lookup_index=0), Document(page_content='Team: Marlins\n"Payroll (millions)": 118.07\n"Wins": 69', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 23}, lookup_index=0), Document(page_content='Team: Red Sox\n"Payroll (millions)": 173.18\n"Wins": 69', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 24}, lookup_index=0), Document(page_content='Team: Indians\n"Payroll (millions)": 78.43\n"Wins": 68', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 25}, lookup_index=0), Document(page_content='Team: Twins\n"Payroll (millions)": 94.08\n"Wins": 66', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 26}, lookup_index=0), Document(page_content='Team: Rockies\n"Payroll (millions)": 78.06\n"Wins": 64', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 27}, lookup_index=0), Document(page_content='Team: Cubs\n"Payroll (millions)": 88.19\n"Wins": 61', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 28}, lookup_index=0), Document(page_content='Team: Astros\n"Payroll (millions)": 60.65\n"Wins": 55', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 29}, lookup_index=0)]

① 定制CSV解析和加载

        参见csv模块文档,了解支持哪些csv参数的更多信息。

        下面直接在注释里面讲解几个比较常用的参数:

loader = CSVLoader(file_path='./example_data/mlb_teams_2012.csv', csv_args={# 定界符:用于分隔字段的单字符字符串。它默认为',''delimiter': ',',# 引号字符:用于引用包含特殊字符的字段的单字符字符串,如定界符或者quotechar,或者包含换行符。它默认为'"'.'quotechar': '"',# 字段名称:如果在创建对象时没有作为参数传递,则在第一次访问或从文件中读取第一条记录时初始化该属性。'fieldnames': ['MLB Team', 'Payroll in millions', 'Wins']
})data = loader.load()print(data)

        打印结果:

    [Document(page_content='MLB Team: Team\nPayroll in millions: "Payroll (millions)"\nWins: "Wins"', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 0}, lookup_index=0), Document(page_content='MLB Team: Nationals\nPayroll in millions: 81.34\nWins: 98', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 1}, lookup_index=0), Document(page_content='MLB Team: Reds\nPayroll in millions: 82.20\nWins: 97', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 2}, lookup_index=0), Document(page_content='MLB Team: Yankees\nPayroll in millions: 197.96\nWins: 95', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 3}, lookup_index=0), Document(page_content='MLB Team: Giants\nPayroll in millions: 117.62\nWins: 94', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 4}, lookup_index=0), Document(page_content='MLB Team: Braves\nPayroll in millions: 83.31\nWins: 94', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 5}, lookup_index=0), Document(page_content='MLB Team: Athletics\nPayroll in millions: 55.37\nWins: 94', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 6}, lookup_index=0), Document(page_content='MLB Team: Rangers\nPayroll in millions: 120.51\nWins: 93', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 7}, lookup_index=0), Document(page_content='MLB Team: Orioles\nPayroll in millions: 81.43\nWins: 93', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 8}, lookup_index=0), Document(page_content='MLB Team: Rays\nPayroll in millions: 64.17\nWins: 90', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 9}, lookup_index=0), Document(page_content='MLB Team: Angels\nPayroll in millions: 154.49\nWins: 89', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 10}, lookup_index=0), Document(page_content='MLB Team: Tigers\nPayroll in millions: 132.30\nWins: 88', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 11}, lookup_index=0), Document(page_content='MLB Team: Cardinals\nPayroll in millions: 110.30\nWins: 88', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 12}, lookup_index=0), Document(page_content='MLB Team: Dodgers\nPayroll in millions: 95.14\nWins: 86', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 13}, lookup_index=0), Document(page_content='MLB Team: White Sox\nPayroll in millions: 96.92\nWins: 85', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 14}, lookup_index=0), Document(page_content='MLB Team: Brewers\nPayroll in millions: 97.65\nWins: 83', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 15}, lookup_index=0), Document(page_content='MLB Team: Phillies\nPayroll in millions: 174.54\nWins: 81', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 16}, lookup_index=0), Document(page_content='MLB Team: Diamondbacks\nPayroll in millions: 74.28\nWins: 81', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 17}, lookup_index=0), Document(page_content='MLB Team: Pirates\nPayroll in millions: 63.43\nWins: 79', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 18}, lookup_index=0), Document(page_content='MLB Team: Padres\nPayroll in millions: 55.24\nWins: 76', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 19}, lookup_index=0), Document(page_content='MLB Team: Mariners\nPayroll in millions: 81.97\nWins: 75', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 20}, lookup_index=0), Document(page_content='MLB Team: Mets\nPayroll in millions: 93.35\nWins: 74', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 21}, lookup_index=0), Document(page_content='MLB Team: Blue Jays\nPayroll in millions: 75.48\nWins: 73', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 22}, lookup_index=0), Document(page_content='MLB Team: Royals\nPayroll in millions: 60.91\nWins: 72', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 23}, lookup_index=0), Document(page_content='MLB Team: Marlins\nPayroll in millions: 118.07\nWins: 69', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 24}, lookup_index=0), Document(page_content='MLB Team: Red Sox\nPayroll in millions: 173.18\nWins: 69', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 25}, lookup_index=0), Document(page_content='MLB Team: Indians\nPayroll in millions: 78.43\nWins: 68', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 26}, lookup_index=0), Document(page_content='MLB Team: Twins\nPayroll in millions: 94.08\nWins: 66', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 27}, lookup_index=0), Document(page_content='MLB Team: Rockies\nPayroll in millions: 78.06\nWins: 64', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 28}, lookup_index=0), Document(page_content='MLB Team: Cubs\nPayroll in millions: 88.19\nWins: 61', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 29}, lookup_index=0), Document(page_content='MLB Team: Astros\nPayroll in millions: 60.65\nWins: 55', lookup_str='', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 30}, lookup_index=0)]

② 指定一个列来标识文档源

        使用 source_column 参数为从每一行创建的文档指定一个源。否则 file_path 将用作从CSV文件创建的所有文档的源。

        当将从CSV文件加载的文档用于使用源回答问题的链时,这很有用。

loader = CSVLoader(file_path='./example_data/mlb_teams_2012.csv', source_column="Team")data = loader.load()print(data)
    [Document(page_content='Team: Nationals\n"Payroll (millions)": 81.34\n"Wins": 98', lookup_str='', metadata={'source': 'Nationals', 'row': 0}, lookup_index=0), Document(page_content='Team: Reds\n"Payroll (millions)": 82.20\n"Wins": 97', lookup_str='', metadata={'source': 'Reds', 'row': 1}, lookup_index=0), Document(page_content='Team: Yankees\n"Payroll (millions)": 197.96\n"Wins": 95', lookup_str='', metadata={'source': 'Yankees', 'row': 2}, lookup_index=0), Document(page_content='Team: Giants\n"Payroll (millions)": 117.62\n"Wins": 94', lookup_str='', metadata={'source': 'Giants', 'row': 3}, lookup_index=0), Document(page_content='Team: Braves\n"Payroll (millions)": 83.31\n"Wins": 94', lookup_str='', metadata={'source': 'Braves', 'row': 4}, lookup_index=0), Document(page_content='Team: Athletics\n"Payroll (millions)": 55.37\n"Wins": 94', lookup_str='', metadata={'source': 'Athletics', 'row': 5}, lookup_index=0), Document(page_content='Team: Rangers\n"Payroll (millions)": 120.51\n"Wins": 93', lookup_str='', metadata={'source': 'Rangers', 'row': 6}, lookup_index=0), Document(page_content='Team: Orioles\n"Payroll (millions)": 81.43\n"Wins": 93', lookup_str='', metadata={'source': 'Orioles', 'row': 7}, lookup_index=0), Document(page_content='Team: Rays\n"Payroll (millions)": 64.17\n"Wins": 90', lookup_str='', metadata={'source': 'Rays', 'row': 8}, lookup_index=0), Document(page_content='Team: Angels\n"Payroll (millions)": 154.49\n"Wins": 89', lookup_str='', metadata={'source': 'Angels', 'row': 9}, lookup_index=0), Document(page_content='Team: Tigers\n"Payroll (millions)": 132.30\n"Wins": 88', lookup_str='', metadata={'source': 'Tigers', 'row': 10}, lookup_index=0), Document(page_content='Team: Cardinals\n"Payroll (millions)": 110.30\n"Wins": 88', lookup_str='', metadata={'source': 'Cardinals', 'row': 11}, lookup_index=0), Document(page_content='Team: Dodgers\n"Payroll (millions)": 95.14\n"Wins": 86', lookup_str='', metadata={'source': 'Dodgers', 'row': 12}, lookup_index=0), Document(page_content='Team: White Sox\n"Payroll (millions)": 96.92\n"Wins": 85', lookup_str='', metadata={'source': 'White Sox', 'row': 13}, lookup_index=0), Document(page_content='Team: Brewers\n"Payroll (millions)": 97.65\n"Wins": 83', lookup_str='', metadata={'source': 'Brewers', 'row': 14}, lookup_index=0), Document(page_content='Team: Phillies\n"Payroll (millions)": 174.54\n"Wins": 81', lookup_str='', metadata={'source': 'Phillies', 'row': 15}, lookup_index=0), Document(page_content='Team: Diamondbacks\n"Payroll (millions)": 74.28\n"Wins": 81', lookup_str='', metadata={'source': 'Diamondbacks', 'row': 16}, lookup_index=0), Document(page_content='Team: Pirates\n"Payroll (millions)": 63.43\n"Wins": 79', lookup_str='', metadata={'source': 'Pirates', 'row': 17}, lookup_index=0), Document(page_content='Team: Padres\n"Payroll (millions)": 55.24\n"Wins": 76', lookup_str='', metadata={'source': 'Padres', 'row': 18}, lookup_index=0), Document(page_content='Team: Mariners\n"Payroll (millions)": 81.97\n"Wins": 75', lookup_str='', metadata={'source': 'Mariners', 'row': 19}, lookup_index=0), Document(page_content='Team: Mets\n"Payroll (millions)": 93.35\n"Wins": 74', lookup_str='', metadata={'source': 'Mets', 'row': 20}, lookup_index=0), Document(page_content='Team: Blue Jays\n"Payroll (millions)": 75.48\n"Wins": 73', lookup_str='', metadata={'source': 'Blue Jays', 'row': 21}, lookup_index=0), Document(page_content='Team: Royals\n"Payroll (millions)": 60.91\n"Wins": 72', lookup_str='', metadata={'source': 'Royals', 'row': 22}, lookup_index=0), Document(page_content='Team: Marlins\n"Payroll (millions)": 118.07\n"Wins": 69', lookup_str='', metadata={'source': 'Marlins', 'row': 23}, lookup_index=0), Document(page_content='Team: Red Sox\n"Payroll (millions)": 173.18\n"Wins": 69', lookup_str='', metadata={'source': 'Red Sox', 'row': 24}, lookup_index=0), Document(page_content='Team: Indians\n"Payroll (millions)": 78.43\n"Wins": 68', lookup_str='', metadata={'source': 'Indians', 'row': 25}, lookup_index=0), Document(page_content='Team: Twins\n"Payroll (millions)": 94.08\n"Wins": 66', lookup_str='', metadata={'source': 'Twins', 'row': 26}, lookup_index=0), Document(page_content='Team: Rockies\n"Payroll (millions)": 78.06\n"Wins": 64', lookup_str='', metadata={'source': 'Rockies', 'row': 27}, lookup_index=0), Document(page_content='Team: Cubs\n"Payroll (millions)": 88.19\n"Wins": 61', lookup_str='', metadata={'source': 'Cubs', 'row': 28}, lookup_index=0), Document(page_content='Team: Astros\n"Payroll (millions)": 60.65\n"Wins": 55', lookup_str='', metadata={'source': 'Astros', 'row': 29}, lookup_index=0)]

二、文件目录 File Directory 加载器

        这包括如何加载目录中的所有文档。

        默认情况下,它使用非结构化加载程序.

from langchain_community.document_loaders import DirectoryLoader

        我们可以使用 glob 参数来控制要加载的文件。请注意,这里它不加载 .rst 文件或 .html 文件。

loader = DirectoryLoader('../', glob="**/*.md")docs = loader.load()print(len(docs))

        打印结果:

    1

① 显示进度条

        默认情况下,不会显示进度条。要显示进度条,请安装 tqdm library(例如),并设置show_progress 参数到 True .

pip install tqdm
loader = DirectoryLoader('../', glob="**/*.md", show_progress=True)
docs = loader.load()

        演示效果:

    Requirement already satisfied: tqdm in /Users/jon/.pyenv/versions/3.9.16/envs/microbiome-app/lib/python3.9/site-packages (4.65.0)0it [00:00, ?it/s]

② 使用多线程

        默认情况下,加载发生在一个线程中。为了利用几个线程,请将 use_multithreading 标志为 true

loader = DirectoryLoader('../', glob="**/*.md", use_multithreading=True)
docs = loader.load()

 ③ 更改加载程序类

        默认情况下,使用 UnstructuredLoader 。然而,您可以非常容易地改变加载程序的类型。只需要指定参数 loader_cls 的类型。

from langchain_community.document_loaders import TextLoaderloader = DirectoryLoader('../', glob="**/*.md", loader_cls=TextLoader)docs = loader.load()len(docs)

        打印结果:

    1

        如果需要加载 Python源代码文件,请使用 PythonLoader .

from langchain_community.document_loaders import PythonLoaderloader = DirectoryLoader('../../../../../', glob="**/*.py", loader_cls=PythonLoader)docs = loader.load()len(docs)

         打印结果:

    691

④ 使用文本加载器自动检测文件编码

        在本例中,我们将看到一些有用的策略,特别是这些策略在加载大量随机文件时使用 TextLoader

        首先,为了说明这个问题,让我们尝试用任意编码加载多个文本。

path = '../../../../../tests/integration_tests/examples'
loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader)

        A.默认行为

loader.load()

        文件 example-non-utf8.txt 使用不同的编码,因此load()函数失败,并显示一条有用的消息,指出哪个文件解码失败。

        默认行为TextLoader加载任何文档失败都将导致整个加载过程失败,并且不会加载任何文档。

        B.无声失败

        我们可以传递参数silent_errorsDirectoryLoader跳过无法加载的文件并继续加载过程。

loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader, silent_errors=True)
docs = loader.load()

        C.自动检测编码

        我们也可以使用 TextLoader 自动检测文件编码失败前,通过autodetect_encoding加载相关的加载器类。

text_loader_kwargs={'autodetect_encoding': True}
loader = DirectoryLoader(path, glob="**/*.txt", loader_cls=TextLoader, loader_kwargs=text_loader_kwargs)
docs = loader.load()doc_sources = [doc.metadata['source']  for doc in docs]
print(doc_sources)

        打印结果:

    ['../../../../../tests/integration_tests/examples/example-non-utf8.txt','../../../../../tests/integration_tests/examples/whatsapp_chat.txt','../../../../../tests/integration_tests/examples/example-utf8.txt']

三、HTML 加载器

超文本标记语言或HTML是设计用于在web浏览器中显示的文档的标准标记语言。

        这包括如何加载 HTML文档 转换成我们可以在下游使用的文档格式。

from langchain_community.document_loaders import UnstructuredHTMLLoaderloader = UnstructuredHTMLLoader("example_data/fake-content.html")data = loader.load()print(data)

         打印结果:

    [Document(page_content='My First Heading\n\nMy first paragraph.', lookup_str='', metadata={'source': 'example_data/fake-content.html'}, lookup_index=0)]

① 用BeautifulSoup4加载HTML

        我们也可以使用 BeautifulSoup4 使用加载HTML文档 BSHTMLLoader 。这将把文本从HTML提取到page_content,页面标题为title到…里面metadata.

from langchain_community.document_loaders import BSHTMLLoader
loader = BSHTMLLoader("example_data/fake-content.html")
data = loader.load()
print(data)
    [Document(page_content='\n\nTest Title\n\n\nMy First Heading\nMy first paragraph.\n\n\n', metadata={'source': 'example_data/fake-content.html', 'title': 'Test Title'})]

四、JSON 加载器

JSON 是一种开放的标准文件格式和数据交换格式,它使用人类可读的文本来存储和传输由属性值对和数组(或其他可序列化的值)组成的数据对象。

JSON行是一种文件格式,其中每一行都是有效的JSON值。

JSONLoader使用指定的jq模式解析JSON文件。它使用jq python包。详情看这个指南的详细文档 jq 语法。 

pip install jq
from langchain_community.document_loaders import JSONLoader
import json
from pathlib import Path
from pprint import pprintfile_path='./example_data/facebook_chat.json'
data = json.loads(Path(file_path).read_text())print(data)
    {'image': {'creation_timestamp': 1675549016, 'uri': 'image_of_the_chat.jpg'},'is_still_participant': True,'joinable_mode': {'link': '', 'mode': 1},'magic_words': [],'messages': [{'content': 'Bye!','sender_name': 'User 2','timestamp_ms': 1675597571851},{'content': 'Oh no worries! Bye','sender_name': 'User 1','timestamp_ms': 1675597435669},{'content': 'No Im sorry it was my mistake, the blue one is not ''for sale','sender_name': 'User 2','timestamp_ms': 1675596277579},{'content': 'I thought you were selling the blue one!','sender_name': 'User 1','timestamp_ms': 1675595140251},{'content': 'Im not interested in this bag. Im interested in the ''blue one!','sender_name': 'User 1','timestamp_ms': 1675595109305},{'content': 'Here is $129','sender_name': 'User 2','timestamp_ms': 1675595068468},{'photos': [{'creation_timestamp': 1675595059,'uri': 'url_of_some_picture.jpg'}],'sender_name': 'User 2','timestamp_ms': 1675595060730},{'content': 'Online is at least $100','sender_name': 'User 2','timestamp_ms': 1675595045152},{'content': 'How much do you want?','sender_name': 'User 1','timestamp_ms': 1675594799696},{'content': 'Goodmorning! $50 is too low.','sender_name': 'User 2','timestamp_ms': 1675577876645},{'content': 'Hi! Im interested in your bag. Im offering $50. Let ''me know if you are interested. Thanks!','sender_name': 'User 1','timestamp_ms': 1675549022673}],'participants': [{'name': 'User 1'}, {'name': 'User 2'}],'thread_path': 'inbox/User 1 and User 2 chat','title': 'User 1 and User 2 chat'}

① 使用JSONLoader

        假设我们对提取content中的字段messagesJSON数据的键。这可以通过JSONLoader如下图。

JSON文件

loader = JSONLoader(file_path='./example_data/facebook_chat.json',jq_schema='.messages[].content',text_content=False)data = loader.load()print(data)
    [Document(page_content='Bye!', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat.json', 'seq_num': 1}),Document(page_content='Oh no worries! Bye', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat.json', 'seq_num': 2}),Document(page_content='No Im sorry it was my mistake, the blue one is not for sale', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat.json', 'seq_num': 3}),Document(page_content='I thought you were selling the blue one!', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat.json', 'seq_num': 4}),Document(page_content='Im not interested in this bag. Im interested in the blue one!', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat.json', 'seq_num': 5}),Document(page_content='Here is $129', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat.json', 'seq_num': 6}),Document(page_content='', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat.json', 'seq_num': 7}),Document(page_content='Online is at least $100', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat.json', 'seq_num': 8}),Document(page_content='How much do you want?', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat.json', 'seq_num': 9}),Document(page_content='Goodmorning! $50 is too low.', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat.json', 'seq_num': 10}),Document(page_content='Hi! Im interested in your bag. Im offering $50. Let me know if you are interested. Thanks!', metadata={'source': '/Users/avsolatorio/WBG/langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat.json', 'seq_num': 11})]

JSON行文件

        如果您想从JSON Lines文件中加载文档,您需要传递json_lines=True并详细说明jq_schema提取page_content来自单个JSON对象。

file_path = './example_data/facebook_chat_messages.jsonl'
print(Path(file_path).read_text())
    ('{"sender_name": "User 2", "timestamp_ms": 1675597571851, "content": "Bye!"}\n''{"sender_name": "User 1", "timestamp_ms": 1675597435669, "content": "Oh no ''worries! Bye"}\n''{"sender_name": "User 2", "timestamp_ms": 1675596277579, "content": "No Im ''sorry it was my mistake, the blue one is not for sale"}\n')
loader = JSONLoader(file_path='./example_data/facebook_chat_messages.jsonl',jq_schema='.content',text_content=False,json_lines=True)data = loader.load()print(data)
    [Document(page_content='Bye!', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat_messages.jsonl', 'seq_num': 1}),Document(page_content='Oh no worries! Bye', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat_messages.jsonl', 'seq_num': 2}),Document(page_content='No Im sorry it was my mistake, the blue one is not for sale', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat_messages.jsonl', 'seq_num': 3})]

设置了另一个选项jq_schema='.'并提供content_key:

loader = JSONLoader(file_path='./example_data/facebook_chat_messages.jsonl',jq_schema='.',content_key='sender_name',json_lines=True)data = loader.load()print(data)
    [Document(page_content='User 2', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat_messages.jsonl', 'seq_num': 1}),Document(page_content='User 1', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat_messages.jsonl', 'seq_num': 2}),Document(page_content='User 2', metadata={'source': 'langchain/docs/modules/indexes/document_loaders/examples/example_data/facebook_chat_messages.jsonl', 'seq_num': 3})]

带有jq模式的JSON文件content_key

        要使用jq模式中的content_key从JSON文件加载文档,请设置is _ content _ key _ jq _ pars able = True。请确保content_key是兼容的,并且可以使用jq模式进行解析。

file_path = './sample.json'
pprint(Path(file_path).read_text())
    {"data": [{"attributes": {"message": "message1","tags": ["tag1"]},"id": "1"},{"attributes": {"message": "message2","tags": ["tag2"]},"id": "2"}]}
loader = JSONLoader(file_path=file_path,jq_schema=".data[]",content_key=".attributes.message",is_content_key_jq_parsable=True,
)data = loader.load()print(data)
    [Document(page_content='message1', metadata={'source': '/path/to/sample.json', 'seq_num': 1}),Document(page_content='message2', metadata={'source': '/path/to/sample.json', 'seq_num': 2})]

五、PDF Loader 加载器

目前市面上有很多的pdf加载器,下面会挑选几款受欢迎的展示,具体要使用哪种,自行选择。

① 使用 PyPDF

        PyPDF是一个功能全面的库,它允许用户进行PDF的读取、分割、合并以及转换等操作。这个库的优点在于其轻量且纯Python编写,没有庞大的依赖,因此安装和使用相对简单。此外,PyPDF跨平台性好,能够在Windows、macOS和Linux上良好运行。然而,它可能不支持PDF 1.7及以上版本的某些特性,对于处理带有复杂特性的最新PDF文件可能会存在限制。

加载PDF使用pypdf文档数组,其中每个文档都包含页面内容和元数据page号码。

pip install pypdf
from langchain_community.document_loaders import PyPDFLoaderloader = PyPDFLoader("example_data/layout-parser-paper.pdf")
pages = loader.load_and_split()
print(pages[0])
    Document(page_content='LayoutParser : A Uni\x0ced Toolkit for Deep\nLearning Based Document Image Analysis\nZejiang Shen1( \x00), Ruochen Zhang2, Melissa Dell3, Benjamin Charles Germain\nLee4, Jacob Carlson3, and Weining Li5\n1Allen Institute for AI\nshannons@allenai.org\n2Brown University\nruochen zhang@brown.edu\n3Harvard University\nfmelissadell,jacob carlson g@fas.harvard.edu\n4University of Washington\nbcgl@cs.washington.edu\n5University of Waterloo\nw422li@uwaterloo.ca\nAbstract. Recent advances in document image analysis (DIA) have been\nprimarily driven by the application of neural networks. Ideally, research\noutcomes could be easily deployed in production and extended for further\ninvestigation. However, various factors like loosely organized codebases\nand sophisticated model con\x0cgurations complicate the easy reuse of im-\nportant innovations by a wide audience. Though there have been on-going\ne\x0borts to improve reusability and simplify deep learning (DL) model\ndevelopment in disciplines like natural language processing and computer\nvision, none of them are optimized for challenges in the domain of DIA.\nThis represents a major gap in the existing toolkit, as DIA is central to\nacademic research across a wide range of disciplines in the social sciences\nand humanities. This paper introduces LayoutParser , an open-source\nlibrary for streamlining the usage of DL in DIA research and applica-\ntions. The core LayoutParser library comes with a set of simple and\nintuitive interfaces for applying and customizing DL models for layout de-\ntection, character recognition, and many other document processing tasks.\nTo promote extensibility, LayoutParser also incorporates a community\nplatform for sharing both pre-trained models and full document digiti-\nzation pipelines. We demonstrate that LayoutParser is helpful for both\nlightweight and large-scale digitization pipelines in real-word use cases.\nThe library is publicly available at https://layout-parser.github.io .\nKeywords: Document Image Analysis ·Deep Learning ·Layout Analysis\n·Character Recognition ·Open Source library ·Toolkit.\n1 Introduction\nDeep Learning(DL)-based approaches are the state-of-the-art for a wide range of\ndocument image analysis (DIA) tasks including document image classi\x0ccation [ 11,arXiv:2103.15348v2  [cs.CV]  21 Jun 2021', metadata={'source': 'example_data/layout-parser-paper.pdf', 'page': 0})

这种方法的一个优点是可以通过页码检索文档。

应用实例:

import os
import getpassos.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key:')from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddingsfaiss_index = FAISS.from_documents(pages, OpenAIEmbeddings())
docs = faiss_index.similarity_search("How will the community be engaged?", k=2)
for doc in docs:print(str(doc.metadata["page"]) + ":", doc.page_content[:300])
    9: 10 Z. Shen et al.Fig. 4: Illustration of (a) the original historical Japanese document with layoutdetection results and (b) a recreated version of the document image that achievesmuch better character recognition recall. The reorganization algorithm rearrangesthe tokens based on the their detect3: 4 Z. Shen et al.Efficient Data AnnotationC u s t o m i z e d  M o d e l  T r a i n i n gModel Cust omizationDI A Model HubDI A Pipeline SharingCommunity PlatformLa y out Detection ModelsDocument ImagesT h e  C o r e  L a y o u t P a r s e r  L i b r a r yOCR ModuleSt or age & VisualizationLa y ou

② 使用PyPDFium2

        PyPDFium2是基于PDFium的Python绑定。PDFium是一个由Google开发的快速且功能丰富的PDF渲染引擎,因此PyPDFium2在PDF渲染和文本提取方面可能具有出色的性能。对于那些主要关注PDF的渲染和文本提取的用户来说,PyPDFium2可能是一个理想的选择。

from langchain_community.document_loaders import PyPDFium2Loaderloader = PyPDFium2Loader("example_data/layout-parser-paper.pdf")data = loader.load()

③ 使用PDFMiner

        PDFMiner是一个专注于从PDF文档中提取文本和元数据的库。它不仅支持基本的文本提取功能,还提供了许多高级特性,如表格分析和图像处理。PDFMiner的API设计简洁易懂,方便开发者快速上手,并且具有良好的跨平台兼容性。无论是简单的文本提取还是复杂的页面布局分析,PDFMiner都能满足各种需求。

from langchain_community.document_loaders import PDFMinerLoaderloader = PDFMinerLoader("example_data/layout-parser-paper.pdf")data = loader.load()

④ 特殊的 PyPDF Directory

        从目录加载pdf

from langchain_community.document_loaders import PyPDFDirectoryLoaderloader = PyPDFDirectoryLoader("example_data/")docs = loader.load()

⑤ 特殊的 使用非结构化

        非结构化的PDF指的是PDF文件中的信息没有按照一定的结构或格式进行组织,而是以原始的、未加工的形式呈现。这类PDF文件中的数据没有预定义的数据模型,不方便用数据库二维逻辑表来表现,也不便于提取和解析。因此,非结构化的PDF文件可能看起来比较杂乱,缺乏统一的结构和格式,使得用户难以直接获取所需的信息。

from langchain_community.document_loaders import UnstructuredPDFLoaderloader = UnstructuredPDFLoader("example_data/layout-parser-paper.pdf")data = loader.load()

六、Word 加载器(含.doc 和 .docx)

在langchain里面word只有一个非结构化的word加载器UnstructuredWordDocumentLoader。

环境准备:

pip install unstructuredpip install python-docpip install python-docx

 示例代码:

from langchain_community.document_loaders import UnstructuredWordDocumentLoaderloader = UnstructuredWordDocumentLoader("example_data/layout-parser-paper.doc")data = loader.load()

七、Text 加载器(.txt 加载器)

在langchain里面.txt只有一个text加载器TextLoader。

from langchain_community.document_loaders import TextLoaderloader = TextLoader("example_data/layout-parser-paper.txt")data = loader.load()

八、完整代码

下面是通过经验总结的常用文件加载器的函数,可直接使用。

from langchain_community.document_loaders import (UnstructuredWordDocumentLoader,CSVLoader,PyPDFLoader,TextLoader,DirectoryLoader,
)
import os
from langchain_community.document_loaders.unstructured import UnstructuredFileLoader
from langchain_community.document_loaders.pdf import PyPDFDirectoryLoader# load PDF files from directory
# def load_pdf_from_dir_2(directory_path):
#     data = []
#     for filename in os.listdir(directory_path):
#         if filename.endswith(".pdf"):
#             print(filename)
#             # print the file name
#             loader = PyPDFLoader(f'{directory_path}/{filename}')
#             print(loader)
#             data.append(loader.load())
#     return data# load PDF files from directory
def load_pdf_from_dir(directory_path):loader = PyPDFDirectoryLoader(directory_path)data = loader.load()return data# load PDF files from a pdf file
def load_pdf_from_one(filepath):data = ''if filepath.endswith(".pdf"):print(filepath)# print the file nameloader = PyPDFLoader(f'{filepath}')print(loader)data = loader.load()return data# load Word files(.doc/.docx) from directory
def load_word_from_dir(directory_path):data = []for filename in os.listdir(directory_path):# check if the file is a doc or docx file# 检查所有doc以及docx后缀的文件if filename.endswith(".doc") or filename.endswith(".docx"):# langchain自带功能,加载word文档loader = UnstructuredWordDocumentLoader(f'{directory_path}/{filename}')data.append(loader.load())return data# load Word files(.doc/.docx) from a filename
def load_word_from_one(filename):data = ''if filename.endswith(".doc") or filename.endswith(".docx"):print(filename)# print the file nameloader = UnstructuredWordDocumentLoader(f'{filename}')print(loader)data = loader.load()return data# load Text files(.txt) from directory
def load_txt_from_dir(directory_path):data = []for filename in os.listdir(directory_path):if filename.endswith(".txt"):print(filename)loader = TextLoader(f'{directory_path}/{filename}')print(loader)data.append(loader.load())return data# load Text files(.doc/.docx) from a filename
def load_text_from_one(filename):data = ''if filename.endswith(".txt"):print(filename)# print the file nameloader = TextLoader(f'{filename}')print(loader)data = loader.load()return data# load CSV files(.txt) from directory
def load_csv_from_dir(directory_path):data = []for filename in os.listdir(directory_path):if filename.endswith(".csv"):print(filename)loader = CSVLoader(f'{directory_path}/{filename}')print(loader)data.append(loader.load())return data# load CSV files(.doc/.docx) from a filename
def load_csv_from_one(filename):data = ''if filename.endswith(".csv"):print(filename)# print the file nameloader = CSVLoader(f'{filename}')print(loader)data = loader.load()return data# load all files from directory
# param glob = "**/*.文件后缀"  控制要加载的文件
# param show_progress = true 显示进度条
# param use_multithreading = true 利用多线程
# param loader_cls = CSVLoader  指定加载器 | UnstructuredFileLoader
def load_all_from_dir(directory_path, glob, show_progress=False, use_multithreading=False, loader_cls=UnstructuredFileLoader):loader = DirectoryLoader(directory_path, glob=glob, show_progress=show_progress, use_multithreading=use_multithreading, loader_cls=loader_cls)data = loader.load()return dataif __name__ == '__main__':res = load_pdf_from_dir("./testdir")print(res)

创作不易,高抬贵手三连(点赞、收藏、关注),同学们的满意是我(H-大叔)的动力。

 代码运行有问题或其他建议,请在留言区评论,看到就会回复,不用私聊。

专栏人工智能 | 大模型 | 实战与教程里面还有其他人工智能|大数据方面的文章,可继续食用,持续更新。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/762935.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

【前端性能】前端性能指标和测量方法总结

文章目录 前端性能指标和测量方法总结重要指标名词概念指标测量方法performance APIChrome PerformanceChrome Lighthouseweb-vitals 前端性能指标和测量方法总结 重要指标名词概念 图源 https://dev.to/xnimorz/hitchhiker-s-guide-to-frontend-performance-optimization-460…

计算机+任何行业都等于王炸!

最近笔者刷到一则消息,一位测试员在某乎上分享,从月薪5K到如今的20K,他总共跳了10次槽,其中还经历过两次劳动申诉,拿到了大几万的赔偿,被同事们称为“职场碰瓷人”。 虽说这种依靠跳槽式的挣钱法相当奇葩&a…

Java微服务分布式事务框架seata的TCC模式

🌹作者主页:青花锁 🌹简介:Java领域优质创作者🏆、Java微服务架构公号作者😄 🌹简历模板、学习资料、面试题库、技术互助 🌹文末获取联系方式 📝 往期热门专栏回顾 专栏…

鸿蒙ArkTS实战开发-Native XComponent组件的使用

介绍 本篇Codelab主要介绍如何使用XComponent组件调用NAPI来创建EGL/GLES环境,实现在主页面绘制一个正方形,并可以改变正方形的颜色。本篇CodeLab使用Native C模板创建。 如图所示,点击绘制矩形按钮,XComponent组件绘制区域中渲…

Sketch软件:重塑UI/UX设计流程的革命性工具

Sketch是一款在Mac操作系统上运行的矢量图形设计软件,其功能特色丰富多样,深受设计师们的喜爱。以下是Sketch软件的主要功能特色介绍: 专业矢量图形设计:Sketch为UI设计、移动应用设计和Web设计等领域提供了强大的支持。它支持线条…

探索ChatGPT时代下的下一代信息检索系统:机遇与挑战

1 Introduction 2022 年 11 月 30 日,OpenAI 推出了 ChatGPT,这是一款由先进的 GPT3.5 和更高版本的 GPT-4 生成语言模型提供支持的 AI 聊天机器人应用程序。该应用迅速吸引了全球超亿用户,创下了产品快速传播的新纪录。 它能够以对话的方式…

ElasticSearch 用法

首先讲下 ES的倒排序索引 创建倒排索引是对正向索引的一种特殊处理, 将每一个文档的数据利用算法分词,得到一个个词条 创建表,每行数据包括词条、词条所在文档id、位置等信息 因为词条唯一性,可以给词条创建索引,例如…

旅游小程序开发的费用及功能

随着科技的发展和智能手机的普及,越来越多的行业开始利用小程序来进行线上服务。旅游业作为一个重要的服务业,也纷纷推出了自己的旅游小程序,以方便游客在线预订、查询景点信息等。那么,旅游小程序开发的费用是多少?功…

Google研究者们提出了VLOGGER模型

每周跟踪AI热点新闻动向和震撼发展 想要探索生成式人工智能的前沿进展吗?订阅我们的简报,深入解析最新的技术突破、实际应用案例和未来的趋势。与全球数同行一同,从行业内部的深度分析和实用指南中受益。不要错过这个机会,成为AI领…

【探索Linux】—— 强大的命令行工具 P.29(网络编程套接字 —— 简单的TCP网络程序模拟实现)

阅读导航 引言一、TCP协议二、TCP网络程序模拟实现1. 预备代码⭕ThreadPool.hpp(线程池)⭕makefile文件⭕打印日志文件⭕将当前进程转变为守护进程 2. TCP 服务器端实现(TcpServer.hpp)3. TCP 客户端实现(main函数&…

babyos 学习记录

宏定义头文件 将一个宏定义取不同的数据到不同的数组中; 侵入式链表 struct list_head { struct list_head *next, *prev; }; // 添加(list_add_tail/list_add)、删除、查找 xx.h // 定义一个用于链表管理的结构体 typedef sturct{ xxx …

matlab矩形薄板小挠度弯曲有限元编程 |【Matlab源码+理论文本】|板单元| Kirchoff薄板 | 板壳单元

专栏导读 作者简介:工学博士,高级工程师,专注于工业软件算法研究本文已收录于专栏:《有限元编程从入门到精通》本专栏旨在提供 1.以案例的形式讲解各类有限元问题的程序实现,并提供所有案例完整源码;2.单元…

分库分表场景下多维查询解决方案(用户+商户)

在采用分库分表设计时,通过一个PartitionKey根据散列策略将数据分散到不同的库表中,从而有效降低海量数据下C端访问数据库的压力。这种方式可以缓解单一数据库的压力,提升了吞吐量,但同时也带来了新的问题。对于B端商户而言&#…

Layui实现删除及修改后停留在当前页

1、功能概述? 我们在使用layui框架的table显示数据的时候,会经常的使用分页技术,这个我们期望能够期望修改数据能停留在当前页,或者删除数据的时候也能够停留在当前页,这样的用户体验会更好一些,但往往事与…

硬核分享|如何将文字转成语音对视频进行配音或旁白解说

硬核分享|如何将文字转成语音对视频进行配音或旁白解说_哔哩哔哩_bilibili 文字转语音工具成为了一种便利而实用的技术应用,它能够将文字内容转化为声音,为我们提供全新的听觉体验。 不论是在阅读、学习、娱乐还是无障碍辅助等方面,文字转语…

【QT入门】 Qt槽函数五种常用写法介绍

声明:该专栏为本人学习Qt知识点时候的笔记汇总,希望能给初学的朋友们一点帮助(加油!) 往期回顾: 【QT入门】实现一个简单的图片查看软件-CSDN博客 【QT入门】图片查看软件(优化)-CSDN博客 【QT入门】 lambda表达式(函数)详解-CSDN…

苹果手机更换国内IP地址的方法

在网络世界中,IP地址扮演着极为重要的角色,是互联网通信的基础。很多人在使用苹果手机时,有时候需要更换国内IP地址以获取更多网络资源或保护隐私。那么,是否可以更换国内ip地址?苹果手机更换国内ip地址的方法是怎样的…

Redis学习二--常见问题及处理

基本概念 Redis基本概念数据结构 机制 持久化机制: RDB(内存快照):某一时刻的内存快照以二进制的方式写入磁盘,可以手动触发和自动触发。 优点:生成文件小,恢复速度快,适用于灾难恢复。 缺点&#xff1a…

Linux docker4--本地jar包生成镜像和docker部署运行

一、通过springboot创建一个java项目,打成出jar包。 二、将jar包生成docker镜像 (1)、创建Dockerfile文件 创建Dockerfile文件,将如下的代码内容粘贴进去即可。 注意:本例中我打出的jar包是boot.jar。如果你打出的jar…

百度小程序入口在哪里找到怎么打开百度词令关键词口令直达小程序?

百度小程序入口在哪里找到怎么打开百度词令关键词口令直达小程序? 一、百度搜索找到百度词令小程序 打开手机百度搜索「词令」即可找到百度词令关键词口令直达小程序; 二、百度小程序中心找到百度小程序 2.1、打开手机百度,点击底部我的&a…