LLM（十）| Tiny-Vicuna-1B：Tiny Models轻量化系列Top One

在过去的一年里，见证了LLM的蓬勃发展，而模型的参数量也不断刷新记录，在2023年下半年，外界传言GPT-4是一个专家混合模型。因此，如果你想用人工智能做点什么，你需要IBM或NASA类似的计算能力：你怎么能运行8个2200亿参数的模型，每个模型的有效参数大小达到1.76万亿？

然而，nano Models（比如新诞生的GeminiNano）、Tiny Models（就像TinyLlama家族）和Microsoft Phi1和2等另辟蹊径，希望一些较小的大模型也可以部署到生产环境中，给更多的企业和个人带来福音。

一、Tiny-Vicuna-1B介绍

Tiny Vicuna是一个Llama模型（Vicuna是使用从ShareGPT收集的用户共享对话进行微调Llama的大模型）。这个小模型是TinyLlama项目的一部分，该项目旨在通过适当的优化，在3万亿tokens上预训练的1.1B Llama模型，但由于Tiny Vicuna 1B是用WizardVicuna数据集微调的TinyLLama 1.1B，因此被称为Tiny Vicuna！运行Tiny Vicuna 1B量化版本，只需要不超过700 Mb的RAM！

二、运行Tiny-Vicuna-1B-GGUF

我们将使用Huggingface上Jiayi-Pan的Tiny-Vicuna-1B的量化GGUF模型文件。因为即使它是一个只有11亿个参数的模型，在CPU上全精度运行也需要将近10 GB的RAM。

Step1：在新目录中创建虚拟环境并将其激活：

mkdir TinyVicunacd TinyVicunapython3.10 -m venv venv #I am using python 3.10python -m venv venv  #if you are on Windows#to activate the Virtual Environmentsource venv/bin/activate  #for macvenv\Scripts\activate     #for windows users

Step2：安装所需的包

pip install llama-cpp-pythonpip install gradiopip install psutilpip install plotly

最后两个依赖项仅用于提供推断时间期间的CPU/RAM使用统计信息。我认为亲眼看到这一切是件好事😉。

Step3：接下来是在同一目录中下载GGUF文件。你可以选择量化方法，但不要低于q4。本演示使用q5版本：有点重，但质量损失很小。下载链接：https://huggingface.co/afrideva/Tiny-Vicuna-1B-GGUF/tree/main

Step4：运行python文件中模型核心代码的两个不同部分，我将在下面对它们进行解释，python文件下载地址：https://github.com/fabiomatricardi/KingOfTheTiny/raw/main/40-vicuna1B_PG_MEM.py。

from llama_cpp import Llamamodelfile = "./tiny-vicuna-1b.q5_k_m.gguf"contextlength=2048stoptoken = '<s>'################ LOADING THE MODELS  ################################ Set gpu_layers to the number of layers to offload to GPU. # Set to 0 if no GPU acceleration is available on your system.####################################################################llm = Llama(  model_path=modelfile,  # Download the model file first  n_ctx=contextlength,  # The max sequence length to use - note that longer sequence lengths require much more resources  #n_threads=2,            # The number of CPU threads to use, tailor to your system and the resulting performance)######### INFERENCE #######################response = llm(prompt,                 max_tokens=max_new_tokens,                 stop=['Q:', stoptoken],                 temperature = temperature,                repeat_penalty = repeat_penalty,                top_p = top_p,                echo=False)print(response)

我们从Llama.cpp导入Llama类，并将其实例化到llm变量中。正如您所看到的，我们需要在这里传递很少的参数：模型路径（包括GGUF文件名）和上下文窗口。

注意1：如果在Windows上运行，modelfile不需要./，它应该简化为modelfile = “tiny-vicuna-1b.q5_k_m.gguf”

注意2：每个模型都使用特定的上下文窗口进行训练。如果模型卡中没有提到，可以在第一次加载模型时查看终端控制台

|这篇博客（https://blog.stackademic.com/model-cards-is-all-you-need-e8c9a3450372?source=post_page-----f05e459b8018--------------------------------）介绍了Huggingface模型卡的使用

Step5：然后，我们使用额外的参数运行调用llm的推理。实际上，唯一需要的是Prompt：其他都是可选的。

请注意，Prompt必须遵循模型的特定格式。如果你不确定，并且模型卡中没有提到任何内容（就像本例中一样），可以查看：https://www.hardware-corner.net/llm-database/

我们搜索Vicuna 1.5版本，因为原始模型表明该模型是WizardVicuna数据集上TinyLlama的微调版本。它应该与Vicuna-v1.5系列完全兼容。

选择模型可以验证预期的提示格式。

因此，我们的提示模板如下（请注意，没有新行，我创建它们是为了可读性）：

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {prompt} ASSISTANT:

因此，我们的提示可以是这样的，在代码中：

prompt = f"USER: What is Science? ASSISTANT:"

三、Tiny-Vicuna-1B案例分析

让我们使用3个主流用例来测试一下Tiny-Vicuna-1B的效果：

通用问题；
从上下文中提取信息（RAG）；
格式化输出。

3.1 通用问题

Prompt：什么是科学？

Answer：科学是通过观察、实验、分析和解释来研究知识。它包括使用逻辑推理、批判性思维和经验证据来理解自然世界或人类经验中的现象。

Prompt：为忙碌的专业人士写一份每日日程表。描述一个从早上到睡觉的典型日常生活。想想他是一个有妻子和孩子的男人。

Answer：我必须更好地说明我想要什么，因为只询问日常事务会给我正确但不一致的答案（永无止境的列表）

 Morning Routine: - Wake up at around 6am, shower and get dressed.- Have breakfast (breakfast can be anything from eggs, bacon or pancakes) and coffee/tea.- Check emails and work on important tasks for the day.- Take a quick walk outside to clear his mind before starting work.- Start working on his daily routine:    - Opening email and answering any urgent messages.   - Creating to-do lists for the day's activities.   - Checking social media accounts for updates or breaking news.   - Reviewing important documents/files.   - Making phone calls if necessary.   - Attending meetings or appointments as needed.   - Taking breaks and stretching to avoid burnout.- Lunch:    - Eating a light lunch, such as sandwiches or salads with hummus or guacamole.   - Drinking water or tea/coffee.- Afternoon Routine:    - Working on any important tasks from the morning's work.   - Checking emails and answering urgent messages.   - Taking breaks to stretch, walk around, or do some light exercise.   - Spending time with family/friends if possible.- Evening Routine:    - Watching TV or Netflix before going to bed.   - Reading a book or watching an episode of his favorite show.   - Taking a warm bath or reading a book while relaxing in bed.- Bedtime Routine:    - Winding down by taking a hot bath, reading a book, and listening to soothing music.   - Going back to sleep with the lights off.

3.2 从上下文中提取信息（RAG）

从youtube视频中提取了摘录，并将其作为上下文来回答用户的问题。

此方法也适用于必须仅回复特定数据而不使用外部知识的聊天机器人

Reply to the question only using the provided context.[context]WHAT IS HIERARCHY 4.0? whether you own build manage maintain or operate an oil plant inevitably issues arise that require immediate action and resolution. With big data flowing in constantly from all sectors making sense of everything while troubleshootingissues without wasting time can be a huge challenge.  So what's the solution?introducing hierarchy 4.0 and Innovative software solution for control Safety Systems . Hierarchy 4.0 presents an interactive diagram of the entire plant revealing cause and effect Behavior with readings provided in a hierarchical view allowing for a deep understanding of the system's strategyAll data is collected from multiple sources visualized as a diagram and optimized through a customized dashboard allowing users to run a logic simulation from live data or pick a moment from their history.  Your simulation is based on actual safety Logics not just on a math modelNow every users can prepare an RCA report 90 percent faster in just a few minutes. Hierarchy can be used for any project phase starting from engineering to commissioning and up to operation and maintenance while supporting hazop Hazard analysis by reducing human error and avoiding incorrect documentation. Hierarchy 4.0 supports plant operators in decision making taking into account both the safety and the operability of their assets.  Hierarchy 4.0 Embraces a block log approach: it automatically calculates all Logics affected by anoverride and gives a full understanding of constraints.  Not convinced let's look at the data!During its first project hierarchy 4.0 prevented a revenue loss of 45 million dollars.  Plants that utilize hierarchy 4.0 save up to 95 of their time and deliver a return on investment up to five times in value from day one and experience a Personnel utilization and plant efficiency increase by 20 percent per year.Try our demo and make the move to hierarchy 4.0 today[end of context]Question: what is Hierarchy 4.0?

该模型在40秒内处理了Prompt中的500个tokens👍🥳:

Answer：Hierarchy 4.0是一种用于控制安全系统的创新软件解决方案，通过交互式图表和优化的仪表板，可以深入了解系统的策略。它允许用户从实时数据中运行逻辑模拟，或从历史记录中选择一个时刻，从而减少人为错误并避免错误的文档。RCA报告可以使用层次结构4.0的块日志方法在几分钟内准备好。这支持工厂运营商在考虑其资产的安全性和可操作性的情况下进行决策。

3.3 格式化输出

在这里，我们使用两个任务来评估一下模型的格式化输出效果：1）从文本中获得摘要，并要求LLM提供文本的主要主题，2）将其格式化为列表。

Prompt很简单，但不得不尝试了几个版本，才找到一个只给摘要而没有任何介绍词的版本…

[text]... insert here your text...... the move to hierarchy 4.0 today[end of text]- SUMMARY:

Answer：层次4.0是一种用于控制安全系统的创新软件解决方案，可深入了解系统策略，在层次视图中可视化数据，并支持工厂操作员在考虑其资产的安全性和可操作性的情况下进行决策。它还通过防止由于错误或被忽视的危险而造成的收入损失，节省了时间并带来了投资回报。

对于第二项任务，必须找到好的措辞和顺序，才能获得一个降价列表作为输出。

Write the main points of the following text.[text]WHAT IS HIERARCHY 4.0? whether you ... the move to hierarchy 4.0 today[end of text]format the output as a markdown list.

不出所料，结果很好！

- Hierarchy 4.0 is an innovative software solution for control safety systems that provides a deep understanding of the system's strategy through interactive diagrams, optimized with a customized dashboard and supported by RCA reports.- All data is collected from multiple sources visualized as a hierarchical view allowing users to run a logic simulation from live data or pick a moment from their history.- Your simulation is based on actual safety logics not just on a math model, making it faster and more accurate for any project phase starting from engineering to commissioning and up to operation and maintenance.- Hierarchy can be used for any project phase starting from engineering to commissioning and up to operation and maintenance while supporting hazop Hazard analysis by reducing human error and avoiding incorrect documentation.- Hierarchy 4.0 supports plant operators in decision making taking into account both the safety and the operability of their assets, with a block log approach that calculates all Logics affected by an override automatically.- Not convinced let's look at the data! During its first project hierarchy 4.0 prevented a revenue loss of 45 million dollars, saving up to 95 of their time and delivering a return on investment up to five times in value from day one and experiencing a Personnel utilization and plant efficiency increase by 20 percent per year.- Try our demo and make the move to hierarchy 4.0 today!

令人惊讶的是，列表生成是准确的，而且总是在主题上。

四、实验的大致步骤如下：

使用的Gradio用户界面来运行推理结果，部署的大致步骤如下：

准备虚拟环境并将其激活（可选）；
根据上面第二小节的Step2，安装依赖项；
下载q5 GGUF模型文件；
下载python文件；
在第39行代码中指定模型文件

#only "tiny-vicuna-1b.q5_k_m.gguf" if you are on Windows39|  modelfile = "./tiny-vicuna-1b.q5_k_m.gguf"

保存文件并在终端上从项目目录运行此命令

python 40-vicuna1B_PG_MEM.py

默认浏览器将打开一个带有Gradio界面的新选项卡

要关闭应用程序：

关闭“浏览器”选项卡；
在“终端”窗口上键入^C

参考文献：

[1] https://blog.stackademic.com/tiny-vicuna-1b-is-the-lightweight-champion-of-the-tiny-models-f05e459b8018

[2] https://github.com/fabiomatricardi/KingOfTheTiny