经过大规模数据预训练后的语言模型已经具备较强的模型能力,能够编码丰富的世界知识,但是由于预训练任务形式所限,这些模型更擅长于文本补全,并不适合直接解决具体的任务。
指令微调是相对“预训练”来讲的,预训练的时候是使用大量无标注的文本,让模型每次都预测下一个字符的方式来训练。而指令微调也可以叫“监督微调”,也就是对于输入文本有标准答案,用标准答案来训练模型。一般预训练完毕后就要进行指令微调,经过指令微调后,大语言模型能够展现出较强的指令遵循能力,可以通过零样本学习的方式解决多种下游任务。
指令微调数据集
在预训练之后,指令微调(也称为有监督微调)是增强或激活大语言模型特定能力的重要方法之一(例如指令遵循能力)。本小节将介绍几个常用的指令微调数据集,并根据格式化指令实例的构建方法将它们分为三种主要类型,即自然语言处理任务数据集、日常对话数据集和合成数据集。
 
 上面的那段文字都是来自《大语言模型》 这本电子书,该书是人民大学出的综述类电子书,非常非常详细。
一些指令微调数据集的格式
指令微调数据集有很多格式,alpaca是其中一种使用较多的方式
[{"instruction": "人类指令(必填)","input": "人类输入(选填)","output": "模型回答(必填)","system": "系统提示词(选填)","history": [["第一轮指令(选填)", "第一轮回答(选填)"],["第二轮指令(选填)", "第二轮回答(选填)"]]}
]
我从llama-factory项目中截取一段中文的指令微调数据集:
[{"instruction": "识别并解释给定列表中的两个科学理论:细胞理论和日心说。","input": "","output": "细胞理论是生物科学的一个理论,它认为所有生命体都是由微小的基本单元——细胞所构成。这是生物学的一个基础理论,认为细胞是所有生物的基本结构和功能单位,所有的生物都是由一个或多个细胞组成,细胞只能通过细胞分裂产生新的细胞。这一理论由薛定谔、施瓦内和雪莱于1839年首次提出。\n\n日心说是指太阳是太阳系的中心,也就是说,行星围绕太阳旋转的理论。这个理论打破了传统的地心说观点,认为地球并不是宇宙的中心。日心说的提出者是尼古拉·哥白尼,他在16世纪初发表了他的著作《天体运行论》,阐述了太阳系行星围绕太阳运行的模型,为天文学的发展做出了巨大贡献。"},{"instruction": "输入三支篮球队的名称并生成一个适当的口号。","input": "输入:俄克拉荷马城雷霆队,芝加哥公牛队,布鲁克林网队。","output": "输出: \"雷霆,公牛和网队:各显神通,角逐群雄!\""},{"instruction": "比较以下两位著名人物,使用特征列表并以句子形式提供输出。","input": "人物1:阿尔伯特·爱因斯坦\n人物2:斯蒂芬·霍金\n特征:智力、教育、对世界的影响","output": "阿尔伯特·爱因斯坦和斯蒂芬·霍金都是拥有极其出色智力的人物。两人都取得过非常高的教育成就,他们推进了科学发展并在世界范围内产生了深远的影响。爱因斯坦以其相对论和质能关系公式而闻名,而霍金以其关于黑洞和宇宙的发现而著称。两位科学家都以其深厚的学识和非凡的贡献影响了世界。"},{"instruction": "请根据以下文章构建一个需要回答的问题。","input": "","output": "我不好意思,您没有提供任何信息或文章来供我根据它构建问题。 请提供相应的文章或信息,以便我能够帮助您构建问题。"},{"instruction": "写一个关于上周末你做了什么的简短叙述。","input": "","output": "由于我是一个人工智能助手,所以我并不能真正的“做”什么事情。我的“工作”时间是24/7无休,确保我随时可供用户提供帮助和服务。所以在上周末,我一直都在“工作”,在响应用户的指令,帮助他们查询信息,执行操作,解决问题。"},......
]
再来一段英文的
[{"instruction": "Describe a process of making crepes.","input": "","output": "Making crepes is an easy and delicious process! Here are step-by-step instructions on how to make them:\n\n1. Assemble your ingredients. For basic crepes, you'll need: 1 cup all-purpose flour, 2 eggs, 1/2 cup milk, 1/2 cup water, 1/4 teaspoon salt, and 2 tablespoons melted butter.\n\n2. Mix the batter: In a large mixing bowl, whisk together the flour and the eggs. Gradually add the milk and water, stirring constantly to ensure that there are no lumps. Add salt and melted butter, and mix well.\n\n3. Let the batter rest: If you can, let the batter sit for an hour or so. This will help the flour to absorb the liquid and make the crepes more tender.\n\n4. Heat your pan: Preheat a non-stick pan over medium heat. Lightly butter the pan or use cooking spray to prevent the crepes from sticking.\n\n5. Pour the batter: Using a ladle or a measuring cup, pour a small amount of batter (about 1/4 cup) onto the center of the pan. Immediately tilt the pan in a circular motion to spread the batter evenly and thinly over the bottom of the pan.\n\n6. Cook the crepe: Cook the crepe for 1-2 minutes until the bottom is lightly golden. Carefully loosen the edges with a spatula and flip the crepe over to cook the other side for another minute.\n\n7. Remove and repeat: Gently slide the crepe onto a plate, and then repeat the process with the remaining batter. Remember to re-butter the pan between each crepe if necessary.\n\n8. Fill and serve: Fill your cooked crepes with your desired filling, such as fresh fruit, whipped cream, Nutella, or ham and cheese. Roll or fold, and serve immediately. Enjoy!"},{"instruction": "Transform the following sentence using a synonym: The car sped quickly.","input": "","output": "The car accelerated rapidly."},{"instruction": "Make a persuasive argument to promote recycling.","input": "","output": "Recycling is an incredibly important action that we must all take to help preserve our planet for future generations. By recycling, we can reduce the amount of waste that goes to landfills, conserve natural resources and energy, and reduce pollution from production of new materials.\n\nWhen we don't recycle, our garbage ends up in landfills where it can take hundreds of years to decompose. Landfills can release harmful chemicals and greenhouse gases, polluting the soil, air and water. However, by recycling and reusing materials, we minimize the amount of garbage that ends up in landfills, and therefore reduce the potential harm it can cause to the environment.\n\nRecycling also helps conserve our natural resources by reducing the demand for new materials. For example, recycling paper means fewer trees need to be cut down, and reusing metal and plastic means less mining and oil extraction. This, in turn, conserves energy, minimizes deforestation and reduces the environmental impact associated with producing new materials.\n\nAdditionally, recycling also has a positive impact on the economy. It creates jobs in the recycling and manufacturing industries, and reduces the costs of waste disposal for companies and taxpayers. Recycling is good for the environment and the economy.\n\nIn conclusion, recycling is an easy and effective way to take action and make a difference for the future of our planet. By simply separating our recyclable materials and disposing of them properly, we can help reduce waste, conserve resources, and protect the environment. We all have a part to play and every little action counts. So, let's make the smart choice and recycle today for a better tomorrow."},{"instruction": "Invent a new word by combining two existing words.","input": "","output": "\"Chillax\" - a combination of the words \"chill\" and \"relax\". It means to calm down and unwind, enjoying oneself in a laidback manner."},{"instruction": "Give an example of a job that a computer can do better than a human being.","input": "","output": "An example of a job that a computer can do better than a human being would be analyzing vast amounts of data accurately and quickly. For instance, a computer can analyze and find patterns in large datasets within milliseconds, far surpassing the capabilities of a human being. This makes a computer ideal for jobs such as financial analysis, medical diagnosis or data-driven decision making, where speed and accuracy are crucial."},{"instruction": "Given the parameters of a triangle, find out its perimeter.","input": "Side 1 = 4\nSide 2 = 6\nSide 3 = 8","output": "The perimeter of a triangle is the sum of the lengths of its sides. In this case, the perimeter would be the sum of side 1, side 2, and side 3. So, \nPerimeter = Side 1 + Side 2 + Side 3\n          = 4 + 6 + 8\n          = 18\n\nTherefore, the perimeter of the triangle is 18 units."},
开源大模型的命名
很多开源大模型除了开源基础模型,还会提供指令微调后的版本。其中带“Instruct ”后缀的就是指令微调后的模型。不带的就是预训练完的基础模型。一般来说基础版本的NLP模型有更强的NLP能力,在CEval等项目的跑分更高,但是指令微调后的模型能更好的遵循指令执行文本描述的各种任务。
 
 但是指令微调数据也有很多种,如机器翻译、文本摘要和文本分类和对话等。有些开源模型是为了跟人对话,这种开源大模型的名字往往包含“chat”,比如百川的大语言模型,带“base”的就是预训练完毕的大语言模型,而带“chat”的就是使用会话指令微调后的模型。当然chat模型微调数据肯定不会只有会话,可能有很多种,不同的大模型混合数据集的策略各有侧重。

指令微调的框架
可以使用LLaMA Factory来对大语言模型进行指令微调。
参数高效的模型微调
通过指令微调,大语言模型能够更好地学习遵循和执行人类指令。然而,由于大语言模型的参数量巨大,进行全参数微调需要较多的算力资源开销,对普通学习或者小团队的开发者仍然很不友好。
参数高效微调 是一个重要的研究方向,旨在减少需要训练的模型参数量,同时保证微调后的模型性能能够与全量微调的表现相媲美。
这些方法包括:LoRA(低秩适配微调方法)、前缀微调(Prefix Tuning)等
参考资料
《大语言模型》 赵鑫 李军毅 周昆 唐天一 文继荣 著