LLMs之ICL：《Bayesian scaling laws for in-context learning》翻译与解读

导读：这篇论文的核心议题是理解和建模大型语言模型（LLM）的上下文学习（ICL）能力。文章从贝叶斯学习的角度出发，提出了一套新的贝叶斯缩放定律来解释和预测ICL的表现。

>> 背景痛点：上下文学习（ICL）是LLM的一种强大能力，无需额外训练即可执行复杂任务，但现有研究对ICL性能与上下文示例数量之间的关系（ICL曲线）缺乏清晰的解释和预测模型。

● 无法准确预测ICL曲线的形状，这阻碍了对多样本ICL策略的有效性评估、预测潜在的对齐失败（例如多样本越狱攻击），以及确定抑制LLM不良行为所需微调的程度。

● 现有研究对ICL的底层学习机制存在多种假设（贝叶斯学习、梯度下降等），缺乏统一的理论框架。

● 后训练方法（如微调）在提高LLM安全性方面效果有限，ICL容易使被抑制的行为重新出现，这需要更深入的理解。

>> 具体的解决方案：论文提出了一套贝叶斯缩放定律来建模ICL曲线。该定律基于以下假设：ICL近似于贝叶斯学习器。通过贝叶斯定理，该定律将预测准确率与上下文示例数量联系起来，并包含可解释的参数，用于表示任务先验、学习效率和每个示例的概率。

>> 核心思路步骤：

● 贝叶斯模型的建立：将ICL建模为一个贝叶斯模型，包含符号集、任务集、任务先验概率分布和似然函数。

● 贝叶斯定理的应用：利用贝叶斯定理更新任务后验概率，随着上下文示例数量的增加，后验概率收敛到最可能的任务。

● ICL曲线的推导：推导出一个函数形式的贝叶斯缩放定律，该定律将上下文示例数量与下一个示例的预期概率联系起来。

● 模型简化和效率系数的引入：为了降低参数数量并考虑示例长度和信息量的影响，对原始定律进行了简化，引入了ICL效率系数K。

● 参数绑定策略：为了减少无法观测的参数数量，提出了两种参数绑定策略：基于采样和基于评分，降低了模型复杂度。

>> 优势：

● 更高的精度：实验结果表明，贝叶斯缩放定律在ICL曲线的插值和外推方面，都优于现有的基于幂律的缩放定律。

● 可解释性：该定律的参数具有可解释性，可以对任务先验、学习效率和每个示例的概率进行分析，从而深入理解LLM的内部机制。

>> 结论和观点：

● 贝叶斯缩放定律能够有效地描述和预测LLM的ICL行为，无论是在人工合成的简单数据集上，还是在真实世界的大型LLM和数据集上。

● 后训练方法（如监督微调和偏好学习强化学习）主要影响任务先验，而对模型对每个任务的知识影响较小，尤其是在模型规模较大的情况下。

● ICL能力随模型规模的增加而增强，学习效率也更高。

● 指令微调降低了有害行为的任务先验概率，但未能阻止多样本越狱攻击，说明单纯的指令微调可能不足以提高LLM的安全性。

● 虽然论文结果支持LLM进行贝叶斯推理的观点，但这并不构成严格的证明。LLM在真实世界中可能只近似地遵循贝叶斯行为。

总而言之，这篇论文提供了一种新的视角来理解和建模LLM的上下文学习能力，并提出了一种具有更高精度和可解释性的贝叶斯缩放定律。该定律为研究和改进LLM的安全性以及对齐问题提供了有价值的工具。

《Bayesian scaling laws for in-context learning》翻译与解读

Abstract

1、Introduction

7、Conclusion

《Bayesian scaling laws for in-context learning》翻译与解读

地址	论文地址：https://arxiv.org/abs/2410.16531
时间	2024年10月21日，最新日期2024年11月2日
作者	斯坦福大学

Abstract

In-context learning (ICL) is a powerful technique for getting language models to perform complex tasks with no training updates. Prior work has established strong correlations between the number of in-context examples provided and the accuracy of the model's predictions. In this paper, we seek to explain this correlation by showing that ICL approximates a Bayesian learner. This perspective gives rise to a family of novel Bayesian scaling laws for ICL. In experiments with \mbox{GPT-2} models of different sizes, our scaling laws exceed or match existing scaling laws in accuracy while also offering interpretable terms for task priors, learning efficiency, and per-example probabilities. To illustrate the analytic power that such interpretable scaling laws provide, we report on controlled synthetic dataset experiments designed to inform real-world studies of safety alignment. In our experimental protocol, we use SFT to suppress an unwanted existing model capability and then use ICL to try to bring that capability back (many-shot jailbreaking). We then experiment on real-world instruction-tuned LLMs using capabilities benchmarks as well as a new many-shot jailbreaking dataset. In all cases, Bayesian scaling laws accurately predict the conditions under which ICL will cause the suppressed behavior to reemerge, which sheds light on the ineffectiveness of post-training at increasing LLM safety.

上下文学习（ICL）是一种强大的技术，可以让语言模型在无需更新训练的情况下执行复杂的任务。先前的工作已经证明，提供的上下文示例的数量与模型预测准确性的相关性很强。在这篇论文中，我们试图通过证明ICL近似于贝叶斯学习者来解释这种相关性。这种观点产生了一系列新颖的贝叶斯缩放定律，用于ICL。在使用不同大小的GPT-2模型的实验中，我们的缩放定律在精度上超过了或与现有的缩放定律相匹配，同时提供了可解释的任务先验、学习效率和单个示例概率的术语。为了展示这些可解释的缩放定律的分析能力，我们报告了旨在为现实世界中的安全对齐研究提供信息的受控合成数据实验。在我们的实验协议中，我们使用SFT来抑制不想要的现有模型能力，然后使用ICL尝试恢复该能力（多示例越狱）。然后，我们在使用能力基准以及一个新的多示例越狱数据集的现实世界指令调整LLM上进行实验。在所有情况下，贝叶斯缩放定律都能准确预测ICL何时会导致被抑制的行为重新出现，这有助于阐明在提高LLM安全性方面，后训练方法的无效性。

1、Introduction

Large language models (LLMs) can infer how to perform a task given only demonstrations and without additional training updates. This capability is known as in-context learning (ICL; Brown et al., 2020; Dong et al., 2022). Under ICL, task performance generally increases with the number of demonstrations, though the precise relationship between these two quantities is unclear. We call this relationship the ICL curve and seek to model it. Being able to predict the shape of the ICL curve would help us decide whether to do many-shot ICL Agarwal et al. (2024) after testing only few-shot performance, predict potential alignment failures under many-shot jailbreaking (Anil et al., 2024), and decide how much fine-tuning we need in order to suppress ICL of undesirable behaviours.

The learning algorithm underlying ICL has been characterised as Bayesian by Xie et al. (2022) and many later works (section 2). Drawing on this line of research, we use Bayes’ theorem to derive a family of Bayesian scaling laws for ICL (section 3) which model the ICL curve of an ideal Bayesian learner.

大型语言模型（LLMs）可以在仅提供示例的情况下，无需额外的训练更新来推断如何执行任务。这种能力被称为上下文无关学习（ICL；Brown et al.， 2020; Dong et al.， 2022）。在ICL的情况下，随着示例数量的增加，任务性能通常会提高，尽管这两个量之间的确切关系尚不清楚。我们称这种关系为ICL曲线，并试图对其进行建模。能够预测ICL曲线的形状将有助于我们决定是否在仅测试了少量示例性能后进行ICL，预测在进行大量ICL解锁时可能出现的对齐失败（Anil et al.， 2024），并决定为了抑制不需要的行为的ICL需要进行多少微调。

ICL背后的学习算法已被Xie等人（2022）和其他许多后续工作（第2节）归类为贝叶斯算法。借鉴这一研究线，我们使用贝叶斯定理推导出一组贝叶斯缩放定律（第3节），用于建模理想贝叶斯学习者的ICL曲线。

To evaluate the performance of our Bayesian laws, we model the ICL curve for gpt2 models trained on simple synthetic data following Xie et al. (2022) as well as real-world LLMs tested on standard benchmarks (section 4.1). Compared to the power laws proposed by Anil et al. (2024), our Bayesian laws achieve lower error rates on both interpolation and extrapolation of the ICL curve, while also providing interpretable parameters for the prior over tasks, the efficiency of ICL, and per-example probabilities under different tasks. In our second set of experiments (section 4.2), we present a case study using our Bayesian laws to model how post-training affects ICL of favoured and disfavoured behaviours. On toy models, we find that smaller amounts of post-training strongly change the prior over tasks but not the model’s knowledge of each task, and the amount of post-training needed to suppress ICL of disfavoured tasks increases with scale.

Finally, we present experiments on real-world LLMs ranging from 1B to 405B parameters (section 5). Our laws accurately predict the ICL behaviour of several models on both capabilities and safety benchmarks and a new many-shot jailbreaking dataset we introduce. We then compare Llama 3.1 8B Base and Instruct using one of our Bayesian scaling laws (section 5.2) and find that alignment merely reduces the prior probability of harmful behaviour but not its learnability under ICL. Our work thus introduces a tool for interpreting the task knowledge of LLMs using purely behavioural observations, which we hope is valuable for improving LLM alignment.

为了评估我们提出的贝叶斯定律的性能，我们按照Xie等人（2022）的方法以及对标准基准测试（第4.1节）进行测试的实际LLM模型，对gpt2模型在简单合成数据上的ICL曲线进行了建模。与Anil等人（2024）提出的幂定律相比，我们的贝叶斯定律在ICL曲线的插值和外推方面具有更低的误差率，同时为任务的先验、ICL的效率以及不同任务下的每例概率提供了可解释的参数。在第二组实验（第4.2节）中，我们通过使用我们的贝叶斯定律来研究后训练如何影响偏好和不偏好的行为的ICL。在玩具模型上，我们发现较小量的后训练会强烈改变任务的先验，但不会改变模型对每个任务的知识，并且抑制不偏好任务的ICL所需的后训练量随规模的增加而增加。最后，我们在从1B到405B参数的真实世界LLM上进行了实验（第5节）。我们的定律准确地预测了几种模型在能力和安全性基准上的ICL行为，以及我们引入的一个新的多示例越狱数据集。然后，我们使用其中一个贝叶斯缩放定律（第5.2节）将Llama 3.1 8B Base和Instruct进行比较，发现对齐只会降低有害行为的先验概率，但在ICL下不会降低其可学习性。因此，我们的工作引入了一种仅基于行为观察来解释LLM任务知识的工具，我们希望这对改进LLM对齐是有价值的。

7、Conclusion

In this paper, we combined two questions to make progress at understanding ICL: (1) what scaling law best describes ICL, and (2) is ICL Bayesian? We showed that Bayesian assumptions naturally lead to a scaling law for ICL, and that Bayesian scaling laws are a great fit for both ICL behaviour by small LMs trained on controlled synthetic data, as well as LLMs trained on natural language. Using a Bayesian formulation gave us interpretable parameters for the prior, learning efficiency, and task-conditional probabilities, which can help us understand how model behaviour changes under alignment. We use these to show how ICL ability varies at different model scales, understand how finetuning harms knowledge of disfavoured distributions, and compare base and instruction-tuned LLMs. We are confident that further progress on understanding ICL is possible through the empirical science of scaling laws.

在这篇论文中，我们将两个问题结合起来，以更好地理解ICL：

(1)描述ICL的最佳标度定律是什么？(2)ICL是贝叶斯的吗？

我们证明了贝叶斯假设自然地导致了ICL的标度定律，并且贝叶斯标度定律非常适合由受控合成数据训练的小型LM以及由自然语言训练的LLM的ICL行为。采用贝叶斯形式使我们能够解释先验、学习效率和任务条件概率等可解释的参数，这有助于我们理解模型行为在对齐时的变化。我们使用这些参数来展示ICL能力在不同模型规模下的变化情况，了解微调如何损害对不受欢迎分布的了解，并比较基础LLM和基于指令的LLM。我们相信，通过标度定律的实证科学，可以进一步推进对ICL的理解。