大模型知识总结
重点论文
论文列表:https://zhuanlan.zhihu.com/p/622541777
-
gpt2: Language Models are Unsupervised Multitask Learners
-
gpt3: Language Models are Few-Shot Learners
-
openai-RHLF(gpt-3.5): Training language models to follow instructions with human feedback
– 主要讨论如果通过SFT,RM和PPO三个步骤完成大模型的微调工作 -
待更新
预训练
微调
- LORA:(指微调阶段不更新模型主体参数,只对指定结构的参数进行更新,达到最高效的学习效果):https://zhuanlan.zhihu.com/p/646831196?utm_id=0
RLHF
- https://learn.microsoft.com/en-us/semantic-kernel/agents/
- https://lilianweng.github.io/posts/2023-06-23-agent/