LLM-Intro to Large Language Models

LLM

some LLM’s model and weight are not opened to user

what is?

Llama 270b model

2 files
- parameters file
  - parameter or weight of neural network
  - parameter – 2bytes, float number
- code run parameters(inference)
  - c or python, etc
  - for c, 500 lines code without dependency to run
  - self contained package(no network need)
how to get parameters?
- lossy compress large chunk of text (10TB) with 6000 GPU for 12 days (cost 200$) to 140G zip file(gestalt of the text, weights and parameters)
what neural do is trying to predict the next word in a sequence. parameters are dispersed throughout the neural network and neurons are connected to each other, fire in a certain way
prediction has strong relationship with compression
LLM create a correct form of text and fill it with its knowedge. not create a copy of text that was be trained.
how does it work？

在这里插入图片描述

training stage

pre-training
- expensive
- base model. get a document generator model
- it’s about knowledge
- internet documents
fine tuning
- cheaper
- assistant model. get a assistant model
- it’s about alighment
- Q&A document
- training with high quality conversation(question and answer).write labeling instructions to specify how assistant should behave
- focus on quality not amount
stage 3(optional)
- use comparison label
- reenforcement learning from human feedback

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

LLM scaling laws：

在这里插入图片描述

multimodality. now some LLM like GPT can use different tools to help it with answering questions. browser, calculator, python interpreter.
future directions of development in LLM