LLM
some LLM’s model and weight are not opened to user
what is?
Llama 270b model
-
2 files
- parameters file
- parameter or weight of neural network
- parameter – 2bytes, float number
- code run parameters(inference)
- c or python, etc
- for c, 500 lines code without dependency to run
- self contained package(no network need)
- parameters file
-
how to get parameters?
- lossy compress large chunk of text (10TB) with 6000 GPU for 12 days (cost 200$) to 140G zip file(gestalt of the text, weights and parameters)
-
what neural do is trying to predict the next word in a sequence. parameters are dispersed throughout the neural network and neurons are connected to each other, fire in a certain way
-
prediction has strong relationship with compression
-
LLM create a correct form of text and fill it with its knowedge. not create a copy of text that was be trained.
-
how does it work?
training stage
-
pre-training
- expensive
- base model. get a document generator model
- it’s about knowledge
- internet documents
-
fine tuning
- cheaper
- assistant model. get a assistant model
- it’s about alighment
- Q&A document
- training with high quality conversation(question and answer).write labeling instructions to specify how assistant should behave
- focus on quality not amount
-
stage 3(optional)
- use comparison label
- reenforcement learning from human feedback
- labeling is a human-machine collaboration
- rank of LLM
LLM scaling laws:
- more D and N will get better model
-
multimodality. now some LLM like GPT can use different tools to help it with answering questions. browser, calculator, python interpreter.
-
future directions of development in LLM
give LLM system 2 ablility
- LLM now only have system one(instinctive)
- convert time to accuracy
self-improvement
- in narrow domain it is possible to self-improve
customization
experts in certain domain