LLM--打造Private GPT需要知道的一些概念及术语

文章目录

大模型存储格式
- GGML
- GGUF
Embedding
- 概念
- 分类
术语
- Llamaindex
- LlamaCPP
- Poetry
- ASGI
- FastAPI
- Chroma
- Qdrant
- gradio
- MRL

大模型存储格式

大模型的存储一个很重要的问题是它的模型文件巨大，而模型的结构、参数等也会影响模型的推理效果和性能，为了让大模型更加高效的存储和交换，就有了不同格式的大模型文件。

GGML

GGML (GPT-Generated Model Language): Developed by Georgi Gerganov, GGML is a tensor library designed for machine learning, facilitating large models and high performance on various hardware, including Apple Silicon.
Pros
- Early Innovation: GGML represented an early attempt to create a file format for GPT models.
- Single File Sharing: It enabled sharing models in a single file, enhancing convenience.
- CPU Compatibility: GGML models could run on CPUs, broadening accessibility.
Cons
- Limited Flexibility: GGML struggled with adding extra information about the model.
- Compatibility Issues: Introduction of new features often led to compatibility problems with older models.
- Manual Adjustments Required: Users frequently had to modify settings like rope-freq-base, rope-freq-scale, gqa, and rms-norm-eps, which could be complex.

GGUF

GGUF (GPT-Generated Unified Format), introduced as a successor to GGML (GPT-Generated Model Language), was released on the 21st of August, 2023. This format represents a significant step forward in the field of language model file formats, facilitating enhanced storage and processing of large language models like GPT.

Pros
- Addresses GGML Limitations: GGUF is designed to overcome GGML’s shortcomings and enhance user experience.
- Extensibility: It allows for the addition of new features while maintaining compatibility with older models.
- Stability: GGUF focuses on eliminating breaking changes, easing the transition to newer versions.
- Versatility: Supports various models, extending beyond the scope of llama models.
Cons
- Transition Time: Converting existing models to GGUF may require significant time.
- Adaptation Required: Users and developers must become accustomed to this new format.

Embedding

概念

Embedding 嵌入是一种机器学习概念，用于将数据映射到高维空间中，在高维空间中，相似语义的数据被放置在一起
Embedding Model 【嵌入模型】

通常是来自BERT或其他Transformer家族的深度神经网络
可以用一系列称为向量的数字有效地表示文本、图像和其他数据类型的语义。
关键特征是在高维空间中向量之间的数学距离可以表示原始文本或图像的语义相似性。

Embedding Function	Type	API or Open-sourced
openai	Dense	API
sentence-transformer	Dense	Open-sourced
bm25	Sparse	Open-sourced
Splade	Sparse	Open-sourced
bge-m3	Hybrid	Open-sourced

术语

Llamaindex

LlamaIndex is the leading data framework for building LLM applications

源码地址

LlamaCPP

LlamaCPP： Inference of Meta’s LLaMA model (and others) in pure C/C++，是一个基于Meta公司的LLaMA模型的纯C/C++版本的推理框架。它主要用于模型推理

主要支持的是Meta公司的LLaMA系列模型，如LLaMA 2、Code Llama、Falcon、Baichuan等。这些模型都是基于LLaMA架构的，并且经过特定的格式转换（如转换为gguf格式）后，才能在LlamaCPP中进行推理。

想要使用LlamaCPP进行模型推理，需要确保你选择的模型是LLaMA系列的，并且已经转换为LlamaCPP所支持的格式

The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud.

Poetry

PoetryPython packaging and dependency management tool

Poetry可以帮助您声明、管理和安装Python项目的依赖项，确保您在任何地方都有正确的堆栈。

ASGI

ASGI（Asynchronous Server Gateway Interface）是一种 Python 异步 Web 服务器和应用程序之间通信的接口标准。与传统的 WSGI（Web Server Gateway Interface）相比，ASGI 更适用于高并发和实时性要求高的应用程序，例如聊天应用、实时通知、在线游戏等。

Django ASGI 是 Django 框架的 ASGI 版本，它允许 Django 应用程序以异步方式处理请求和响应。
uvicorn: an ASGI web server implementation for Python.
Hypercorn is an ASGI web server based on the sans-io hyper, h11, h2, and wsproto libraries and inspired by Gunicorn. Hypercorn supports HTTP/1, HTTP/2, WebSockets (over HTTP/1 and HTTP/2), ASGI/2, and ASGI/3 specifications. Hypercorn can utilise asyncio, uvloop, or trio worker types.