Gemma-2B离线运行-基于transformer

下载模型

一般而言，模型和模型参数可以通过如下三个模型源进行相应的下载：
HuggingFace | ModelScope | WiseModel
本实例中，使用的是HuggingFace的源下载，相应的地址如下：
https://huggingface.co/google/gemma-2b-it

环境准备

本项目需要使用transformers,在安装transformers前，由于项目内部需要使用pyTorch，因此需要线性安装CUDA和pyTorch.安装方式可见之前的文章：
Windows安装Torch
安装完成后，安装项目需要的transformers即可。

pip install transformers

我们推荐的版本需要在4.37.2以后，本项目中使用的是4.37.2的版本。

模型的使用

使用如下google提供的调用的官方代码即可：

from transformers import AutoTokenizer, AutoModelForCausalLM'''
AutoTokenizer用于加载预训练的分词器
AutoModelForCausalLM则用于加载预训练的因果语言模型（Causal Language Model），这种模型通常用于文本生成任务
'''
'''
本地化的模型文件的存储地址
'''
MODEL_PATH = r"C:\VM\Chatbot\gemma-2b"tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, token='。。。')
# 加载gemma-2b的预训练分词器，使用制定GPU加载和推理
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, device_map="cuda:0")
# 加载gemma-2b的预训练语言生成模型
'''
使用其他几个进行文本续写，其他的地方是一样的，就这里加载的预训练模型不同：
"google/gemma-2b-it"
"google/gemma-7b"
"google/gemma-7b-it"
'''
#输入的问题
input_text = "Write me a poem about Machine Learning."
# 定义了要生成文本的初始输入
input_ids = tokenizer(input_text, return_tensors="pt").to(model.device)
# 使用前面加载的分词器将input_text转换为模型可理解的数字表示【token id】
# return_tensors="pt"表明返回的是PyTorch张量格式。outputs = model.generate(**input_ids,max_length=100)
# 使用模型和转换后的输入input_ids来生成文本，并定义响应的内容和响应的长度
print(tokenizer.decode(outputs[0]))
# 将生成的文本令牌解码为人类可读的文本，并打印出来

异常处理

报错执行指令后，报错内容如下：

Gemma's activation function should be approximate GeLU and not exact GeLU.
Changing the activation function to `gelu_pytorch_tanh`.if you want to use the legacy `gelu`, edit the `model.config` to set `hidden_activation=gelu`   instead of `hidden_act`. See https://github.com/huggingface/transformers/pull/29402 for more details.