汇总开源大模型的本地API启动方式

文章目录

    • CodeGeex2
    • ChatGLM2_6B
    • Baichuan2_13B
    • sqlcoder
    • 开启后测试

CodeGeex2

from fastapi import FastAPI, Request
from transformers import AutoTokenizer, AutoModel
import uvicorn, json, datetime
import torch
import argparse
try:import chatglm_cppenable_chatglm_cpp = True
except:print("[WARN] chatglm-cpp not found. Install it by `pip install chatglm-cpp` for better performance. ""Check out https://github.com/li-plus/chatglm.cpp for more details.")enable_chatglm_cpp = FalseLANGUAGE_TAG = {"Abap"         : "* language: Abap","ActionScript" : "// language: ActionScript","Ada"          : "-- language: Ada","Agda"         : "-- language: Agda","ANTLR"        : "// language: ANTLR","AppleScript"  : "-- language: AppleScript","Assembly"     : "; language: Assembly","Augeas"       : "// language: Augeas","AWK"          : "// language: AWK","Basic"        : "' language: Basic","C"            : "// language: C","C#"           : "// language: C#","C++"          : "// language: C++","CMake"        : "# language: CMake","Cobol"        : "// language: Cobol","CSS"          : "/* language: CSS */","CUDA"         : "// language: Cuda","Dart"         : "// language: Dart","Delphi"       : "{language: Delphi}","Dockerfile"   : "# language: Dockerfile","Elixir"       : "# language: Elixir","Erlang"       : f"% language: Erlang","Excel"        : "' language: Excel","F#"           : "// language: F#","Fortran"      : "!language: Fortran","GDScript"     : "# language: GDScript","GLSL"         : "// language: GLSL","Go"           : "// language: Go","Groovy"       : "// language: Groovy","Haskell"      : "-- language: Haskell","HTML"         : "<!--language: HTML-->","Isabelle"     : "(*language: Isabelle*)","Java"         : "// language: Java","JavaScript"   : "// language: JavaScript","Julia"        : "# language: Julia","Kotlin"       : "// language: Kotlin","Lean"         : "-- language: Lean","Lisp"         : "; language: Lisp","Lua"          : "// language: Lua","Markdown"     : "<!--language: Markdown-->","Matlab"       : f"% language: Matlab","Objective-C"  : "// language: Objective-C","Objective-C++": "// language: Objective-C++","Pascal"       : "// language: Pascal","Perl"         : "# language: Perl","PHP"          : "// language: PHP","PowerShell"   : "# language: PowerShell","Prolog"       : f"% language: Prolog","Python"       : "# language: Python","R"            : "# language: R","Racket"       : "; language: Racket","RMarkdown"    : "# language: RMarkdown","Ruby"         : "# language: Ruby","Rust"         : "// language: Rust","Scala"        : "// language: Scala","Scheme"       : "; language: Scheme","Shell"        : "# language: Shell","Solidity"     : "// language: Solidity","SPARQL"       : "# language: SPARQL","SQL"          : "-- language: SQL","Swift"        : "// language: swift","TeX"          : f"% language: TeX","Thrift"       : "/* language: Thrift */","TypeScript"   : "// language: TypeScript","Vue"          : "<!--language: Vue-->","Verilog"      : "// language: Verilog","Visual Basic" : "' language: Visual Basic",
}app = FastAPI()
def device(config, model_path):if enable_chatglm_cpp and config.use_chatglm_cpp:print("Using chatglm-cpp to improve performance")dtype = "f16" if config.half else "f32"if config.quantize in [4, 5, 8]:dtype = f"q{config.quantize}_0"model = chatglm_cpp.Pipeline(model_path, dtype=dtype)return modelprint("chatglm-cpp not enabled, falling back to transformers")if config.device != "cpu":if not config.half:model = AutoModel.from_pretrained(model_path, trust_remote_code=True).cuda(int(config.device))else:model = AutoModel.from_pretrained(model_path, trust_remote_code=True).cuda(int(config.device)).half()if config.quantize in [4, 8]:print(f"Model is quantized to INT{config.quantize} format.")model = model.half().quantize(config.quantize)else:model = AutoModel.from_pretrained(model_path, trust_remote_code=True)return model.eval()@app.post("/")
async def create_item(request: Request):global model, tokenizerjson_post_raw = await request.json()json_post = json.dumps(json_post_raw)json_post_list = json.loads(json_post)lang = json_post_list.get('lang')prompt = json_post_list.get('prompt')max_length = json_post_list.get('max_length', 128)top_p = json_post_list.get('top_p', 0.95)temperature = json_post_list.get('temperature', 0.2)top_k = json_post_list.get('top_k', 0)if lang != "None":prompt = LANGUAGE_TAG[lang] + "\n" + promptif enable_chatglm_cpp and use_chatglm_cpp:response = model.generate(prompt,max_length=max_length,do_sample=temperature > 0,top_p=top_p,top_k=top_k,temperature=temperature)else:response = model.chat(tokenizer,prompt,max_length=max_length,top_p=top_p,top_k=top_k,temperature=temperature)now = datetime.datetime.now()time = now.strftime("%Y-%m-%d %H:%M:%S")answer = {"response": response,"lang": lang,"status": 200,"time": time}return answerdef api_start(config):global use_chatglm_cppuse_chatglm_cpp = config.use_chatglm_cppmodel_path = "CodeModels/CodeGeex2"global tokenizerglobal modeltokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)model = device(config, model_path)uvicorn.run(app, host="0.0.0.0", port=7861, workers=1)

ChatGLM2_6B

from fastapi import FastAPI, Request
from transformers import AutoTokenizer, AutoModel
import uvicorn, json, datetime
import torchdef torch_gc(mydevice):if torch.cuda.is_available():with torch.cuda.device(mydevice):torch.cuda.empty_cache()torch.cuda.ipc_collect()app = FastAPI()
def device(config, model_path):if config.device != "cpu":if not config.half:model = AutoModel.from_pretrained(model_path, trust_remote_code=True).cuda(int(config.device))else:model = AutoModel.from_pretrained(model_path, trust_remote_code=True).cuda(int(config.device)).half()if config.quantize in [4, 8]:print(f"Model is quantized to INT{config.quantize} format.")model = model.half().quantize(config.quantize)else:model = AutoModel.from_pretrained(model_path, trust_remote_code=True)return model.eval()@app.post("/")
async def create_item(request: Request):global model, tokenizerjson_post_raw = await request.json()json_post = json.dumps(json_post_raw)json_post_list = json.loads(json_post)prompt = json_post_list.get('prompt')history = json_post_list.get('history', [])max_length = json_post_list.get('max_length', 2048)top_p = json_post_list.get('top_p', 0.7)temperature = json_post_list.get('temperature', 0.95)top_k = json_post_list.get('top_k', 0)response, history = model.chat(tokenizer,prompt,history=history,max_length=max_length,top_p=top_p,temperature=temperature)now = datetime.datetime.now()time = now.strftime("%Y-%m-%d %H:%M:%S")answer = {"response": response,"history": history,"status": 200,"time": time}torch_gc(model.device)return answerdef api_start(config):model_path = "LanguageModels/ChatGLM2_6B/"global tokenizerglobal modeltokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)model = device(config, model_path)uvicorn.run(app, host="0.0.0.0", port=7862, workers=1)

Baichuan2_13B

from fastapi import FastAPI, Request
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation.utils import GenerationConfig
import uvicorn, json, datetime
import torchdef torch_gc(mydevice):if torch.cuda.is_available():with torch.cuda.device(mydevice):torch.cuda.empty_cache()torch.cuda.ipc_collect()app = FastAPI()
def device(config, model_path):model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)model.generation_config = GenerationConfig.from_pretrained(model_path)return model.eval()@app.post("/")
async def create_item(request: Request):global model, tokenizerjson_post_raw = await request.json()json_post = json.dumps(json_post_raw)json_post_list = json.loads(json_post)prompt = json_post_list.get('prompt')messages = []messages.append({"role": "user", "content": prompt})response = model.chat(tokenizer, messages)now = datetime.datetime.now()time = now.strftime("%Y-%m-%d %H:%M:%S")answer = {"response": response,"status": 200,"time": time}torch_gc(model.device)return answerdef api_start(config):model_path = "LanguageModels/Baichuan2_13B_Chat/"global tokenizerglobal modeltokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False, trust_remote_code=True)model = device(config, model_path)uvicorn.run(app, host="0.0.0.0", port=7863, workers=1)

sqlcoder

from fastapi import FastAPI, Request
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.generation.utils import GenerationConfig
import uvicorn, json, datetime
import torchdef torch_gc(mydevice):if torch.cuda.is_available():with torch.cuda.device(mydevice):torch.cuda.empty_cache()torch.cuda.ipc_collect()app = FastAPI()
def device(config, model_path):model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", load_in_8bit=True, use_cache=True, trust_remote_code=True)return model.eval()@app.post("/")
async def create_item(request: Request):global model, tokenizerjson_post_raw = await request.json()json_post = json.dumps(json_post_raw)json_post_list = json.loads(json_post)prompt = json_post_list.get('prompt')eos_token_id = tokenizer.convert_tokens_to_ids(["```"])[0]inputs = tokenizer(prompt, return_tensors="pt").to("cuda")generated_ids = model.generate(**inputs,num_return_sequences=1,eos_token_id=eos_token_id,pad_token_id=eos_token_id,max_new_tokens=400,do_sample=False,num_beams=5)outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)response = outputs[0].split("```sql")[-1].split("```")[0].split(";")[0].strip() + ";"now = datetime.datetime.now()time = now.strftime("%Y-%m-%d %H:%M:%S")answer = {"response": response,"status": 200,"time": time}torch_gc(model.device)return answerdef api_start(config):model_path = "CodeModels/sqlcoder/"global tokenizerglobal modeltokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)model = device(config, model_path)uvicorn.run(app, host="0.0.0.0", port=7864, workers=1)

开启后测试

curl -X POST "http://127.0.0.1:7864 -H 'Content-Type: application/json' -d '{"prompt": "你的名字是"}'

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/97061.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

git stash详解

stash:保存现场 1.建议&#xff08;规范&#xff09; &#xff1a;在功能未没有开发完毕前&#xff0c;不要commit 2.规定&#xff08;必须&#xff09; &#xff1a; 在没有commit之前&#xff0c;不能chekcout切换分支 &#xff08;不在同一个commit阶段&#xff09; 如果还没…

缓存与数据库双写一致性问题解决方案

其实如果使用缓存&#xff0c;就会出现缓存和数据库的不一致问题&#xff0c;关键在于我们可以接受不一致的时间是多少&#xff0c;根据不同的需求采取不同的实现方案。 第一种&#xff1a;先更新数据库后更新缓存 做法简单&#xff0c;但是并发写情况下&#xff0c;会出现数…

Mac安装GYM遇到的一些坑

以下是遇到的一些问题 安装GitHub上说的直接 pip install gym成功了&#xff0c;但是运行实例报错没安装gym[classic_control]&#xff0c;所以就全安装一下[all] 安装GitHub上说的直接 pip install gym成功了&#xff0c;但是运行实例报错没安装gym[classic_control]&#xff…

nodejs+vue快递管理服务系统elementui

电子商务改变了人们的传统经济活动中的交易方式和流通技术&#xff0c; 复杂的物流快递信息需要有效的进行处理&#xff0c;构建一个快递管理服务系统可以确保物流信息的一致性、员工登录&#xff1a;通过用户名和密码登录。这也间接带动了物流快递行业的高速发展。 &#xff0…

运行huggingface Kosmos2报错 nameerror: name ‘kosmos2tokenizer‘ is not defined

尝试运行huggingface上的Kosmos,https://huggingface.co/ydshieh/kosmos-2-patch14-224失败,报错: nameerror: name kosmos2tokenizer is not defined查看报错代码: vi /root/.cache/huggingface/modules/transformers_modules/ydshieh/kosmos-2-patch14-224/48e3edebaeb…

用rabbitMq 怎么处理“延迟消息队列”?

延迟消息队列是一种允许消息在发送后等待一段时间&#xff0c;然后再被消费的机制。这种机制通常用于需要延迟处理的应用场景&#xff0c;如定时任务、消息重试、消息调度等。在 RabbitMQ 中&#xff0c;实现延迟消息队列需要使用一些额外的组件和技术&#xff0c;因为 RabbitM…

一天一八股——TCP保活keepalive和HTTP的Keep-Alive

TCP属于传输层&#xff0c;关于TCP的设置在内核态完成 HTTP属于用户层的协议&#xff0c;主要用于web服务器和浏览器之间的 http的Keep-Alive都是为了减少多次建立tcp连接采用的保持长连接的机制&#xff0c;而tcp的keepalive是为了保证已经建立的tcp连接依旧可用(双端依旧可以…

KylinOSv10系统k8s集群启动mysql5.7占用内存高的问题

问题现象 麒麟系统搭建k8s集群 mysql的pod启动失败 describe查看ommkill&#xff0c;放大limit资源限制到30G依旧启动失败 系统 报错信息 原因 内存占用太高 open_files_limit初始化太高 解决&#xff1a; 1、更换镜像 链接: https://pan.baidu.com/s/1b9uJLcc5Os0uDqD1e…

3. 无重复字符的最长子串(枚举+滑动窗口)

目录 一、题目 二、代码 一、题目 力扣&#xff08;LeetCode&#xff09;官网 - 全球极客挚爱的技术成长平台 二、代码 class Solution { public:int lengthOfLongestSubstring(string s) {int _MaxLength 0;int left 0, right 0;vector<int>hash(128, 0);//ASCII…

Spring AOP(JavaEE进阶系列5)

目录 前言&#xff1a; 1.什么是Spring AOP 2.为什么要使用AOP呢&#xff1f; 3.AOP的组成 3.1切面 3.2切点 3.3通知 3.4连接点 4.Spring AOP的实现 4.1添加依赖 4.2定义切面 4.3定义切点 4.4实现通知 5.AOP的实现原理 结束语&#xff1a; 前言&#xff1a; 在…

element ui中父子组件共用一个el-dialog弹窗,切换组件页面弹窗进行关闭

在Element UI中&#xff0c;如果多个父子组件共用一个el-dialog弹窗&#xff0c;并且需要在切换组件页面时关闭弹窗&#xff0c;你可以考虑以下方法来实现&#xff1a; 使用Vuex进行状态管理&#xff1a; 在Vuex中创建一个状态来管理弹窗的显示状态&#xff08;例如&#xff0…

NOSQL Redis 数据持久化 RDB、AOF(二) 恢复

redis 执行flushall 或 flushdb 也会产生dump.rdb文件&#xff0c;但里面是空的。 注意&#xff1a;千万执行&#xff0c;不然rdb文件会被覆盖的。 dump.rdb 文件如何恢复数据 讲备份文件 dump.rdb 移动到redis安装目录并启动服务即可。 dump.rdb 自动触发 和手动触发 自…

【GIT版本控制】--项目管理与工具

一、使用Git与项目管理工具的集成 将Git与项目管理工具集成在一起可以有效地跟踪和管理软件开发项目。以下是如何使用Git与项目管理工具&#xff08;如GitHub、GitLab、Bitbucket和Jira&#xff09;进行集成的关键方法&#xff1a; 创建问题或任务&#xff1a; 项目管理工具通…

IDEA 2023.1.3图文安装教程及下载

IDEA 2023.1 最新变化是在 IDEA 2023.1 中&#xff0c;对新 UI 做出了大量改进。实现了性能增强&#xff0c;从而更快导入 Maven&#xff0c;以及在打开项目时更早提供 IDE 功能。 新版本通过后台提交检查提供了简化的提交流程。 IntelliJ IDEA Ultimate 现在支持 Spring Secur…

智慧电力物联网系统引领电力行业数字化发展

智慧电力物联网系统是以提高用户侧电力运行安全、降低运维成本为目的的一套电力运维管理系统。综合分析采用智慧物联网、人工智能等现代化经济信息网络技术&#xff0c;配置智能采集终端、小安神童值班机器人或边缘网关&#xff0c;实现对企事业用户供配电系统的数字化远程监控…

Elasticsearch:使用 ELSER 文本扩展进行语义搜索

在今天的文章里&#xff0c;我来详细地介绍如何使用 ELSER 进行文本扩展驱动的语义搜索。 安装 Elasticsearch 及 Kibana 如果你还没有安装好自己的 Elasticsearch 及 Kibana&#xff0c;请参考如下的链接来进行安装&#xff1a; 如何在 Linux&#xff0c;MacOS 及 Windows 上…

Opengl之立方体贴图

简单来说,立方体贴图就是一个包含了6个2D纹理的纹理,每个2D纹理都组成了立方体的一个面:一个有纹理的立方体。你可能会奇怪,这样一个立方体有什么用途呢?为什么要把6张纹理合并到一张纹理中,而不是直接使用6个单独的纹理呢?立方体贴图有一个非常有用的特性,它可以通过一…

【计算机视觉|人脸建模】学习从图像中回归3D面部形状和表情而无需3D监督

本系列博文为深度学习/计算机视觉论文笔记&#xff0c;转载请注明出处 标题&#xff1a;Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision 链接&#xff1a;[1905.06817] Learning to Regress 3D Face Shape and Expression from an I…

Mac迁移的时候,如何同步启动台图标布局

北京 当迁移MacOS数据到另一台电脑的时候&#xff08;或者用时间机器还原重置前的电脑状态的时候&#xff09;&#xff0c;有少数地方是不同步的&#xff0c;其中&#xff0c;启动台布局就是一个&#xff0c;这点看似不重要&#xff0c;实际上还是对于之前对App位置的记忆丢失还…

OpenCV实现求解单目相机位姿

单目相机通过对极约束来求解相机运动的位姿。参考了ORBSLAM中单目实现的代码&#xff0c;这里用opencv来实现最简单的位姿估计. mLeftImg cv::imread(lImg, cv::IMREAD_GRAYSCALE); mRightImg cv::imread(rImg, cv::IMREAD_GRAYSCALE); cv::Ptr<ORB> OrbLeftExtractor …