Ubuntu20.04配置qwen0.5B记录

环境简介

Ubuntu20.04、
NVIDIA-SMI 545.29.06、
Cuda 11.4、
python3.10、
pytorch1.11.0

开始搭建

python环境设置

创建虚拟环境

conda create --name qewn python==3.10

预安装modelscope和transformers

pip install modelscope
pip install transformers

安装pytorch

conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3
模型需要下载

创建一个python文件

gedit download.py

里面复制如下内容

from modelscope.hub.file_download import model_file_downloadmodel_dir = model_file_download(model_id='qwen/Qwen1.5-0.5B-Chat-GGUF',file_path='qwen1_5-0_5b-chat-q5_k_m.gguf',revision='master',cache_dir='path/to/local/dir')

运行python文件进行下载

python download.py
下载llama.cpp

使⽤git命令克隆llama.cpp项⽬

git clone https://github.com/ggerganov/llama.cpp

克隆完成之后我们进入llama.cpp目录中,对项目进行编译

cd llama.cpp
make -j
模型下载

在魔搭社区中下载模型运行
https://www.modelscope.cn/models/qwen/Qwen1.5-0.5B-Chat-GGUF/files
本人下载的是qwen1_5-0_5b-chat-q5_k_m.gguf
终端运行,其中模型替换为自己的模型地址(官方给的-cml参数在help中没有找到,且影响运行,所以我删除掉了)
官方:

./main -m /path/to/local/dir/qwen/Qwen1.5-0.5B-Chat-GGUF/qwen1_5-0_5b-chat-q5_k_m.gguf -n 512 --color -i -cml -f prompts/chat-with-qwen.txt

我运行:

./main -m /path/to/local/dir/qwen/Qwen1.5-0.5B-Chat-GGUF/qwen1_5-0_5b-chat-q5_k_m.gguf -n 512 --color -i -f prompts/chat-with-qwen.txt

help内容:

usage: ./main [options]general:-h,    --help, --usage          print usage and exit--version                show version and build info-v,    --verbose                print verbose information--verbosity N            set specific verbosity level (default: 0)--verbose-prompt         print a verbose prompt before generation (default: false)--no-display-prompt      don't print prompt at generation (default: false)-co,   --color                  colorise output to distinguish prompt and user input from generations (default: false)-s,    --seed SEED              RNG seed (default: -1, use random seed for < 0)-t,    --threads N              number of threads to use during generation (default: 8)-tb,   --threads-batch N        number of threads to use during batch and prompt processing (default: same as --threads)-td,   --threads-draft N        number of threads to use during generation (default: same as --threads)-tbd,  --threads-batch-draft N  number of threads to use during batch and prompt processing (default: same as --threads-draft)--draft N                number of tokens to draft for speculative decoding (default: 5)-ps,   --p-split N              speculative decoding split probability (default: 0.1)-lcs,  --lookup-cache-static FNAMEpath to static lookup cache to use for lookup decoding (not updated by generation)-lcd,  --lookup-cache-dynamic FNAMEpath to dynamic lookup cache to use for lookup decoding (updated by generation)-c,    --ctx-size N             size of the prompt context (default: 0, 0 = loaded from model)-n,    --predict N              number of tokens to predict (default: -1, -1 = infinity, -2 = until context filled)-b,    --batch-size N           logical maximum batch size (default: 2048)-ub,   --ubatch-size N          physical maximum batch size (default: 512)--keep N                 number of tokens to keep from the initial prompt (default: 0, -1 = all)--chunks N               max number of chunks to process (default: -1, -1 = all)-fa,   --flash-attn             enable Flash Attention (default: disabled)-p,    --prompt PROMPT          prompt to start generation with (default: '')-f,    --file FNAME             a file containing the prompt (default: none)--in-file FNAME          an input file (repeat to specify multiple files)-bf,   --binary-file FNAME      binary file containing the prompt (default: none)-e,    --escape                 process escapes sequences (\n, \r, \t, \', \", \\) (default: true)--no-escape              do not process escape sequences-ptc,  --print-token-count N    print token count every N tokens (default: -1)--prompt-cache FNAME     file to cache prompt state for faster startup (default: none)--prompt-cache-all       if specified, saves user input and generations to cache as wellnot supported with --interactive or other interactive options--prompt-cache-ro        if specified, uses the prompt cache but does not update it-r,    --reverse-prompt PROMPT  halt generation at PROMPT, return control in interactive modecan be specified more than once for multiple prompts-sp,   --special                special tokens output enabled (default: false)-cnv,  --conversation           run in conversation mode (does not print special tokens and suffix/prefix) (default: false)-i,    --interactive            run in interactive mode (default: false)-if,   --interactive-first      run in interactive mode and wait for input right away (default: false)-mli,  --multiline-input        allows you to write or paste multiple lines without ending each in '\'--in-prefix-bos          prefix BOS to user inputs, preceding the `--in-prefix` string--in-prefix STRING       string to prefix user inputs with (default: empty)--in-suffix STRING       string to suffix after user inputs with (default: empty)sampling:--samplers SAMPLERS      samplers that will be used for generation in the order, separated by ';'(default: top_k;tfs_z;typical_p;top_p;min_p;temperature)--sampling-seq SEQUENCE  simplified sequence for samplers that will be used (default: kfypmt)--ignore-eos             ignore end of stream token and continue generating (implies --logit-bias EOS-inf)--penalize-nl            penalize newline tokens (default: false)--temp N                 temperature (default: 0.8)--top-k N                top-k sampling (default: 40, 0 = disabled)--top-p N                top-p sampling (default: 0.9, 1.0 = disabled)--min-p N                min-p sampling (default: 0.1, 0.0 = disabled)--tfs N                  tail free sampling, parameter z (default: 1.0, 1.0 = disabled)--typical N              locally typical sampling, parameter p (default: 1.0, 1.0 = disabled)--repeat-last-n N        last n tokens to consider for penalize (default: 64, 0 = disabled, -1 = ctx_size)--repeat-penalty N       penalize repeat sequence of tokens (default: 1.0, 1.0 = disabled)--presence-penalty N     repeat alpha presence penalty (default: 0.0, 0.0 = disabled)--frequency-penalty N    repeat alpha frequency penalty (default: 0.0, 0.0 = disabled)--dynatemp-range N       dynamic temperature range (default: 0.0, 0.0 = disabled)--dynatemp-exp N         dynamic temperature exponent (default: 1.0)--mirostat N             use Mirostat sampling.Top K, Nucleus, Tail Free and Locally Typical samplers are ignored if used.(default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0)--mirostat-lr N          Mirostat learning rate, parameter eta (default: 0.1)--mirostat-ent N         Mirostat target entropy, parameter tau (default: 5.0)-l TOKEN_ID(+/-)BIAS     modifies the likelihood of token appearing in the completion,i.e. `--logit-bias 15043+1` to increase likelihood of token ' Hello',or `--logit-bias 15043-1` to decrease likelihood of token ' Hello'--cfg-negative-prompt PROMPTnegative prompt to use for guidance (default: '')--cfg-negative-prompt-file FNAMEnegative prompt file to use for guidance--cfg-scale N            strength of guidance (default: 1.0, 1.0 = disable)grammar:--grammar GRAMMAR        BNF-like grammar to constrain generations (see samples in grammars/ dir) (default: '')--grammar-file FNAME     file to read grammar from-j,    --json-schema SCHEMA     JSON schema to constrain generations (https://json-schema.org/), e.g. `{}` for any JSON objectFor schemas w/ external $refs, use --grammar + example/json_schema_to_grammar.py insteadembedding:--pooling {none,mean,cls}pooling type for embeddings, use model default if unspecifiedcontext hacking:--rope-scaling {none,linear,yarn}RoPE frequency scaling method, defaults to linear unless specified by the model--rope-scale N           RoPE context scaling factor, expands context by a factor of N--rope-freq-base N       RoPE base frequency, used by NTK-aware scaling (default: loaded from model)--rope-freq-scale N      RoPE frequency scaling factor, expands context by a factor of 1/N--yarn-orig-ctx N        YaRN: original context size of model (default: 0 = model training context size)--yarn-ext-factor N      YaRN: extrapolation mix factor (default: -1.0, 0.0 = full interpolation)--yarn-attn-factor N     YaRN: scale sqrt(t) or attention magnitude (default: 1.0)--yarn-beta-slow N       YaRN: high correction dim or alpha (default: 1.0)--yarn-beta-fast N       YaRN: low correction dim or beta (default: 32.0)-gan,  --grp-attn-n N           group-attention factor (default: 1)-gaw,  --grp-attn-w N           group-attention width (default: 512.0)-dkvc, --dump-kv-cache          verbose print of the KV cache-nkvo, --no-kv-offload          disable KV offload-ctk,  --cache-type-k TYPE      KV cache data type for K (default: f16)-ctv,  --cache-type-v TYPE      KV cache data type for V (default: f16)perplexity:--all-logits             return logits for all tokens in the batch (default: false)--hellaswag              compute HellaSwag score over random tasks from datafile supplied with -f--hellaswag-tasks N      number of tasks to use when computing the HellaSwag score (default: 400)--winogrande             compute Winogrande score over random tasks from datafile supplied with -f--winogrande-tasks N     number of tasks to use when computing the Winogrande score (default: 0)--multiple-choice        compute multiple choice score over random tasks from datafile supplied with -f--multiple-choice-tasks Nnumber of tasks to use when computing the multiple choice score (default: 0)--kl-divergence          computes KL-divergence to logits provided via --kl-divergence-base--ppl-stride N           stride for perplexity calculation (default: 0)--ppl-output-type {0,1}  output type for perplexity calculation (default: 0)parallel:-dt,   --defrag-thold N         KV cache defragmentation threshold (default: -1.0, < 0 - disabled)-np,   --parallel N             number of parallel sequences to decode (default: 1)-ns,   --sequences N            number of sequences to decode (default: 1)-cb,   --cont-batching          enable continuous batching (a.k.a dynamic batching) (default: enabled)multi-modality:--mmproj FILE            path to a multimodal projector file for LLaVA. see examples/llava/README.md--image FILE             path to an image file. use with multimodal models. Specify multiple times for batchingbackend:--rpc SERVERS            comma separated list of RPC servers--mlock                  force system to keep model in RAM rather than swapping or compressing--no-mmap                do not memory-map model (slower load but may reduce pageouts if not using mlock)--numa TYPE              attempt optimizations that help on some NUMA systems- distribute: spread execution evenly over all nodes- isolate: only spawn threads on CPUs on the node that execution started on- numactl: use the CPU map provided by numactlif run without this previously, it is recommended to drop the system page cache before using thissee https://github.com/ggerganov/llama.cpp/issues/1437model:--check-tensors          check model tensor data for invalid values (default: false)--override-kv KEY=TYPE:VALUEadvanced option to override model metadata by key. may be specified multiple times.types: int, float, bool, str. example: --override-kv tokenizer.ggml.add_bos_token=bool:false--lora FNAME             apply LoRA adapter (implies --no-mmap)--lora-scaled FNAME S    apply LoRA adapter with user defined scaling S (implies --no-mmap)--lora-base FNAME        optional model to use as a base for the layers modified by the LoRA adapter--control-vector FNAME   add a control vector--control-vector-scaled FNAME SCALEadd a control vector with user defined scaling SCALE--control-vector-layer-range START ENDlayer range to apply the control vector(s) to, start and end inclusive-m,    --model FNAME            model path (default: models/$filename with filename from --hf-fileor --model-url if set, otherwise models/7B/ggml-model-f16.gguf)-md,   --model-draft FNAME      draft model for speculative decoding (default: unused)-mu,   --model-url MODEL_URL    model download url (default: unused)-hfr,  --hf-repo REPO           Hugging Face model repository (default: unused)-hff,  --hf-file FILE           Hugging Face model file (default: unused)retrieval:--context-file FNAME     file to load context from (repeat to specify multiple files)--chunk-size N           minimum length of embedded text chunks (default: 64)--chunk-separator STRING separator between chunks (default: '')passkey:--junk N                 number of times to repeat the junk text (default: 250)--pos N                  position of the passkey in the junk text (default: -1)imatrix:-o,    --output FNAME           output file (default: 'imatrix.dat')--output-frequency N     output the imatrix every N iterations (default: 10)--save-frequency N       save an imatrix copy every N iterations (default: 0)--process-output         collect data for the output tensor (default: false)--no-ppl                 do not compute perplexity (default: true)--chunk N                start processing the input from chunk N (default: 0)bench:-pps                            is the prompt shared across parallel sequences (default: false)-npp n0,n1,...                  number of prompt tokens-ntg n0,n1,...                  number of text generation tokens-npl n0,n1,...                  number of parallel promptsserver:--host HOST              ip address to listen (default: 127.0.0.1)--port PORT              port to listen (default: 8080)--path PATH              path to serve static files from (default: )--embedding(s)           enable embedding endpoint (default: disabled)--api-key KEY            API key to use for authentication (default: none)--api-key-file FNAME     path to file containing API keys (default: none)--ssl-key-file FNAME     path to file a PEM-encoded SSL private key--ssl-cert-file FNAME    path to file a PEM-encoded SSL certificate--timeout N              server read/write timeout in seconds (default: 600)--threads-http N         number of threads used to process HTTP requests (default: -1)--system-prompt-file FNAMEset a file to load a system prompt (initial prompt of all slots), this is useful for chat applications--log-format {text,json} log output format: json or text (default: json)--metrics                enable prometheus compatible metrics endpoint (default: disabled)--no-slots               disables slots monitoring endpoint (default: enabled)--slot-save-path PATH    path to save slot kv cache (default: disabled)--chat-template JINJA_TEMPLATEset custom jinja chat template (default: template taken from model's metadata)only commonly used templates are accepted:https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template-sps,  --slot-prompt-similarity SIMILARITYhow much the prompt of a request must match the prompt of a slot in order to use that slot (default: 0.50, 0.0 = disabled)logging:--simple-io              use basic IO for better compatibility in subprocesses and limited consoles-ld,   --logdir LOGDIR          path under which to save YAML logs (no logging if unset)--log-test               Run simple logging test--log-disable            Disable trace logs--log-enable             Enable trace logs--log-file FNAME         Specify a log filename (without extension)--log-new                Create a separate new log file on start. Each log file will have unique name: "<name>.<ID>.log"--log-append             Don't truncate the old log file.

参考文章

(Qwen)通义千问大模型安装部署教程2024最新

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/diannao/25969.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Android.基本用法学习笔记

设置文本的内容 先在strings.xml声明变量 方法1. 方法2. 设置文本的大小 1.单位dp&#xff0c;大家可以去学一下有关的单位换算 2. 设置文本颜色 1. 2. 4.设置文本背景颜色 1. 2. 设置视图的宽高 与上级视图一致&#xff0c;也就是上一级有多宽就有多少 1. 2. 3. 4. 设置视图…

【全网最简单的解决办法】vscode中点击运行出现仅当从 VS 开发人员命令提示符处运行 VS Code 时,cl.exe 生成和调试才可用

首先确保你是否下载好了gcc编译器&#xff01;&#xff01;&#xff01; 检测方法&#xff1a; winR 打开cmd命令窗 输入where gcc(如果出现路径则说明gcc配置好啦&#xff01;) where gcc 然后打开我们的vscode 把这个文件删除掉 再次点击运行代码&#xff0c;第一个出现…

【C语言】C语言—通讯录管理系统(源码)【独一无二】

&#x1f449;博__主&#x1f448;&#xff1a;米码收割机 &#x1f449;技__能&#x1f448;&#xff1a;C/Python语言 &#x1f449;公众号&#x1f448;&#xff1a;测试开发自动化【获取源码商业合作】 &#x1f449;荣__誉&#x1f448;&#xff1a;阿里云博客专家博主、5…

AI日报0610 -- Prompt这样改,AI成本瞬降20%!

全球首届人工智能选美大赛 世界 AI 创作者大赛和创作者平台 FanVue 正在举办首届“Miss AI”大赛 超过 1,500 名 AI 生成的模特竞逐。这些模型不仅形象逼真 还展示了不同的个性和原因。 评委将评估技术和吸引观众的能力。 奖金池高达 20,000 美元&#xff0c;并有机会参加公关…

【python】python化妆品销售logistic逻辑回归预测分析可视化(源码+课程论文+数据集)【独一无二】

&#x1f449;博__主&#x1f448;&#xff1a;米码收割机 &#x1f449;技__能&#x1f448;&#xff1a;C/Python语言 &#x1f449;公众号&#x1f448;&#xff1a;测试开发自动化【获取源码商业合作】 &#x1f449;荣__誉&#x1f448;&#xff1a;阿里云博客专家博主、5…

已解决Error || IndexError: index 3 is out of bounds for axis 0 with size 3

已解决Error || IndexError: index 3 is out of bounds for axis 0 with size 3 原创作者&#xff1a; 猫头虎 作者微信号&#xff1a; Libin9iOak 作者公众号&#xff1a; 猫头虎技术团队 更新日期&#xff1a; 2024年6月6日 博主猫头虎的技术世界 &#x1f31f; 欢迎来…

一文看懂Llama2:原理、模型及训练

#llama Llama2&#xff08;Language Learning and Understanding Model Architecture 2&#xff09;是一个由Meta AI&#xff08;原Facebook AI&#xff09;开发的自然语言处理模型。这款模型的目标是通过深度学习技术来实现高效的自然语言理解和生成。本文将从原理、模型结构…

移动端适配和响应式页面中的常用单位

在移动端适配和响应式页面中&#xff0c;一般采用以下几种单位&#xff1a; 百分比&#xff08;%&#xff09;&#xff1a;百分比单位是相对于父元素的大小计算的。它可以用于设置宽度、高度、字体大小等属性&#xff0c;使得元素能够随着父元素的大小自动调整。百分比单位在响…

基于JavaScript 如何实现爬山算法以及优化方案

前言 爬山算法&#xff08;Hill Climbing Algorithm&#xff09;是一种常见的启发式搜索算法&#xff0c;常用于解决优化问题。其核心思想是从一个初始状态出发&#xff0c;通过逐步选择使目标函数值增大的邻近状态来寻找最优解。接下来&#xff0c;我们将通过 JavaScript 实现…

11. MySQL 备份、恢复

文章目录 【 1. MySQL 备份类型 】【 2. 备份数据库 mysqldump 】2.1 备份单个数据表2.2 备份多个数据库2.3 备份所有数据库2.4 备份文件解析 【 3. 恢复数据库 mysql 】【 4. 导出表数据 OUTFILE 】【 5. 恢复表数据 INFILE 】 问题背景 尽管采取了一些管理措施来保证数据库的…

在CentOS安装rabbitMQ教程

安装 1.官网地址 https://www.rabbitmq.com/download.html2.文件上传 上传到/usr/local/software目录下(如果没有software需要自己创建) 3.安装文件(分别按照以下顺序安装) cd /usr/local/rpm -ivh erlang-21.3-1.el7.x86_64.rpm yum install socat -y rpm -ivh rabbitmq-ser…

VM渗透系统合集(下载链接)

Windows渗透系统 制作不易&#xff0c;恳请师傅们点点关注一键三连&#xff0c;谢谢Ⅰ 目录 Windows渗透系统 1、win10渗透测试全套组件&#xff08;镜像&#xff09; 2、忍者渗透系统 3、悬剑单兵武器库 4、悬剑3.0公益版执法版本 5、ICS基于Win10打造的kali工具集【win版…

33-unittest数据驱动(ddt)

所谓数据驱动&#xff0c;是指利用不同的测试数据来测试相同的场景。为了提高代码的重用性&#xff0c;增加代码效率而采用一种代码编写的方法&#xff0c;叫数据驱动&#xff0c;也就是参数化。达到测试数据和测试业务相分离的效果。 比如登录这个功能&#xff0c;操…

MySQL物理备份

目录 备份策略 全量备份 (Full Backup) 增量备份 (Incremental Backup) 差异备份 (Differential Backup) 使用 Percona XtraBackup 全量备份 步骤 1&#xff1a;全量备份 步骤 2&#xff1a;备份后处理&#xff08;应用日志&#xff09; 步骤 3&#xff1a;恢复备份 验…

大模型基础——从零实现一个Transformer(2)

大模型基础——从零实现一个Transformer(1) 一、引言 上一章主要实现了一下Transformer里面的BPE算法和 Embedding模块定义 本章主要讲一下 Transformer里面的位置编码以及多头注意力 二、位置编码 2.1正弦位置编码(Sinusoidal Position Encoding) 其中&#xff1a; pos&…

持续总结中!2024年面试必问 20 道分布式、微服务面试题(七)

上一篇地址&#xff1a;持续总结中&#xff01;2024年面试必问 20 道分布式、微服务面试题&#xff08;六&#xff09;-CSDN博客 十三、请解释什么是服务网格&#xff08;Service Mesh&#xff09;&#xff1f; 服务网格&#xff08;Service Mesh&#xff09;是一种用于处理服…

线程知识点总结

Java线程是Java并发编程中的核心概念之一&#xff0c;它允许程序同时执行多个任务。以下是关于Java线程的一些关键知识点总结&#xff1a; 1. 线程的创建与启动 继承Thread类&#xff1a;创建一个新的类继承Thread类&#xff0c;并重写其run()方法。通过创建该类的实例并调用st…

TypeScript基础教程学习

菜鸟教程 TypeScript基础类型 数字类型 number 双精度 64 位浮点值。它可以用来表示整数和分数。 let binaryLiteral: number 0b1010; // 二进制 let octalLiteral: number 0o744; // 八进制 let decLiteral: number 6; // 十进制 let hexLiteral: number 0xf00d…

从信号灯到泊车位,ARMxy如何重塑城市交通智能化

城市智能交通系统的高效运行对于缓解交通拥堵、提高出行安全及优化城市管理至关重要。ARMxy工业计算机&#xff0c;作为这一领域内的技术先锋&#xff0c;正以其强大的性能和灵活性&#xff0c;悄然推动着交通管理的智能化升级。 智能信号控制的精细化管理 想象一下&#xff0…

【C语言】11.字符函数和字符串函数

文章目录 1.字符分类函数2.字符转换函数3.strlen的使用和模拟实现4.strcpy的使用和模拟实现5.strcat的使用和模拟实现6.strcmp的使用和模拟实现7.strncpy函数的使用8.strncat函数的使用9.strncmp函数的使用10.strstr的使用和模拟实现11.strtok函数的使用12.strerror函数的使用 …