webassembly003 TTS BARK.CPP

TTS task

  • TTS(Text-to-Speech)任务是一种自然语言处理(NLP)任务,其中模型的目标是将输入的文本转换为声音,实现自动语音合成。具体来说,模型需要理解输入的文本并生成对应的语音输出,使得合成的语音听起来自然而流畅,类似于人类语音的表达方式。

Bark

  • Bark(https://github.com/suno-ai/bark) 是由 Suno 创建的基于转换器的文本到音频模型。Bark 可以生成高度逼真的多语言语音以及其他音频,包括音乐、背景噪音和简单的音效。该模型还可以产生非语言交流,如大笑、叹息和哭泣。为了支持研究社区,我们提供了对预训练模型检查点的访问,这些检查点已准备好进行推理并可用于商业用途。

bark.cpp

  • https://github.com/PABannier/bark.cpp

编译

$mkdir build
$cd build
$cmake ..
-- The C compiler identification is GNU 9.5.0
-- The CXX compiler identification is GNU 9.5.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Linux detected
-- Configuring done
-- Generating done
-- Build files have been written to: /home/pdd/le/bark.cpp/build
$cmake --build . --config Release
[  7%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml.c.o
[ 14%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml-alloc.c.o
[ 21%] Linking C static library libggml.a
[ 21%] Built target ggml
[ 28%] Building CXX object CMakeFiles/bark.cpp.dir/bark.cpp.o
[ 42%] Linking CXX static library libbark.cpp.a
[ 42%] Built target bark.cpp
[ 50%] Building CXX object examples/main/CMakeFiles/main.dir/main.cpp.o
[ 57%] Linking CXX executable ../../bin/main
[ 57%] Built target main
[ 64%] Building CXX object examples/server/CMakeFiles/server.dir/server.cpp.o
[ 71%] Linking CXX executable ../../bin/server
[ 71%] Built target server
[ 78%] Building CXX object examples/quantize/CMakeFiles/quantize.dir/quantize.cpp.o
[ 85%] Linking CXX executable ../../bin/quantize
[ 85%] Built target quantize
[ 92%] Building CXX object tests/CMakeFiles/test-tokenizer.dir/test-tokenizer.cpp.o
[100%] Linking CXX executable ../bin/test-tokenizer
[100%] Built target test-tokenizer

权重下载与转换

$cd ../
# text_2.pt, coarse_2.pt, fine_2.pt,https://dl.fbaipublicfiles.com/encodec/v0/encodec_24khz-d7cc33bc.th
$python3 download_weights.py --download-dir ./models
# convert the model to ggml format
$python3 convert.py   --dir-model ./models --codec-path ./models --vocab-path ./ggml_weights/ --out-dir ./ggml_weights/
$ ls -ahl ./models/
总用量 13G
drwxrwxr-x  2 pdd pdd 4.0K Jan 29 08:22 .
drwxrwxr-x 13 pdd pdd 4.0K Jan 29 06:50 ..
-rwxrwxrwx  1 pdd pdd 3.7G Jan 29 07:34 coarse_2.pt
-rw-rw-r--  1 pdd pdd  89M Jan 29 07:29 encodec_24khz-d7cc33bc.th
-rwxrwxrwx  1 pdd pdd 3.5G Jan 29 07:53 fine_2.pt
-rwxrwxrwx  1 pdd pdd 5.0G Jan 29 07:22 text_2.pt
$ ls -ahl ./ggml_weights/
总用量 4.2G
drwxrwxr-x  2 pdd pdd 4.0K Jan 29 08:34 .
drwxrwxr-x 13 pdd pdd 4.0K Jan 29 06:50 ..
-rw-rw-r--  1 pdd pdd 1.3M Jan 29 08:33 ggml_vocab.bin
-rw-rw-r--  1 pdd pdd 1.3G Jan 29 08:34 ggml_weights_coarse.bin
-rw-rw-r--  1 pdd pdd  45M Jan 29 08:34 ggml_weights_codec.bin
-rw-rw-r--  1 pdd pdd 1.2G Jan 29 08:34 ggml_weights_fine.bin
-rw-rw-r--  1 pdd pdd 1.7G Jan 29 08:33 ggml_weights_text.bin
-rw-rw-r--  1 pdd pdd 973K Jan 29 05:23 vocab.txt
$ ./main -m ./ggml_weights/ -p "this is an audio"

运行

$ ./build/bin/main -h
usage: ./build/bin/main [options]options:-h, --help            show this help message and exit-t N, --threads N     number of threads to use during computation (default: 4)-s N, --seed N        seed for random number generator (default: 0)-p PROMPT, --prompt PROMPTprompt to start generation with (default: random)-m FNAME, --model FNAMEmodel path (default: /home/pdd/le/bark.cpp/ggml_weights)-o FNAME, --outwav FNAMEoutput generated wav (default: output.wav)
$ ./build/bin/main -m ./ggml_weights/ -p "this is an audio"
bark_load_model_from_file: loading model from './ggml_weights/'
bark_load_model_from_file: reading bark text model
gpt_model_load: n_in_vocab  = 129600
gpt_model_load: n_out_vocab = 10048
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 1
gpt_model_load: n_wtes      = 1
gpt_model_load: ftype       = 0
gpt_model_load: qntvr       = 0
gpt_model_load: ggml tensor size = 304 bytes
gpt_model_load: ggml ctx size = 1894.87 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1701.69 MB
bark_load_model_from_file: reading bark vocabbark_load_model_from_file: reading bark coarse model
gpt_model_load: n_in_vocab  = 12096
gpt_model_load: n_out_vocab = 12096
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 1
gpt_model_load: n_wtes      = 1
gpt_model_load: ftype       = 0
gpt_model_load: qntvr       = 0
gpt_model_load: ggml tensor size = 304 bytes
gpt_model_load: ggml ctx size = 1443.87 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1250.69 MBbark_load_model_from_file: reading bark fine model
gpt_model_load: n_in_vocab  = 1056
gpt_model_load: n_out_vocab = 1056
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 7
gpt_model_load: n_wtes      = 8
gpt_model_load: ftype       = 0
gpt_model_load: qntvr       = 0
gpt_model_load: ggml tensor size = 304 bytes
gpt_model_load: ggml ctx size = 1411.25 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1218.26 MBbark_load_model_from_file: reading bark codec model
encodec_model_load: model size    =   44.32 MBbark_load_model_from_file: total model size  =  4170.64 MBbark_tokenize_input: prompt: 'this is an audio'
bark_tokenize_input: number of tokens in prompt = 513, first 8 tokens: 20579 20172 20199 33733 129595 129595 129595 129595 
bark_forward_text_encoder: ...........................................................................................................bark_print_statistics: mem per token =     4.81 MB
bark_print_statistics:   sample time =    16.03 ms / 109 tokens
bark_print_statistics:  predict time =  9644.73 ms / 87.68 ms per token
bark_print_statistics:    total time =  9663.29 msbark_forward_coarse_encoder: ...................................................................................................................................................................................................................................................................................................................................bark_print_statistics: mem per token =     8.53 MB
bark_print_statistics:   sample time =     4.43 ms / 324 tokens
bark_print_statistics:  predict time = 52071.64 ms / 160.22 ms per token
bark_print_statistics:    total time = 52080.24 msggml_new_object: not enough space in the context's memory pool (needed 4115076720, available 4112941056)
段错误 (核心已转储)
  • 一开始以为是内存不足,去增加了虚拟内存,但仍然报错
$ sudo dd if=/dev/zero of=swapfile bs=1024 count=10000000
记录了10000000+0 的读入
记录了10000000+0 的写出
10240000000字节(10 GB,9.5 GiB)已复制,55.3595 s,185 MB/s
$ sudo chmod 600 ./swapfile  # delete the swapfile if you dont need it
$ sudo mkswap -f ./swapfile
正在设置交换空间版本 1,大小 = 9.5 GiB (10239995904  个字节)
无标签, UUID=f3e2a0be-b880-48da-b598-950b7d69f94f
$ sudo swapon ./swapfile
$ free -mtotal        used        free      shared  buff/cache   available
内存:      15731        6441         307        1242        8982        7713
交换:      11813        2047        9765$ ./build/bin/main -m ./ggml_weights/ -p "this is an audio"
ggml_new_object: not enough space in the context's memory pool (needed 4115076720, available 4112941056)
  • 去看了报错的函数,应该不是内存的原因
static struct ggml_object * ggml_new_object(struct ggml_context * ctx, enum ggml_object_type type, size_t size) {// always insert objects at the end of the context's memory poolstruct ggml_object * obj_cur = ctx->objects_end;const size_t cur_offs = obj_cur == NULL ? 0 : obj_cur->offs;const size_t cur_size = obj_cur == NULL ? 0 : obj_cur->size;const size_t cur_end  = cur_offs + cur_size;// align to GGML_MEM_ALIGNsize_t size_needed = GGML_PAD(size, GGML_MEM_ALIGN);char * const mem_buffer = ctx->mem_buffer;struct ggml_object * const obj_new = (struct ggml_object *)(mem_buffer + cur_end);if (cur_end + size_needed + GGML_OBJECT_SIZE > ctx->mem_size) {GGML_PRINT("%s: not enough space in the context's memory pool (needed %zu, available %zu)\n",__func__, cur_end + size_needed, ctx->mem_size);assert(false);return NULL;}*obj_new = (struct ggml_object) {.offs = cur_end + GGML_OBJECT_SIZE,.size = size_needed,.next = NULL,.type = type,};ggml_assert_aligned(mem_buffer + obj_new->offs);if (obj_cur != NULL) {obj_cur->next = obj_new;} else {// this is the first object in this contextctx->objects_begin = obj_new;}ctx->objects_end = obj_new;//printf("%s: inserted new object at %zu, size = %zu\n", __func__, cur_end, obj_new->size);return obj_new;
}
  • 然后找到了https://github.com/PABannier/bark.cpp/issues/122
    在这里插入图片描述
$ cd bark.cpp/
$ git checkout -f 07e651618b3a8a27de3bfa7f733cdb0aa8f46b8a
HEAD 目前位于 07e6516 ENH Decorrelate fine GPT graph (#111)
  • 运行成功
/home/pdd/le/bark.cpp/cmake-build-debug/bin/main
bark_load_model_from_file: loading model from '/home/pdd/le/bark.cpp/ggml_weights'
bark_load_model_from_file: reading bark text model
gpt_model_load: n_in_vocab  = 129600
gpt_model_load: n_out_vocab = 10048
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 1
gpt_model_load: n_wtes      = 1
gpt_model_load: ftype       = 0
gpt_model_load: qntvr       = 0
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1894.87 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1701.69 MB
bark_load_model_from_file: reading bark vocabbark_load_model_from_file: reading bark coarse model
gpt_model_load: n_in_vocab  = 12096
gpt_model_load: n_out_vocab = 12096
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 1
gpt_model_load: n_wtes      = 1
gpt_model_load: ftype       = 0
gpt_model_load: qntvr       = 0
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1443.87 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1250.69 MBbark_load_model_from_file: reading bark fine model
gpt_model_load: n_in_vocab  = 1056
gpt_model_load: n_out_vocab = 1056
gpt_model_load: block_size  = 1024
gpt_model_load: n_embd      = 1024
gpt_model_load: n_head      = 16
gpt_model_load: n_layer     = 24
gpt_model_load: n_lm_heads  = 7
gpt_model_load: n_wtes      = 8
gpt_model_load: ftype       = 0
gpt_model_load: qntvr       = 0
gpt_model_load: ggml tensor size = 272 bytes
gpt_model_load: ggml ctx size = 1411.25 MB
gpt_model_load: memory size =   192.00 MB, n_mem = 24576
gpt_model_load: model size  =  1218.26 MBbark_load_model_from_file: reading bark codec modelbark_load_model_from_file: total model size  =  4170.64 MBbark_tokenize_input: prompt: 'this is an audio'
bark_tokenize_input: number of tokens in prompt = 513, first 8 tokens: 20579 20172 20199 33733 129595 129595 129595 129595 
encodec_model_load: model size    =   44.32 MB
bark_forward_text_encoder: ...........................................................................................................bark_print_statistics: mem per token =     4.80 MB
bark_print_statistics:   sample time =    59.49 ms / 109 tokens
bark_print_statistics:  predict time = 24761.95 ms / 225.11 ms per token
bark_print_statistics:    total time = 24826.76 msbark_forward_coarse_encoder: ...................................................................................................................................................................................................................................................................................................................................bark_print_statistics: mem per token =     8.51 MB
bark_print_statistics:   sample time =    19.74 ms / 324 tokens
bark_print_statistics:  predict time = 178366.69 ms / 548.82 ms per token
bark_print_statistics:    total time = 178396.22 msbark_forward_fine_encoder: .....bark_print_statistics: mem per token =     0.66 MB
bark_print_statistics:   sample time =   304.20 ms / 6144 tokens
bark_print_statistics:  predict time = 407086.19 ms / 58155.17 ms per token
bark_print_statistics:    total time = 407399.91 msbark_forward_encodec: mem per token = 760209 bytes
bark_forward_encodec:  predict time =  4349.03 ms
bark_forward_encodec:    total time =  4349.07 msNumber of frames written = 51840.main:     load time = 11441.58 ms
main:     eval time = 614987.69 ms
main:    total time = 626429.31 msProcess finished with exit code 0

CG

  • 科大讯飞 语义理解 AIUI封装
  • https://github.com/iboB/pytorch-ggml-plugin

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/656389.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

c++学习记录 多态—案例2—电脑组装

#include<iostream> using namespace std;//抽象不同的零件//抽象的cpu类 class Cpu { public://抽象的计算函数virtual void calculate() 0; };//抽象的显卡类 class VideoCard { public://抽象的显示函数virtual void display() 0; };//抽象的内存条类 class Memory …

Nginx启用WebSocket支持

报错内容nginx.conf proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; 问题解决WebSocket跨域 add_header Access-Control-Allow-Origin *; add_header Access-Control-Allow-Credentials true;

常用芯片学习——AMS1117芯片

AMS1117 1A 低压差线性稳压器 使用说明 AMS1117 是一款低压差线性稳压电路&#xff0c;该电路输出电流能力为1A。该系列电路包含固定输出电压版本和可调输出电压版本&#xff0c;其输出电压精度为士1.5%。为了保证芯片和电源系统的稳定性&#xff0c;XBLWAMS1117 内置热保护和…

对鸢尾花进行分类预测-----pycharm

项目说明 #项目&#xff1a; 对鸢尾花进行分类预测 #实例数量150个(3类各50个) #属性数量&#xff1a;4(数值型&#xff0c;数值型&#xff0c;帮助预测的属性和类) #特征&#xff1a;花萼长度&#xff0c;花萼宽度&#xff0c;花瓣长度&#xff0c;花瓣宽度 单位&#xff1…

什么是事务?

事务 是一组操作的集合&#xff0c;它是一个不可分割的工作单位。事务会把所有的操作作为一个整体&#xff0c;一起向数据库提交或者是撤销操作请求。所以这组操作要么同时成功&#xff0c;要么同时失败。 1. 事务管理 怎么样来控制这组操作&#xff0c;让这组操作同时成功或…

机器学习 强化学习 深度学习的区别与联系

机器学习 强化学习 深度学习 机器学习 按道理来说&#xff0c; 这个领域&#xff08;机器学习&#xff09;应该叫做 统计学习 &#xff08;Statistical Learning&#xff09;&#xff0c;因为它的方法都是由概率统计领域拿来的。这些人中的领军人物很有商业头脑&#xff0c; 把…

亚马逊测评:卖家如何操作测评,安全高效(自养号测评)

亚马逊测评的作用在于让用户更真实、清晰、快捷地了解产品以及产品的使用方法和体验。通过买家对产品的测评&#xff0c;也可以帮助厂商和卖家优化产品缺陷&#xff0c;提高用户的使用体验。这进而帮助他们获得更好的销量&#xff0c;并更深入地了解市场需求。亚马逊测评在满足…

【获奖必看2.0】美赛小技巧之一秒输入一个公式

大家好呀&#xff0c;美赛开赛还有四天的时间&#xff0c;今天给大家带来的是美赛论文写作时非常实用的一个小技巧——快速输入任何复杂公式。 相信很多小伙伴在论文写作的时候都有一个小烦恼&#xff0c;那就是在面对比较复杂的公式的时候&#xff0c;应该怎么进行快速输入呢…

vue3项目中让echarts适应div的大小变化,跟随div的大小改变图表大小

目录如下 我的项目环境如下利用element-resize-detector插件监听元素大小变化element-resize-detector插件的用法完整代码如下&#xff1a;结果如下 在做项目的时候&#xff0c;经常会使用到echarts&#xff0c;特别是在做一些大屏项目的时候。有时候我们是需要根据div的大小改…

【Three.js】Layers图层的使用

目录 前言 创建图层对象 启用图层 关闭图层 其他 前言 Layers 对象为Object3D对象分配了1-32个图层&#xff0c;编号为0-31。在内部实现上&#xff0c;每个图层对象被存储为一个bit mask&#xff0c; 默认所有 Object3D 对象都存储在第 0 个图层上。 图层对象可以用于控制…

美国将限制中国,使用Azure、AWS等云,训练AI大模型

1月29日&#xff0c;美国商务部在Federal Register&#xff08;联邦公报&#xff09;正式公布了&#xff0c;《采取额外措施应对与重大恶意网络行为相关的国家紧急状态》提案。 该提案明确要求美国IaaS&#xff08;云服务&#xff09;厂商在提供云服务时&#xff0c;要验证外国…

深度强化学习(王树森)笔记09

深度强化学习&#xff08;DRL&#xff09; 本文是学习笔记&#xff0c;如有侵权&#xff0c;请联系删除。本文在ChatGPT辅助下完成。 参考链接 Deep Reinforcement Learning官方链接&#xff1a;https://github.com/wangshusen/DRL 源代码链接&#xff1a;https://github.c…

【数据结构:顺序表】

文章目录 线性表顺序表1.1 顺序表结构的定义1.2 初始化顺序表1.3 检查顺序表空间1.4 打印1.5 尾插1.6 头插1.7 尾删1.8 头删1.9 查找1.10 指定位置插入1.11 删除指定位置数据1.12 销毁顺序表 数据结构(Data Structure)是计算机存储、组织数据的方式&#xff0c;指相互之间存在一…

2024Cypress自动化测试开发指南!

cypress是基于JavaScript语言为编写语言的自动化测试开发工具&#xff0c;配合使用cucumber测试开发框架&#xff0c;以node.js为服务进程&#xff0c;可以简单的帮助测试人员完成需要人工手点的所有页面人机交互操作&#xff0c;可以模拟键盘和鼠标输入&#xff0c;快捷完成ca…

使用串口WiFi透传模块需要解决的几个问题,2串口双串口,3串口多串口转WiFi模块S2W-M02

我们知道在现在物联网时代&#xff0c;串口设备通过WiFi联网上传数据已经有很多的场景需求。但是&#xff0c;现在市面上的大部分串口转WiFi模块都仅仅支持一个串口的数据透传应用。 如果串口转WiFi模块仅仅有一个串口资源进行透传&#xff0c;那么它的应用场景是如下的&#x…

问卷发放实战指南:提高问卷回收率与数据质量的技巧

进行问卷调查分为四步&#xff1a;制作问卷、发放问卷、收集问卷、分析问卷。其中&#xff0c;发放问卷起到了关键性的作用。他关乎到我们后续收集问卷是否顺利&#xff0c;收集到的问卷数据是否具备真实性和有效性。那么&#xff0c;怎么有效地进行问卷发放呢&#xff1f; ​…

【C项目】顺序表

简介&#xff1a;本系列博客为C项目系列内容&#xff0c;通过代码来具体实现某个经典简单项目 适宜人群&#xff1a;已大体了解C语法同学 作者留言&#xff1a;本博客相关内容如需转载请注明出处&#xff0c;本人学疏才浅&#xff0c;难免存在些许错误&#xff0c;望留言指正 作…

网络战新高度!俄罗斯280台服务器被摧毁,200万GB数据丢失

Hackread网站消息&#xff0c;乌克兰国防部主要情报总局&#xff08;HUR&#xff09;的网络安全专家宣称对俄罗斯IPL咨询公司发起了一次成功的网络攻击&#xff0c;摧毁了该公司所有的IT基础设施&#xff0c;导致全国通信中断。 乌克兰HUR在Facebook上的发布公告表示&#xff0…

nginx负载均衡案例

大家好今天给大家带来nginx负载均衡实验案例,首大家先看一下我的各类版本信息。&#xff08;还有两台设备信息相同就不展示了&#xff09; 一&#xff0c;搭建nginx环境 ❶首先创建Nginx的目录并进入&#xff1a; [rootlocalhost]# mkdir /soft && mkdir /soft/nginx…

电气自动化行业,全面数字化工作流程

电气自动化行业数字化转型所需流程软件&#xff0c;与大家分享如下&#xff1a; D-Hub企业数字化协同平台、SuperHarness数字线束软件、SuperPanel母排设计软件、D-Hub生产管理系统&#xff0c;全面的数字化工作流程&#xff0c;智能降本增效&#xff01; D-Hub D-Hub是一款…