离线语音识别 sherpa-ncnn 尝鲜体验

文章目录

    • 1、ubuntu 编译运行
      • 依赖安装
      • 下载与编译
      • 模型下载
      • 运行
    • 2、树莓派 4B 编译运行
      • 确认树莓派 4B 环境
      • 交叉编译
      • 交叉编译
      • 模型下载与运行
      • 模型对比测试
      • 树莓派 4B 运行大模型

Sherpa-NCNN 是一个基于 C++ 的轻量级神经网络推理框架,是 kaldi 下的一个子项目,它专门针对移动设备和嵌入式系统进行了优化。 Sherpa-NCNN 的目标是提供高性能、低延迟的推理能力,适用于移动设备和嵌入式系统,可以以满足实时应用需求。仓库地址: https://github.com/k2-fsa/sherpa-ncnn

主要功能:语音识别、流式语音识别。即边说话,边识别。不需要访问网络,不需要数据传输,完全本地识别。

识别效果:识别速度很快,效果比较好,但是只支持wav格式的音频,其他格式的需要转换后才能识别。

官方使用说明:https://k2-fsa.github.io/sherpa/ncnn/index.html

1、ubuntu 编译运行

依赖安装

$ sudo apt-get install -y git cmake

下载与编译

$ git clone https://github.com/k2-fsa/sherpa-ncnn
$ cd sherpa-ncnn
$ mkdir build && cd build
$ cmake -DCMAKE_BUILD_TYPE=Release ..
$ make -j8

编译脚本会自动下载编译需要的相关文件,编译完成后可执行文件在 build/bin 目录下。

$ cd bin
$ ll
total 36M
-rwxrwxr-x 1 cys cys 7.2M Dec 31 08:26 decode-file-c-api
-rwxrwxr-x 1 cys cys 7.1M Dec 31 08:26 generate-int8-scale-table
-rwxrwxr-x 1 cys cys 7.2M Dec 31 08:26 sherpa-ncnn
-rwxrwxr-x 1 cys cys 7.2M Dec 31 08:26 sherpa-ncnn-alsa
-rwxrwxr-x 1 cys cys 7.3M Dec 31 08:26 sherpa-ncnn-microphone

其中:

  • sherpa-ncnn:用于识别单个 wav 文件
  • sherpa-ncnn-microphone:用于识别麦克风的实时语音

模型下载

sherpa-ncnn 已经提供了一些预先训练好的模型可以下载,可在 https://k2-fsa.github.io/sherpa/ncnn/pretrained_models/index.html 页面下查看。有专门的小模型可应用于 Raspberry Pi 4 之类的嵌入式板卡上,在 PC 上可以选择大一点的模型,做一些对比,选择了 csukuangfj/sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13 (Bilingual, Chinese + English) 这个模型在 PC 上运行。

$ wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13.tar.bz2
$ tar xvf sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13.tar.bz2

运行

  1. 识别单个 wav 文件

模型文件自带了 5 个 wav 文件,在模型的 test_wavs 目录下。在 sherpa-ncnn 目录下执行以下命令:

$ for method in greedy_search modified_beam_search; do./build/bin/sherpa-ncnn \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav \2 \$method
done

运行成功后会显示如下日志:

$ RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav
wav duration (s): 5.1
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav
text: 这是第一种第二种叫呃与 ALWAYS ALWAYS什么意思啊
timestamps: 0.96 1.04 1.28 1.4 1.48 1.72 1.84 2.04 2.44 3.24 3.6 3.84 4.36 4.72 4.76 4.92 5.04 5.08 
Elapsed seconds: 0.726 s
Real time factor (RTF): 0.726 / 5.100 = 0.142
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="modified_beam_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav
wav duration (s): 5.1
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav
text: 这是第一种第二种叫呃与 ALWAYS ALWAYS什么意思啊
timestamps: 0.96 1.04 1.28 1.4 1.48 1.72 1.84 2.04 2.44 3.2 3.52 3.84 4.36 4.72 4.76 4.92 5.04 5.08 
Elapsed seconds: 1.172 s
Real time factor (RTF): 1.172 / 5.100 = 0.230

与 1.wav 文件播放比较,识别的文字 100% 一致。波形文件长度为 5.2s 识别花了大约 0.726 s/1.172 s

$ for method in greedy_search modified_beam_search; do./build/bin/sherpa-ncnn \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/0.wav \2 \$method
doneRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/0.wav
wav duration (s): 10.0531
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/0.wav
text: 昨天是 MONDAY TODAY IS LIBR THE DAY AFTER TOMORROW是星期三
timestamps: 0.64 1.04 1.6 2.12 2.2 2.44 4.2 4.4 5.12 5.56 5.8 6.16 6.84 7.12 7.44 8.04 8.16 8.24 8.28 9.08 9.4 9.64 9.88 
Elapsed seconds: 1.440 s
Real time factor (RTF): 1.440 / 10.053 = 0.143
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="modified_beam_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/0.wav
wav duration (s): 10.0531
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/0.wav
text: 昨天是 MONDAY TODAY IS礼拜二 THE DAY AFTER TOMORROW是星期三
timestamps: 0.64 1.04 1.6 2.12 2.2 2.44 4.2 4.4 4.96 5.56 5.8 6 6.84 7.12 7.44 8.04 8.16 8.24 8.28 9.08 9.4 9.64 9.88 
Elapsed seconds: 1.955 s
Real time factor (RTF): 1.955 / 10.053 = 0.194

波形文件长度为 10.05s 识别花了大约 1.440 s/1.955 s

  1. 识别麦克风实时语音

在 sherpa-ncnn 目录下执行以下命令:

$ ./build/bin/sherpa-ncnn-microphone \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin \2 \greedy_search

在日志中提示 started 之后,对着 PC 的麦克风说话,日志中会实时显示对应的文字:

$ ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
xcb_connection_has_error() returned true
xcb_connection_has_error() returned true
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
xcb_connection_has_error() returned true
xcb_connection_has_error() returned true
Num devices: 11
Use default device: 10Name: defaultMax input channels: 32
xcb_connection_has_error() returned true
Started
0:轻轻的我走了正如我轻轻的来
1:我轻轻的招手

效果还是相当不错的。

2、树莓派 4B 编译运行

确认树莓派 4B 环境

在树莓派终端上运行以下命令,确认当前操作系统是 32 位还是 64 位

$ getconf LONG_BIT 
32

当前安装的是32位操作系统。

可参考说明 https://k2-fsa.github.io/sherpa/ncnn/install/arm-embedded-linux.html 完成交叉编译

交叉编译

交叉编译器可以使我们在主机上编译出可以在嵌入式设备上运行的程序

下载 gcc-arm-8.3-2019.03-x86_64-arm-linux-gnueabihf.tar.xz 并安装

$ wget https://media.githubusercontent.com/media/Vicent1992/prebuilts/master/gcc/arm/gcc-arm-8.3-2019.03-x86_64-arm-linux-gnueabihf.tar.xz
$ tar xvf gcc-arm-8.3-2019.03-x86_64-arm-linux-gnueabihf.tar.xz -C /opt
$ export PATH=/opt/gcc-arm-8.3-2019.03-x86_64-arm-linux-gnueabihf/bin:$PATH
$ arm-linux-gnueabihf-gcc --version
arm-linux-gnueabihf-gcc (GNU Toolchain for the A-profile Architecture 8.3-2019.03 (arm-rel-8.36)) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

交叉编译

$ sudo apt-get install -y libtool
$ git clone https://github.com/k2-fsa/sherpa-ncnn
$ cd sherpa-ncnn
$ ./build-arm-linux-gnueabihf.sh

编译完成后可执行文件位于 build-arm-linux-gnueabihf/install/bin 目录,将这 2 个文件复制至树莓派 4b 内。

$ ll build-arm-linux-gnueabihf/install/bin
total 4.6M
-rwxr-xr-x 1 cys cys 2.3M Dec 31 10:40 sherpa-ncnn
-rwxr-xr-x 1 cys cys 2.3M Dec 31 10:40 sherpa-ncnn-alsa

模型下载与运行

  1. 下载
    在树莓派 4b 上下载模式,可在 https://k2-fsa.github.io/sherpa/ncnn/pretrained_models/small-models.html 页面下载对应模型。
$ wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23.tar.bz2
$ tar xvf sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23.tar.bz2
  1. 运行

在终端中运行以下命令:

$ for method in greedy_search modified_beam_search; do./sherpa-ncnn \./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/tokens.txt \./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/encoder_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/encoder_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/decoder_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/decoder_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/joiner_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/joiner_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/test_wavs/0.wav \2 \$method
done

运行日志如下:

RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/test_wavs/0.wav
wav duration (s): 5.6115
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/test_wavs/0.wav
text: 对我做了介绍那么我想说的是大家如果对我的研究感兴趣
timestamps: 0.32 0.64 0.76 0.96 1.08 1.16 1.96 2.04 2.24 2.36 2.56 2.68 2.8 3.36 3.52 3.64 3.72 3.84 3.92 4 4.08 4.24 4.48 4.56 4.72 
Elapsed seconds: 1.092 s
Real time factor (RTF): 1.092 / 5.611 = 0.195
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="modified_beam_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/test_wavs/0.wav
wav duration (s): 5.6115
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23/test_wavs/0.wav
text: 对我做了介绍那么我想说的是大家如果对我的研究感兴趣
timestamps: 0.32 0.64 0.72 0.96 1.08 1.16 1.96 2.04 2.24 2.36 2.56 2.68 2.8 3.36 3.52 3.64 3.72 3.84 3.92 4 4.08 4.24 4.48 4.56 4.72 
Elapsed seconds: 1.596 s

模型对比测试

$ wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-streaming-zipformer-20M-2023-02-17.tar.bz2
$ tar xvf sherpa-ncnn-streaming-zipformer-20M-2023-02-17.tar.bz2

运行:

$ for method in greedy_search modified_beam_search; do./sherpa-ncnn \./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/tokens.txt \./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/encoder_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/encoder_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/decoder_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/decoder_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/joiner_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/joiner_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/test_wavs/0.wav \2 \$method
doneRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/test_wavs/0.wav
wav duration (s): 6.625
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/test_wavs/0.wav
text:  AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BRAFFLELS
timestamps: 0.68 1 1.16 1.2 1.48 1.72 1.76 1.92 2 2.16 2.28 2.36 2.52 2.64 2.68 2.76 2.92 3.08 3.4 3.6 3.72 3.88 4.12 4.48 4.64 4.68 4.84 4.96 5.16 5.2 5.32 5.36 5.6 5.72 5.92 5.96 6.08 6.24 6.36 6.52 
Elapsed seconds: 1.892 s
Real time factor (RTF): 1.892 / 6.625 = 0.286
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="modified_beam_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/test_wavs/0.wav
wav duration (s): 6.625
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-20M-2023-02-17/test_wavs/0.wav
text:  AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BRAFFLELS
timestamps: 0.68 1 1.16 1.2 1.48 1.72 1.8 1.92 2 2.16 2.28 2.36 2.52 2.64 2.68 2.76 2.92 3.08 3.4 3.6 3.72 3.88 4.12 4.48 4.64 4.68 4.84 4.96 5.16 5.2 5.32 5.36 5.6 5.72 5.92 5.96 6.08 6.24 6.36 6.52 
Elapsed seconds: 2.051 s
Real time factor (RTF): 2.051 / 6.625 = 0.310
$ wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-conv-emformer-transducer-small-2023-01-09.tar.bz2
$ tar xvf sherpa-ncnn-conv-emformer-transducer-small-2023-01-09.tar.bz2

运行

$ ./sherpa-ncnn \./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/tokens.txt \./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/encoder_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/encoder_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/decoder_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/decoder_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/joiner_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/joiner_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/test_wavs/1089-134686-0001.wavRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/tokens.txt", encoder num_threads=4, decoder num_threads=4, joiner num_threads=4), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/test_wavs/1089-134686-0001.wav
wav duration (s): 6.625
Started!
Done!
Recognition result for ./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/test_wavs/1089-134686-0001.wav
text:  AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BRAFFLES
timestamps: 0.32 0.92 1 1.12 1.4 1.56 1.6 1.76 1.96 2.08 2.2 2.24 2.4 2.52 2.56 2.64 2.8 3 3.24 3.48 3.6 3.72 3.92 4.36 4.48 4.52 4.76 4.84 5 5.04 5.16 5.2 5.44 5.6 5.8 5.84 5.96 6.04 6.24 
Elapsed seconds: 0.951 s
Real time factor (RTF): 0.951 / 6.625 = 0.144

使用 int8 量化运行:

$ /sherpa-ncnn \./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/tokens.txt \./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/encoder_jit_trace-pnnx.ncnn.int8.param \./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/encoder_jit_trace-pnnx.ncnn.int8.bin \./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/decoder_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/decoder_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/joiner_jit_trace-pnnx.ncnn.int8.param \./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/joiner_jit_trace-pnnx.ncnn.int8.bin \./sherpa-ncnn-conv-emformer-transducer-small-2023-01-09/test_wavs/1089-134686-0001.wav
$ wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-lstm-transducer-small-2023-02-13.tar.bz2
$ tar xvf sherpa-ncnn-lstm-transducer-small-2023-02-13.tar.bz2

运行:

./sherpa-ncnn \./sherpa-ncnn-lstm-transducer-small-2023-02-13/tokens.txt \./sherpa-ncnn-lstm-transducer-small-2023-02-13/encoder_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-lstm-transducer-small-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-lstm-transducer-small-2023-02-13/decoder_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-lstm-transducer-small-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-lstm-transducer-small-2023-02-13/joiner_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-lstm-transducer-small-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-lstm-transducer-small-2023-02-13/test_wavs/0.wavRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-lstm-transducer-small-2023-02-13/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-lstm-transducer-small-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-lstm-transducer-small-2023-02-13/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-lstm-transducer-small-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-lstm-transducer-small-2023-02-13/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-lstm-transducer-small-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-lstm-transducer-small-2023-02-13/tokens.txt", encoder num_threads=4, decoder num_threads=4, joiner num_threads=4), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-lstm-transducer-small-2023-02-13/test_wavs/0.wav
wav duration (s): 16.73
Started!
Done!
Recognition result for ./sherpa-ncnn-lstm-transducer-small-2023-02-13/test_wavs/0.wav
text:  IF UP它也是一个它这里是一个介词那么它后面加的是什么呢刚才老师说的它加了是什么呢对吧它也是加 I N G对吧它为什么也就 I N G呢
timestamps: 4.04 4.36 4.64 4.84 4.96 5.08 5.16 5.72 5.84 5.96 6.2 6.32 6.4 6.6 6.68 6.84 6.92 7 7.08 7.2 7.36 7.44 7.52 7.6 7.64 7.8 7.96 8.04 8.16 8.28 8.4 8.48 8.64 8.76 8.84 8.96 9.04 9.08 9.12 14.2 14.24 14.36 14.44 14.56 14.72 14.92 14.96 15.04 15.2 15.28 15.6 15.72 15.84 15.88 16 16.16 16.36 16.4 16.44 16.52 
Elapsed seconds: 3.441 s
Real time factor (RTF): 3.441 / 16.730 = 0.206
$ wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16.tar.bz2
$ tar xvf sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16.tar.bz2

运行:

$ for method in greedy_search modified_beam_search; do./sherpa-ncnn \./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/tokens.txt \./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/encoder_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/encoder_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/decoder_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/decoder_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/joiner_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/joiner_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/test_wavs/1.wav \2 \$method
doneRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/test_wavs/1.wav
wav duration (s): 5.1
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/test_wavs/1.wav
text: 这是第一种第二种叫呃与 ALWAYS什么意思啊
timestamps: 0.96 1.08 1.28 1.4 1.52 1.76 1.84 2.08 2.56 3.04 3.6 3.84 4.68 4.76 4.92 5.04 5.08 
Elapsed seconds: 1.638 s
Real time factor (RTF): 1.638 / 5.100 = 0.321
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="modified_beam_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/test_wavs/1.wav
wav duration (s): 5.1
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-small-bilingual-zh-en-2023-02-16/test_wavs/1.wav
text: 这是第一种第二种叫呃与 ALWAYS什么意思啊
timestamps: 0.96 1.08 1.28 1.4 1.52 1.76 1.84 2.08 2.56 3.04 3.6 3.84 4.68 4.76 4.92 5.04 5.08 
Elapsed seconds: 2.411 s
Real time factor (RTF): 2.411 / 5.100 = 0.473

树莓派 4B 运行大模型

  1. 下载模型
$ wget https://github.com/k2-fsa/sherpa-ncnn/releases/download/models/sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13.tar.bz2
$ tar xvf sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13.tar.bz2
  1. 运行
$ for method in greedy_search modified_beam_search; do./sherpa-ncnn \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin \./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav \2 \$method
doneRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="greedy_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav
wav duration (s): 5.1
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav
text: 这是第一种第二种叫呃与 ALWAYS ALWAYS什么意思啊
timestamps: 0.96 1.04 1.28 1.4 1.48 1.72 1.84 2.04 2.44 3.24 3.6 3.84 4.36 4.72 4.76 4.92 5.04 5.08 
Elapsed seconds: 4.076 s
Real time factor (RTF): 4.076 / 5.100 = 0.799
RecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80), model_config=ModelConfig(encoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.param", encoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/encoder_jit_trace-pnnx.ncnn.bin", decoder_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.param", decoder_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/decoder_jit_trace-pnnx.ncnn.bin", joiner_param="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.param", joiner_bin="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/joiner_jit_trace-pnnx.ncnn.bin", tokens="./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/tokens.txt", encoder num_threads=2, decoder num_threads=2, joiner num_threads=2), decoder_config=DecoderConfig(method="modified_beam_search", num_active_paths=4), endpoint_config=EndpointConfig(rule1=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=2.4, min_utterance_length=0), rule2=EndpointRule(must_contain_nonsilence=True, min_trailing_silence=1.4, min_utterance_length=0), rule3=EndpointRule(must_contain_nonsilence=False, min_trailing_silence=0, min_utterance_length=20)), enable_endpoint=False, hotwords_file="", hotwrods_score=1.5)
wav filename: ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav
wav duration (s): 5.1
Started!
Done!
Recognition result for ./sherpa-ncnn-streaming-zipformer-bilingual-zh-en-2023-02-13/test_wavs/1.wav
text: 这是第一种第二种叫呃与 ALWAYS ALWAYS什么意思啊
timestamps: 0.96 1.04 1.28 1.4 1.48 1.72 1.84 2.04 2.44 3.2 3.52 3.84 4.36 4.72 4.76 4.92 5.04 5.08 
Elapsed seconds: 4.935 s
Real time factor (RTF): 4.935 / 5.100 = 0.968

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/588494.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Python高级并发编程的实例详解

更多Python学习内容:ipengtao.com Python中的高效并发编程,有几个重要的概念和工具可以帮助大家充分利用多核处理器和提高程序性能。本文将介绍一些关键的概念和示例代码,以帮助大家更好地理解Python中的高效并发编程。 多线程 vs. 多进程 在…

计算机网络【HTTP 面试题】

HTTP的请求报文结构和响应报文结构 HTTP请求报文主要由请求行、请求头、空行、请求正文(Get请求没有请求正文)4部分组成。 1、请求行 由三部分组成,分别为:请求方法、URL以及协议版本,之间由空格分隔;请…

使用 HarperDB SDK for Java 简化数据库操作

在现代应用程序开发的动态环境中,与数据库的高效、无缝交互至关重要。HarperDB 凭借其 NoSQL 功能,为开发人员提供了强大的解决方案。为了简化这种交互,HarperDB SDK for Java提供了一个方便的接口,用于将 Java 应用程序与 Harper…

YHZ010 Python 的类型转换

🐶 类型转换 资源编号:YHZ010 配套视频:https://www.bilibili.com/video/BV1zy4y1Z7nk?p11 🐹 检查变量类型 在 Python 中可以使用type函数对变量的类型进行检查。程序设计中函数的概念跟数学上函数的概念是一致的,数…

全栈架构:从0开始,Vue的搭建与开发

尼恩说在前面 在40岁老架构师 尼恩的读者交流群(50)中,很多小伙伴拿到一线互联网企业、上市企业如阿里、网易、有赞、希音、百度、滴滴的面试资格。 然后,很多小伙伴平时聚焦CRUD,没有亮点项目, 黄金项目。 简历也写得是非常lo…

九台虚拟机网站流量分析项目启动步骤

文章目录 零、操作概述一、服务器分配二、9台虚拟机相互免密登录三、Nginx(反向代理服务器)四、Tomcat(Web服务器)五、测试Nginx反向代理是否成功六、Flume集群配置七、修改LogDemo项目八、项目1703FluxStorm九、Hadoop集群十、整个集群的启动十一、部署项目十二、测试项目…

骑砍战团MOD开发(26)-系统定制UI资源替换

一.TaleWorlds开机动画删除 骑砍1战团:taleworlds_intro.bik 重命名为 taleworlds_intro_bak.bik 骑砍2霸主: Modules\Native\Videos\TWLogo_and_Partners.ivf 重命名为 TWLogo_and_Partners_bak.ivf Modules\Native\Videos\TWLogo_and_Partners.ogg 重命名为TWLogo_and_Partne…

Generalized Focal Loss V1论文解读

摘要 单级检测器基本上将物体检测表述为密集分类和定位(即边界框回归)。分类通常通过Focal Loss进行优化,而边界框的定位通常根据Dirac delta分布进行学习。单级检测器的最新趋势是引入一个单独的预测分支来估计定位质量,预测质量…

【Web2D/3D】SVG(第二篇)

1. 前言 SVG(Scalable Vector Graphics,可缩放矢量图形)是一种使用XML描述2D图形的语言,由于SVG是基于XML(HTML也是基于XML的),因为SVG DOM中每个元素都是可以操作的,包含修改元素属…

【javaSE】代理并不难

代理: 代理模式最主要的就是在不改变原来代码(就是目标对象)的情况下实现功能的增强 在学习AOP之前先了解代理,代理有两种:一种是动态代理,一类是静态代理。 静态代理 相当于是自己写了一个代理类&#…

SAP-PP:Phantom Assembly 虚拟装配 概念理解

文章目录 前言一、Phantom assembly 是什么?二、举例总结 前言 最近在学习PP的过程中遇到一个校验增强,系统中将对生产订单的组件预留进行校验,是否消耗正常。 那么就遇到BOM中设定了虚拟装配件的标识预留,其是不消耗数量的&…

力扣(leetcode)第257题二叉树的所有路径(Python)

257.二叉树的所有路径 题目链接:257.二叉树的所有路径 给你一个二叉树的根节点 root ,按 任意顺序 ,返回所有从根节点到叶子节点的路径。 叶子节点 是指没有子节点的节点。 示例 1: 输入:root [1,2,3,null,5] 输出…

win10和win11上解决乱码的一个优点偏门的方法,不算很完美

左下角搜索控制面板,进入控制面板之后,点击时钟和区域下面的更改日期、时间或数字格式 点击顶上的“管理”选项,然后找到更改系统区域设置,把下方的Beta版:使用Unicode UTF-8提供全球语言支持(U&#xff…

视频批量转码:一键转换多个视频mp4格式到FLV视频

在数字媒体时代,视频格式的多样性给处理工作带来了诸多不便。满足不同的播放需求,经常要视频从一种格式转换为另一种格式。其中,将mp4格式转换为FLV格式的需求很常见。现在一起来看下云炫AI智剪如何高效的将视频批量转码方法,一键…

[Angular] 笔记 21:@ViewChild

chatgpt: 在 Angular 中,ViewChild 是一个装饰器,用于在组件类中获取对模板中子元素、指令或组件的引用。它允许你在组件类中访问模板中的特定元素,以便可以直接操作或与其交互。 例如,如果你在模板中有一个子组件或一个具有本地…

\r\n和缓冲区/进度条小程序

一 前置知识 带有\n就会立马刷新缓冲区(因为显示器是行刷新),\r不会刷新缓冲区 刷新的2个场景: 1 ~fflush 缓冲区中存在\r或\n --> \r fflush --> 不换行的\n) 2 ~ 文件关闭自动刷新缓冲区 倒计时小程序0-9 %-d是左对齐,%d是右对齐 倒计时小程序0-99 …

复试 || 就业day05(2023.12.31)算法篇

文章目录 前言找不同最长回文串找到所有数组中消失的数字下一个更大元素 I键盘行 前言 💫你好,我是辰chen,本文旨在准备考研复试或就业 💫文章题目大多来自于 leetcode,当然也可能来自洛谷或其他刷题平台 &#x1f4ab…

在Go中使用Goroutines和Channels发送电子邮件

学习如何使用Goroutines和Channels在Go中发送电子邮件 在现代软件开发的世界中,通信是一个关键元素。发送电子邮件是各种目的的常见实践,例如用户通知、报告等。Go是一种静态类型和编译语言,为处理此类任务提供了高效和并发的方式。在本文中&…

pycharm配置pyqt5的ui文件转py文件的小工具

在PyCharm中配置 PyQt5 的 .ui 文件转 .py 文件的小工具其实是配置一个外部工具,以便可以直接在 IDE 中通过单击按钮来完成这个转换。你需要使用 pyuic5 命令,它是 PyQt5 的工具集之一,用于将 .ui 文件(用 Qt Designer 创建的&…

BED 文件格式 chip-seq m6a数据可视化会用到

General usage — bedtools 2.31.0 documentationhttps://bedtools.readthedocs.io/en/latest/content/general-usage.html BED格式(Browser Extensible Data format)是一种在生物信息学中广泛使用的文本文件格式,用于描述基因组上的特征和…