首先github下载.exe的安装包, gtk或者qt5都可以。推荐gtk。
https://github.com/manisandro/gImageReader/releases
直接下载的地址:
https://github.com/manisandro/gImageReader/releases/download/master/gImageReader_latest_gtk_x86_64.exe
然后安装,安装的时候选择本地化,这样就支持汉语或者其他的本地语言。
然后打开默认的安装地址,
C:\Program Files\gImageReader\
然后,就是安装多语言识别包,字典拼写
多语言识别包,路径和说明:
C:\Program Files\gImageReader\share\tessdata
说明如下:
This folder contains tesseract language definitions.
To add additional language definitions:
- Use the tessdata manager from the language selection menu in gImageReader
- Or install the languages manually:
* In the gImageReader about dialog, check which version of tesseract is used
* If using tesseract 4.x, go to https://github.com/tesseract-ocr/tessdata_fast
* If using tesseract 3.x or older, go to https://github.com/tesseract-ocr/tessdata
* In the branch selection button, under tags, select the version which is *less or equal* the tesseract version in use
* Download the desired language definitions (*.traineddata along with any supplementary files which certain languages need) and save them inside this folder
* If gImageReader is running, select "Redetect Languages" from the application menu, or restart the application
多语言下载地址:
https://github.com/tesseract-ocr/tessdata
点击code, 然后点击 download zip, 压缩包大小约634MB, 解压后1.2G左右。
例如我们要安装俄语的语言包,将解压后的软件包,找到并 复制 rus.traineddata 到以下默认安装的语言包数据路径:(其他语言找对应的)
C:\Program Files\gImageReader\share\tessdata
我个人测试的最好的结果是 tessdata_best
https://github.com/tesseract-ocr/tessdata_best
建议下载整个软件包,不要单独下载,否则会出现不完整。
最后我们添加拼写字典,找到拼写的路径,这里使用的是huspell,在share的文件夹路径下找到huspell,
C:\Program Files\gImageReader\share\hunspell
然后打开说明文件:
This folder contains spelling dictionaries.
To add additional spelling dictionaries:
* Visit https://cgit.freedesktop.org/libreoffice/dictionaries/tree/
* Download the *.dic and *.aff files for the desired language and place them inside this folder
* If gImageReader is running, select "Redetect Languages" from the application menu, or restart the application
这里需要下载俄语的拼写文件, *.dic and *.aff, 默认的只有英语的。我们需要找到俄语的拼写字典。
根据说明打开下载地址:
https://cgit.freedesktop.org/libreoffice/dictionaries/tree/
俄语语言包下载地址:
https://cgit.freedesktop.org/libreoffice/dictionaries/tree/ru_RU
然后下载文件如下
文件名 文件大小
-rw-r--r-- ru_RU.aff 71236
-rw-r--r-- ru_RU.dic 3473191
最后将文件复制到gimagereader的安装路径:
这里一定要注意, 不要下载html格式的文件,这里要点开,然后右键点击 (plain) , 然后 “ 从链接另存文件为...", 两个文件都需要如此操作。
ru_RU.aff
下载后的大小尺寸: 69.5 KB (71,236 字节)
ru_RU.dic
下载后的大小尺寸:
3.31 MB (3,473,191 字节)
保存下载,然后复制到C盘的字典拼写路径中:
C:\Program Files\gImageReader\share\hunspell