Python 文本语种检测模型:cld2-cffi
安装 :pip install cld2-cffi
代码
import cld2t = ['A accès aux chiens et aux frontaux qui lui ont été il peut consulter et modifier ses collections et exporter Cet article concerne le pays européen aujourd’hui appelé République française.','The present disclosure relates to a method for extracting lithium from a lithium-containing material.','面向生产环境的多语种自然语言处理工具包,基于PyTorch和TensorFlow 2.x双引擎','Transformer模型','富士フイルム和光純薬株式会社','인사말 안녕하세요','Javigator:Java代码导读及分析管理工具的设计']for s in t:# 常用状态 'Unknown' Chinese ENGLISHprint(s)isReliable, textBytesFound, details = cld2.detect(s)# print('reliable: %s' % (isReliable != 0)) # 结果是否可信# print('textBytes: %s' % textBytesFound) #print(f"检测结果详情:{str(details)}")print(details[0].language_code)print('--------------------------')if __name__ == "__main__":run_code = 0
结果:
A accès aux chiens et aux frontaux qui lui ont été il peut consulter et modifier ses collections et exporter Cet article concerne le pays européen aujourd’hui appelé République française.
检测结果详情:(Detection(language_name='FRENCH', language_code='fr', percent=99, score=1345.0), Detection(language_name='Unknown', language_code='un', percent=0, score=0.0), Detection(language_name='Unknown', language_code='un', percent=0, score=0.0))
fr
--------------------------
The present disclosure relates to a method for extracting lithium from a lithium-containing material.
检测结果详情:(Detection(language_name='ENGLISH', language_code='en', percent=99, score=882.0), Detection(language_name='Unknown', language_code='un', percent=0, score=0.0), Detection(language_name='Unknown', language_code='un', percent=0, score=0.0))
en
--------------------------
面向生产环境的多语种自然语言处理工具包,基于PyTorch和TensorFlow 2.x双引擎
检测结果详情:(Detection(language_name='Chinese', language_code='zh', percent=74, score=1952.0), Detection(language_name='ENGLISH', language_code='en', percent=23, score=445.0), Detection(language_name='Unknown', language_code='un', percent=0, score=0.0))
zh
--------------------------
Transformer模型
检测结果详情:(Detection(language_name='Unknown', language_code='un', percent=0, score=0.0), Detection(language_name='Unknown', language_code='un', percent=0, score=0.0), Detection(language_name='Unknown', language_code='un', percent=0, score=0.0))
un
--------------------------
富士フイルム和光純薬株式会社
检测结果详情:(Detection(language_name='Japanese', language_code='ja', percent=97, score=2619.0), Detection(language_name='Unknown', language_code='un', percent=0, score=0.0), Detection(language_name='Unknown', language_code='un', percent=0, score=0.0))
ja
--------------------------
인사말 안녕하세요
检测结果详情:(Detection(language_name='Korean', language_code='ko', percent=96, score=3780.0), Detection(language_name='Unknown', language_code='un', percent=0, score=0.0), Detection(language_name='Unknown', language_code='un', percent=0, score=0.0))
ko
--------------------------
Javigator:Java代码导读及分析管理工具的设计
检测结果详情:(Detection(language_name='Chinese', language_code='zh', percent=71, score=2143.0), Detection(language_name='ENGLISH', language_code='en', percent=25, score=819.0), Detection(language_name='Unknown', language_code='un', percent=0, score=0.0))
zh
--------------------------