命名实体识别 实体抽取_您的公司为什么要关心命名实体的识别

命名实体识别 实体抽取

Named entity recognition is the task of categorizing text into entities, such as people, locations, and dates. For example, for the sentence, On April 30, 1789, George Washington was inaugurated as the first president of the United States , this sentence may be tagged with the following entities:

命名实体识别是将文本分类为实体的任务,例如人物,位置和日期。 例如,对于On April 30, 1789, George Washington was inaugurated as the first president of the United States宣告On April 30, 1789, George Washington was inaugurated as the first president of the United States ,该句子可能带有以下实体标记:

Image for post
Image from Zach Monge
图片来自Zach Monge

You might be thinking, okay exactly how is this useful? Well, there are many potential uses of named entity recognition, but one is being able to make a database easily searchable. You might be thinking, why would I need to tag entities to make a database easily searchable? Can’t I just use a simple dictionary lookup to exactly match terms? Well, yes, you can, but this is far from ideal and just to show you how ineffective searches can be without named entity recognition, let’s walk through a real life example.

您可能会想,好吧,这到底有什么用? 好的,命名实体识别有许多潜在用途,但是其中一个功能是使数据库易于搜索。 您可能在想,为什么我需要标记实体以使数据库易于搜索? 我不能只使用简单的字典查找来完全匹配术语吗? 是的,可以,但是这远非理想,只是为了向您展示在没有命名实体识别的情况下如何进行无效搜索,让我们来看一个真实的例子。

(Example)

Recently I was ordering food at my local grocery store, Weis Markets, and was trying to add to my cart Perdue frozen chicken fingers. So I typed into the search bar:

最近,我在当地的杂货店Weis Markets点菜,并试图将Perdue冷冻鸡手指添加到我的购物车中。 所以我输入了搜索栏:

Image for post
Weis MarketsWeis Markets

To my disappointment, my search did not yield any results:

令我失望的是,我的搜索没有得到任何结果:

Image for post
Weis MarketsWeis Markets

At first I thought they may have been out of stock, but after searching for several other items, I kept getting no results. After awhile, I started to suspect that Weis’s search engine was only able to find search terms that almost exactly matched the product label (Note: I do not actually know the machinery behind Weis’s search engine). So I looked up on Google what the chicken fingers I wanted were exactly called and I realized they are called chicken tenders not fingers (of course!). So I typed perdue chicken tenders into the search box and it worked! I was then successfully able to add the chicken fingers to my cart.

起初我以为它们可能没货了,但是在搜索了其他几项之后,我一直没有得到任何结果。 一段时间后,我开始怀疑Weis的搜索引擎只能找到几乎与产品标签完全匹配的搜索词(注意:我实际上并不知道Weis搜索引擎背后的机制)。 因此,我在Google上查到了我想要的鸡手指的确切名称,然后我意识到它们被称为鸡肉而不是手指 (当然!)。 因此,我在搜索框中输入了perdue chicken tenders ,它起作用了! 然后,我成功地将鸡手指添加到购物车中。

Image for post
Weis MarketsWeis Markets
Image for post
Weis MarketsWeis Markets

I was happy that I was able to add the chicken fingers to my cart, but this was a lot of work to just find one item and I had this same issue with several other items. This made Weis’s online shopping almost unusable! Since then I have not purchased groceries online from this store — it’s just too much work.

我很高兴能够将鸡爪添加到购物车中,但是要找到一个项目却需要很多工作,而其他几个项目也遇到了同样的问题。 这使得Weis的在线购物几乎无法使用! 从那以后,我再也没有从这家商店在线购买杂货了-太累了。

解决方案 (The Solution)

Fortunately for Weis Market, there is a somewhat easy fix to their search engine issue and that is to use named entity recognition. With named entity recognition, the search engine should automatically tag each of the entities. For example, when I typed in perdue chicken fingers it should have tagged Perdue as the brand and chicken fingers as chicken tender (I am not not an expert in food categories, so I do not actually know if chicken tender would be a useful category).

幸运的是,对于Weis Market而言,可以轻松解决其搜索引擎问题,即使用命名实体识别。 使用命名实体识别,搜索引擎应自动标记每个实体。 例如,当我键入perdue chicken fingers ,应该将Perdue标记为品牌,并且将chicken fingers标记为鸡嫩(我不是食品类别的专家,所以我实际上不知道鸡嫩是否会是有用的类别) 。

Image for post
Image from Zach Monge
图片来自Zach Monge

Then, this would search through a database, where each item has been previously tagged. So the actual chicken fingers I wanted may have been previously tagged with the following categories: brand=Perdue; food=chicken tender; frozen, fresh, canned: frozen.

然后,这将在数据库中进行搜索,每个项目之前都已在其中进行了标记。 因此,我之前想要的实际鸡手指可能以前被标记了以下类别: brand = Perdue; 食物 =鸡肉嫩; 冷冻,新鲜,罐头 :冷冻。

Image for post
Image from Zach Monge
图片来自Zach Monge

With the use of these entities and a structured database, my search for perdue chicken fingers would have matched Perdue as the brand and chicken tender as the food and would likely have included the chicken fingers I wanted in my search results.

通过使用这些实体和结构化的数据库,我对perdue chicken fingers搜索将与Perdue作为品牌,而将chicken tender作为食品,并且可能将我想要的鸡手指包括在搜索结果中。

结论 (Conclusions)

So as you can see, named entity recognition can be extremely useful and is almost essential for some products. You can imagine all the possible other uses besides creating a search engine for a grocery store (e.g., recommending similar online articles based upon tagged entities, creating an easily searchable database of interview transcripts, etc.). Something I have not mentioned in this post is the machine learning approaches that may be used to actually conduct the named entity recognition task (in the example, the task of tagging entities in the search perdue chicken fingers). This is the first installment of a series of blog posts about named entity recognition and the next post will go more into the technical details. Lastly, if you think your company may benefit from named entity recognition, feel free to reach out to me — my contact information may be found on my website.

如您所见,命名实体识别可能非常有用,并且对于某些产品几乎是必不可少的。 您可以想象除了为杂货店创建搜索引擎之外,所有其他可能的用途(例如,基于标记的实体推荐类似的在线文章,创建易于搜索的采访记录数据库等)。 我在这篇文章中没有提到的是机器学习方法,可用于实际执行命名实体识别任务(在本示例中,是在搜索perdue chicken fingers中标记实体的任务)。 这是有关命名实体识别的一系列博客文章的第一部分,下一篇文章将进一步介绍技术细节。 最后,如果您认为您的公司可以从命名实体的认可中受益,请随时与我联系-我的联系信息可以在我的网站上找到 。

翻译自: https://towardsdatascience.com/why-your-company-should-care-about-named-entity-recognition-e00de2f45700

命名实体识别 实体抽取

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/242124.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

表达式测试

1111 (parameters) -> { statements; }//求平方 (int a) -> {return a * a;}//打印,无返回值 (int a) -> {System.out.println("a " a);}

有关西电的课程学分相关问题:必修课、选修课、补考、重修、学分

注:最近一年多以来学校的政策改动比较大,听说有选修一旦选了就必须通过,否则视为挂科需要重修的;还有的说是选修课学分够了再多选可能要收费(未经确认,可能只是误传);等各种说法。本…

银行现在都很缺钱吗,为什么给的利息比以前高了?

目前无论是大银行还是小银行,也不论是国有银行还是民营银行,基本上每个银行都上浮利率,如果不上浮利率,那就只能吃土了,当然加息一般主要针对定期存款以及贷款来说,活期存款利率一般是不会上浮,…

机器学习 异常值检测_异常值是否会破坏您的机器学习预测? 寻找最佳解决方案

机器学习 异常值检测内部AI (Inside AI) In the world of data, we all love Gaussian distribution (also known as a normal distribution). In real-life, seldom we have normal distribution data. It is skewed, missing data points or has outliers.在数据世界中&#…

1000万贷款三年,到期一次性偿还1500万,这个利息算不算高?

1000万的贷款三年期到期还1500万,相当于每一年的利息是166.6万,折算下来年化利率是16.6%。至于这个利率是否划算,要看你在什么金融机构贷款以及你个人的资质来看。如果你个人条件比较好,在银行做的抵押贷款,那我认为16…

Golang之变量去哪儿

写过C/C的同学都知道,调用著名的malloc和new函数可以在堆上分配一块内存,这块内存的使用和销毁的责任都在程序员。一不小心,就会发生内存泄露,搞得胆战心惊。切换到Golang后,基本不会担心内存泄露了。虽然也有new函数&…

运营商ip映射_我们如何映射互联网以发现运营商

运营商ip映射Being able to accurately predict which carriers use which IP addresses is important for Wandera’s data cost management solution. Customers with dual-SIM/eSIM devices in their fleet need to be aware at which point in time a device is using whic…

在县城开一家彩票站,一个月能赚多少钱?

现在彩票店多如牛毛,几步就有一个投注站,真能赚大钱的很少,但维持个基本生活应该是不成问题的。 至于接手彩票上是否能赚钱,关键还是要看人流,人流,人流。 想要知道彩票站是否赚钱,你就得先了解…

修改TrustedInstaller权限文件(无法删除文件)

在Win7系统中,存在一个虚拟账户,即TrustedInstaller,有时需要对C盘一些系统文件/文件夹进行修改,或删除,就会弹出“你需要TrustedInstaller提供的权限才能修改此文件”。这时用此法可解除此限制。对于系统中一些无法删…

yolov3算法优点缺点_优点缺点

yolov3算法优点缺点Naive Bayes: A classification algorithm under a supervised learning group based on Probabilistic logic. This is one of the simplest machine learning algorithms of all. Logistic regression is another classification algorithm that models po…

为什么很多企业要跑到美国去上市,而不是在A股上市?

我们都知道目前很多中国优质的企业都选择在香港,美国等境外上市,其中不乏阿里巴巴、腾讯,京东,百度这样的知名企业。比如下图是2017年我国市值排名前20的企业,这些企业当中有19个在境外上市,有的是境外跟境…

逻辑回归画图_逻辑回归

逻辑回归画图申请流程 (Application Flow) Logistic Regression is one of the most fundamental algorithms for classification in the Machine Learning world.Logistic回归是机器学习世界中分类的最基本算法之一。 But before proceeding with the algorithm, let’s firs…

邮储银行的规模有多大?凭什么可以成为第6大国有银行?

邮储银行之所以被划为第6大国有银行,因为他不论是在性质上还是在规模上都对得起第6大国有银行这一称号。首先邮储银行是国有控股的大型商业银行。邮储银行是由原来邮局的储蓄所以及邮电系统的储蓄业务整合而来,在上市之前邮储银行由中国邮政集团100%控股…

工商银行信用卡如何通过刷星提额?

想要刷星级提额,我们就先来了解一下,为什么银行愿意给你提额。不论是对其他银行还是对于工商银行来说,他们愿意给你挑提额无非就两个核心前提,一个是你能给银行创造更多的收益,第2个是你没有任何风险,也就是…

主成分分析具体解释_主成分分析-现在用您自己的术语解释

主成分分析具体解释The caption in the online magazine “WIRED” caught my eye one night a few months ago. When I focused my eyes on it, it read: “Can everything be explained to everyone in terms they can understand? In 5 Levels, an expert scientist explai…

MongoDB介绍

一、MongoDB介绍 1.1 mongoDB介绍 MongoDB 是由C语言编写的,是一个基于分布式文件存储的开源数据库系统。 在高负载的情况下,添加更多的节点,可以保证服务器性能。 MongoDB 旨在为WEB应用提供可扩展的高性能数据存储解决方案。 MongoDB …

Cross-Drone Transformer Network for Robust Single Object Tracking论文阅读笔记

Cross-Drone Transformer Network for Robust Single Object Tracking论文阅读笔记 Abstract 无人机在各种应用中得到了广泛使用,例如航拍和军事安全,这得益于它们与固定摄像机相比的高机动性和广阔视野。多无人机追踪系统可以通过从不同视角收集互补的…

【5G PHY】NR参考信号功率和小区总传输功率的计算

博主未授权任何人或组织机构转载博主任何原创文章,感谢各位对原创的支持! 博主链接 本人就职于国际知名终端厂商,负责modem芯片研发。 在5G早期负责终端数据业务层、核心网相关的开发工作,目前牵头6G算力网络技术标准研究。 博客…

2016年第五届数学建模国际赛小美赛A题臭氧消耗预测解题全过程文档及程序

2016年第五届数学建模国际赛小美赛 A题 臭氧消耗预测 原题再现: 臭氧消耗包括自1970年代后期以来观察到的若干现象:地球平流层(臭氧层)臭氧总量稳步下降,以及地球极地附近平流层臭氧(称为臭氧空洞&#x…

数据结构和算法-二叉排序树(定义 查找 插入 删除 时间复杂度)

文章目录 二叉排序树总览二叉排序树的定义二叉排序树的查找二叉排序树的插入二叉排序树的构造二叉排序树的删除删除的是叶子节点删除的是只有左子树或者只有右子树的节点删除的是有左子树和右子树的节点 查找效率分析查找成功查找失败 小结 二叉排序树 总览 二叉排序树的定义 …