什么是自然语言处理,它如何工作?

Talking to a chat bot on a smartphone.
NicoElNino/Shutterstock.comNicoElNino / Shutterstock.com

Natural language processing enables computers to process what we’re saying into commands that it can execute. Find out how the basics of how it works, and how it’s being used to improve our lives.

自然语言处理使计算机能够将我们所说的内容处理成可以执行的命令。 了解其运作方式的基础知识,以及如何将其用于改善我们的生活。

什么是自然语言处理? (What Is Natural Language Processing?)

Whether it’s Alexa, Siri, Google Assistant, Bixby, or Cortana, everyone with a smartphone or smart speaker has a voice-activated assistant nowadays. Every year, these voice assistants seem to get better at recognizing and executing the things we tell them to do. But have you ever wondered how these assistants process the things we’re saying? They manage to do this thanks to Natural Language Processing, or NLP.

无论是Alexa,Siri,Google Assistant,Bixby还是Cortana,如今每个拥有智能手机或智能扬声器的人都可以使用声控助手。 每年,这些语音助手在识别和执行我们告诉他们要做的事情上似乎都变得更好。 但是您是否想知道这些助手如何处理我们所说的话? 他们借助自然语言处理(NLP)设法做到了这一点。

Historically, most software has only been able to respond to a fixed set of specific commands. A file will open because you clicked Open, or a spreadsheet will compute a formula based on certain symbols and formula names. A program communicates using the programming language that it was coded in, and will thus produce an output when it is given input that it recognizes. In this context, words are like a set of different mechanical levers that always provide the desired output.

从历史上看,大多数软件只能响应一组固定的特定命令。 一个文件将打开,因为你点击打开,或电子表格将计算公式基于一定的符号和公式的名称。 程序使用其编码所用的编程语言进行通信,因此当获得可识别的输入时,它将产生输出。 在这种情况下,词语就像总是提供所需输出的一组不同的机械杆。

This is in contrast to human languages, which are complex, unstructured, and have a multitude of meanings based on sentence structure, tone, accent, timing, punctuation, and context. Natural Language Processing is a branch of artificial intelligence that attempts to bridge that gap between what a machine recognizes as input and the human language. This is so that when we speak or type naturally, the machine produces an output in line with what we said.

这与人类语言相反,人类语言复杂,无结构,并且具有基于句子结构,语调,重音,时间,标点和上下文的多种含义。 自然语言处理是人工智能的一个分支,它试图弥合机器识别为输入的语言与人类语言之间的鸿沟。 这样一来,当我们自然说话或打字时,机器会产生与我们所说的一致的输出。

This is done by taking vast amounts of data points to derive meaning from the various elements of the human language, on top of the meanings of the actual words. This process is closely tied with the concept known as machine learning, which enables computers to learn more as they obtain more points of data. That is the reason why most of the natural language processing machines we interact with frequently seem to get better over time.

这是通过在实际单词的含义之上,通过获取大量数据点来从人类语言的各个元素中获取含义来实现的。 该过程与称为机器学习的概念紧密相关,后者使计算机在获取更多数据点时可以学习更多。 这就是为什么我们经常与之交互的大多数自然语言处理机器随着时间的推移而变得越来越好的原因。

To illuminate the concept better, let’s have a look at two of the most top-level techniques used in NLP to process language and information.

为了更好地阐明这一概念,让我们看一下NLP中用于处理语言和信息的两种最高级技术。

代币化 (Tokenization)

tokenization natural language processing

Tokenization means splitting up speech into words or sentences. Each piece of text is a token, and these tokens are what show up when your speech is processed. It sounds simple, but in practice, it’s a tricky process.

标记化是指将语音分为单词或句子。 每一段文本都是一个标记,这些标记是在处理语音时显示的标记。 听起来很简单,但是实际上,这是一个棘手的过程。

Let’s say that you are using text-to-speech software, such as the Google Keyboard, to send a message to a friend. You want to message, “Meet me at the park.” When your phone takes that recording and processes it through Google’s text-to-speech algorithm, Google must then split what you just said into tokens. These tokens would be “meet,” “me,” “at,” “the,” and “park”.

假设您正在使用文字转语音软件(例如Google键盘)向朋友发送消息。 您想留言,“在公园认识我”。 当您的手机录制该记录并通过Google的语音合成算法对其进行处理时,Google必须将您刚才所说的内容拆分为令牌。 这些标记将是“满足”,“我”,“在”,“该”和“停放”。

People have different lengths of pauses between words, and other languages may not have very little in the way of an audible pause between words. The tokenization process varies drastically between languages and dialects.

人们在单词之间的停顿时间长短不同,而其他语言在单词之间的可听停顿方面可能不会少。 语言和方言之间的分词过程大不相同。

词干和词法化 (Stemming and Lemmatization)

Stemming and lemmatization both involve the process of removing additions or variations to a root word that the machine can recognize. This is done to make interpretation of speech consistent across different words that all mean essentially the same thing, which makes NLP processing faster.

词干和词根去除均涉及删除机器可以识别的根词的附加内容或变体的过程。 这样做的目的是使语音解释在不同的词之间保持一致,而这些词本质上都是同一件事,这使得NLP处理更快。

stemming natural language processing

Stemming is a crude fast process that involves removing affixes from a root word, which are additions to a word attached before or after the root. This turns the word into the simplest base form by simply removing letters. For example:

词干处理是一个粗略的快速过程,涉及从词根词中删除词缀,词缀是词根之前或之后附加词的附加词。 只需删除字母,即可将单词变成最简单的基本形式。 例如:

  • “Walking” turns into “walk”

    “走路”变成“走路”
  • “Faster” turns into “fast”

    “更快”变成“快速”
  • “Severity” turns into “sever”

    “严重程度”变成“严重程度”

As you can see, stemming may have the adverse effect of changing the meaning of a word entirely. “Severity” and “sever” do not mean the same thing, but the suffix “ity” was removed in the process of stemming.

如您所见,词干可能会对完全改变单词的含义产生不利影响。 “严重性”和“严重性”并不相同,但是在词干处理过程中删除了后缀“ ity”。

On the other hand, lemmatization is a more sophisticated process that involves reducing a word to their base, known as the lemma. This takes into consideration the context of the word and how it’s used in a sentence. It also involves looking up a term in a database of words and their respective lemma. For example:

另一方面,词义化是一个更复杂的过程,涉及将单词减少为词根,即词义 这考虑了单词的上下文及其在句子中的使用方式。 它还涉及在单词及其各自的引理的数据库中查找术语。 例如:

  • “Are” turns into “be”

    “是”变成“是”
  • “Operation” turns into “operate”

    “经营”变成“经营”
  • “Severity” turns into “severe”

    “严重程度”变成“严重程度”

In this example, lemmatization managed to turn the term “severity” into “severe,” which is its lemma form and root word.

在此示例中,词形化成功将术语“严重性”转换为“严重”,这是其词缀形式和词根。

NLP用例和未来 (NLP Use Cases and the Future)

The previous examples only begin to scratch the surface of what Natural Language Processing is. It encompasses a wide range of practices and usage scenarios, many of which we use in our daily lives. These are a few examples of where NLP is currently in use:

前面的示例仅开始介绍自然语言处理的内容。 它涵盖了广泛的实践和使用场景,我们在日常生活中使用了许多实践和使用场景。 以下是一些当前使用NLP的示例:

  • Predictive Text: When you type a message on your smartphone, it automatically suggests you words that fit into the sentence or that you’ve used before.

    预想文字:当您在智能手机上键入信息时,它会自动为您推荐适合该句子或您以前使用过的单词。

  • Machine Translation: Widely used consumer translating services, such as Google Translate, to incorporate a high-level form of NLP to process language and translate it.

    机器翻译:广泛使用的消费者翻译服务,例如Google Translate,可以结合高级形式的NLP来处理语言并进行翻译。

  • Chatbots: NLP is the foundation for intelligent chatbots, especially in customer service, where they can assist customers and process their requests before they face a real person.

    聊天机器人: NLP是智能聊天机器人的基础,尤其是在客户服务中,他们可以在面对真正的人之前帮助客户并处理他们的请求。

There’s more to come. NLP uses are currently being developed and deployed in fields such as news media, medical technology, workplace management, and finance. There’s a chance we may be able to have a full-fledged sophisticated conversation with a robot in the future.

还有更多。 NLP用途目前正在新闻媒体,医疗技术,工作场所管理和金融等领域开发和部署。 将来,我们有可能与机器人进行全面的复杂对话。

If you’re interested in learning more about NLP, there are a lot of fantastic resources on the Towards Data Science blog or the Standford National Langauge Processing Group that you can check out.

如果您有兴趣了解有关NLP的更多信息,可以在Towards Data Science博客或Standford National Langauge Processing Group上找到很多精彩的资源,可以查阅。

翻译自: https://www.howtogeek.com/665702/what-is-natural-language-processing-and-how-does-it-work/

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/278290.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

GIT速查手册

为什么80%的码农都做不了架构师?>>> 一、GIT 1.1 简单配置 git是版本控制系统,与svn不同的是git是分布式,svn是集中式 配置文件位置 # 配置文件 .git/config 当前仓库的配置文件 ~/.gitconfig 全局配置文件# 查看所有配置项 git …

4-3逻辑非运算符及案例 4-4

创建类 LoginDemo3 这里取反 !(n%30) package com.imooc.operator; import java.util.Scanner;public class LoginDemo3 {public static void main(String[] args) {// TODO Auto-generated method stubSystem.out.println("请输入一个整数");Scanner scnew Scanner(…

assistant字体_如何使用Google Assistant设置和致电家庭联系人

assistant字体Google谷歌Google Home and Nest smart speakers and displays allow you to make calls without using your phone. By setting up “Household Contacts,” anyone in your home can easily call friends and family members with Google Assistant-enabled dev…

Accoridion折叠面板

详细操作见代码&#xff1a; <!doctype html> <html><head><meta charset"UTF-8"><title></title><meta name"viewport" content"widthdevice-width,initial-scale1,minimum-scale1,maximum-scale1,user-scal…

skype快捷键_每个Skype键盘快捷键及其用法

skype快捷键Roberto Ricca/Shutterstock罗伯托里卡/ ShutterstockGet familiar with Skype’s unique keyboard shortcuts that will allow you to quickly change your settings, alter your interface, and control your communications. Use these hotkeys and become a Sky…

YouTube键盘快捷键:速查表

Google’s video website wouldn’t be complete without all sorts of useful buttons and hidden commands that aren’t immediately obvious. Use this hotkey cheat sheet to quickly navigate YouTube and gain better control over your video browsing experience. 如果…

MySQL服务读取参数文件my.cnf的规律研究探索

在MySQL中&#xff0c;它是按什么顺序或规律去读取my.cnf配置文件的呢&#xff1f;其实只要你花一点功夫&#xff0c;实验测试一下就能弄清楚&#xff0c;下面的实验环境为5.7.21 MySQL Community Server。其它版本如有不同&#xff0c;请以实际情况为准。 其实&#xff0c;MyS…

将组策略编辑器添加到控制面板

If you find yourself using the Group Policy Editor all the time, you might have wondered why it doesn’t show up in the Control Panel along with all the other tools. After many hours of registry hacking, I’ve come up with a registry tweak to let you do ju…

Exchange Server 2016管理系列课件50.DAG管理之激活数据库副本

激活邮箱数据库副本是将特定被动副本指定为邮箱数据库的新主动副本的过程。我们将此过程称为数据库切换。数据库切换过程是指卸除当前的活动数据库&#xff0c;然后在指定的服务器上将相应的数据库副本作为新的活动邮箱数据库副本进行装载。成为活动邮箱数据库的数据库副本必须…

常见设计模式 (python代码实现)

1.创建型模式 单例模式 单例模式&#xff08;Singleton Pattern&#xff09;是一种常用的软件设计模式&#xff0c;该模式的主要目的是确保某一个类只有一个实例存在。当你希望在整个系统中&#xff0c;某个类只能出现一个实例时&#xff0c;单例对象就能派上用场。 比如&#…

记录一次解决httpcline请求https报handshake_failure错误

概述 当使用httpclinet发起https请求时报如下错误&#xff1a; javax.net.ssl.SSLHandshakeException: Received fatal alert: handshake_failureat com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:174)at com.sun.net.ssl.internal.ssl.Alerts.getSSLExcep…

桌面程序explorer_备份Internet Explorer 7搜索提供程序列表

桌面程序explorerIf you are both an IE user and a fan of using custom search providers in your search box, you might be interested to know how you can back up that list and/or restore it on another computer. Yes, this article is boring, but we’re trying to…

GreenPlum数据库故障恢复测试

本文介绍gpdb的master故障及恢复测试以及segment故障恢复测试。 环境介绍&#xff1a;Gpdb版本&#xff1a;5.5.0 二进制版本操作系统版本&#xff1a; centos linux 7.0Master segment: 192.168.1.225/24 hostname: mfsmasterStadnby segemnt: 192.168.1.227/24 hostname: ser…

书评:Just the Computer Essentials(Vista)

Normally we try and focus on articles about how to customize your computer, but today we’ll take a break from that and do a book review. This is something I’ve not done before, so any suggestions or questions will be welcomed in the comments. 通常&#x…

python学习

为了学会能学&#xff0c;不负时间&#xff0c;为了那简练的美&#xff01; 为了一片新天地。 /t 对齐 python : # 99乘法表i 0while i < 9 : i 1 j 0 while j < i : j 1 print(j ,* , i,"" , i*j , end\t) #空格不能对齐 制表符…

hey 安装_如何在助理扬声器上调整“ Hey Google”的灵敏度

hey 安装Vantage_DS/ShutterstockVantage_DS / ShutterstockThe Google Assistant is a useful tool that allows you to control your smart home, check the weather, and more. Unfortunately, the Assistant might not hear you in a noisy environment or it might activa…

EXCEL如何进行多条件的数据查找返回

在使用EXCEL时经常会碰到一个表里的同一款产品每天的销量都不一样&#xff0c;然后我们需要查导出每一款产品每天的销量&#xff0c;即一对多条件查找。这个教复杂&#xff0c;我们要用到好几个函数的综合&#xff0c;下面小编来教你吧。 工具/原料 EXCEL软件&#xff08;本文使…

如何将Google幻灯片转换为PowerPoint

If someone sent you a presentation on Google Slides, but you’d rather work on it in Microsoft PowerPoint, you can easily convert it to a .pptx file in just a few simple steps. Here’s how it’s done. 如果有人在Google幻灯片上向您发送了演示文稿&#xff0c;但…

XP调整禁用页面文件

NOTE: You shouldn’t disable your page file unless you really really know what you are doing. 注意&#xff1a;除非您真的很清楚自己在做什么&#xff0c;否则不应该禁用页面文件。 If your computer has 1 GB of RAM or more, disabling the page file can speed up XP…

如何在Windows 10的命令提示符中更改目录

One of the first things you’ll need to learn as you become more familiar with Command Prompt on Windows 10 is how to change directories in the operating system’s file system. There are a few ways you can do this, so we’ll walk you through them. 随着您对…