文本分析工具 数据科学_数据科学工具

文本分析工具 数据科学

The Data Scientist is the "Sexiest job of 21 Century", by Harvard Business Review, however, what specifically will a data Scientist do, what tools do they use?

数据科学家是《哈佛商业评论》(Harvard Business Review)所说的“ 21世纪最勤奋的工作”,但是,数据科学家将具体做什么工作,他们将使用哪些工具?

Data Science as a profession will be outlined as people operating and experimenting with information to answer relevant data-related inquiries and building and deploying scalable models that support the data. They use heaps of technical tools to investigate data, build models and cite their observations.

数据科学作为一种专业将被概述为人们在操作和尝试信息以回答与数据相关的相关查询,以及建立和部署支持数据的可伸缩模型。 他们使用大量技术工具来调查数据,建立模型并引用他们的观察结果。

Here could be a list of them:

这可能是它们的列表:

1. Git / GitHub (1. Git/GitHub)

A versioning system, lowlife and GitHub are well widespread all told domains involving open supply comes, collaborations and maintaining code. It's a vastly widespread tool employed by Data Scientists to preserve their findings and code blocks. GitHub has additionally been termed as your "Digital Resume" for the very fact recruiters are analyzing a person’s skills.on GitHub.

一个版本控制系统,lowlife和GitHub广为人知,涉及开放供应,协作和代码维护领域。 它是数据科学家用来保护其发现和代码块的一种广泛使用的工具。 由于招聘人员正在分析某人的技能,因此GitHub还被称为您的“ 数字简历 ”。在GitHub上。

2.编程语言界面 (2. Programming Languages Interface)

Python, Spider, Subline, Jupyter Notebooks for Julia, R, RStudio, PyCharm, Notepad++, Colab by Google and several other IDE and Code-writing platforms are an awfully widespread tool employed by Data Scientists.

适用于Julia,R,RStudio,PyCharm,Notepad ++,Google的Colab以及其他几种IDE和代码编写平台的Python,Spider,Subline,Jupyter Notebooks是数据科学家广泛使用的工具。

3. Orange和IBM Watson (3. Orange and IBM Watson)

Orange, IBM Watson and lots of different automatic Machine Learning design building frameworks are a handy tool for Data Scientists and Machine Learning Engineers to experiment with different models and to create extremely scalable renewable Machine Learning architecture.

Orange,IBM Watson和许多不同的自动机器学习设计构建框架对于数据科学家和机器学习工程师来说都是一种方便的工具,可以尝试使用不同的模型并创建可扩展的可扩展机器学习架构。

4. D3.js和Tableau (4. D3.js and Tableau)

Analytics is an integral part of the data Science advancement and understanding the data via visualization makes a data scientist capable of responsive most data-driven queries from pure observations. For this, D3.js and Tableau have established to be an excellent catalyst particularly within the field of Business Analytics. Honorable mention additionally goes to Excel and PowerBI.

分析是数据科学进步不可或缺的一部分,通过可视化了解数据使数据科学家能够响应来自纯观测值的大多数数据驱动的查询。 为此,D3.js和Tableau已经确立了成为极好的催化剂,特别是在业务分析领域。 值得一提的是Excel和PowerBI。

5. Hadoop,Mahout,Apache,Hive和Pig (5. Hadoop, Mahout, Apache, Hive, and Pig)

After the appearance of BigData, many frameworks are developed to handle vast streams of information, to investigate it and build models on that. Whereas Hadoop is extremely widespread for its distributed filing system referred to as HDFS(Hadoop Distributed File System), Apache and Driver for machine learning incorporation and Hive and Pig for quicker huge information integration; these are extremely powerful and favored tools employed by Data Scientists.

BigData出现之后,开发了许多框架来处理大量信息流,进行研究并在此基础上建立模型。 Hadoop因其称为HDFS(Hadoop分布式文件系统)的分布式文件系统而极为广泛,而Apache和用于机器学习合并的驱动程序,以及Hive和Pig用于更快地进行大规模信息集成; 这些是数据科学家使用的极其强大且受人欢迎的工具。

6. NoSql,MongoDB,Cassandra,MySQL (6. NoSql, MongoDB, Cassandra, MySQL)

SQL increasing to Structured source language is an integrated part of the direction that falls underneath the primary quarter of the advancement of information Science. Whereas MySQL has been the selection of veterans, MongoDB has picked up some serious pace and has established to be extremely used tools by Data Scientists.

增加到结构化源语言SQL是该方向的一个组成部分,该方向属于信息科学发展的主要部分。 MySQL是退伍军人的首选,而MongoDB已经取得了一些认真的进展,并已被数据科学家确定为极为有用的工具。

7.编程语言的软件包/模块 (7. Packages/Modules of Programming Languages)

Packages in several programming languages are a crucial side in writing easy, reusable and economical code. In Python, packages like pandas, NumPy, Scipy, matplolib, bokeh, seaborn, stats model, collections, sci-kit-learn, urllib, beautifulsoup and lots of additional are terribly ordinarily employed by Data Scientists. Similarly, in R, tidy, ggplot2, etc., are notable mentions.

几种编程语言的程序包在编写简单,可重用和经济的代码方面至关重要。 在Python中,数据科学家通常会严格地使用pandas,NumPy,Scipy,matplolib,bokeh,seaborn,stats模型,集合,sci-kit-learn,urllib,beautifulsoup等软件包。 同样,在R中,值得注意的是tidy,ggplot2等。

翻译自: https://www.includehelp.com/data-science/tools-for-data-science.aspx

文本分析工具 数据科学

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/543873.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

appweb ejs_具有快速路线的EJS

appweb ejsHI! Welcome to NODE AND EJS TEMPLATE ENGINE SERIES. Today, we will see how we can work with EJS and routes? 嗨! 欢迎使用NODE和EJS模板引擎系列 。 今天,我们将看到如何使用EJS和路由? A route is like a sub domain wit…

数据分析 数据科学_数据科学中的数据分析

数据分析 数据科学资料剖析 (Data Profiling) Data Profiling is a method of examining data from an existing supply and summarizing info this data. Your profile data to work out the accuracy, completeness, and validity of your data. Information identification …

bpl开发模式_BPL的完整形式是什么? 什么是电力线宽带

bpl开发模式BPL:电力线宽带 (BPL: Broadband Over Power Lines) BPL is an abbreviation of "Broadband Over Power Lines". BPL是“电力线宽带”的缩写 。 BPL is also occasionally called as Internet over power line (IPL) or power line telecommu…

ups一直响是什么原因_UPS的完整形式是什么?

ups一直响是什么原因UPS:不间断电源 (UPS: Uninterruptible Power Supply) UPS is an abbreviation of Uninterruptible Power Supply. It operates with the support of a battery which is used to supply power in the lack of most important source or when th…

语音asr是什么意思_ASR的完整形式是什么?

语音asr是什么意思ASR:自动语音识别 (ASR: Automated Speech Recognition) ASR stands for Automated Speech Recognition. With the help of this technology, spoken words can be easily converted to written text. What actually it does? It gives access to…

数据库缓冲池_块缓冲| 数据库管理系统

数据库缓冲池When several blocks need to be transferred from disk to main memory and all the block addresses are known, several buffers can be reserved in main memory to speed up the transfer. 当需要将几个块从磁盘传输到主存储器并且所有块地址已知时&#xff0…

python公共变量_Python中的公共变量

python公共变量By default all numbers, methods, variables of the class are public in the Python programming language; we can access them outside of the class using the object name. 默认情况下,该类的所有数字,方法和变量在Python编程语言中…

递归如何书写?

目录 第一步:首先你分析问题,要有递归的思路,知道要递归什么来解决问题。 第二步:先按照思路(第一层)写出函数的定义与函数体 第三步:根据函数的定义与函数体进一步确定需要的参数 第四步&a…

kotlin 判断数字_Kotlin程序可以逆转数字

kotlin 判断数字Given an integer number, we have to find reverse number and print it. 给定一个整数,我们必须找到反向数字并打印出来。 Example: 例: Input:Number: 12345Output:Reverse Number: 54321To find a reverse number – we use this f…

Python | 创建员工类别

Python-员工类代码 (Python - employee class code) # employee class code in Python# class definitionclass Employee:__id0__name""__gender""__city""__salary0# function to set data def setData(self,id,name,gender,city,salary):self.…

scala 字段覆盖_Scala中的字段覆盖

scala 字段覆盖Scala字段覆盖 (Scala field overriding) Overriding is the concept in which the child class is allowed to redefine the members of the parent class. Both methods and variables/ fields can be overridden in object-oriented programming. In Scala as…

python 散点图 分类_Python | 分类图

python 散点图 分类Visualizing different variables is also a part of basic plotting. Such variables can have different classes, for example, numerical or a category. Matplotlib has an important feature of Categorical Plotting. We can plot multiple categoric…

python 对角线矩阵_Python | 矩阵的对角线

python 对角线矩阵Some problems in linear algebra are mainly concerned with diagonal elements of the matrix. For this purpose, we have a predefined function numpy.diag(a) in NumPy library package which automatically stores diagonal elements in an array (a V…

二叉树祖先节点_二叉树的祖先

二叉树祖先节点Problem statement: 问题陈述: Given a Binary Tree and a target key, write a function that prints all the ancestors of the key in the given binary tree. 给定二叉树和目标键,编写一个函数,以打印给定二叉树中键的所有…

txt文本变为粗体_如何在PHP中使文本变为粗体?

txt文本变为粗体Sometimes we might want to display text with style. That its font, color, make it bold, italic, underlined and many more. Adding whatever style is all based on the message that we want to pass across or getting someones attention. 有时我们可…

CALayer精讲

CALayer精讲 CALayer包含在QuartzCore框架中,这是一个跨平台的框架,既可以用在iOS中又可以用在Mac OS X中。后面要学Core Animation就应该先学好Layer(层)。 我们看一下UIView与Layer之间的关系图(图片来源于网络&…

VSRE的完整形式是什么?

VSRE:预期回复非常短 (VSRE: Very Short Reply Expected) VSRE is an abbreviation of "Very Short Reply Expected". VSRE是“ Very Short Reply Expected”的缩写。 It is an expression, which is commonly used in the Gmail platform. It is writte…

rofl用什么播放_ROFL的完整形式是什么?

rofl用什么播放ROFL:笑在地板上滚动 (ROFL: Rolling On Floor Laughing) ROFL is an abbreviation of Rolling on Floor Laughing. ROFL is a very trendy internet slang between youngsters and used in text messaging, instant messaging, chatting, and social…

为什么只有根桥发送bpdu_BPDU的完整形式是什么?

为什么只有根桥发送bpduBPDU:网桥协议数据单元 (BPDU: Bridge Protocol Data Unit) BPDU is an abbreviation of the "Bridge Protocol Data Unit". BPDU是“网桥协议数据单元”的缩写 。 It is a data message in the form of a frame that used to exc…

什么叫穷举法?

穷举法的基本思想是根据题目的部分条件确定答案的大致范围,并在此范围内对所有可能的情况逐一验证,直到全部情况验证完毕。若某个情况验证符合题目的全部条件,则为本问题的一个解;若全部情况验证后都不符合题目的全部条件&#xf…