文本分析工具 数据科学
The Data Scientist is the "Sexiest job of 21 Century", by Harvard Business Review, however, what specifically will a data Scientist do, what tools do they use?
数据科学家是《哈佛商业评论》(Harvard Business Review)所说的“ 21世纪最勤奋的工作”,但是,数据科学家将具体做什么工作,他们将使用哪些工具?
Data Science as a profession will be outlined as people operating and experimenting with information to answer relevant data-related inquiries and building and deploying scalable models that support the data. They use heaps of technical tools to investigate data, build models and cite their observations.
数据科学作为一种专业将被概述为人们在操作和尝试信息以回答与数据相关的相关查询,以及建立和部署支持数据的可伸缩模型。 他们使用大量技术工具来调查数据,建立模型并引用他们的观察结果。
Here could be a list of them:
这可能是它们的列表:
1. Git / GitHub (1. Git/GitHub)
A versioning system, lowlife and GitHub are well widespread all told domains involving open supply comes, collaborations and maintaining code. It's a vastly widespread tool employed by Data Scientists to preserve their findings and code blocks. GitHub has additionally been termed as your "Digital Resume" for the very fact recruiters are analyzing a person’s skills.on GitHub.
一个版本控制系统,lowlife和GitHub广为人知,涉及开放供应,协作和代码维护领域。 它是数据科学家用来保护其发现和代码块的一种广泛使用的工具。 由于招聘人员正在分析某人的技能,因此GitHub还被称为您的“ 数字简历 ”。在GitHub上。
2.编程语言界面 (2. Programming Languages Interface)
Python, Spider, Subline, Jupyter Notebooks for Julia, R, RStudio, PyCharm, Notepad++, Colab by Google and several other IDE and Code-writing platforms are an awfully widespread tool employed by Data Scientists.
适用于Julia,R,RStudio,PyCharm,Notepad ++,Google的Colab以及其他几种IDE和代码编写平台的Python,Spider,Subline,Jupyter Notebooks是数据科学家广泛使用的工具。
3. Orange和IBM Watson (3. Orange and IBM Watson)
Orange, IBM Watson and lots of different automatic Machine Learning design building frameworks are a handy tool for Data Scientists and Machine Learning Engineers to experiment with different models and to create extremely scalable renewable Machine Learning architecture.
Orange,IBM Watson和许多不同的自动机器学习设计构建框架对于数据科学家和机器学习工程师来说都是一种方便的工具,可以尝试使用不同的模型并创建可扩展的可扩展机器学习架构。
4. D3.js和Tableau (4. D3.js and Tableau)
Analytics is an integral part of the data Science advancement and understanding the data via visualization makes a data scientist capable of responsive most data-driven queries from pure observations. For this, D3.js and Tableau have established to be an excellent catalyst particularly within the field of Business Analytics. Honorable mention additionally goes to Excel and PowerBI.
分析是数据科学进步不可或缺的一部分,通过可视化了解数据使数据科学家能够响应来自纯观测值的大多数数据驱动的查询。 为此,D3.js和Tableau已经确立了成为极好的催化剂,特别是在业务分析领域。 值得一提的是Excel和PowerBI。
5. Hadoop,Mahout,Apache,Hive和Pig (5. Hadoop, Mahout, Apache, Hive, and Pig)
After the appearance of BigData, many frameworks are developed to handle vast streams of information, to investigate it and build models on that. Whereas Hadoop is extremely widespread for its distributed filing system referred to as HDFS(Hadoop Distributed File System), Apache and Driver for machine learning incorporation and Hive and Pig for quicker huge information integration; these are extremely powerful and favored tools employed by Data Scientists.
BigData出现之后,开发了许多框架来处理大量信息流,进行研究并在此基础上建立模型。 Hadoop因其称为HDFS(Hadoop分布式文件系统)的分布式文件系统而极为广泛,而Apache和用于机器学习合并的驱动程序,以及Hive和Pig用于更快地进行大规模信息集成; 这些是数据科学家使用的极其强大且受人欢迎的工具。
6. NoSql,MongoDB,Cassandra,MySQL (6. NoSql, MongoDB, Cassandra, MySQL)
SQL increasing to Structured source language is an integrated part of the direction that falls underneath the primary quarter of the advancement of information Science. Whereas MySQL has been the selection of veterans, MongoDB has picked up some serious pace and has established to be extremely used tools by Data Scientists.
增加到结构化源语言SQL是该方向的一个组成部分,该方向属于信息科学发展的主要部分。 MySQL是退伍军人的首选,而MongoDB已经取得了一些认真的进展,并已被数据科学家确定为极为有用的工具。
7.编程语言的软件包/模块 (7. Packages/Modules of Programming Languages)
Packages in several programming languages are a crucial side in writing easy, reusable and economical code. In Python, packages like pandas, NumPy, Scipy, matplolib, bokeh, seaborn, stats model, collections, sci-kit-learn, urllib, beautifulsoup and lots of additional are terribly ordinarily employed by Data Scientists. Similarly, in R, tidy, ggplot2, etc., are notable mentions.
几种编程语言的程序包在编写简单,可重用和经济的代码方面至关重要。 在Python中,数据科学家通常会严格地使用pandas,NumPy,Scipy,matplolib,bokeh,seaborn,stats模型,集合,sci-kit-learn,urllib,beautifulsoup等软件包。 同样,在R中,值得注意的是tidy,ggplot2等。
翻译自: https://www.includehelp.com/data-science/tools-for-data-science.aspx
文本分析工具 数据科学