顶级数据恢复_顶级R数据科学图书馆

顶级数据恢复

Data science is the discipline of making data useful

数据科学是使数据有用的学科

When we talk about the top programming language for Data Science, we often find Python to be the best fit for the topic. Sure, Python is undoubtedly an excellent choice for a vast majority of Data Science-centric tasks, but there’s another programming language that was built specifically to provide superior number-crunching capabilities for Data Science, and that is R.

当我们谈论数据科学的顶级编程语言时 ,我们经常发现Python最适合该主题。 当然,对于绝大多数以数据科学为中心的任务,Python无疑是一个不错的选择,但是还有另一种专门为数据科学提供出色的数字运算功能的编程语言, 那就是R。

In addition to providing robust statistical computing, R offers a huge collection, over 16 thousand to be exact, of highly resourceful libraries, catering to the needs of Data Scientists, Data Miners, and Statisticians alike. Further, in this article, we will shed some light on a handful of top R libraries for Data Science.

除了提供强大的统计计算功能外,R还提供了大量的资源丰富的库 (准确地说是一万六千多个),可以满足数据科学家,数据挖掘者和统计学家的需求。 此外,在本文中,我们将阐明一些用于数据科学的顶级R库。

最佳R数据科学图书馆 (Best R Libraries for Data Science)

R is extremely popular among Data Miners and Statisticians, and part of the reason is the extensive range of libraries that comes with R. These tools and functions can simplify statistical tasks to a great extent, making tasks such as data manipulation, visualization, web crawling, Machine Learning and more, a breeze. Some of the libraries have been briefly explained below:

R在数据挖掘者和统计学家中非常受欢迎,部分原因是R附带的大量库 。这些工具和功能可以在很大程度上简化统计任务 ,从而完成诸如数据操作,可视化,Web爬网等任务,机器学习等等,轻而易举。 下面简要说明了一些库:

1. dplyr (1. dplyr)

The dplyr package, also known as the grammar of data manipulation, essentially provides frequently used tools and functions for data manipulation, that includes the following functions:

dplyr软件包 (也称为数据操作语法)本质上提供了用于数据操作的常用工具和功能 ,其中包括以下功能:

  • filter(): for filtering your data based on the criteria

    filter():用于根据条件过滤数据

  • mutate(): to add new variables which will act as functions of existing variables

    mutate():添加将充当现有变量功能的新变量

  • select(): for selecting variables based on the names

    select():根据名称选择变量

  • summarise(): helps summarise the data from multiple values

    summarise():有助于汇总来自多个值的数据

  • arrange(): for rearranging the ordering of the rows

    range():用于重新排列行的顺序

  • Additionally, you can use the group_by() function, which can return the results grouped according to the requirements. If you’re keen on checking out the dplyr package, you can either get it from the tidyverse or install the package directly with the command “install.packages(“dplyr”).

    此外,您可以使用group_by()函数,函数可以返回根据要求分组的结果。 如果您热衷于签出dplyr软件包,则可以从tidyverse获取它。 或使用命令“ install.packages(“ dplyr”)”直接安装软件包

2.提迪尔 (2. tidyr)

tidyr is one of the core packages in the Tidyverse ecosystem, and as the name suggests, it is used to tidy up messy data. Now, if you’re wondering what tidy data is, let me clear it for you. A tidy data indicates that every column is variable, each row is an observation, and each cell is a singular value.

tidyrTidyverse 生态系统的核心软件包之一,顾名思义,它用于整理凌乱的数据 。 现在,如果您想知道什么是整洁的数据,请让我为您清除。 整洁的数据表示每一列都是变量,每一行都是观察值,每个单元格都是一个奇异值。

According to tidyr, tidy data is a way of storing the data that is to be used throughout the tidyverse and can help you save time and be more productive with your analysis. You can get the package from tidyverse or by the following command “install.packages(“tidyr”)”.

根据tidyr的说法,整齐的数据是一种存储将在整个tidyverse中使用的数据的方式,它可以帮助您节省时间并提高分析效率。 您可以从tidyverse或通过以下命令“ install.packages(“ tidyr”)”获取软件包

3. ggplot2 (3. ggplot2)

ggplot2 is among the top R libraries for data visualization and is actively being used by thousands of users around the world to create compelling charts, graphs, and plots. The reason behind this popularity is ggplot2 was created to simplify the visualization process by taking minimal input from the developer, such as the data to visualize, the style, and the primitives to use while leaving the rest onto the library.

ggplot2用于数据可视化的顶级R库之一 ,世界各地成千上万的用户积极使用ggplot2创建引人注目的图表,图形和绘图 。 之所以如此受欢迎,是因为创建了ggplot2来简化可视化过程,方法是从开发人员获取最少的输入,例如要可视化的数据,样式和要使用的基元,而将其余的保留在库中。

The result is a graph that effortlessly presents complex statistics for instant visualizations. If you’re looking to add more customizability to your charts, you can use IDEs like RStudio for more granular control. You can get your hands on ggplot2 via the tidyverse collection or by using the standalone library via the command “install.packages(“ggplot2”)”.

结果是一个图形,该图形毫不费力地呈现了复杂的统计数据,以实现即时可视化。 如果您想为图表添加更多可定制性,则可以使用以下IDE: RStudio提供更精细的控制。 您可以通过tidyverse集合或使用独立库(通过命令“ install.packages(“ ggplot2”))使用ggplot2。

Read this R documentation to know about ggplot2 functions-

阅读此R文档以了解ggplot2函数-

4.润滑 (4. lubridate)

R is an excellent programming language for Data Science, but there are certain areas where R may feel incomplete. One such area is the handling of date and time. For anyone extensively working with date and time in R, may find it’s built-in capabilities cumbersome.

R是Data Science的出色编程语言,但在某些方面R可能感觉不完整。 这样的领域之一就是日期和时间的处理。 对于在R中广泛使用日期和时间的人,可能会发现它的内置功能很麻烦。

To overcome this, we have a handy package called lubridate. The package not only handles the standard date and time in R, but also offers additional enhancements such as time periods, daylight savings times, leap days, supports various time zones, fast time parsing, and many helper functions. Should your project require you to work with time and date, you can get the lubridate package from tidyverse or install just the package with “install.packages(“lubridate”)” command.

为了克服这个问题,我们有一个名为lubridate的便捷软件包 该软件包不仅可以处理R中的标准日期和时间,而且还提供其他增强功能,例如时间段,夏令时,leap日,支持各种时区,快速时间解析以及许多辅助功能。 如果您的项目要求您使用时间和日期,则可以从tidyverse获取lubridate软件包。 或者使用“ install.packages(“ lubridate”)”命令仅安装软件包

Read the documentation here:

在此处阅读文档:

5.格子 (5. lattice)

lattice is another elegant yet powerful data visualization library focussed on multivariate data. What makes this library special, is that apart from handling the regular visualizations, lattice also comes prepared with support for nonstandard situations and requirements. Due to being the practical implementation of Trellis graphics for R, it allows you to create Trellis graphs and even offers options to tune the graphs according to your requirements. lattice comes with R by default, but there’s an advanced version of lattice called latticeExtra, which might come in handy in case you want to extend the core features provided by the lattice.

是另一个优雅而强大的数据可视化库,它专注于多元数据。 这个库之所以与众不同,是因为除了处理常规的可视化之外 ,格网还准备了对非标准情况和要求的支持。 由于是R的Trellis图形的实际实现,因此它允许您创建Trellis图形 ,甚至提供根据您的要求调整图形的选项。 默认情况下,R附带有lattice,但是有一个高级版本的网格称为gridExtra ,如果您想扩展该网格提供的核心功能,可能会派上用场。

6.毫升 (6. mlr)

The Machine Learning in R(mlr), is a library that was released in 2013 and was updated to mlr3 with newer techniques, a better architecture, and core design in 2019. As of now, the library provides a framework to address several classifications, regression, support vector machines, and many other Machine Learning activities.

R(mlr)中机器学习(Machine Learning in R(mlr))是一个库,于2013年发布,并于2019年通过更新的技术,更好的体系结构和核心设计更新为mlr3 。 到目前为止,该库提供了一个框架,用于处理几种分类,回归,支持向量机以及许多其他机器学习活动。

mlr3 is targeted towards Machine Learning practitioners and researchers to facilitate the benchmarking and deployment of various Machine Learning algorithms without much hassle. For those looking to extend and even combine the existing learners and fine-tune the best technique for a task, will find mlr3 to be a perfect option. mlr3 can be installed using the command “install.packages(“mlr3”)”.

mlr3面向机器学习从业者和研究人员,旨在帮助轻松地对各种机器学习算法进行基准测试和部署。 对于那些希望扩展甚至结合现有学习者并微调最佳技术来完成某项任务的人来说,mlr3是理想的选择。 可以使用命令“ install.packages(“ mlr3”)”安装mlr3。

The wide range of functions are mentioned here —

这里提到了广泛的功能-

7. 插入号 (7. caret)

Short for Classification And REgression Training, the caret library provides several functions to optimize the process of model training for tricky regression and classification problems. caret comes with several additional tools and functions for tasks like data splitting, variable importance estimation, feature selection, pre-processing, and many more. With caret, you can also measure the performance of the models, and even fine-tune the model behavior by using various parameters like tuneLength or tuneGrid according to your requirements. The package itself is easy to use and only loads the necessary components as it goes. The library can be installed with the command “install.packages(“caret”)”.

插入式 分类和回归训练缩写 该库提供了一些功能来优化棘手的回归和分类问题的模型训练过程。 插入符还提供了一些其他工具和功能来执行任务,例如数据拆分,变量重要性估计,功能选择,预处理等等。 使用插入符号,您还可以测量模型的性能,甚至根据需要使用各种参数(如tuneLength或tuneGrid)来微调模型行为。 程序包本身易于使用,并且仅在运行时加载必要的组件。 可以使用命令“ install.packages(“ caret”)”安装该库

8. 随从 (8. esquisse)

esquisse is not a library per se, but an addin for the powerful data visualization library ggplot2. You might be wondering why would you need this with ggplot2, let me clear it for you. ggplot2 is already smart enough, but if you need an additional layer of intuitiveness for your visualizations, esquisse is the right way to go. esquisse allows you to simply drag and drop the required data, choose the desired customization options, and there you have it, a tailored plot built within a short period and ready to export to your application of choice. With esquisse, you can create visualizations such as bar plots, histograms, scatter plots, sf objects. You can add esquisse to your environment using “install.packages(“esquisse”)”.

esquisse本身并不是一个库,而是强大的数据可视化库ggplot2的插件。 您可能想知道为什么ggplot2需要它,让我为您清除它。 ggplot2已经足够聪明了,但是如果您需要可视化的附加直观性,那么使用esquisse是正确的方法。 esquisse允许您简单地拖放所需的数据,选择所需的自定义选项,就可以在短时间内构建定制的绘图,并准备将其导出到所选的应用程序中。 使用esquisse,您可以创建可视化效果,例如条形图,直方图,散点图,sf对象 。 您可以使用“ install.packages(“ esquisse”)”将esquisse添加到您的环境中

9. 有光泽 (9. shiny)

shiny is a web application framework from RStudio that allows the developers to create interactive web applications using R with minimal web development background. With shiny, you can build web pages, interactive visualizations, dashboards, and even embed widgets on R documents. shiny can also be easily extended with CSS themes, JavaScript actions, and htmlwidgets for added customization. It comes with a host of attractive built-in widgets for presenting plots, tables, and output of R objects, and whatever you code in shiny goes live the same instant, eliminating those annoying frequent page refreshes. If you’re sold on the features and want to give it a shot, you can get shiny using the command “install.packages(“shiny”)”.

ShinyRStudio的Web应用程序框架,允许开发人员使用R在最小的Web开发背景下创建交互式Web应用程序。 有了光泽,您可以构建网页,交互式可视化效果,仪表板,甚至将小部件嵌入 R文档中。 还可使用CSS主题,JavaScript操作和htmlwidget轻松扩展Shiny,以添加自定义功能。 它带有许多吸引人的内置小部件,用于显示R对象的图,表和输出,无论您用闪亮的代码进行编码,都可以在同一瞬间生效,从而消除了那些烦人的频繁页面刷新。 如果您已购买这些功能部件并想试一试,则可以使用“ install.packages(“ shiny”)”命令获得光泽

10. 爬行者 (10. Rcrawler)

If you’re looking for a tool to scrape data off websites and that too in an understandable format, look no further, Rcrawler is the right option for you. With Rcrawler’s powerful web crawling, data scraping, and data mining capabilities, you can not only crawl through websites and scrape data, but also analyze the network structure of any website, including its internal and external hyperlinks. In case you’re wondering why not use rvest, the Rcrawler package is a step up from rvest as it goes through all the pages on a website and extracts the data, which can be extremely helpful while trying to gather all the information from one source and in one go. The package can be installed with the command “install.packages(“Rcrawler”)”.

如果您正在寻找一种可以从网站抓取数据的工具,并且格式也是可以理解的, 那就别无所求Rcrawler是您的正确选择。 借助Rcrawler强大的Web爬网,数据抓取和数据挖掘功能 ,您不仅可以爬网网站并抓取数据,还可以分析任何网站的网络结构,包括其内部和外部超链接。 如果您想知道为什么不使用rvest ,那么Rcrawler程序包会比rvest更高,因为它会遍历网站上的所有页面并提取数据,这在尝试从一个来源收集所有信息时非常有帮助一口气。 可以使用命令“ install.packages(“ Rcrawler”)”安装该软件包

11. DT (11. DT)

The DT package acts as a wrapper of the JavaScript library called DataTables, for R. DT allows you to transform the data in your R matrix into an interactive table on your HTML page, which facilitates easy searching, sorting, and filtering of data. The package works by letting the main function i.e, the datatable() function, create an HTML widget for the R objects. DT allows further fine-tuning via the “options” arguments and even some additional customizability to your tables, all of this without going deep into the coding. The DT package can be installed using the command “install.packages(“DT”)”.

DT包充当JavaScript库DataTables的包装,用于R。DT允许您将R矩阵中的数据转换为HTML页面上的交互式表,从而方便了数据的搜索,排序和过滤。 该包通过让主要功能(即datatable()函数)为R对象创建HTML小部件来工作。 DT允许通过“选项”参数进行进一步的微调,甚至可以对表进行一些其他自定义,而所有这些都无需深入编码。 可以使用命令“ install.packages(“ DT”)”安装DT软件包。

12. 密谋 (12. plotly)

If you want to create interactive visualizations that steal the show, plotly would be perfect for you. With Plotly, you can create stunning, publication-worthy visualizations from a diverse collection of charts and graphs, such as scatter and line plots, bar charts, pie charts, histograms, heatmaps, contour plots, time series, you name it and plotly can make it. Built on top of the plotly.js library, plotly visualizations can also be displayed in web applications via Dash, in Jupyter Notebooks, or saved as HTML files. If you’re interested in trying out the package, you can install it using the command “install.packages(“plotly”)”.

如果您想创建可以窃取节目的交互式可视化效果,那么对于您而言, plotly非常适合。 使用Plotly,您可以从各种图表和图形中创建令人惊叹的,值得发布的可视化效果,例如散点图和折线图,条形图,饼图,直方图,热图,等高线图,时间序列 ,您可以为其命名并进行绘图做了。 构建在plotly.js库的顶部,绘制可视化效果还可以通过Dash在Jupyter Notebooks中显示在Web应用程序中,或另存为HTML文件。 如果您想试用该软件包,可以使用命令“ install.packages(“ plotly”)”进行安装。

其他值得R库- (Other Worth R Libraries —)

  • BioConductor

    生物导体
  • Knitr

    针织衫
  • Janitor

    看门人
  • randomForest

    randomForest
  • e1071

    e1071
  • stringr

    纵梁
  • data.table

    数据表
  • RMarkdown

    RMarkdown
  • Rvest

    Rvest

结论 (Conclusion)

Throughout this article, we covered some of the top R libraries covering common Data Science tasks, such as visualization, grammar, Machine Learning model training, and optimization. We know that this is not an extensive list and by no means covers the entirety of the vast ecosystem of libraries R has. CRAN, the repository for all things R, has thousands of equally capable and resourceful libraries for your specific needs with detailed information and documentation, should you ever need to find a library, we highly recommend you give CRAN a shot.

在本文中,我们涵盖了一些顶级R库,这些库涵盖了常见的数据科学任务,例如可视化,语法,机器学习模型训练和优化。 我们知道这不是一个广泛的清单,并且绝不涵盖R拥有的巨大的图书馆生态系统。 CRAN是所有R的存储库,拥有成千上万个功能相同且资源丰富的库,可满足您的特定需求,并提供详细的信息和文档,如果您需要查找库,我们强烈建议您尝试一下CRAN。

Note: To eliminate problems of different kinds, I want to alert you to the fact this article represents just my personal opinion I want to share, and you possess every right to disagree with it. If I’ve missed out any important library then do let me know in the comments section.

注意: 为消除各种问题,我想提醒您以下事实,即本文仅代表我要分享的个人观点,您拥有与此不同意的一切权利。 如果我错过了任何重要的库,请在评论部分让我知道。

更有趣的读物— (More Interesting Readings —)

I hope you’ve found this article useful! Below are some interesting readings hope you would like them too —

希望本文对您有所帮助! 以下是一些有趣的读物,希望您也喜欢它们—

About Author

关于作者

Claire D. is a Content Crafter and Marketer at Digitalogya tech sourcing and custom matchmaking marketplace that connects people with pre-screened & top-notch developers and designers based on their specific needs across the globe. Connect with Digitalogy on Linkedin, Twitter, Instagram.

克莱尔·D Digitalogy 的Content Crafter and Marketinger ,这 是一个技术采购和自定义配对市场,可根据人们在全球的特定需求,将他们与预先筛选和一流的开发商和设计师联系起来。 Linkedin Twitter Instagram Digitalogy联系

翻译自: https://towardsdatascience.com/top-r-libraries-for-data-science-29b4e9f4907c

顶级数据恢复

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/392201.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

xp系统oracle数据库,Oracle10g 数据库的安装基于windowsXP

Oracle的安装一、首先去官网下载自身系统相对应的数据库软件http://www.oracle.com/cn/index.htmlOracle软件本身是免费的,个人用途完全没关系,商业用途并被发现才会被Oracle所要求收费,收费买的不是软件,而是授权。何谓授权&…

了解React Native中的不同JavaScript环境

by Khoa Pham通过Khoa Pham 了解React Native中的不同JavaScript环境 (Get to know different JavaScript environments in React Native) React Native can be very easy to get started with, and then at some point problems occur and we need to dive deep into it.Reac…

分布与并行计算—生命游戏(Java)

生命游戏其实是一个零玩家游戏,它包括一个二维矩形世界,这个世界中的每个方格居住着一个活着的或死了的细胞。一个细胞在下一个时刻生死取决于相邻八个方格中活着的或死了的细胞的数量。如果相邻方格活着的细胞数量过多,这个细胞会因为资源匮…

正确认识 Vista 激活期限

当我们在安装 Vista 时,可以不输入序列号进行安装,这和以往的操作系统安装有所不同,我们不必再为安装系统时找不到我们的序列号标签而发愁。如果不输入序列号而继续安装系统,那么系统将提示我们有30天的激活期限!这里的…

Oracle使用hs odbc连接mssql2008

1.创建odbc 2.在 product\11.2.0\dbhome_1\hs\admin\ 下拷贝initdg4odbc,把名字改为initcrmsql(init所建odbc的名称) HS_FDS_CONNECT_INFO crmsql #odbc名称 HS_FDS_TRACE_LEVEL 0 HS_FDS_RECOVERY_ACCOUNTsa #要连接的数据库名称 HS_FDS_RECOVERY_PWD…

oracle修改物化视图字段,获取物化视图字段的修改矢量(一)

当表建立了物化视图日志之后,表的DML修改会被记录到物化视图日志中,而物化视图日志则包含了一个修改矢量,来记录哪个列被修改。在文章列的修改矢量可以通过2的N次方来获得,也就是POWER(2, N)。而N的值,就是列的位置。但…

聚合 数据处理_R中聚合的简介:强大的数据处理工具

聚合 数据处理by Satyam Singh Chauhan萨蒂扬辛格乔汉(Satyam Singh Chauhan) R中聚合的简介:强大的数据处理工具 (An introduction to aggregates in R: a powerful tool for playing with data) Data Visualization is not just about colors and graphs. It’s …

大数据 notebook_Dockerless Notebook:数据科学期待已久的未来

大数据 notebookData science is hard. Data scientists spend hours figuring out how to install that Python package on their laptops. Data scientists read many pages of Google search results to connect to that database. Data scientists write a detailed docume…

【NGN学习笔记】6 代理(Proxy)和背靠背用户代理(B2BUA)

1. 什么是Proxy模式? 按照RFC3261中的定义,Proxy服务器是一个中间的实体,它本身即作为客户端也作为服务端,为其他客户端提供请求的转发服务。一个Proxy服务器首先提供的是路由服务,也就是说保证请求被发到更加”靠近”…

分布与并行计算—并行计算π(Java)

并行计算π public class pithread extends Thread {private static long mini1000000000;private long start,diff;double sum0;double cur1/(double)mini;public pithread(long start,long diff) {this.startstart;this.diffdiff;}Overridepublic void run() {long istart;f…

linux复制文件跳过相同,Linux cp指令,怎么跳过相同的文件

1、使用cp命令的-n参数即可跳过相同的文件 。2、cp命令使用详解:1)、用法:cp [选项]... [-T] 源文件 目标文件或:cp [选项]... 源文件... 目录或:cp [选项]... -t 目录 源文件...将源文件复制至目标文件,或将多个源文件…

eclipse类自动生成注释

1.创建新类时自动生成注释 window->preference->java->code styple->code template 当你选择到这部的时候就会看见右侧有一个框显示出code这个选项,你点开这个选项,点一下他下面的New …

rman恢复

--建表create table sales( product_id number(10), sales_date date, sales_cost number(10,2), status varchar2(20));--插数据insert into sales values (1,sysdate-90,18.23,inactive);commit; --启用rman做全库备份 运行D:\autobackup\rman\backup_orcl.bat 生成…

微软大数据_我对Microsoft的数据科学采访

微软大数据Microsoft was one of the software companies that come to hire interns at my university for 2021 summers. This year, it was the first time that Microsoft offered any Data Science Internship for pre-final year undergraduate students.微软是到2021年夏…

再次检查打印机名称 并确保_我们的公司名称糟透了。 这是确保您没有的方法。...

再次检查打印机名称 并确保by Dawid Cedrych通过戴维德塞德里奇 我们的公司名称糟透了。 这是确保您没有的方法。 (Our company name sucked. Here’s how to make sure yours doesn’t.) It is harder than one might think to find a good business name. Paul Graham of Y …

linux中文本查找命令,Linux常用的文本查找命令 find

一、常用的文本查找命令grep、egrep命令grep:文本搜索工具,根据用户指定的文本模式对目标文件进行逐行搜索,先是能够被模式匹配到的行。后面跟正则表达式,让grep工具相当强大。-E之后还支持扩展的正则表达式。# grep [options] …

分布与并行计算—日志挖掘(Java)

日志挖掘——处理数据、计费统计 1、读取附件中日志的内容,找出自己学号停车场中对应的进出车次数(in/out配对的记录数,1条in、1条out,视为一个车次,本日志中in/out为一一对应,不存在缺失某条进或出记录&a…

《人人都该买保险》读书笔记

内容目录: 1.你必须知道的保险知识 2.家庭理财的必需品 3.保障型保险产品 4.储蓄型保险产品 5.投资型保险产品 6.明明白白买保险 现在我所在的公司Manulife是一家金融保险公司,主打业务就是保险,因此我需要熟悉一下保险的基础知识&#xff0c…

Linux下查看txt文档

当我们在使用Window操作系统的时候,可能使用最多的文本格式就是txt了,可是当我们将Window平台下的txt文本文档复制到Linux平台下查看时,发现原来的中文所有变成了乱码。没错, 引起这个结果的原因就是两个平台下,编辑器…

如何击败腾讯_击败股市

如何击败腾讯个人项目 (Personal Proyects) Note from Towards Data Science’s editors: While we allow independent authors to publish articles in accordance with our rules and guidelines, we do not endorse each author’s contribution. You should not rely on an…