数据科学 python
Choosing the right programming language when taking on a new project is perhaps one of the most daunting decisions programmers often make.
在进行新项目时选择正确的编程语言可能是程序员经常做出的最艰巨的决定之一。
Python and R are no doubt among the top options while picking a programming language for a Data Science project. Over the years, both R and Python have garnered a lot of positive feedback from developers and users for a variety of modern tasks. It might seem hard at first to decide which one is better among the two but let me tell you something, even though they are similar in certain areas, such as being free and open-source, they both can offer some unique and game-changing features.
在为数据科学项目选择编程语言时,毫无疑问, Python 和 R是首选。 多年来,R和Python都已从开发人员和用户那里获得了许多针对各种现代任务的积极反馈 。 乍看起来似乎很难决定哪一个更好,但是让我告诉您一些事情,即使它们在某些领域是相似的,例如免费和开源 ,它们都可以提供一些独特且改变游戏规则的东西。特点 。
Some examples of sub-communities using Python/R:
使用Python / R的子社区的一些示例:
- Deep Learning 深度学习
- Machine Learning 机器学习
- Advanced Analytics 进阶分析
- Predictive Analytics 预测分析
- Statistics 统计
- Exploration and Data Analysis 探索与数据分析
- Academic Scientific Research 学术科研
With the help of this article, we would like to shed some light on the features separating Python from R.
在本文的帮助下,我们希望阐明一些将Python与R分开的功能。
Python和R的介绍 (Introduction of Python and R)
●Python (● Python)
Python is an experiment in how much freedom programmers need. Too much freedom and nobody can read another’s code; too little and expressiveness is endangered.
Python是程序员需要多少自由度的实验。 太多的自由,没人能读懂别人的密码。 太少,表现力受到威胁。
- Guido van Rossum
-Guido van Rossum
Python has been around since 1989 as a high-level general-purpose programming language, which was built to emphasize code readability. Python encourages developers to write clear and logical code for projects of all scales. Built to be extremely extensible, Python comes with hundreds of libraries that extend its core functionality while its open-source nature allows developers to freely build and share custom libraries.
自1989年以来, Python就已经成为一种高级通用编程语言 ,其目的是强调代码的可读性。 Python鼓励开发人员为各种规模的项目编写清晰而逻辑的代码 。 Python的构建具有极强的可扩展性,带有数百个库 扩展了其核心功能,同时其开源特性允许开发人员自由构建和共享自定义库。
Python also serves as an exceptional tool for Data Science, Machine Learning, and Deep Learning due to the availability of several packages and libraries, such as TensorFlow, Pandas, Keras, NumPy, PyTorch, and more.
Python也作为一个特殊的数据科学工具 , 机器学习和深度学习由于几个包和库,如可用性TensorFlow , 熊猫 , Keras , NumPy的 , PyTorch ,等等。
优点 (Advantages)
● Hugely popular among developers due to its easy to use nature.
●由于其易用性,在开发人员中非常受欢迎。
● Supports multiple programming paradigms, such as object-oriented and procedural.
●支持多种编程范例,例如面向对象和过程。
● Takes comparatively less execution time than others.
●比其他方法花费更少的执行时间。
● Has a vast collection of third-party libraries.
●拥有大量的第三方库。
缺点 (Disadvantages)
● Python may lack alternatives to some of the popular libraries in R.
●Python可能缺少R中某些流行库的替代方法。
● Dynamic typing can sometimes make it difficult to track faults properly.
●动态类型化有时会导致难以正确跟踪故障。
●R (●R)
First launched in 1993 by Ross Ihaka and Robert Gentleman, R was built to put unmatched statistical computing and graphical capabilities in the hands of the developers, statisticians, analysts, and data miners. It comes with a command-line interface.
由Ross Ihaka于1993年首次推出 罗伯特·金特尔曼(Robert Gentleman, R)的创建是为了将无与伦比的统计计算和图形功能提供给开发人员,统计学家,分析师和数据挖掘者。 它带有命令行界面。
When it comes to Data Science, many researchers still prefer R over Python due to its powerful statistics-oriented nature and interactive visualization capabilities. Also, using R’s frameworks, you can create dashboards and interactive visualizations for actionable insights.
在数据科学方面,由于其强大的面向统计的特性和交互式可视化功能 ,许多研究人员仍然更喜欢R而不是Python 。 另外,使用R的框架,您可以创建仪表板和交互式可视化效果,以获取可行的见解。
R being a procedural language allows the developers to break complex portions of the problem into smaller chunks to make problem-solving easier.
R是一种过程语言,使开发人员可以将问题的复杂部分分解为较小的块,从而更轻松地解决问题。
优点 (Advantages)
● Comes equipped with a robust set of analysis tools.
●配备了一组强大的分析工具。
● Has a wide range of packages for enhancing its core behavior and capabilities.
●具有广泛的软件包,可增强其核心行为和功能。
● GUIs like RStudio IDE and Jupyter can add a graphical interface to an already powerful tool while adding more features such as integrated help, code debugger, code completion.
●RStudio IDE和Jupyter之类的GUI可以在已经强大的工具中添加图形界面,同时添加更多功能,例如集成的帮助,代码调试器,代码完成。
● Allows for powerful data import options, including files, such as Microsoft Excel.
●提供强大的数据导入选项,包括Microsoft Excel等文件。
● Supports various third-party packages for extensibility.
●支持各种第三方程序包以进行扩展。
缺点 (Disadvantages)
● R is difficult to learn and can make things go down if not used carefully.
●R很难学习,如果使用不当,可能会使事情恶化。
● Lack of proper documentation for some libraries can waste the developer’s efforts.
●对于某些库而言,缺少适当的文档会浪费开发人员的精力。
● Relatively slower performer than Python.
●与Python相比,性能相对较慢。
Python vs R-详细比较 (Python vs R— Detailed Comparison)
Choosing one language over another for your next Data Science project can be challenging, especially when both the languages can carry out the same tasks. Now that the introduction is out of the way, we will cover the comparison between both the languages in the upcoming section, keeping in mind a set of notable features that most developers will find extremely helpful.
为您的下一个数据科学项目选择一种语言而不是另一种语言可能具有挑战性,尤其是当两种语言都可以执行相同的任务时。 既然介绍已经结束,我们将在下一部分中介绍这两种语言之间的比较,同时牢记大多数开发人员会发现非常有用的一系列显着功能。
1.数据收集的差异 (1. Differences in Data Collection)
To facilitate data collection, Python can support a variety of commonly used data formats, such as CSVs, JSON files, and even SQL files. Another widely used source of data in Python among Data Scientists is the datasets. Python can also allow you to extract data directly from the internet with the help of suitable libraries.
为了促进数据收集,Python可以支持各种常用的数据格式,例如CSV,JSON文件甚至SQL文件 。 在数据科学家中,Python中另一个广泛使用的数据源是数据集 。 Python还可以让您借助合适的库直接从Internet提取数据。
Although not as versatile as Python, R allows you to import data via Excel, CSV, and text files. Files built using packages such as Minitab or SPSS can also be turned into data frames for use in R. Packages such as Rvest and magrittr can help you scrape and clean the data from the web.
尽管R不如Python通用,但R允许您通过Excel,CSV和文本文件导入数据。 使用Minitab或SPSS等程序包构建的文件也可以转换为数据帧,以用于R。Rvest等程序包 magrittr可以帮助您从网络上抓取和清理数据。
2. 数据探索的差异 (2. Differences in Data Exploration)
Python’s various libraries can help you analyze structured and unstructured data very easily. Libraries such as pandas, NumPy, PyPI are undoubtedly among the best for data exploration. Pandas, for example, allows you to organize the data into data frames and makes cleaning simpler. Moreover, pandas can even hold a huge amount of data while offering additional benefits.
Python的各种库可以帮助您非常轻松地分析结构化和非结构化数据 。 诸如pandas,NumPy, PyPI之类的图书馆无疑是最适合数据探索的图书馆。 熊猫 ,例如,允许你来组织数据到数据帧 ,使清洁更简单。 此外,大熊猫甚至可以容纳大量数据,同时还能带来更多好处。
Built specifically for Data Exploration, R delivers exceptional results, as it was built specifically for statisticians and data miners. With R, you can apply a range of tests, and techniques, such as probability distributions, data mining on your data. R can perform data optimization, random number generation, signal processing, and even offers support for third-party libraries.
R是专为数据探索而构建的,它为统计人员和数据挖掘者特别构建,因此可提供出色的结果。 使用R,您可以在数据上应用一系列测试和技术,例如概率分布,数据挖掘 。 R可以执行数据优化,随机数生成,信号处理 ,甚至提供对第三方库的支持。
3. 数据可视化的差异 (3. Differences in Data Visualization)
With Python, you can create effective and customizable visualizations in the form of graphs and charts. Libraries like IPython and matplotlib exist to help developers and researchers create powerful and interactive visualizations. While the Python ecosystem does consist of more libraries, the most commonly used is matplotlib.
使用Python,您可以以图形和图表的形式创建有效且可自定义的可视化 。 像IPython和Matplotlib这样的库 可以帮助开发人员和研究人员创建强大的交互式可视化效果。 尽管Python生态系统确实包含更多的库,但最常用的是matplotlib。
On the other hand, R can offer advanced visualizations as it is among the core functions provided by the programming language. R comes with built-in support for many standard graphs, for even more complex visualizations, you can use libraries, such as ggplot2, Plotly, and Lattice.
另一方面,R可以提供高级可视化效果,因为它是编程语言提供的核心功能之一。 R内置了对许多标准图形的支持,对于更复杂的可视化,您可以使用库,例如 ggplot2 , Plotly 和 Lattice 。
4. 数据建模的差异 (4. Differences in Data Modeling)
For data modeling, Python provides several libraries that will cater to the desired modeling type. Say, for numerical modeling, Python provides its NumPy library, similarly, for scientific computing, we have SciPy. Various other libraries and techniques allow for more data modeling options in Python.
对于数据建模,Python提供了一些库,可以满足所需的建模类型。 假设,对于数值建模 ,Python提供了其NumPy库,同样, 对于科学计算 ,我们还有SciPy 。 其他各种库和技术也允许在Python中使用更多数据建模选项。
In R, you can do statistical modeling efficiently due to the robust statistical capabilities offered by the programming language. It comes with plenty of support packages to help you in statistical modeling, even for specific analyses, such as Poisson Distribution, Linear & Logistic Regression.
在R中,由于编程语言提供了强大的统计功能,因此可以有效地进行统计建模 。 它带有大量支持包,可帮助您进行统计建模,甚至用于特定分析,例如泊松分布,线性和逻辑回归。
5.表现 (5. Performance)
Performance is a critical aspect of any programming language, and it often becomes the prime reason for picking one language over the other. One of the key reasons why most programmers and even data scientists are beginning to prefer Python over R is due to its ability to rapidly perform most data science tasks with relative ease. Another area where Python outshines R is that it can perform comparatively faster. Other factors against R can include a lack of features, such as unit testing and insufficient code readability.
性能是任何编程语言的关键方面,并且通常成为选择一种语言而不是另一种语言的主要原因。 为什么大多数程序员甚至数据科学家开始偏爱Python而不是R的关键原因之一是由于它能够相对轻松地快速执行大多数数据科学任务。 Python胜过R的另一个方面是它可以相对更快地执行。 反对R的其他因素可能包括缺乏功能,例如单元测试和代码可读性不足。
Python Performance Tips —
Python性能提示-
https://wiki.python.org/moin/PythonSpeed/PerformanceTips
https://wiki.python.org/moin/PythonSpeed/PerformanceTips
https://stackify.com/20-simple-python-performance-tuning-tips/
https://stackify.com/20-simple-python-performance-tuning-tips/
6.图书馆 (6. Libraries)
When it comes to the packages and libraries provided by these programming languages, they both offer thousands of useful packages for almost every situation.
当谈到这些编程语言提供的软件包和库时,它们都为几乎每种情况提供了数千个有用的软件包。
PyPI hosts and manages Python’s packages, whereas R’s side of things are handled by CRAN. If you’re more interested in the numbers, Python has over 257 thousand packages, while CRAN has a little over 16 thousand. That’s a lot!
PyPI托管和管理Python的软件包,而R方面的事务由CRAN处理。 如果您对数字更感兴趣,Python拥有超过25.7万个软件包 ,而CRAN则有超过 1.6 万个 。 好多啊!
Although Python does offer more than 10 times the packages available for R, not all of them are useful for Data Science. One shouldn’t forget while reading those numbers that Python is a general-purpose programming language, whereas R isn’t.
尽管Python提供的R软件包的确超过10倍,但并不是所有软件包对Data Science都有用。 在阅读这些数字时,请不要忘记Python是一种通用编程语言,而R不是。
7.人气 (7. Popularity)
Both of the programming languages are fairly popular among developers and data scientists and are good options to add under their command. Python seems to be taking the lead here due to its general-purpose nature and the availability of several libraries focused around Data Science, but R is not far behind.
两种编程语言在开发人员和数据科学家中都相当流行,并且是在其命令下添加的不错的选择。 由于Python的通用性和几个专注于Data Science的库的可用性,Python似乎在这里处于领先地位,但是R紧随其后。
According to StackOverflow, Python is the fastest-growing major programming language.
根据StackOverflow的介绍,Python是增长最快的主要编程语言。
Several statisticians and data miners still prefer R for its powerful number-crunching and visualization capabilities. Moreover, R provides better control over data analysis due to its inclination towards statistical and numerical computing and its collection of libraries, providing more advanced and in-depth results to substantiate the claim.
一些统计人员和数据挖掘者仍然喜欢R,因为它具有强大的数字处理和可视化功能。 此外,由于R倾向于统计和数值计算及其库的收集,因此R对数据分析提供了更好的控制,从而提供了更高级和更深入的结果来证实该主张。
The programming language R continues to rise and is on schedule to become TIOBE’s programming language of the year 2020.
编程语言R持续增长,并有望成为IOBE的2020年编程语言。
8. 工作机会 (8. Job Opportunities)
Job opportunities in Data Science are on the rise, and statistics show that more jobs demand Python than R. Both the programming languages are much more needed now than ever due to the pace at which Data Science is growing.
数据科学领域的工作机会正在增加,统计数据表明, 与R相比,Python需要更多的工作 。 由于数据科学的发展速度,现在比以往任何时候都更需要这两种编程语言。
Python, being an all-rounder programming language, can be a solid overall choice since it can allow you to do software engineering, and provide a reputable entry point into Data Science. Whereas R will be a much better option if you are to focus on extracting valuable statistics within a short period, make beautiful visualizations that speak for the numbers, and create graphical interfaces for web applications.
Python是一种全面的编程语言,可以作为一个可靠的整体选择,因为它可以帮助您进行软件工程设计,并为您提供著名的数据科学切入点。 如果您要专注于在短时间内提取有价值的统计信息,进行漂亮的可视化表示数字,并为Web应用程序创建图形界面,则R是一个更好的选择。
9.社区 (9. Community)
A community offers support and guidance to the developers and one can say that it is the second most visited place by a developer, after the project code. It holds a significant value in quickly finding the root cause and solution to the problems at hand while offering dozens of useful tips.
社区向开发人员提供支持和指导 ,可以说它是开发人员访问量第二高的地方 ,仅次于项目代码。 它在快速找到问题的根本原因和解决方案的同时, 提供了许多有用的技巧 ,具有重要的价值。
When we talk about a programming language’s community, the first thing that comes to mind is its target users. Usually, it will include developers, but our case includes statisticians and data miners as well. Python is used by a diverse audience that includes applications of all sorts. R, on the other hand, is primarily used by enterprises and researchers chasing primarily statistics.
当我们谈论编程语言的社区时,首先想到的是它的目标用户。 通常,它将包括开发人员,但我们的案例还包括统计人员和数据挖掘人员。 Python被各种各样的读者所使用,其中包括各种应用程序。 另一方面,R主要由追求统计数据的企业和研究人员使用。
Needless to say, both the programming languages provide an active community of developers and contributors, regularly providing invaluable insight to others and the language.
不用说,这两种编程语言都为开发人员和贡献者提供了一个活跃的社区,它们定期为其他人和语言提供宝贵的见解。
Python Community —
Python社区—
RStudio Community —
RStudio社区—
结论 (Conclusion)
The competing nature of the two languages might help us produce the simplest and the most efficient code for our purposes.
两种语言的竞争性质可能有助于我们为我们的目的生成最简单,最有效的代码。
Throughout this article, we discussed a handful of deciding factors among Python and R playing a leading role in picking one programming language over the other. We can conclude that even though both the languages are a respectable choice for Data Science, they still have their pros and cons. Learning Python gives you the versatility to work with a majority of Data Science-centric projects while learning R gives you a stronger hold on the statistics in Data Science. Learning both will undoubtedly give you an upper hand in your upcoming Data Science projects, but we’d like to leave the final decision-making up to you.
在整个本文中,我们讨论了Python和R中的一些决定性因素,这些因素在选择一种编程语言而不是另一种编程语言中起着主导作用。 我们可以得出结论,尽管这两种语言都是数据科学的不错选择,但它们仍然各有利弊。 学习Python使您可以处理大多数以数据科学为中心的项目,而学习R则可以使您更牢固地掌握数据科学中的统计信息。 两者的学习无疑将使您在即将到来的Data Science项目中占上风,但是我们希望最终的决定权由您决定。
Note: To eliminate problems of different kinds, I want to alert you to the fact this article represent just my personal opinion I want to share, and you possess every right to disagree with it.
注意: 为消除各种问题,我谨在此提醒您,本文仅代表我要分享的个人观点,您拥有反对该观点的一切权利。
About Author
关于作者
Claire D. is a Content Crafter and Marketer at Digitalogy — a tech sourcing and custom matchmaking marketplace that connects people with pre-screened & top-notch developers and designers based on their specific needs across the globe. Connect with Digitalogy on Linkedin, Twitter, Instagram.
克莱尔·D 。 是 Digitalogy 的Content Crafter and Marketinger ,这 是一个技术采购和自定义配对市场,可根据人们在全球的特定需求,将他们与预先筛选和一流的开发商和设计师联系起来。 在 Linkedin , Twitter , Instagram 上 与 Digitalogy联系 。
翻译自: https://towardsdatascience.com/python-vs-and-r-for-data-science-4a32580846a4
数据科学 python
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/392360.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!