使用python数据分析_如何使用Python提升您的数据分析技能

使用python数据分析

If you're learning Python, you've likely heard about sci-kit-learn, NumPy and Pandas. And these are all important libraries to learn. But there is more to them than you might initially realize.

如果您正在学习Python,则可能听说过sci-kit-learn,NumPy和Pandas。 这些都是需要学习的重要库。 但是他们所拥有的比您最初想象的要多。

There are numerous tips and tricks in the world of Python that can help you speed up your tasks in data science, improve your code, and also help you to write code more efficiently.

Python领域中有许多技巧和窍门,可以帮助您加快数据科学中的任务,改善代码并还可以更有效地编写代码。

So I decided to compile some of the most valuable data analysis tips in this article for you.

因此,我决定为您编译一些最有价值的数据分析技巧。

在Pandas中剖析数据框 (Profile dataframes in Pandas)

The primary role or purpose of profiling is to get a clear understanding of the data. And this is what the Python package, Pandas Profiling, does. This method is straightforward and fast in performing data analysis of dataframes in Pandas.

概要分析的主要作用或目的是对数据有清晰的了解。 这就是Python程序包Pandas Profiling所做的。 该方法在对Pandas中的数据帧执行数据分析时非常简单快捷。

The exploratory data analysis process includes the Pandas df.info()functions and df.describe() as the first steps. But you only get a basic data overview, which might not be very helpful if you're dealing with a large data set.

探索性数据分析过程包括熊猫df.info()函数和df.describe()作为第一步。 但是您只会得到基本的数据概述,如果您要处理大量数据集,这可能不会很有帮助。

Pandas’s profiling function also extends the dataframe of Pandas with the df.profile_report(), which helps you quickly analyze data. It displays plenty of information in just one line of code, which also happens to be an HTML report that's interactive.

Pandas的分析功能还使用df.profile_report()扩展了Pandas的数据框,该功能可帮助您快速分析数据。 它仅用一行代码显示大量信息,而这恰好是交互式HTML报告。

For a set of data, Pandas profiling computes these statistics:

对于一组数据,Pandas分析会计算以下统计信息:

使熊猫图更具互动性 (Make pandas plots more interactive)

The built-in plot() function of Pandas is also one of the Dataframe classes. However, this function offers visualizations that are not very interactive, and so do not appeal much to a data science audience.

Pandas的内置plot()函数也是Dataframe类之一。 但是,此功能提供的可视化效果不是很互动,因此对数据科学的受众吸引力不大。

On the other hand, it is easy to plot a chart with the Pandas.DataFrame.plot() function. The question then is, how do we plot interactive charts like Plotly using Pandas and without making significant changes to the code?

另一方面,使用Pandas.DataFrame.plot()函数可以很容易地绘制图表。 然后的问题是,如何在不对代码进行重大更改的情况下使用Pandas绘制交互式图表(如Plotly)?

You can do this with the Cufflinks library, which binds Plotly’s power with Pandas's flexibility for plotting quickly.

您可以使用Cufflinks库来做到这一点,该库将Plotly的功能与Pandas的灵活性相结合,可以快速进行绘图。

You can see the result in the images below.

您可以在下面的图像中看到结果。

Both visualizations show the same things. The first visualization is a static chart, while the second one is a more interactive chart (and it also provides more details than the first one). Yet, we got this without making any significant changes to the syntax.

两种可视化都显示相同的内容。 第一个可视化是静态图表,而第二个可视化是更具交互性的图表(它还提供了比第一个图表更多的详细信息)。 但是,我们在没有对语法进行任何重大更改的情况下获得了此代码。

魔术命令 (Magic commands)

The tag ‘Magic Commands’ refers to a set of functions in Jupyter Notebooks. They created this set of features to solve the many common problems that are experienced in standard data analysis.

标签“ Magic Commands”指的是Jupyter Notebook中的一组功能。 他们创建了这组功能来解决标准数据分析中遇到的许多常见问题。

There are two kinds of Magic commands. First, there are the line magics - those that have a prefix of the % character. They also operate on one line of input.

有两种Magic命令。 首先,有线魔术-带有%字符前缀的魔术。 它们还可以在一行输入上运行。

The second kind are the cell magics - denoted by the double %% prefix. They work on more than one input line. If you set it to 1, you'll call the magic functions without needing to type the initial %.

第二种是细胞魔术-由双%%前缀表示。 它们在多个输入行上工作。 如果将其设置为1,则无需键入首字母%就可以调用magic函数。

Some of these commands might come in handy when you're doing everyday tasks in data analysis. Some of them are:

在执行数据分析的日常任务时,其中一些命令可能会派上用场。 他们之中有一些是:

%pastebin (%pastebin)

This function returns the URL and also uploads the code to Pastebin. Pastebin is a content hosting service online where it's possible to store plain text (such as source code snippets) and then share the URL with other people.

此函数返回URL,并将代码上传到Pastebin。 Pastebin是在线的内容托管服务,可以存储纯文本(例如源代码片段),然后与其他人共享URL。

As a matter of fact, a Github gist is very similar to Pastebin, but has version control.

实际上,Github要点与Pastebin非常相似,但是具有版本控制功能。

%matplotlib笔记本 (%matplotlib notebook)

You can use this inline function for rendering static Matplotlib plots within Jupyter notebooks. You have to try and replace the inline part with a notebook. This will get you resize-able and zoom-able plots quickly.

您可以使用此内联函数在Jupyter笔记本中渲染静态Matplotlib图。 您必须尝试用笔记本替换嵌入式部件。 这将使您能够快速调整大小和缩放比例的图。

But make sure you call the function before you start to import the Matplotlib library.

但是请确保在开始导入Matplotlib库之前先调用该函数。

%跑 (%run)

You can use this function to run a Python script in a notebook.

您可以使用此功能在笔记本中运行Python脚本。

%% writefile (%%writefile)

This function writes the cell content into a file. You then write the code into another file named foo.py before saving it into the current directory.

此函数将单元格内容写入文件。 然后,将代码写入另一个名为foo.py的文件中,然后再将其保存到当前目录中。

%%胶乳 (%%latex)

This function makes the cell content appear as LaTeX. It comes in handy when writing mathematical equations and formulae in a cell.

此功能使单元格内容显示为LaTeX。 在单元格中编写数学方程式和公式时非常方便。

查找并删除错误 (Find and remove errors)

The function known as the interactive debugger is another magic feature. However, for this article, it has a different category all its own.

称为交互式调试器的功能是另一个魔术功能。 但是,对于本文,它自己拥有一个不同的类别。

If you are running a code cell and get an exception, type %debug under a new line and then run it. This will open up an environment for interactive debugging that takes you back to the point where the exception happened.

如果您正在运行代码单元并遇到异常,请在新行下键入%debug,然后运行它。 这将为交互式调试打开一个环境,使您回到发生异常的地方。

You can also check the values of the different variables that they assigned within the program and, at the same time, perform operations there. After that, if you want to exit the debugger, press q.

您还可以检查它们在程序中分配的不同变量的值,并同时在其中执行操作。 此后,如果要退出调试器,请按q。

运行Python脚本时使用“ I”选项 (Use the ‘I’ option when running Python scripts)

One way to typically run a Python script from the command line is with hello.py. But if you add an -i and run the same Python script, (Python -i hello.py), you get more benefits. How?

通常从命令行运行Python脚本的一种方法是hello.py。 但是,如果添加-i并运行相同的Python脚本(Python -i hello.py),则会获得更多好处。 怎么样?

First of all, after you get to the program end, Python does not close the interpreter. This means that we can check for the values of the different variables and how correct the functions defined in the program are.

首先,进入程序端后 ,Python不会关闭解释器。 这意味着我们可以检查不同变量的值以及程序中定义的函数的正确性。

Second, it is then easy to invoke the Python debugger, especially since the interpreter is still available by:

其次,调用Python调试器非常容易,特别是因为解释器仍然可以通过以下方式使用:

  • Import pdb

    导入pdb
  • Pdb.pm()

    Pdb.pm()

From here, we can quickly get to the point where the exception happened and then work on the code.

从这里,我们可以快速到达发生异常的地方,然后对代码进行处理。

删除并还原 (Delete and restore)

So what do you do when you mistakenly delete one cell within your Jupyter Notebook? Luckily there is a shortcut for you to undo that action.

那么,当您错误地删除Jupyter Notebook中的一个单元格时该怎么办? 幸运的是,您可以通过快捷方式撤消该操作。

You can recover or undo your deleted content by hitting CTRL/CMD+Z.

您可以通过按CTRL / CMD + Z来恢复或撤消已删除的内容。

If you have deleted an entire cell that you want to recover, press ESC+Z, or EDIT > Undo Delete Cells.

如果已删除要恢复的整个单元,请按ESC + Z或EDIT> Undo Delete Cells。

结论 (Conclusion)

This article shared some tips to boost your data analysis skills with Python. These hacks should come in handy for you at some point in your Python data analysis journey.

本文分享了一些技巧,以提高您使用Python的数据分析技能。 在您进行Python数据分析的过程中,这些技巧应该会很方便。

翻译自: https://www.freecodecamp.org/news/how-to-boost-your-data-analysis-skills-with-python/

使用python数据分析

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/390217.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

openresty 日志输出的处理

最近出了个故障,有个接口的请求居然出现了长达几十秒的处理时间,由于日志缺乏,网络故障也解除了,就没法再重现这个故障了。为了可以在下次出现问题的时候能追查到问题,所以需要添加一些追踪日志。添加这些追踪日志&…

谁是赢家_赢家的真正作品是股东

谁是赢家As I wrote in the article “5 Skills to Look For When Hiring Remote Talent,” remote work is a fast emerging segment of the labor market. Today roughly eight million Americans work remotely full-time. And among the most commonly held jobs include m…

博客园代码黑色主题高亮设置

参考链接: https://segmentfault.com/a/1190000013001367 先发链接,有空实践后会整理。我的GitHub地址:https://github.com/heizemingjun我的博客园地址:http://www.cnblogs.com/chenmingjun我的蚂蚁笔记博客地址:http…

Matplotlib课程–学习Python数据可视化

Learn the basics of Matplotlib in this crash course tutorial. Matplotlib is an amazing data visualization library for Python. You will also learn how to apply Matplotlib to real-world problems.在此速成班教程中学习Matplotlib的基础知识。 Matplotlib是一个很棒…

Android 开发使用 Gradle 配置构建库模块的工作方式

Android 开发过程中,我们不可避免地需要引入其他人的工作成果。减少重复“造轮子”的时间,投入到更有意义的核心任务当中。Android 库模块在结构上与 Android 应用模块相同。提供构建应用所需的一切内容,包括源代码(src&#xff0…

vue 组件库发布_如何创建和发布Vue组件库

vue 组件库发布Component libraries are all the rage these days. They make it easy to maintain a consistent look and feel across an application. 如今,组件库风行一时。 它们使在整个应用程序中保持一致的外观和感觉变得容易。 Ive used a variety of diff…

angular

<input type"file" id"one-input" accept"image/*" file-model"images" οnchange"angular.element(this).scope().img_upload(this.files)"/>转载于:https://www.cnblogs.com/loweringye/p/8441437.html

Java网络编程 — Netty入门

认识Netty Netty简介 Netty is an asynchronous event-driven network application framework for rapid development of maintainable high performance protocol servers & clients. Netty is a NIO client server framework which enables quick and easy development o…

har文件分析http_如何使用HAR文件分析一段时间内的性能

har文件分析httpWhen I consider the performance of a website, several things come to mind. I think about looking at the requests of a page, understanding what resources are being loaded, and how long these resources take to be available to users.当我考虑网站…

第一阶段:前端开发_Mysql——表与表之间的关系

2018-06-26 表与表之间的关系 一、一对多关系&#xff1a; 常见实例&#xff1a;分类和商品&#xff0c;部门和员工一对多建表原则&#xff1a;在从表&#xff08;多方&#xff09;创建一个字段&#xff0c;字段作为外键指向主表&#xff08;一方&#xff09;的一方      …

按钮提交在url后添加字段_在输入字段上定向单击“清除”按钮(X)

按钮提交在url后添加字段jQuery makes it easy to get your project up and running. Though its fallen out of favor in recent years, its still worth learning the basics, especially if you want quick access to its powerful methods.jQuery使您可以轻松启动和运行项目…

429. N 叉树的层序遍历

429. N 叉树的层序遍历 给定一个 N 叉树&#xff0c;返回其节点值的层序遍历。&#xff08;即从左到右&#xff0c;逐层遍历&#xff09;。 树的序列化输入是用层序遍历&#xff0c;每组子节点都由 null 值分隔&#xff08;参见示例&#xff09;。 - 示例 1&#xff1a;输入…

javascript如何阻止事件冒泡和默认行为

阻止冒泡&#xff1a; 冒泡简单的举例来说&#xff0c;儿子知道了一个秘密消息&#xff0c;它告诉了爸爸&#xff0c;爸爸知道了又告诉了爷爷&#xff0c;一级级传递从而以引起事件的混乱&#xff0c;而阻止冒泡就是不让儿子告诉爸爸&#xff0c;爸爸自然不会告诉爷爷。下面的d…

89. Gray Code - LeetCode

为什么80%的码农都做不了架构师&#xff1f;>>> Question 89. Gray Code Solution 思路&#xff1a; n 0 0 n 1 0 1 n 2 00 01 10 11 n 3 000 001 010 011 100 101 110 111 Java实现&#xff1a; public List<Integer> grayCode(int n) {List&…

400. 第 N 位数字

400. 第 N 位数字 在无限的整数序列 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, …中找到第 n 位数字。 注意&#xff1a;n 是正数且在 32 位整数范围内&#xff08;n < 231&#xff09;。 示例 1&#xff1a; 输入&#xff1a;3 输出&#xff1a;3 示例 2&#xff1a; 输入&…

1.初识Linux

1.Linux 区分大小写 2.shell命令行-bash 进入终端->[stulocalhost~]$ (其中,Stu为登录用户名&#xff0c;localhost为登录主机名&#xff0c;’~’ 表示当前用户正处在stu用户的家目录中, 普通用户的提示符以$结尾&#xff0c;而根用户以’#’结尾) 3.Linux中所谓的命令(…

这份NLP研究进展汇总请收好,GitHub连续3天最火的都是它

最近&#xff0c;有一份自然语言处理 (NLP) 进展合辑&#xff0c;一发布就受到了同性交友网站用户的疯狂标星&#xff0c;已经连续3天高居GitHub热门榜首位。 合集里面包括&#xff0c;20多种NLP任务前赴后继的研究成果&#xff0c;以及用到的数据集。 这是来自爱尔兰的Sebasti…

基于模型的嵌入式开发流程_如何使用基于模型的测试来改善工作流程

基于模型的嵌入式开发流程Unit testing is not enough – so lets start using model-based testing to improve our workflows.单元测试还不够–因此&#xff0c;让我们开始使用基于模型的测试来改善我们的工作流程。 Software testing is an important phase in building a …

166. 分数到小数

166. 分数到小数 给定两个整数&#xff0c;分别表示分数的分子 numerator 和分母 denominator&#xff0c;以 字符串形式返回小数 。 如果小数部分为循环小数&#xff0c;则将循环的部分括在括号内。 如果存在多个答案&#xff0c;只需返回 任意一个 。 对于所有给定的输入…

最近用.NET实现DHT爬虫,全.NET实现

最近用.NET实现DHT爬虫&#xff0c;全.NET实现&#xff0c;大家可以加我QQ交流下 309159808 转载于:https://www.cnblogs.com/oshoh/p/9236186.html