谷歌 colab
Recently, pandas have come up with an amazing open-source library called pandas-profiling. Generally, EDA starts by df.describe()
, df.info()
and etc which to be done separately. Pandas_profiling extends the general data frame report using a single line of code: df.profile_report()
which interactively describes the statistics, you can read it more here.
最近,熊猫想出了一个了不起的开源库,叫做pandas-profiling。 通常,EDA从df.describe()
, df.info()
等开始,这需要分别进行。 Pandas_profiling使用单行代码df.profile_report()
扩展了通用数据框架报告,该代码以交互方式描述了统计信息,您可以在此处内容。
然而, pandas_profiling
不能被直接用在Colab。 该代码将导致错误,如下所示; (However, pandas_profiling
cannot be straightforwardly used on Colab. The code will result in an error, as below;)
“concat() got an unexpected keyword argument ‘join axes“
This is because Google Colab comes with a pre-installed older version of Pandas-profiling (v1) and the join_axes
function is deprecated in the installed Pandas version on Google Colab.
这是因为Google Colab随附了预先安装的Pandas分析(v1)的join_axes
版本,而在Google Colab上已安装的Pandas版本中不推荐使用join_axes
函数。
Google Colab的两个主要命令是: (The two main commands for Google Colab are:)
! pip install https://github.com/pandas-profiling/pandas-profiling/archive/master.zip
profile.to_notebook_iframe()
步骤:在Google Colab上安装Pandas分析 (STEPS : Install Pandas Profiling on Google Colab)
Run the below command, you can visit the link on github.
运行以下命令,您可以访问github上的链接 。
! pip install https://github.com/pandas-profiling/pandas-profiling/archive/master.zip
2. Restart the kernel
2.重新启动内核
3. Re-import the libraries
3.重新导入库
4. Import and read your data set
4.导入和读取您的数据集
5. Define your profile report:
5.定义您的个人资料报告:
6. However, profile.to_widgets()
is not working properly as it is not yet fully supported on Google Colab, as below snapshot :
6.但是, profile.to_widgets()
无法正常运行,因为Google Colab尚未完全支持它,如下快照所示:
7. Instead, change to profile.to_notebook_iframe()
, as below snapshot:
7.而是改为profile.to_notebook_iframe()
,如下快照:
8. And here’s your output:
8.这是您的输出:
9. Save your output file in html format: so you can share as a webpage
9.将您的输出文件保存为html格式:这样您就可以作为网页共享
Pandas_profiling displays descriptive overview of the data sets, by showing the number of variables, observations, total missing cells, duplicate rows, memory used and the variable types. Then, it generates detailed analysis for each variable, class distributions, interactions, correlations, missing values, samples and duplicated rows, which you can observe by clicking each tab.
Pandas_profiling通过显示变量的数量,观察值,丢失的单元格总数,重复的行,使用的内存和变量类型来显示数据集的描述性概述。 然后,它为每个变量,类分布,相互作用,相关性,缺失值,样本和重复行生成详细分析,您可以通过单击每个选项卡进行观察。
I hope this will help you to play around with Pandas profiling.
我希望这将帮助您进行Pandas分析。
Happy exploring!
探索愉快!
翻译自: https://medium.com/python-in-plain-english/how-to-use-pandas-profiling-on-google-colab-e34f34ff1c9f
谷歌 colab
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389663.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!