谷歌 colab_如何在Google Colab上使用熊猫分析

谷歌 colab

Recently, pandas have come up with an amazing open-source library called pandas-profiling. Generally, EDA starts by df.describe(), df.info() and etc which to be done separately. Pandas_profiling extends the general data frame report using a single line of code: df.profile_report() which interactively describes the statistics, you can read it more here.

最近,熊猫想出了一个了不起的开源库,叫做pandas-profiling。 通常,EDA从df.describe()df.info()等开始,这需要分别进行。 Pandas_profiling使用单行代码df.profile_report()扩展了通用数据框架报告,该代码以交互方式描述了统计信息,您可以在此处内容。

然而, pandas_profiling不能被直接用在Colab。 该代码将导致错误,如下所示; (However, pandas_profiling cannot be straightforwardly used on Colab. The code will result in an error, as below;)

“concat() got an unexpected keyword argument ‘join axes“

This is because Google Colab comes with a pre-installed older version of Pandas-profiling (v1) and the join_axes function is deprecated in the installed Pandas version on Google Colab.

这是因为Google Colab随附了预先安装的Pandas分析(v1)的join_axes版本,而在Google Colab上已安装的Pandas版本中不推荐使用join_axes函数。

Google Colab的两个主要命令是: (The two main commands for Google Colab are:)

! pip install https://github.com/pandas-profiling/pandas-profiling/archive/master.zip
profile.to_notebook_iframe()

步骤:在Google Colab上安装Pandas分析 (STEPS : Install Pandas Profiling on Google Colab)

  1. Run the below command, you can visit the link on github.

    运行以下命令,您可以访问github上的链接

! pip install https://github.com/pandas-profiling/pandas-profiling/archive/master.zip 

2. Restart the kernel

2.重新启动内核

3. Re-import the libraries

3.重新导入库

Image for post
image by Author
图片作者

4. Import and read your data set

4.导入和读取您的数据集

5. Define your profile report:

5.定义您的个人资料报告:

Image for post
image by Author
图片作者

6. However, profile.to_widgets() is not working properly as it is not yet fully supported on Google Colab, as below snapshot :

6.但是, profile.to_widgets() 无法正常运行,因为Google Colab尚未完全支持它,如下快照所示:

Image for post
image by Author
图片作者

7. Instead, change to profile.to_notebook_iframe(), as below snapshot:

7.而是改为profile.to_notebook_iframe() ,如下快照:

Image for post
image by Author
图片作者

8. And here’s your output:

8.这是您的输出:

Image for post
Gif by Author
Gif作者

9. Save your output file in html format: so you can share as a webpage

9.将您的输出文件保存为html格式:这样您就可以作为网页共享

Image for post
Image by Author
图片作者

Pandas_profiling displays descriptive overview of the data sets, by showing the number of variables, observations, total missing cells, duplicate rows, memory used and the variable types. Then, it generates detailed analysis for each variable, class distributions, interactions, correlations, missing values, samples and duplicated rows, which you can observe by clicking each tab.

Pandas_profiling通过显示变量的数量,观察值,丢失的单元格总数,重复的行,使用的内存和变量类型来显示数据集的描述性概述。 然后,它为每个变量,类分布,相互作用,相关性,缺失值,样本和重复行生成详细分析,您可以通过单击每个选项卡进行观察。

I hope this will help you to play around with Pandas profiling.

我希望这将帮助您进行Pandas分析。

Happy exploring!

探索愉快!

翻译自: https://medium.com/python-in-plain-english/how-to-use-pandas-profiling-on-google-colab-e34f34ff1c9f

谷歌 colab

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389663.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Java之生成Pdf并对Pdf内容操作

虽说网上有很多可以在线导出Pdf或者word或者转成png等格式的工具,但是我觉得还是得了解知道是怎么实现的。一来,在线免费转换工具,是有容量限制的,达到一定的容量时,是不能成功导出的;二来,业务需求&#x…

边际概率条件概率_数据科学家解释的边际联合和条件概率

边际概率条件概率Probability plays a very important role in Data Science, as Data Scientist regularly attempt to draw statistical inferences that could be used to predict data or analyse data better.P robability起着数据科学非常重要的作用,为数据科…

袋装决策树_袋装树是每个数据科学家需要的机器学习算法

袋装决策树袋装树木介绍 (Introduction to Bagged Trees) Without diving into the specifics just yet, it’s important that you have some foundation understanding of decision trees.尚未深入研究细节,对决策树有一定基础了解就很重要。 From the evaluatio…

[JS 分析] 天_眼_查 字体文件

0. 参考 js分析 猫_眼_电_影 字体文件 font-face 1. 分析 1.1 定位目标元素 1.2 查看网页源代码 1.3 requests 请求提取得到大量错误信息 对比猫_眼_电_影抓取到unicode编码,天_眼_查混合使用正常字体和自定义字体,难点在于如何从 红 转化为 美。 一开始…

经天测绘测量工具包_公共土地测量系统

经天测绘测量工具包部分-乡镇第一师 (Sections — First Divisions of Townships) The PLSS Townships are typically divided into 36 Sections (nominally one mile on a side), but in the national standard this feature is called the first division because Townships …

洛谷 P4012 深海机器人问题【费用流】

题目链接:https://www.luogu.org/problemnew/show/P4012 洛谷 P4012 深海机器人问题 输入输出样例 输入样例#1: 1 1 2 2 1 2 3 4 5 6 7 2 8 10 9 3 2 0 0 2 2 2 输出样例#1: 42 说明 题解:建图方法如下: 对于矩阵中的每…

opencv实现对象跟踪_如何使用opencv跟踪对象的距离和角度

opencv实现对象跟踪介绍 (Introduction) Tracking the distance and angle of an object has many practical uses, especially in robotics. This tutorial explains how to get an accurate distance and angle measurement, even when the target is at a strong angle from…

spring cloud 入门系列七:基于Git存储的分布式配置中心--Spring Cloud Config

我们前面接触到的spring cloud组件都是基于Netflix的组件进行实现的,这次我们来看下spring cloud 团队自己创建的一个全新项目:Spring Cloud Config.它用来为分布式系统中的基础设施和微服务提供集中化的外部配置支持,分为服务端和客户端两个…

熊猫数据集_大熊猫数据框的5个基本操作

熊猫数据集Tips and Tricks for Data Science数据科学技巧与窍门 Pandas is a powerful and easy-to-use software library written in the Python programming language, and is used for data manipulation and analysis.Pandas是使用Python编程语言编写的功能强大且易于使用…

图嵌入综述 (arxiv 1709.07604) 译文五、六、七

应用 图嵌入有益于各种图分析应用,因为向量表示可以在时间和空间上高效处理。 在本节中,我们将图嵌入的应用分类为节点相关,边相关和图相关。 节点相关应用 节点分类 节点分类是基于从标记节点习得的规则,为图中的每个节点分配类标…

聊聊自动化测试框架

无论是在自动化测试实践,还是日常交流中,经常听到一个词:框架。之前学习自动化测试的过程中,一直对“框架”这个词知其然不知其所以然。 最近看了很多自动化相关的资料,加上自己的一些实践,算是对“框架”有…

移动磁盘文件或目录损坏且无法读取资料如何找回

文件或目录损坏且无法读取说明这个盘的文件系统结构损坏了。在平时如果数据不重要,那么可以直接格式化就能用了。但是有的时候里面的数据很重要,那么就必须先恢复出数据再格式化。具体恢复方法可以看正文了解(不格式化的恢复方法)…

python 平滑时间序列_时间序列平滑以实现更好的聚类

python 平滑时间序列In time series analysis, the presence of dirty and messy data can alter our reasonings and conclusions. This is true, especially in this domain, because the temporal dependency plays a crucial role when dealing with temporal sequences.在…

帮助学生改善学习方法_学生应该如何花费时间改善自己的幸福

帮助学生改善学习方法There have been numerous studies looking into the relationship between sleep, exercise, leisure, studying and happiness. The results were often quite like how we expected, though there have been debates about the relationship between sl…

Spring Boot 静态资源访问原理解析

一、前言 springboot配置静态资源方式是多种多样,接下来我会介绍其中几种方式,并解析一下其中的原理。 二、使用properties属性进行配置 应该说 spring.mvc.static-path-pattern 和 spring.resources.static-locations这两属性是成对使用的,如…

深挖“窄带高清”的实现原理

过去几年,又拍云一直在点播、直播等视频应用方面潜心钻研,取得了不俗的成果。我们结合点播、直播、短视频等业务中的用户场景,推出了“省带宽、压成本”系列文章,从编码技术、网络架构等角度出发,结合又拍云的产品成果…

Redis 服务安装

下载 客户端可视化工具: RedisDesktopManager redis官网下载: http://redis.io/download windos服务安装 windows服务安装/卸载下载文件并解压使用 管理员身份 运行命令行并且切换到解压目录执行 redis-service --service-install windowsR 打开运行窗口, 输入 services.msc 查…

熊猫数据集_对熊猫数据框使用逻辑比较

熊猫数据集P (tPYTHON) Logical comparisons are used everywhere.逻辑比较随处可见 。 The Pandas library gives you a lot of different ways that you can compare a DataFrame or Series to other Pandas objects, lists, scalar values, and more. The traditional comp…

决策树之前要不要处理缺失值_不要使用这样的决策树

决策树之前要不要处理缺失值As one of the most popular classic machine learning algorithm, the Decision Tree is much more intuitive than the others for its explainability. In one of my previous article, I have introduced the basic idea and mechanism of a Dec…

gl3520 gl3510_带有gl gl本机的跨平台地理空间可视化

gl3520 gl3510Editor’s note: Today’s post is by Ib Green, CTO, and Ilija Puaca, Founding Engineer, both at Unfolded, an “open core” company that builds products and services on the open source deck.gl / vis.gl technology stack, and is also a major contr…