数据可视化及其重要性:Python

Data visualization is an important skill to possess for anyone trying to extract and communicate insights from data. In the field of machine learning, visualization plays a key role throughout the entire process of analysis.

对于任何试图从数据中提取和传达见解的人来说,数据可视化都是一项重要技能。 在机器学习领域,可视化在整个分析过程中都扮演着关键角色。

Why do we need to visualize the data?

为什么我们需要可视化数据?

Let’s say, we have data set of Car Sales across four continents in the first 11 months.

假设我们在前11个月拥有四大洲的汽车销售数据集。

Image for post
Car Sales from Jan to Nov
1月至11月的汽车销量

It is pretty cumbersome to analyze each column separately and draw some conclusions by the above data. So, what we generally do is, summarize the data and deduce some insights from it. Now, let’s see how the sales have performed in each continent when compared to others, for that, we’ll calculate the average of Discount and Sales for each continent,

分别分析各列并根据上述数据得出一些结论是非常麻烦的。 因此,我们通常要做的是汇总数据并从中得出一些见解。 现在,让我们看看与其他大陆相比,每个大陆的销售情况如何,为此,我们将计算每个大陆的折扣和销售平均值,

Image for post
Average of Discount and Sales
折扣和销售平均值

It looks like the Sales have been pretty equal across the continents for the first 11 months. Let’s also take a look at the Standard Deviation of each column by further inspecting the data,

前11个月,各大洲的销售情况似乎相当。 让我们通过进一步检查数据来查看每列的标准差,

Image for post
Standard Deviation across the continents
各大洲的标准差

So, by the above data, we can infer that the performance of the sales has been the same when compared to the continents. See, this is where the summary statistics tend to mislead.

因此,根据以上数据,我们可以推断出与各大洲相比,销售业绩是相同的。 瞧,这就是汇总统计数据容易引起误解的地方。

If we plot the Sales performance across the Discount rate from the above data in Python on a scatter plot, we get the following graphs.

如果我们根据散点图上Python中上述数据在折现率上绘制Sales性能,则会得到以下图形。

Image for post
Scatter Plot
散点图

Each of the continents had employed a different strategy to boost their sales and their discount rate, and the sales numbers were also quite different across all of them. It is difficult to understand the pattern or the strategy of each of the continents using the numbers alone. So, that’s why it is important to Visualize the data instead of drawing the conclusions based on only numbers.

每个大洲都采用了不同的策略来提高销售量和折扣率,并且所有销售量的差异也很大。 仅凭数字很难理解每个大洲的格局或战略。 因此,这就是为什么要可视化数据而不是仅基于数字得出结论很重要的原因。

The above data-set is a modified version of Anscombe’s quartet, they were constructed in 1973 by the statistician Francis Anscombe, to counter the impression among statisticians that “numerical calculations are exact, but graphs are rough.”

上面的数据集是Anscombe四重奏的修改版本,它们是由统计学家Francis Anscombe于1973年构建的,目的是抵消统计学家的印象,即“数值计算是精确的,但图形是粗糙的”。

You can find more about Anscombe’s quartet here.

您可以在此处找到有关Anscombe四重奏的更多信息。

So, now comes the million-dollar question,

因此,现在出现了百万美元的问题,

我们应该使用哪个Python库进行数据可视化? (Which Python Library should we use for Data Visualization?)

Python has some of the most interactive data visualization tools. The most basic plot types are shared between multiple libraries, but others are only available in certain libraries.

Python具有一些最具交互性的数据可视化工具。 最基本的绘图类型在多个库之间共享,但是其他类型仅在某些库中可用。

The three main data visualization libraries used by every data scientist is:

每个数据科学家使用的三个主要的数据可视化库是:

  1. Matplotlib

    Matplotlib
  2. Seaborn

    海生
  3. Plotly

    密谋

1. Matplotlib (1. Matplotlib)

Matplotlib is the most popular data visualization library of Python. It is used to generate simple yet powerful visualizations. Everyone, from beginners to seasoned professionals in Data science, Matplotlib is the most widely used library for plotting.

Matplotlib是最受欢迎的Python数据可视化库。 它用于生成简单而强大的可视化。 从初学者到经验丰富的数据科学专业人士,Matplotlib是最广泛使用的绘图库。

Advantages:

优点:

  1. Matplotlib supports various types of graphical representations like Bar Graphs, Histograms, Line Graph, Scatter Plot, Stem Plots, etc.

    Matplotlib支持各种类型的图形表示,例如条形图,直方图,折线图,散点图,干图等。
  2. Matplotlib can be used in multiple ways including Python scripts, the Python and iPython shells, Jupyter Notebooks.

    Matplotlib可以多种方式使用,包括Python脚本,Python和iPython shell,Jupyter Notebooks。
  3. Matplotlib is a 2-D plotting library. But there are some extensions that we can use to create advanced visualizations like 3-Dimensional plots, etc.

    Matplotlib是一个二维绘图库。 但是,我们可以使用一些扩展来创建高级可视化效果,例如3维图等。
Image for post
3D representation using matplotlib
使用matplotlib的3D表示

2. Seaborn (2. Seaborn)

The Python library Seaborn is a data visualization library based on Matplotlib. Seaborn provides a variety of visualization patterns. It is more integrated to work with Pandas dataframe compared to matplotlib. Seaborn is widely used for statistics visualization because it has some of the best statistical tasks built with-in.

Python库Seaborn是基于Matplotlib的数据可视化库。 Seaborn提供了多种可视化模式。 与matplotlib相比,它与Pandas数据框的集成度更高。 Seaborn被广泛用于统计可视化,因为它具有一些内置的最佳统计任务。

Advantages:

优点:

  1. Seaborn uses fewer Syntax and we write less code to achieve high-grade visualizations.

    Seaborn使用的语法更少,我们编写的代码更少,可以实现高级可视化。
  2. When compared to matplolib, the seaborn graphs are much more visually appealing by default.

    与matplolib相比,默认情况下,seaborn图在视觉上更具吸引力。
Image for post
Matplotlib vs Seaborn using same dataset
Matplotlib vs Seaborn使用相同的数据集

3. Seaborn works with the whole dataset as a whole compared to matplotlib which deals with dataframes and arrays.

3.与处理数据帧和数组的matplotlib相比,Seaborn可以处理整个数据集。

3.密谋 (3. Plotly)

Plotly provides interactive plots and is easily readable to an audience who doesn’t have much knowledge of reading plots. Plotly is mostly used for handing the geographical, scientific, statistical, and financial data.

Plotly提供交互式绘图,对于不了解绘图的读者很容易理解。 Plotly主要用于处理地理,科学,统计和财务数据。

Advantages:

优点:

  1. Plotly is highly compatible with Jupyter Notebook and Web-Browsers, which makes it easy to share the graphs with end-users.

    Plotly与Jupyter Notebook和Web浏览器高度兼容,这使得与最终用户轻松共享图形成为可能。
  2. The most important advantage is Plotly offers contour plots, which cannot be found in most libraries.

    最重要的优点是Plotly提供了等高线图,这在大多数库中都找不到。
Image for post
Basic Contour Plot using Plotly
使用Plotly的基本轮廓图

3. While using Plotly, if we mouse over on the Graph, it shows the values of the axis at that particular point.

3.使用Plotly时,如果将鼠标悬停在Graph上,它将显示该特定点处的轴值。

There are some more data visualization libraries available in Python like Bokeh, Altair, ggplot, etc. But, the ones mentioned above are the most common and widely used libraries across the world.

Python中还有更多可用的数据可视化库,例如Bokeh,Altair,ggplot等。但是,上面提到的那些是世界上最常见且使用最广泛的库。

结论 (Conclusion)

In this article first, we learned why it is important to visualize the data instead of inferring solely based on datasheets. After that, we have seen the different types of data visualization libraries in Python. There are a wide variety of data visualization tools available in Python apart from the ones discussed and mentioned above. It is important to familiarize yourself with the libraries before proceeding with a particular approach.

首先,在本文中,我们了解了为什么对数据进行可视化而不是仅基于数据表进行推断很重要。 之后,我们看到了Python中不同类型的数据可视化库。 除了上面讨论和提到的工具外,Python还提供了各种各样的数据可视化工具。 在继续使用特定方法之前,一定要熟悉这些库,这一点很重要。

Thank you for reading and Happy Coding!!!

感谢您的阅读快乐编码!!!

在这里查看我以前有关Python的文章 (Check out my previous articles about Python here)

  • Pandas: Python

    熊猫:Python

  • Matplotlib: Python

    Matplotlib:Python

  • NumPy: Python

    NumPy:Python

  • Time Complexity and Its Importance in Python

    时间复杂度及其在Python中的重要性

  • Python Recursion or Recursive Function in Python

    Python中的Python递归或递归函数

  • Python Programs to check for Armstrong Number (n digit) and Fenced Matrix

    用于检查Armstrong编号(n位)和栅栏矩阵的Python程序

  • Python: Problems for Basics Reference — Swapping, Factorial, Reverse Digits, Pattern Print

    Python:基本参考问题-交换,阶乘,反向数字,图案打印

翻译自: https://levelup.gitconnected.com/data-visualization-and-its-importance-python-7599c1092a09

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389373.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

【洛谷算法题】P1046-[NOIP2005 普及组] 陶陶摘苹果【入门2分支结构】Java题解

👨‍💻博客主页:花无缺 欢迎 点赞👍 收藏⭐ 留言📝 加关注✅! 本文由 花无缺 原创 收录于专栏 【洛谷算法题】 文章目录 【洛谷算法题】P1046-[NOIP2005 普及组] 陶陶摘苹果【入门2分支结构】Java题解🌏题目…

python多项式回归_如何在Python中实现多项式回归模型

python多项式回归Let’s start with an example. We want to predict the Price of a home based on the Area and Age. The function below was used to generate Home Prices and we can pretend this is “real-world data” and our “job” is to create a model which wi…

充分利用UC berkeleys数据科学专业

By Kyra Wong and Kendall Kikkawa黄凯拉(Kyra Wong)和菊川健多 ( Kendall Kikkawa) 什么是“数据科学”? (What is ‘Data Science’?) Data collection, an important aspect of “data science”, is not a new idea. Before the tech boom, every industry al…

02-web框架

1 while True:print(server is waiting...)conn, addr server.accept()data conn.recv(1024) print(data:, data)# 1.得到请求的url路径# ------------dict/obj d["path":"/login"]# d.get(”path“)# 按着http请求协议解析数据# 专注于web业…

ai驱动数据安全治理_AI驱动的Web数据收集解决方案的新起点

ai驱动数据安全治理Data gathering consists of many time-consuming and complex activities. These include proxy management, data parsing, infrastructure management, overcoming fingerprinting anti-measures, rendering JavaScript-heavy websites at scale, and muc…

铁拳nat映射_铁拳如何重塑我的数据可视化设计流程

铁拳nat映射It’s been a full year since I’ve become an independent data visualization designer. When I first started, projects that came to me didn’t relate to my interests or skills. Over the past eight months, it’s become very clear to me that when cl…

DengAI —如何应对数据科学竞赛? (EDA)

了解机器学习 (Understanding ML) This article is based on my entry into DengAI competition on the DrivenData platform. I’ve managed to score within 0.2% (14/9069 as on 02 Jun 2020). Some of the ideas presented here are strictly designed for competitions li…

java.net.SocketException: Software caused connection abort: socket write erro

场景:接口测试 编辑器:eclipse 版本:Version: 2018-09 (4.9.0) testng版本:TestNG version 6.14.0 执行testng.xml时报错信息: 出现此报错原因之一:网上有人说是testng版本与eclipse版本不一致造成的&#…

使用K-Means对美因河畔法兰克福的社区进行聚类

介绍 (Introduction) This blog post summarizes the results of the Capstone Project in the IBM Data Science Specialization on Coursera. Within the project, the districts of Frankfurt am Main in Germany shall be clustered according to their venue data using t…

样本均值的抽样分布_抽样分布样本均值

样本均值的抽样分布One of the most important concepts discussed in the context of inferential data analysis is the idea of sampling distributions. Understanding sampling distributions helps us better comprehend and interpret results from our descriptive as …

玩转ceph性能测试---对象存储(一)

笔者最近在工作中需要测试ceph的rgw,于是边测试边学习。首先工具采用的intel的一个开源工具cosbench,这也是业界主流的对象存储测试工具。 1、cosbench的安装,启动下载最新的cosbench包wget https://github.com/intel-cloud/cosbench/release…

因果关系和相关关系 大数据_数据科学中的相关性与因果关系

因果关系和相关关系 大数据Let’s jump into it right away.让我们马上进入。 相关性 (Correlation) Correlation means relationship and association to another variable. For example, a movement in one variable associates with the movement in another variable. For…

vue取数据第一个数据_我作为数据科学家的第一个月

vue取数据第一个数据A lot.很多。 I landed my first job as a Data Scientist at the beginning of August, and like any new job, there’s a lot of information to take in at once.我于8月初找到了数据科学家的第一份工作,并且像任何新工作一样,一…

STL-开篇

基本概念 STL: Standard Template Library,标准模板库 定义: c引入的一个标准类库 特点:1)数据结构和算法的 c实现( 采用模板类和模板函数)2)数据的存储和算法的分离3)高…

rcp rapido_为什么气流非常适合Rapido

rcp rapidoBack in 2019, when we were building our data platform, we started building the data platform with Hadoop 2.8 and Apache Hive, managing our own HDFS. The need for managing workflows whether it’s data pipelines, i.e. ETL’s, machine learning predi…

Mysql5.7开启远程

2019独角兽企业重金招聘Python工程师标准>>> 1.注掉bind-address #bind-address 127.0.0.1 2.开启远程访问权限 grant all privileges on *.* to root"xxx.xxx.xxx.xxx" identified by "密码"; 或 grant all privileges on *.* to root"%…

分类结果可视化python_可视化分类结果的另一种方法

分类结果可视化pythonI love good data visualizations. Back in the days when I did my PhD in particle physics, I was stunned by the histograms my colleagues built and how much information was accumulated in one single plot.我喜欢出色的数据可视化。 早在我获得…

算法组合 优化算法_算法交易简化了风险价值和投资组合优化

算法组合 优化算法Photo by Markus Spiske (left) and Jamie Street (right) on UnsplashMarkus Spiske (左)和Jamie Street(右)在Unsplash上的照片 In the last post, we saw how actual algorithms are developed and tested. In this post, we will figure out the level of…

PS抠发丝技巧 「选择并遮住…」

PS抠发丝技巧 「选择并遮住…」 现在的海报设计,大多数都有模特MM,然而MM的头发实用太多了,有的还飘起来…… 对于设计师(特别是淘宝美工)没有一个强大、快速、实用的抠发丝技巧真的混不去哦。而PS CC 2017版本开始,就有了一个强大…

covid 19如何重塑美国科技公司的工作文化

未来 , 技术 , 观点 (Future, Technology, Opinion) Who would have thought that a single virus would take down the whole world and make us stay inside our homes? A pandemic wave that has altered our lives in such a way that no human (bi…