数据可视化及其重要性:Python

Data visualization is an important skill to possess for anyone trying to extract and communicate insights from data. In the field of machine learning, visualization plays a key role throughout the entire process of analysis.

对于任何试图从数据中提取和传达见解的人来说,数据可视化都是一项重要技能。 在机器学习领域,可视化在整个分析过程中都扮演着关键角色。

Why do we need to visualize the data?

为什么我们需要可视化数据?

Let’s say, we have data set of Car Sales across four continents in the first 11 months.

假设我们在前11个月拥有四大洲的汽车销售数据集。

Image for post
Car Sales from Jan to Nov
1月至11月的汽车销量

It is pretty cumbersome to analyze each column separately and draw some conclusions by the above data. So, what we generally do is, summarize the data and deduce some insights from it. Now, let’s see how the sales have performed in each continent when compared to others, for that, we’ll calculate the average of Discount and Sales for each continent,

分别分析各列并根据上述数据得出一些结论是非常麻烦的。 因此,我们通常要做的是汇总数据并从中得出一些见解。 现在,让我们看看与其他大陆相比,每个大陆的销售情况如何,为此,我们将计算每个大陆的折扣和销售平均值,

Image for post
Average of Discount and Sales
折扣和销售平均值

It looks like the Sales have been pretty equal across the continents for the first 11 months. Let’s also take a look at the Standard Deviation of each column by further inspecting the data,

前11个月,各大洲的销售情况似乎相当。 让我们通过进一步检查数据来查看每列的标准差,

Image for post
Standard Deviation across the continents
各大洲的标准差

So, by the above data, we can infer that the performance of the sales has been the same when compared to the continents. See, this is where the summary statistics tend to mislead.

因此,根据以上数据,我们可以推断出与各大洲相比,销售业绩是相同的。 瞧,这就是汇总统计数据容易引起误解的地方。

If we plot the Sales performance across the Discount rate from the above data in Python on a scatter plot, we get the following graphs.

如果我们根据散点图上Python中上述数据在折现率上绘制Sales性能,则会得到以下图形。

Image for post
Scatter Plot
散点图

Each of the continents had employed a different strategy to boost their sales and their discount rate, and the sales numbers were also quite different across all of them. It is difficult to understand the pattern or the strategy of each of the continents using the numbers alone. So, that’s why it is important to Visualize the data instead of drawing the conclusions based on only numbers.

每个大洲都采用了不同的策略来提高销售量和折扣率,并且所有销售量的差异也很大。 仅凭数字很难理解每个大洲的格局或战略。 因此,这就是为什么要可视化数据而不是仅基于数字得出结论很重要的原因。

The above data-set is a modified version of Anscombe’s quartet, they were constructed in 1973 by the statistician Francis Anscombe, to counter the impression among statisticians that “numerical calculations are exact, but graphs are rough.”

上面的数据集是Anscombe四重奏的修改版本,它们是由统计学家Francis Anscombe于1973年构建的,目的是抵消统计学家的印象,即“数值计算是精确的,但图形是粗糙的”。

You can find more about Anscombe’s quartet here.

您可以在此处找到有关Anscombe四重奏的更多信息。

So, now comes the million-dollar question,

因此,现在出现了百万美元的问题,

我们应该使用哪个Python库进行数据可视化? (Which Python Library should we use for Data Visualization?)

Python has some of the most interactive data visualization tools. The most basic plot types are shared between multiple libraries, but others are only available in certain libraries.

Python具有一些最具交互性的数据可视化工具。 最基本的绘图类型在多个库之间共享,但是其他类型仅在某些库中可用。

The three main data visualization libraries used by every data scientist is:

每个数据科学家使用的三个主要的数据可视化库是:

  1. Matplotlib

    Matplotlib
  2. Seaborn

    海生
  3. Plotly

    密谋

1. Matplotlib (1. Matplotlib)

Matplotlib is the most popular data visualization library of Python. It is used to generate simple yet powerful visualizations. Everyone, from beginners to seasoned professionals in Data science, Matplotlib is the most widely used library for plotting.

Matplotlib是最受欢迎的Python数据可视化库。 它用于生成简单而强大的可视化。 从初学者到经验丰富的数据科学专业人士,Matplotlib是最广泛使用的绘图库。

Advantages:

优点:

  1. Matplotlib supports various types of graphical representations like Bar Graphs, Histograms, Line Graph, Scatter Plot, Stem Plots, etc.

    Matplotlib支持各种类型的图形表示,例如条形图,直方图,折线图,散点图,干图等。
  2. Matplotlib can be used in multiple ways including Python scripts, the Python and iPython shells, Jupyter Notebooks.

    Matplotlib可以多种方式使用,包括Python脚本,Python和iPython shell,Jupyter Notebooks。
  3. Matplotlib is a 2-D plotting library. But there are some extensions that we can use to create advanced visualizations like 3-Dimensional plots, etc.

    Matplotlib是一个二维绘图库。 但是,我们可以使用一些扩展来创建高级可视化效果,例如3维图等。
Image for post
3D representation using matplotlib
使用matplotlib的3D表示

2. Seaborn (2. Seaborn)

The Python library Seaborn is a data visualization library based on Matplotlib. Seaborn provides a variety of visualization patterns. It is more integrated to work with Pandas dataframe compared to matplotlib. Seaborn is widely used for statistics visualization because it has some of the best statistical tasks built with-in.

Python库Seaborn是基于Matplotlib的数据可视化库。 Seaborn提供了多种可视化模式。 与matplotlib相比,它与Pandas数据框的集成度更高。 Seaborn被广泛用于统计可视化,因为它具有一些内置的最佳统计任务。

Advantages:

优点:

  1. Seaborn uses fewer Syntax and we write less code to achieve high-grade visualizations.

    Seaborn使用的语法更少,我们编写的代码更少,可以实现高级可视化。
  2. When compared to matplolib, the seaborn graphs are much more visually appealing by default.

    与matplolib相比,默认情况下,seaborn图在视觉上更具吸引力。
Image for post
Matplotlib vs Seaborn using same dataset
Matplotlib vs Seaborn使用相同的数据集

3. Seaborn works with the whole dataset as a whole compared to matplotlib which deals with dataframes and arrays.

3.与处理数据帧和数组的matplotlib相比,Seaborn可以处理整个数据集。

3.密谋 (3. Plotly)

Plotly provides interactive plots and is easily readable to an audience who doesn’t have much knowledge of reading plots. Plotly is mostly used for handing the geographical, scientific, statistical, and financial data.

Plotly提供交互式绘图,对于不了解绘图的读者很容易理解。 Plotly主要用于处理地理,科学,统计和财务数据。

Advantages:

优点:

  1. Plotly is highly compatible with Jupyter Notebook and Web-Browsers, which makes it easy to share the graphs with end-users.

    Plotly与Jupyter Notebook和Web浏览器高度兼容,这使得与最终用户轻松共享图形成为可能。
  2. The most important advantage is Plotly offers contour plots, which cannot be found in most libraries.

    最重要的优点是Plotly提供了等高线图,这在大多数库中都找不到。
Image for post
Basic Contour Plot using Plotly
使用Plotly的基本轮廓图

3. While using Plotly, if we mouse over on the Graph, it shows the values of the axis at that particular point.

3.使用Plotly时,如果将鼠标悬停在Graph上,它将显示该特定点处的轴值。

There are some more data visualization libraries available in Python like Bokeh, Altair, ggplot, etc. But, the ones mentioned above are the most common and widely used libraries across the world.

Python中还有更多可用的数据可视化库,例如Bokeh,Altair,ggplot等。但是,上面提到的那些是世界上最常见且使用最广泛的库。

结论 (Conclusion)

In this article first, we learned why it is important to visualize the data instead of inferring solely based on datasheets. After that, we have seen the different types of data visualization libraries in Python. There are a wide variety of data visualization tools available in Python apart from the ones discussed and mentioned above. It is important to familiarize yourself with the libraries before proceeding with a particular approach.

首先,在本文中,我们了解了为什么对数据进行可视化而不是仅基于数据表进行推断很重要。 之后,我们看到了Python中不同类型的数据可视化库。 除了上面讨论和提到的工具外,Python还提供了各种各样的数据可视化工具。 在继续使用特定方法之前,一定要熟悉这些库,这一点很重要。

Thank you for reading and Happy Coding!!!

感谢您的阅读快乐编码!!!

在这里查看我以前有关Python的文章 (Check out my previous articles about Python here)

  • Pandas: Python

    熊猫:Python

  • Matplotlib: Python

    Matplotlib:Python

  • NumPy: Python

    NumPy:Python

  • Time Complexity and Its Importance in Python

    时间复杂度及其在Python中的重要性

  • Python Recursion or Recursive Function in Python

    Python中的Python递归或递归函数

  • Python Programs to check for Armstrong Number (n digit) and Fenced Matrix

    用于检查Armstrong编号(n位)和栅栏矩阵的Python程序

  • Python: Problems for Basics Reference — Swapping, Factorial, Reverse Digits, Pattern Print

    Python:基本参考问题-交换,阶乘,反向数字,图案打印

翻译自: https://levelup.gitconnected.com/data-visualization-and-its-importance-python-7599c1092a09

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389373.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

熊猫数据集_熊猫迈向数据科学的第三部分

熊猫数据集Data is almost never perfect. Data Scientist spend more time in preprocessing dataset than in creating a model. Often we come across scenario where we find some missing data in data set. Such data points are represented with NaN or Not a Number i…

Pytorch有关张量的各种操作

一,创建张量 1. 生成float格式的张量: a torch.tensor([1,2,3],dtype torch.float)2. 生成从1到10,间隔是2的张量: b torch.arange(1,10,step 2)3. 随机生成从0.0到6.28的10个张量 注意: (1).生成的10个张量中包含0.0和6.28&#xff…

mongodb安装失败与解决方法(附安装教程)

安装mongodb遇到的一些坑 浪费了大量的时间 在此记录一下 主要是电脑系统win10企业版自带的防火墙 当然还有其他的一些坑 一般的问题在第6步骤都可以解决,本教程的安装步骤不够详细的话 请自行百度或谷歌 安装教程很多 我是基于node.js使用mongodb结合Robo 3T数…

【洛谷算法题】P1046-[NOIP2005 普及组] 陶陶摘苹果【入门2分支结构】Java题解

👨‍💻博客主页:花无缺 欢迎 点赞👍 收藏⭐ 留言📝 加关注✅! 本文由 花无缺 原创 收录于专栏 【洛谷算法题】 文章目录 【洛谷算法题】P1046-[NOIP2005 普及组] 陶陶摘苹果【入门2分支结构】Java题解🌏题目…

web性能优化(理论)

什么是性能优化? 就是让用户感觉你的网站加载速度很快。。。哈哈哈。 分析 让我们来分析一下从用户按下回车键到网站呈现出来经历了哪些和前端相关的过程。 缓存 首先看本地是否有缓存,如果有符合使用条件的缓存则不需要向服务器发送请求了。DNS查询建立…

python多项式回归_如何在Python中实现多项式回归模型

python多项式回归Let’s start with an example. We want to predict the Price of a home based on the Area and Age. The function below was used to generate Home Prices and we can pretend this is “real-world data” and our “job” is to create a model which wi…

充分利用UC berkeleys数据科学专业

By Kyra Wong and Kendall Kikkawa黄凯拉(Kyra Wong)和菊川健多 ( Kendall Kikkawa) 什么是“数据科学”? (What is ‘Data Science’?) Data collection, an important aspect of “data science”, is not a new idea. Before the tech boom, every industry al…

文本二叉树折半查询及其截取值

using System;using System.ComponentModel;using System.Data;using System.Drawing;using System.Text;using System.Windows.Forms;using System.Collections;using System.IO;namespace CS_ScanSample1{ /// <summary> /// Logic 的摘要说明。 /// </summary> …

nn.functional 和 nn.Module入门讲解

本文来自《20天吃透Pytorch》 一&#xff0c;nn.functional 和 nn.Module 前面我们介绍了Pytorch的张量的结构操作和数学运算中的一些常用API。 利用这些张量的API我们可以构建出神经网络相关的组件(如激活函数&#xff0c;模型层&#xff0c;损失函数)。 Pytorch和神经网络…

10.30PMP试题每日一题

SC>0&#xff0c;CPI<1&#xff0c;说明项目截止到当前&#xff1a;A、进度超前&#xff0c;成本超值B、进度落后&#xff0c;成本结余C、进度超前&#xff0c;成本结余D、无法判断 答案将于明天和新题一起揭晓&#xff01; 10.29试题答案&#xff1a;A转载于:https://bl…

02-web框架

1 while True:print(server is waiting...)conn, addr server.accept()data conn.recv(1024) print(data:, data)# 1.得到请求的url路径# ------------dict/obj d["path":"/login"]# d.get(”path“)# 按着http请求协议解析数据# 专注于web业…

ai驱动数据安全治理_AI驱动的Web数据收集解决方案的新起点

ai驱动数据安全治理Data gathering consists of many time-consuming and complex activities. These include proxy management, data parsing, infrastructure management, overcoming fingerprinting anti-measures, rendering JavaScript-heavy websites at scale, and muc…

从Text文本中读值插入到数据库中

/// <summary> /// 转换数据&#xff0c;从Text文本中导入到数据库中 /// </summary> private void ChangeTextToDb() { if(File.Exists("Storage Card/Zyk.txt")) { try { this.RecNum.Visibletrue; SqlCeCommand sqlCreateTable…

Dataset和DataLoader构建数据通道

重点在第二部分的构建数据通道和第三部分的加载数据集 Pytorch通常使用Dataset和DataLoader这两个工具类来构建数据管道。 Dataset定义了数据集的内容&#xff0c;它相当于一个类似列表的数据结构&#xff0c;具有确定的长度&#xff0c;能够用索引获取数据集中的元素。 而D…

铁拳nat映射_铁拳如何重塑我的数据可视化设计流程

铁拳nat映射It’s been a full year since I’ve become an independent data visualization designer. When I first started, projects that came to me didn’t relate to my interests or skills. Over the past eight months, it’s become very clear to me that when cl…

Django2 Web 实战03-文件上传

作者&#xff1a;Hubery 时间&#xff1a;2018.10.31 接上文&#xff1a;接上文&#xff1a;Django2 Web 实战02-用户注册登录退出 视频是一种可视化媒介&#xff0c;因此视频数据库至少应该存储图像。让用户上传文件是个很大的隐患&#xff0c;因此接下来会讨论这俩话题&#…

BZOJ.2738.矩阵乘法(整体二分 二维树状数组)

题目链接 BZOJ洛谷 整体二分。把求序列第K小的树状数组改成二维树状数组就行了。 初始答案区间有点大&#xff0c;离散化一下。 因为这题是一开始给点&#xff0c;之后询问&#xff0c;so可以先处理该区间值在l~mid的修改&#xff0c;再处理询问。即二分标准可以直接用点的标号…

从数据库里读值往TEXT文本里写

/// <summary> /// 把预定内容导入到Text文档 /// </summary> private void ChangeDbToText() { this.RecNum.Visibletrue; //建立文件&#xff0c;并打开 string oneLine ""; string filename "Storage Card/YD" DateTime.Now.…

DengAI —如何应对数据科学竞赛? (EDA)

了解机器学习 (Understanding ML) This article is based on my entry into DengAI competition on the DrivenData platform. I’ve managed to score within 0.2% (14/9069 as on 02 Jun 2020). Some of the ideas presented here are strictly designed for competitions li…

Pytorch模型层简单介绍

模型层layers 深度学习模型一般由各种模型层组合而成。 torch.nn中内置了非常丰富的各种模型层。它们都属于nn.Module的子类&#xff0c;具备参数管理功能。 例如&#xff1a; nn.Linear, nn.Flatten, nn.Dropout, nn.BatchNorm2d nn.Conv2d,nn.AvgPool2d,nn.Conv1d,nn.Co…