
Machine learning is a complex discipline. The implementation of machine learning models is now far much easier than it used to be, this is as a result of Machine learning frameworks such as pandas. Wait!! isnt panda an animal? As I rec…


Machine learning is a complex discipline. The implementation of machine learning models is now far much easier than it used to be, this is as a result of Machine learning frameworks such as pandas. Wait!! isn't panda an animal? As I recall panda is an animal, this was my reaction in a Data science class by the end of the class I had completely grasped the concept of pandas.

机器学习是一门复杂的学科。 机器学习模型的实现现在比以前容易得多,这是由于熊猫等机器学习框架的结果。 等待!! 熊猫不是动物吗? 我记得熊猫是一种动物,这是我在数据科学课上的React,直到我完全掌握了熊猫的概念。

Pandas is an open-source library, free to use (under theBSD license) and it was originally written by Wes McKinney back in 2009. Today we look at Pandas Library an entirely different kind of panda that is not only powerful but also the most used Library when it comes to data munging/wrangling.

Pandas是一个开放源代码库,免费使用(已获得BSD许可),最初由Wes McKinney于2009年编写。今天,我们将Pandas Library视为一种完全不同的熊猫,它不仅功能强大,而且使用最广泛关于数据整理/整理的库。

This article is purely for others like me who might be confused of the connection between the animal and the Data. Note: there is no connection between pandas the animal and the library.

本文仅适用于像我这样的人,他们可能会对动物和数据之间的联系感到困惑。 注意:熊猫与动物之间没有任何联系。

什么是熊猫 (What is Pandas.)

Pandas is a fast, powerful, flexible, and easy to use open-source data analysis and manipulation tool. It is the most common tool used by Data analyst Data scientists working with data and use the python platform.

Pandas是一种快速,强大,灵活且易于使用的开源数据分析和处理工具。 它是数据分析师,数据科学家使用数据并使用python平台使用的最常用工具。

According to Wikipedia it is derived from the term ““panel data”, an econometrics term for data sets that include observations over multiple time periods for the same individuals. [Pandas] is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.’’

根据维基百科 它源自术语“ 面板数据”,这是数据集的计量经济学术语,其中包括同一个人在多个时间段内的观察结果。 [Pandas] 是为Python编程语言编写的用于数据处理和分析的软件库。 特别是,它提供了用于操纵数值表和时间序列的数据结构和操作。''

Before you work with pandas you have to install it in your system. Depending on the type of system the installation differs.The easiest way to install pandas is to install it as part of the Anaconda distribution, a cross-platform distribution for data analysis and scientific computing. It is the recommended installation method for most users. The anaconda distribution is the most used platform that is used when it comes to working with data it comes intergrated with a number of tools that are used in working with data.

在使用熊猫之前,必须将其安装在系统中。 根据系统类型的不同,安装熊猫的最简单方法是将其作为Anaconda发行版的一部分进行安装, Anaconda发行版是用于数据分析和科学计算的跨平台发行版。 对于大多数用户,这是推荐的安装方法。 anaconda发行版是处理数据时最常用的平台,它与许多用于处理数据的工具集成在一起。

为什么是熊猫? (Why pandas?)

Have you ever tried working with data without the pandas’ library? If not, this will be a hard task you will have to perform when it comes to working with data unless you are using a language like R where the case is different. If you tried working without pandas then you understand the need for the library.

您是否曾经尝试过在没有熊猫库的情况下使用数据? 如果不是这样,除非涉及不同的情况,否则在处理数据时这将是一项艰巨的任务,除非您使用R之类的语言。 如果您尝试在没有熊猫的情况下工作,那么您会了解对图书馆的需求。

The reason why pandas are the most used library is that when working with tabular data, exploration, cleaning, and processing of your data is the very first and most important steps. These steps ensure that you get to understand the structure of the data. In this case, identifying the missing values, the size of the data frame the type of data. With pandas, you get a general view of the kind of data that you are working with.

大熊猫是最常用的库的原因是,在处理表格数据时,探索,清理和处理数据是最重要的第一步。 这些步骤可确保您了解数据的结构。 在这种情况下,识别丢失的值,数据帧的大小就是数据的类型。 使用熊猫,您可以大致了解正在使用的数据类型。

Pandas are suited for many different kinds of data:


-Arbitrary matrix data with row and column labels.-Ordered and unordered time-series data.- Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet, working with tabular data, such as data stored in spreadsheets or databases, pandas is the right tool for you


-Any other form of observational/statistical data sets.


The fact that pandas support the integration with many file formats or data sources out of the box (CSV, Excel, SQL, JSON, parquet,. . . ) this is a bonus to pandas being the most popular library used in python. Pandas are commonly used for data analysis. The library allows various data manipulation operations such as merging, reshaping, selecting, as well as data cleaning, and data wrangling features.

大熊猫支持开箱即用的许多文件格式或数据源(CSV,Excel,SQL,JSON,parquet等)的集成,这是熊猫作为python中最受欢迎的库的一大优势。 熊猫通常用于数据分析。 该库允许进行各种数据操作操作,例如合并,重塑,选择以及数据清理和数据整理功能。

Image for post

Pandas provide a platform to visualize the data this allows one to draw conclusions based on the relationships in the plots. Plots are a useful tool when it comes to understanding the relationship in the data. You are sure to use plots to get a conclusion based on the data. You also get the chance to choose the plot type (scatter, bar, boxplot,… ) corresponding to your data.

熊猫提供了一个可视化数据的平台,这使人们可以根据地块之间的关系得出结论。 当了解数据中的关系时,图是一个有用的工具。 您一定要使用图来根据数据得出结论。 您还可以选择与数据相对应的绘图类型(散点图,条形图,箱线图等)。

摘要 (Summary)

Pandas is a package that provides a fast, flexible, and expressive library designed to make working with “relational” or “labeled” data both easy and intuitive. Its goal is to be a fundamental high-level building block for practicing, real-world data analysis in Python.

Pandas是一个软件包,提供了快速,灵活和富于表现力的库,旨在使使用“关系”或“标记”数据既简单又直观。 它的目标是成为在Python中进行实际数据分析的基本高级构建块。

With Pandas you are offered the power to work with a variety of data including, Arbitrary matrix data with row and column labels, Ordered and unordered time-series data, Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet and any other form of observational/statistical data sets.


