熊猫数据集_大熊猫数据框的5个基本操作

熊猫数据集

Tips and Tricks for Data Science

数据科学技巧与窍门

Pandas is a powerful and easy-to-use software library written in the Python programming language, and is used for data manipulation and analysis.

Pandas是使用Python编程语言编写的功能强大且易于使用的软件库,可用于数据处理和分析。

Installing pandas: https://pypi.org/project/pandas/

安装熊猫: https : //pypi.org/project/pandas/

pip install pandas

pip install pandas

什么是Pandas DataFrame? (What is a Pandas DataFrame?)

A pandas DataFrame is a two dimensional data structure which stores data in a tabular form. Every row and column are labeled and can hold data of any type.

pandas DataFrame是二维数据结构,以表格形式存储数据。 每行和每列都有标签,可以保存任何类型的数据。

Here is an example:

这是一个例子:

Image for post
First 3 rows of the Titanic: Machine Learning from Disaster dataset
泰坦尼克号的前三行:灾难数据中的机器学习

1.创建一个熊猫DataFrame (1. Creating a pandas DataFrame)

The pandas.DataFrame constructor:

pandas.DataFrame构造函数:

pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False

pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False

data This parameter serves as the input to make a DataFrame, which could be a NumPy ndarray, iterable, dict or another DataFrame. An ndarray is a multidimensional container of items of the same type and size. An iterable is any Python object capable of returning its members one at a time, permitting to be iterated over in a for-loop. Some examples for iterables are lists, tuples and sets. Dict here can refer to pandas Series, arrays, constants or list-like objects.

data此参数用作制作DataFrame的输入,该DataFrame可以是NumPy ndarray,可迭代,dict另一个DataFramendarray是具有相同类型和大小的项目的多维容器。 可迭代对象是能够一次返回其成员并允许在for循环中对其进行迭代的任何Python对象。 可迭代的一些示例是列表,元组和集合。 这里的Dict可以引用pandas系列,数组,常量或类似列表的对象。

indexThis parameter could have an Index or an array-like data type and serves as the index for the row labels in the resulting DataFrame. If no indexing information is provided, this parameter will default to RangeIndex.

index此参数可以具有Index或类似数组的数据类型,并用作结果DataFrame中行标签的索引。 如果没有提供索引信息,则此参数将默认为RangeIndex 。

columnsThis parameter could have an Index or an array-like data type and serves as the index for the column labels in the resulting DataFrame. If no indexing information is provided, this parameter will default to RangeIndex.

columns此参数可以具有Index或类似数组的数据类型,并用作结果DataFrame中列标签的索引。 如果没有提供索引信息,则此参数将默认为RangeIndex 。

dtypeEach column in the DataFrame can only have a single data type. This parameter is used to force a certain data type. By default, datatype is inferred from data.

DTYPE在数据帧的每一列只能有一种数据类型。 此参数用于强制某种数据类型。 默认情况下,从数据推断出数据类型。

copyWhen this parameter is set to True, and the input data is a DataFrame or a 2D ndarray, data is copied into the resulting DataFrame. By default, copy is set to False.

复制如果将此参数设置为True,并且输入数据是DataFrame或2D ndarray,则将数据复制到结果DataFrame中。 默认情况下,复制设置为False。

从Python字典创建Pandas DataFrame (Creating a Pandas DataFrame from a Python Dictionary)

import pandas as pd

import pandas as pd

d = {'Name' : ['John', 'Adam', 'Jane'], 'Age' : [25, 18, 30]}pd.DataFrame(d)

d = {'Name' : ['John', 'Adam', 'Jane'], 'Age' : [25, 18, 30]}pd.DataFrame(d)

Image for post

The index parameter can be used to change the default row index and the columns parameter can be used to change the order of the keys:

index参数可用于更改默认行索引, columns参数可用于更改键的顺序:

d = {'Name' : ['John', 'Adam', 'Jane'], 'Age' : [25, 18, 30]}pd.DataFrame(d, index=[10, 20, 30], columns=['First Name', 'Current Age'])

d = {'Name' : ['John', 'Adam', 'Jane'], 'Age' : [25, 18, 30]}pd.DataFrame(d, index=[10, 20, 30], columns=['First Name', 'Current Age'])

Image for post

从列表创建Pandas DataFrame: (Creating a Pandas DataFrame from a list:)

l = [['John', 25], ['Adam', 18], ['Jane', 30]]pd.DataFrame(l, columns=['Name', 'Age'])

l = [['John', 25], ['Adam', 18], ['Jane', 30]]pd.DataFrame(l, columns=['Name', 'Age'])

Image for post

从文件创建Pandas DataFrame (Creating a Pandas DataFrame from a File)

For any Data Science process, the dataset is commonly stored in files having formats like CSV (Comma Separated Values). Pandas allows storing data along with their labels from a CSV file using the method pandas.read_csv().

对于任何数据科学过程,数据集通常存储在具有CSV(逗号分隔值)之类的格式的文件中。 Pandas允许使用pandas.read_csv()方法将数据及其标签中的数据与CSV文件一起存储

Image for post
Example1.csv
Example1.csv
Image for post

2.从Pandas DataFrame中选择行和列 (2. Selecting Rows and Columns from a Pandas DataFrame)

从Pandas DataFrame中选择列 (Selecting Columns from a Pandas DataFrame)

Columns can be selected using their column names.

可以使用列名称选择列。

df[column_1, column_2])

df[ column_1 , column_2 ])

Image for post
Selecting column ‘Name’ from DataFrame df
从DataFrame df中选择“名称”列

从Pandas DataFrame中选择行 (Selecting Rows from a Pandas DataFrame)

Pandas provides 2 attributes for selecting rows from a DataFrame: loc and iloc

Pandas提供了2个用于从DataFrame中选择行的属性: lociloc

loc is label-based, which means that the row label has to be specified and iloc is integer-based which means that the integer index has to be specified.

loc是基于标签的,这意味着必须指定行标签,而iloc是基于整数的,这意味着必须指定整数索引。

Image for post
Using loc and iloc for selecting rows from DataFrame df
使用loc和iloc从DataFrame df中选择行

3.在Pandas DataFrame中插入行和列 (3. Inserting Rows and Columns to a Pandas DataFrame)

在Pandas DataFrame中插入行 (Inserting Rows to a Pandas DataFrame)

One method of inserting a row into a DataFrame is to create a pandas.Series() object and insert it at the end of the DataFrame using the pandas.DataFrame.append()method. The column indices of the DataFrame serve as the index attribute for the Series object.

将行插入DataFrame的一种方法是创建pandas.Series() 对象,然后使用pandas.DataFrame.append()方法将其插入DataFrame的pandas.DataFrame.append() 。 DataFrame的列索引用作Series对象的索引属性。

Image for post
Inserting new row to DataFrame df
将新行插入DataFrame df

将列插入Pandas DataFrame (Inserting Columns to a Pandas DataFrame)

One easy method of adding a column to a DataFrame is by just referring to the new column and assigning values.

将列添加到DataFrame的一种简单方法是仅引用新列并分配值。

Image for post
Inserting columns ID, Score and Country to DataFrame df
将列ID,分数和国家/地区插入DataFrame df

4.从Pandas DataFrame删除行和列 (4. Deleting Rows and Columns from a Pandas DataFrame)

从Pandas DataFrame删除行 (Deleting Rows from a Pandas DataFrame)

A row can be deleted using the method pandas.DataFrame.drop() with it’s row label.

可以使用带有行标签的pandas.DataFrame.drop()方法删除一行。

Image for post
Deleting row with label 1 from DataFrame df
从DataFrame df中删除带有标签1的行

To delete a row based on a column, the index of the row is obtained using the DataFrame.index attribute and then the row with the index is deleted using the pandas.DataFrame.drop() method.

要删除基于列的行,请使用DataFrame.index属性获取该行的索引,然后使用pandas.DataFrame.drop()方法删除具有索引的行。

Image for post
Deleting row with Name Kelly from DataFrame df
从DataFrame df中删除名称为Kelly的行

从Pandas DataFrame删除列 (Deleting Columns from a Pandas DataFrame)

A column can be deleted from a DataFrame based on its label as well as its position in the DataFrame using the method pandas.DataFrame.drop().

可以使用pandas.DataFrame.drop()方法根据列的标签及其在DataFrame中的位置从DataFrame中删除列

Image for post
Deleting column with label ‘Country’ from DataFrame df
从DataFrame df中删除带有标签“国家”的列
Image for post
Deleting column with position 2 from DataFrame df
从DataFrame df中删除位置2的列

The axis argument is set to 1 when dropping columns, and 0 when dropping rows.

删除列时, axis参数设置为1;删除行时, axis参数设置为0。

5.对Pandas DataFrame排序 (5. Sorting a Pandas DataFrame)

A Pandas DataFrame can be sorted using the pandas.DataFrame.sort_values() method. The by parameter for the method serves as the label of the column to sort by and ascending is set to True for sorting in ascending order and to False for sorting in descending order.

可以使用pandas.DataFrame.sort_values()方法对Pandas DataFrame进行排序。 该方法的by参数用作要按其进行排序的列的标签,并且升序设置为True(以升序排序),设置为False(以降序排序)。

Image for post
Sorting DataFrame df by Name in ascending order
按名称对DataFrame df进行升序排序
Image for post
Sorting DataFrame df by Age in descending order
按年龄降序对DataFrame df进行排序

https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-pythonhttps://realpython.com/pandas-dataframe/#creating-a-pandas-dataframehttps://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htmhttps://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python https://realpython.com/pandas-dataframe/#creating-a-pandas-dataframe https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

翻译自: https://medium.com/ml-course-microsoft-udacity/5-fundamental-operations-on-a-pandas-dataframe-93b4384dff9d

熊猫数据集

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389646.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

图嵌入综述 (arxiv 1709.07604) 译文五、六、七

应用 图嵌入有益于各种图分析应用,因为向量表示可以在时间和空间上高效处理。 在本节中,我们将图嵌入的应用分类为节点相关,边相关和图相关。 节点相关应用 节点分类 节点分类是基于从标记节点习得的规则,为图中的每个节点分配类标…

聊聊自动化测试框架

无论是在自动化测试实践,还是日常交流中,经常听到一个词:框架。之前学习自动化测试的过程中,一直对“框架”这个词知其然不知其所以然。 最近看了很多自动化相关的资料,加上自己的一些实践,算是对“框架”有…

移动磁盘文件或目录损坏且无法读取资料如何找回

文件或目录损坏且无法读取说明这个盘的文件系统结构损坏了。在平时如果数据不重要,那么可以直接格式化就能用了。但是有的时候里面的数据很重要,那么就必须先恢复出数据再格式化。具体恢复方法可以看正文了解(不格式化的恢复方法)…

python 平滑时间序列_时间序列平滑以实现更好的聚类

python 平滑时间序列In time series analysis, the presence of dirty and messy data can alter our reasonings and conclusions. This is true, especially in this domain, because the temporal dependency plays a crucial role when dealing with temporal sequences.在…

帮助学生改善学习方法_学生应该如何花费时间改善自己的幸福

帮助学生改善学习方法There have been numerous studies looking into the relationship between sleep, exercise, leisure, studying and happiness. The results were often quite like how we expected, though there have been debates about the relationship between sl…

Spring Boot 静态资源访问原理解析

一、前言 springboot配置静态资源方式是多种多样,接下来我会介绍其中几种方式,并解析一下其中的原理。 二、使用properties属性进行配置 应该说 spring.mvc.static-path-pattern 和 spring.resources.static-locations这两属性是成对使用的,如…

深挖“窄带高清”的实现原理

过去几年,又拍云一直在点播、直播等视频应用方面潜心钻研,取得了不俗的成果。我们结合点播、直播、短视频等业务中的用户场景,推出了“省带宽、压成本”系列文章,从编码技术、网络架构等角度出发,结合又拍云的产品成果…

Redis 服务安装

下载 客户端可视化工具: RedisDesktopManager redis官网下载: http://redis.io/download windos服务安装 windows服务安装/卸载下载文件并解压使用 管理员身份 运行命令行并且切换到解压目录执行 redis-service --service-install windowsR 打开运行窗口, 输入 services.msc 查…

熊猫数据集_对熊猫数据框使用逻辑比较

熊猫数据集P (tPYTHON) Logical comparisons are used everywhere.逻辑比较随处可见 。 The Pandas library gives you a lot of different ways that you can compare a DataFrame or Series to other Pandas objects, lists, scalar values, and more. The traditional comp…

决策树之前要不要处理缺失值_不要使用这样的决策树

决策树之前要不要处理缺失值As one of the most popular classic machine learning algorithm, the Decision Tree is much more intuitive than the others for its explainability. In one of my previous article, I have introduced the basic idea and mechanism of a Dec…

gl3520 gl3510_带有gl gl本机的跨平台地理空间可视化

gl3520 gl3510Editor’s note: Today’s post is by Ib Green, CTO, and Ilija Puaca, Founding Engineer, both at Unfolded, an “open core” company that builds products and services on the open source deck.gl / vis.gl technology stack, and is also a major contr…

uiautomator +python 安卓UI自动化尝试

使用方法基本说明:https://www.cnblogs.com/mliangchen/p/5114149.html,https://blog.csdn.net/Eugene_3972/article/details/76629066 环境准备:https://www.cnblogs.com/keeptheminutes/p/7083816.html 简单实例 1.自动化安装与卸载 &#…

power bi中的切片器_在Power Bi中显示选定的切片器

power bi中的切片器Just recently, while presenting my session: “Magnificent 7 — Simple tricks to boost your Power BI Development” at the New Stars of Data conference, one of the questions I’ve received was:就在最近,在“新数据之星”会议上介绍我…

5939. 半径为 k 的子数组平均值

5939. 半径为 k 的子数组平均值 给你一个下标从 0 开始的数组 nums ,数组中有 n 个整数,另给你一个整数 k 。 半径为 k 的子数组平均值 是指:nums 中一个以下标 i 为 中心 且 半径 为 k 的子数组中所有元素的平均值,即下标在 i …

数据库逻辑删除的sql语句_通过数据库的眼睛查询sql的逻辑流程

数据库逻辑删除的sql语句Structured Query Language (SQL) is famously known as the romance language of data. Even thinking of extracting the single correct answer from terabytes of relational data seems a little overwhelming. So understanding the logical flow…

数据挖掘流程_数据流挖掘

数据挖掘流程1-简介 (1- Introduction) The fact that the pace of technological change is at its peak, Silicon Valley is also introducing new challenges that need to be tackled via new and efficient ways. Continuous research is being carried out to improve th…

北门外的小吃街才是我的大学食堂

学校北门外的那些小吃摊,陪我度过了漫长的大学四年。 细数下来,我最怀念的是…… (1)烤鸡翅 吸引指数:★★★★★ 必杀技:酥流油 烤鸡翅有蜂蜜味、香辣味、孜然味……最爱店家独创的秘制鸡翅。鸡翅的外皮被…

[LeetCode]最长公共前缀(Longest Common Prefix)

题目描述 编写一个函数来查找字符串数组中的最长公共前缀。如果不存在公共前缀,返回空字符串 ""。 示例 1:输入: ["flower","flow","flight"]输出: "fl"示例 2:输入: ["dog","racecar",&quo…

spark的流失计算模型_使用spark对sparkify的流失预测

spark的流失计算模型Churn prediction, namely predicting clients who might want to turn down the service, is one of the most common business applications of machine learning. It is especially important for those companies providing streaming services. In thi…

区块链开发公司谈区块链与大数据的关系

在过去的两千多年的时间长河中,数字一直指引着我们去探索很多未知的科学世界。到目前为止,随着网络和信息技术的发展,一切与人类活动相关的活动,都直接或者间接的连入了互联网之中,一个全新的数字化的世界展现在我们的…