熊猫tv新功能介绍
Out of all technologies that is introduced in Data Analysis, Pandas is one of the most popular and widely used library.
在Data Analysis引入的所有技术中,P andas是最受欢迎和使用最广泛的库之一。
So what are we going to cover :
那么我们要讲的是:
- Installation of pandas 熊猫的安装
- Key components of pandas 大熊猫的主要成分
- Read/Import data from CSV file 从CSV文件读取/导入数据
- Write/Export data to CSV files 将数据写入/导出到CSV文件
- Viewing and selecting data 查看和选择数据
1.安装熊猫 (1. Installation of pandas)
Let’s take care of the boring but important stuff first. Setting up the space to work with pandas.
首先让我们处理无聊但重要的事情。 设置与熊猫共处的空间。
If you are using conda as your environment with miniconda or Anaconda then:
如果您使用的 畅达 与 miniconda 或 Python 那么 你的环境 :
- Activate your environment 激活您的环境
conda activate ./env
conda激活./env
- Install pandas package 安装熊猫包
conda install pandas
conda安装熊猫
If you are using virtual environment with virtualenv then :
如果您通过virtualenv使用虚拟环境,则:
- Activate your environment 激活您的环境
source ./env/bin/activate
源./env/bin/activate
- Install pandas package 安装熊猫包
pip install pandas
点安装熊猫
If you are using virtual environment with pipenv then :
如果您通过pipenv使用虚拟环境,则:
- create and environment and install pandas in that environment 在该环境中创建和环境并安装熊猫
pipenv install pandas
pipenv安装熊猫
- Activate the environment 激活环境
pipenv shell
皮壳
2.大熊猫的主要成分 (2. Key components of pandas)
Pandas provides two compound data types, which are the key components of pandas that gives us so much flexibility on selecting, viewing and manipulating the data. Those two key components are:
熊猫提供了两种复合数据类型,它们是熊猫的关键组成部分,这使我们在选择,查看和操作数据方面具有如此大的灵活性。 这两个关键组成部分是:
- Pandas Series 熊猫系列
- Pandas Data Frame 熊猫数据框
熊猫系列 (Pandas Series)
It is an one dimensional array offered by pandas. It can store different types of data ( meaning int,string, float, boolean etc..)
它是熊猫提供的一维数组。 它可以存储不同类型的数据(表示int,string,float,boolean等。)
A pandas series data be created as:
将熊猫系列数据创建为:
import pandas as pd
将熊猫作为pd导入
student_pass_percentage_in_country = pd.Series([“90”, “67”, “85”])
student_pass_percentage_in_country = pd.Series([“ 90”,“ 67”,“ 85”])
countries = pd.Series([“India”, “USA”, “China”])
国家= pd.Series([“印度”,“美国”,“中国”])
熊猫数据框 (Pandas Data Frame)
It is the one where most of the magic happens. It is a two dimensional array , you can think of it as an excel sheet.
这是大多数魔术发生的地方。 它是一个二维数组,您可以将其视为Excel工作表。
- The index in pandas starts from 0. 熊猫的索引从0开始。
- The row is referred as axis=1 and column as axis=0. 该行称为axis = 1,而列称为axis = 0。
- Its first column represents the index. 它的第一列代表索引。
- More then one row can be associated with one index. So there are two ways of looking for data: one by index, one by position. Position also starts from 0. 多于一行可以与一个索引相关联。 因此,有两种查找数据的方法:一种是按索引,一种是按位置。 位置也从0开始。
A pandas data frame can be created as:
熊猫数据框可以创建为:
student_pass_percent_by_country = pd.DataFrame({ ‘Country’: countries, ‘Pass Percent’: student_pass_percentage_in_country})
student_pass_percent_by_country = pd.DataFrame({'Country':国家,'Pass Percent':student_pass_percentage_in_country})
3.从CSV文件读取/导入数据 (3. Read / import data from CSV file)
First lets see how CSV file data looks like.
首先,让我们看看CSV文件数据的外观。
A CSV file contains data in comma separated format, which looks like:
CSV文件包含逗号分隔格式的数据,如下所示:
Reading CSV data is very straight forward in pandas. It provides you two functions : read_csv(‘file_path’) or read_csv(‘file_url’) , the data gets stored in data frame.
在熊猫中,读取CSV数据非常简单。 它提供了两个功能:read_csv('file_path')或read_csv('file_url'),数据被存储在数据框中。
i have taken this public repository from curran, so that you can use it as well.
我已经从curran那里获取了这个公共存储库,以便您也可以使用它。
csv_data = pd.read_csv(‘https://github.com/curran/data/blob/gh-pages/indiaGovOpenData/All_India_Index-February2016.csv’)
csv_data = pd.read_csv(' https://github.com/curran/data/blob/gh-pages/indiaGovOpenData/All_India_Index-February2016.csv ')
As you can see it right away tells us how many rows and columns are there in the data.
如您所见,它立即告诉我们数据中有多少行和多少列。
4.将数据写入/导出到CSV文件 (4. Write/Export data to CSV files)
Exporting data to CSV file is as simple as importing it. Pandas has a function called : to_csv(‘file_name’), this will export the data from a data frame to CSV file.
将数据导出到CSV文件就像导入数据一样简单。 熊猫有一个名为:to_csv('file_name')的函数,它将数据从数据帧导出到CSV文件。
csv_data.to_csv(‘new_exported_data.csv;’)
csv_data.to_csv('new_exported_data.csv;')
5.查看和选择数据 (5. Viewing and Selecting data)
As we get to work with a lot of data so if we can view and select the data the way we want, it can give us more insights on the data at the first place.
当我们开始处理大量数据时,如果我们可以按照自己的方式查看和选择数据,那么它首先可以为我们提供关于数据的更多见解。
To view a snippet of data , ( 5 rows by default ):
要查看数据片段,(默认为5行):
csv_data.head()
csv_data.head()
To view more then just 5 records, let’s say you want to see 23 records from the top:
要查看仅5条记录,假设您要从顶部查看23条记录:
csv_data.head(23)
csv_data.head(23)
To view a snippet of data from bottom:
要从底部查看数据片段:
csv_data.tail()
csv_data.tail()
To view more then just 5 records from bottom, let’s say you want to see 11 records from the bottom:
要从底部仅查看5条记录,假设您要从底部查看11条记录:
csv_data.tail(11)
csv_data.tail(11)
To list out all the columns in the data:
列出数据中的所有列:
csv_data.columns
csv_data.columns
In pandas dataframe we can assign more then one data in an index. and the index starts from 0.
在pandas数据框中,我们可以在一个索引中分配多个数据。 索引从0开始。
sample_data = pd.DataFrame({‘name’: [‘Arun’, ‘Shiva’, ‘Rafah’], ‘age’: [12, 34, 45]}, index=[1, 1, 2])
sample_data = pd.DataFrame({'name':['Arun','Shiva','Rafah'],'age':[12,34,45]},index = [1,1,2])
One thing you have noticed above is that , i can create data frame from plan python lists as well.
您在上面注意到的一件事是,我也可以从计划python列表创建数据框。
View data at index 3:
查看索引3的数据:
sample_data.loc[1]
sample_data.loc [1]
View data at position 3:
查看位置3的数据:
sample_data.iloc[1]
sample_data.iloc [1]
Selecting a column , you can select a column in two ways
选择列,您可以通过两种方式选择列
a. Dot notation:
一个。 点表示法:
sample_data.age
sample_data.age
b. Index/Attribute notation:
b。 索引/属性符号:
sample_data[‘age’]
sample_data ['age']
The first option (a) will not work if the column name has spaces. So select one and stick to that.
如果列名包含空格,则第一个选项(a)将不起作用。 因此,选择一个并坚持下去。
Selecting only those data where age is greater than 20:
仅选择年龄大于20的那些数据:
sample_data[sample_data[‘age’] > 20]
sample_data [sample_data ['age']> 20]
I have just listed only most used functions here. I am planning to keep updating the article as i am going to refer it as well if i forget anything. If you have any questions or want to discuss any project feel free to comment here.
我在这里只列出了最常用的功能。 我打算继续更新文章,因为如果我忘记了任何内容,我也会参考它。 如果您有任何疑问或想要讨论任何项目,请在此处发表评论。
Thank you for reading :)
谢谢您的阅读:)
翻译自: https://medium.com/@lax_17478/data-analysis-a-complete-introduction-to-pandas-part-1-3dd06922144a
熊猫tv新功能介绍
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391647.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!