熊猫分发_熊猫新手:第一部分

熊猫分发

For those just starting out in data science, the Python programming language is a pre-requisite to learning data science so if you aren’t familiar with Python go make yourself familiar and then come back here to start on Pandas.

对于刚接触数据科学的人来说,Python编程语言是学习数据科学的先决条件,因此,如果您不熟悉Python,请先熟悉一下,然后再回到这里开始学习Pandas。

You can start learning Python with a series of articles I just started called Minimal Python Required for Data Science.

您可以从我刚刚开始的一系列文章开始学习Python,这些文章称为“数据科学所需的最小Python” 。

One of the most important tools in the toolbox when it comes to data science is pandas which is a data analytics library for Python developed by Wes McKinney during his tenure at a hedge fund.

关于数据科学,工具箱中最重要的工具之一是pandas,这是Wes McKinney在对冲基金任职期间开发的Python数据分析库。

For this entire series of articles, we’re going to be using Anaconda which is a fancy Python package manager geared for data science and machine learning. If you aren’t familiar with what I just talked about go ahead and check out this video which will teach you about Anaconda and Jupyter Notebook which is central to data science work.

在整个系列文章中,我们将使用Anaconda ,这是一款专为数据科学和机器学习而设计的Python软件包管理器。 如果您不熟悉我刚才所说的内容,请观看此视频,该视频将教您有关Anaconda和Jupyter Notebook的知识,这对数据科学工作至关重要。

You can activate your conda environment (virtual environment) with:

您可以使用以下方法激活conda环境( 虚拟环境 ):

$ conda activate [name of environment]# my environment is named `datascience` so$ conda activate datascience

Once you activate your conda virtual environment you should see this on your Terminal:

激活conda虚拟环境后,您应该在终端上看到以下内容:

(datascience)$

Assuming you have miniconda or anaconda installed on your system you can easily install pandas with:

假设您的系统上安装了miniconda或anaconda,则可以使用以下方法轻松安装熊猫:

$ conda install pandas

We’re also going to be using Jupyter Notebook to do our coding so go ahead and

我们还将使用Jupyter Notebook进行编码,因此继续

$ 

And startup your Jupyter Notebook with:

然后使用以下命令启动Jupyter Notebook:

$ jupyter notebook

熊猫是将所有元素粘合在一起的粘合剂 (Pandas is the glue that holds it all together)

Image for post
Photo by Juhasz Imre from Pexels
Pexels的Juhasz Imre 摄影

Pandas gets more important as we venture higher up the hierarchy of data science into the fields of machine learning as it allows data to be “cleaned” and “wrangled” before getting fed to algorithms like Random Forest and Neural Networks. If ML algorithms are Doc, then pandas is Marty.

随着我们冒险将数据科学的层次结构带入机器学习领域,Pandas变得越来越重要,因为它允许在将数据馈入随机森林和神经网络等算法之前先对其进行“清理”和“整理”。 如果ML算法是Doc,则熊猫是Marty。

导游巴士之旅 (A Guided Bus Tour)

Image for post
My favorite. Photo by Venkat Ragavan from Pexels
我的最爱。 Pexels的Venkat Ragavan摄

One of my favorite places to visit even since childhood is the San Diego Zoo. And one thing I always do is to take the guided bus tour while drinking a Blue Moon.

即使从小我最喜欢去的地方之一是圣地亚哥动物园。 我一直要做的一件事就是在喝着“蓝月亮”的同时进行有导游的游览。

We’re going to do something similar in that I’m going to give a brief tour of just some of the things you can do with Pandas. You’re on your own with the Blue Moon.

我们将做类似的事情,简要介绍一下您可以使用Pandas进行的一些操作。 蓝月亮让你自己。

Both the data and the inspiration for this medium series come from Ted Petrou’s excellent courses on Dunder Data.

该媒体系列的数据和灵感均来自Ted Petrou的Dunder Data精品课程。

Pandas essentially deals with tabular data: rows and columns. In this respect it’s very much like an Excel spreadsheet.

熊猫本质上处理表格数据:行和列。 在这方面,它非常类似于Excel电子表格。

The two primary objects you’ll interface with in pandas is the Series and the DataFrame. A DataFrame is two-dimensional data complete with rows and columns.

您将在熊猫中使用的两个主要对象是SeriesDataFrame 。 DataFrame是具有行和列的二维数据。

It’s okay if you don’t know what the below code does we will go over it later in detail. The data that we use here concerns bicycle riders in the city of Chicago, Illnoise.

没关系,如果您不知道下面的代码是什么,我们稍后将详细介绍它。 我们在此使用的数据与伊利诺伊斯州芝加哥市的自行车骑手有关。

Image for post
DataFrame: tabular data
DataFrame:表格数据

Series is one-dimensional data or a single column of data with respect to a DataFrame:

系列是相对于DataFrame的一维数据或单列数据:

Image for post
Series: A single column of data
系列:单列数据

As shown above one of the highlights of pandas is that it allows data to be loaded into a Jupyter Notebook session from whatever the source file is whether it’s a CSV (comma delimited), XLSX(Excel), SQL, or JSON.

如上所示,pandas的亮点之一是它允许将数据从任何源文件加载到Jupyter Notebook会话中,无论源文件是CSV(逗号分隔),XLSX(Excel),SQL还是JSON。

One of the first things we always do is take a peek at the dataset we’re studying by using the head method. By default head will present the first five rows of the data. We can pass an integer to control how many rows we want to see:

我们经常要做的第一件事就是使用head方法窥视我们正在研究的数据集。 默认情况下, head将显示数据的前五行。 我们可以传递一个整数来控制我们要查看的行数:

df.head(7)
Image for post
First seven rows
前七行

If we want to see the last five rows:

如果要查看最后五行:

df.tail()

读入数据 (Read In Data)

We use the read_csv function to load CSV formatted data.

我们使用read_csv函数加载CSV格式的数据。

We pass the path to the file containing our data as a string to the read_csv method of pandas. In my case, I’m using the url of my GitHub Repo which holds all the data that I will be using. I highly recommend reading the documentation regarding pandas read_csv function as it’s one of the most important and dynamic functions within the whole library.

我们将包含数据的文件的路径作为字符串传递给read_csv方法。 就我而言,我使用的是GitHub Repo的网址,该网址包含我将要使用的所有数据。 我强烈建议阅读有关pandas read_csv函数的文档 ,因为它是整个库中最重要且最动态的函数之一。

筛选资料 (Filter Data)

We can filter rows of a pandas DataFrame with conditional logic. For programmers familiar with SQL this would be like using the WHERE clause.

我们可以使用条件逻辑过滤熊猫DataFrame的行。 对于熟悉SQL的程序员,这就像使用WHERE子句。

To retrieve only the rows where wind_speed is greater than 42.0 we can do this:

要仅检索wind_speed大于42.0的行,我们可以这样做:

Image for post
the filt variable stands for ‘filter’
filt变量代表“过滤器”

We can filter for more than one condition like this:

我们可以过滤多个条件,例如:

Image for post

Here we filter for the condition where the wind speed is greater than 42.0 (I’m assuming miles per hour) and where the gender of the bicyclist is female. As we can see it returns an empty dataset.

在这里,我们筛选出风速大于42.0(我假设每小时英里)并且骑自行车的性别是女性的情况。 如我们所见,它返回一个空的数据集。

We can verify that we’re not committing some kind of error that results in an empty query by trying out the same multiple filters but for male riders.

我们可以通过尝试相同的多个过滤器(但针对男性骑手)来验证是否未犯导致空查询的错误。

Image for post

We can also do something like this:

我们还可以这样做:

Image for post

查询:过滤的一种更简单的选择 (Query: A Simpler Alternative to Filtering)

Pandas also has a query method which is somewhat limited in its abilities, but allows for simpler and more readable code. Just as before, programmers familiar with SQL should feel comfortable with this method.

熊猫还具有一种query方法,该query方法的功能受到一定程度的限制,但允许使用更简单和更具可读性的代码。 和以前一样,熟悉SQL的程序员应该对此方法感到满意。

Image for post

未完待续 (To Be Continued)

Pandas for Newbies is meant to be a Medium series so watch for the next upcoming tutorial Pandas for Newbies: An Introduction Part II which will be posted soon.

《 Pandas for Newbies》是一个中级系列,因此请关注下一个即将发布的教程《 Pandas for Newbies:Introduction Part II》

我做的事 (What I do)

I help people find Mentors, Code in Python, and Write about Life. If you’re thinking about switching careers into the tech industry or just want to talk you can sign up for my Slack Channel via VegasBlu.

我帮助人们找到导师,Python代码并撰写关于生活的文章。 如果您正在考虑将职业转向科技行业,或者只是想谈谈,可以通过VegasBlu注册我的Slack频道。

翻译自: https://towardsdatascience.com/pandas-for-newbies-an-introduction-part-i-8246f14efcca

熊猫分发

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389198.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

多线程 进度条 C# .net

前言  在我们应用程序开发过程中,经常会遇到一些问题,需要使用多线程技术来加以解决。本文就是通过几个示例程序给大家讲解一下多线程相关的一些主要问题。 执行长任务操作  许多种类的应用程序都需要长时间操作,比如:执行一…

《Linux内核原理与分析》第六周作业

课本:第五章 系统调用的三层机制(下) 中断向量0x80和system_call中断服务程序入口的关系 0x80对应着system_call中断服务程序入口,在start_kernel函数中调用了trap_init函数,trap_init函数中调用了set_system_trap_gat…

Codeforces Round 493

心情不好&#xff0c;被遣散回学校 &#xff0c;心态不好 &#xff0c;为什么会累&#xff0c;一直微笑就好了 #include<bits/stdc.h> using namespace std; int main() {freopen("in","r",stdin);\freopen("out","w",stdout);i…

android动画笔记二

从android3.0&#xff0c;系统提供了一个新的动画&#xff0d;property animation, 为什么系统会提供这样一个全新的动画包呢&#xff0c;先来看看之前的补间动画都有什么缺陷吧1、传统的补间动画都是固定的编码&#xff0c;功能是固定的&#xff0c;扩展难度大。比如传统动画只…

回归分析检验_回归分析

回归分析检验Regression analysis is a reliable method in statistics to determine whether a certain variable is influenced by certain other(s). The great thing about regression is also that there could be multiple variables influencing the variable of intere…

是什么样的骚操作让应用上线节省90%的时间

优秀的程序员 总会想着 如何把花30分钟才能解决的问题 在5分钟内就解决完 例如在应用上线这件事上 通常的做法是 构建项目在本地用maven打包 每次需要clean一次&#xff0c;再build一次 部署包在本地ide、git/svn、maven/gradie 及代码仓库、镜像仓库和云平台间 来回切换 上传部…

Ubuntu 18.04 下如何配置mysql 及 配置远程连接

首先是大家都知道的老三套&#xff0c;啥也不说上来就放三个大招&#xff1a; sudo apt-get install mysql-serversudo apt isntall mysql-clientsudo apt install libmysqlclient-dev 这三步下来mysql就装好了&#xff0c;然后我们偷偷检查一下 sudo netstat -tap | grep mysq…

数据科学与大数据技术的案例_主数据科学案例研究,招聘经理的观点

数据科学与大数据技术的案例I’ve been in that situation where I got a bunch of data science case studies from different companies and I had to figure out what the problem was, what to do to solve it and what to focus on. Conversely, I’ve also designed case…

队列的链式存储结构及其实现_了解队列数据结构及其实现

队列的链式存储结构及其实现A queue is a collection of items whereby its operations work in a FIFO — First In First Out manner. The two primary operations associated with them are enqueue and dequeue.队列是项目的集合&#xff0c;由此其操作以FIFO(先进先出)的方…

cad2016珊瑚_预测有马的硬珊瑚覆盖率

cad2016珊瑚What’s the future of the world’s coral reefs?世界珊瑚礁的未来是什么&#xff1f; In February of 2020, scientists at University of Hawaii Manoa released a study addressing this very question. The models they developed forecasted a 70–90% worl…

EChart中使用地图方式总结(转载)

EChart中使用地图方式总结 2018年02月06日 22:18:57 来源&#xff1a;https://blog.csdn.net/shaxiaozilove/article/details/79274772最近在仿照EChart公交线路方向示例&#xff0c;开发表示排水网和污水网流向地图&#xff0c;同时地图上需要叠加排放口、污染源、污水处理厂等…

android mvp模式

越来越多人讨论mvp模式&#xff0c;mvp在android应用开发中获得更多的重视&#xff0c;这里说一下对MVP的简单了解。 什么是 MVP? MVP模式使逻辑从视图层分开&#xff0c;目的是我们在屏幕上怎么表现&#xff0c;和界面如何工作的所有事情就完全分开了。 View显示数据&…

Node.js REPL(交互式解释器)

2019独角兽企业重金招聘Python工程师标准>>> Node.js REPL(交互式解释器) Node.js REPL(Read Eval Print Loop:交互式解释器) 表示一个电脑的环境&#xff0c;类似 Window 系统的终端或 Unix/Linux shell&#xff0c;我们可以在终端中输入命令&#xff0c;并接收系统…

用python进行营销分析_用python进行covid 19分析

用python进行营销分析Python is a highly powerful general purpose programming language which can be easily learned and provides data scientists a wide variety of tools and packages. Amid this pandemic period, I decided to do an analysis on this novel coronav…

Alpha冲刺第二天

Alpha第二天 1.团队成员 郑西坤 031602542 &#xff08;队长&#xff09; 陈俊杰 031602504陈顺兴 031602505张胜男 031602540廖钰萍 031602323雷光游 031602319苏芳锃 0316023302.项目燃尽图 3.项目进展 时间工作内容11月18日UI设计、初步架构搭建11月19日UI设计、服务器的进一…

水文分析提取河网_基于图的河网段地理信息分析排序算法

水文分析提取河网The topic of this article is the application of information technologies in environmental science, namely, in hydrology. Below is a description of the algorithm for ranking rivers and the plugin we implemented for the open-source geographic…

请不要更多的基本情节

“If I see one more basic blue bar plot…”“如果我再看到一个基本的蓝色条形图……” After completing the first module in my studies at Flatiron School NYC, I started playing with plot customizations and design using Seaborn and Matplotlib. Much like doodl…

Powershell-获取DHCP地址租用信息

需求&#xff1a;业务需要获取现阶段DHCP服务器所有地址租用信息。 1.首先查看DHCP相关帮助信息&#xff1a;2.确定执行命令并获取相关帮助信息&#xff1a;help Get-DhcpServerv4Scope 名称 Get-DhcpServerv4Scope 语法 Get-DhcpServerv4Scope [[-ScopeId] <ipaddress[]>…

python 交互式流程图_使用Python创建漂亮的交互式和弦图

python 交互式流程图Python中的数据可视化 (Data Visualization in Python) R vs Python is a constant tussle when it comes to what is the best language, according to data scientists. Though each language has it’s strengths, R, in my opinion has one cutting-edg…

机器学习解决什么问题_机器学习帮助解决水危机

机器学习解决什么问题According to Water.org and Lifewater International, out of 57 million people in Tanzania, 25 million do not have access to safe water. Women and children must travel each day multiple times to gather water when the safety of that water …