大熊猫卸妆后_您不应错过的6大熊猫行动

大熊猫卸妆后

数据科学 (Data Science)

Pandas is used mainly for reading, cleaning, and extracting insights from data. We will see an advanced use of Pandas which are very important to a Data Scientist. These operations are used to analyze data and manipulate it if required. These are used in the steps performed before building any machine learning model.

熊猫主要用于读取,清洁和从数据中提取见解。 我们将看到对数据科学家非常重要的熊猫的高级用法。 这些操作用于分析数据并根据需要进行操作。 这些用于构建任何机器学习模型之前执行的步骤。

  1. Summarising Data

    汇总数据
  2. Concatenation

    级联
  3. Merge and Join

    合并与加入
  4. Grouping

    分组
  5. Pivot Table

    数据透视表
  6. Reshaping multi-index DataFrame

    重塑多索引DataFrame

We will be using the very famous Titanic dataset to explore the functionalities of Pandas. Let’s just quickly import NumPy, Pandas, and load Titanic Dataset from Seaborn.

我们将使用非常著名的泰坦尼克号数据集来探索熊猫的功能。 让我们快速导入NumPy,Pandas,并从Seaborn加载Titanic Dataset。

import numpy as np
import pandas as pd
import seaborn as sns
df = sns.load_dataset('titanic')
df.head()
Image for post

汇总数据 (Summarizing data)

The very first thing any data scientist would like to know is the statistics of the entire data. With the help of the Pandas .describe() method, we can see the summary stats of each feature. Notice, the stats are given only for numerical columns which is an obvious behavior we can also ask describe function to include categorical columns with the parameter ‘include’ and value equal to ‘all’ ( include=‘all’).

任何数据科学家都想知道的第一件事就是整个数据的统计信息。 借助Pandas .describe()方法 ,我们可以看到每个功能的摘要状态。 注意,仅针对数字列提供统计信息,这是显而易见的行为,我们也可以要求describe函数包括参数为'include'并且值等于'all'(include ='all')的分类列。

df.describe()
Image for post

Another method is .info(). It gives metadata of a dataset. We can see the size of the dataset, dtype, and count of null values in each column.

另一个方法是.info() 。 它给出了数据集的元数据。 我们可以在每个列中看到数据集的大小,dtype和空值的计数。

df.info()
Image for post

级联 (Concatenation)

Concatenation of two DataFrames is very straightforward, thanks to the Pandas method concat(). Let us take a small section of our Titanic data with the help of vector indexing. Vector indexing is a way to specify the row and column name/integer we would like to index in any order as a list.

借助于Pandas 方法concat() ,两个DataFrame的串联非常简单。 让我们在矢量索引的帮助下,提取一小部分Titanic数据。 扇区索引是一种指定行和列名称/整数的方法,我们希望以列表的任何顺序对其进行索引。

smallData = df.loc[[1,7,21,10], ['sex','age','fare','who','class']]
smallData
Image for post

Also, I have created a dataset with matching columns to explain concatenation.

另外,我还创建了带有匹配列的数据集来说明串联。

newData = pd.DataFrame({’sex’:[’female’,’male’,’male’],’age’: [25,49,35], ’fare’:[89.22,70.653,30.666],’who’:[’child’, 'women’, 'man’],'class’:[’First’,’First’,’First’]})
Image for post

By default, the concatenation happens row-wise. Let’s see how the new dataset looks when we concat the two DataFrames.

默认情况下,串联是逐行进行的 。 让我们看看当连接两个DataFrame时新数据集的外观。

pd.concat([smallData, newData])
Image for post

What if we want to concatenate ignoring the index? just set the ingore_index parameter to True.

如果我们想串联忽略索引怎么办? 只需将ingore_index参数设置为True。

pd.concat([ newData,smallData], ignore_index=True)
Image for post

If we wish to concatenate along with the columns we just have to change the axis parameter to 1.

如果我们希望与列连接在一起我们只需要将axis参数更改为1。

pd.concat([ newData,smallData], axis=1)
Image for post
Image for post
Left table-smallData, Right table-newData
左表-smallData,右表-newData

Notice the changes? As soon we concatenated column-wise Pandas arranged the data in an order of row indices. In smallData, row 0 and 2 are missing but present in newData hence insert them in sequential order. But we have row 1 in both the data and Pandas retained the data of the 1st dataset because that was the 1st dataset we passed as a parameter to concat. Also, the missing data is represented as NaN.

注意到变化了吗? 一旦我们串联列式熊猫,便以行索引的顺序排列了数据。 在smallData中,缺少第0行和第2行,但存在于newData中,因此按顺序插入它们。 但是我们在数据中都有第1行,Pandas保留了第一个数据集的数据,因为那是我们作为参数传递给concat的第一个数据集。 同样,缺失的数据表示为NaN。

We can also perform concatenation in SQL join fashion. Let’s create a new DataFrame ‘newData’ having a few columns the same as smallData but not all.

我们还可以以SQL连接的方式执行串联。 让我们创建一个新的DataFrame'newData',其中有几列与smallData相同,但并非全部。

newData = pd.DataFrame({'fare':[89.22,70.653,30.666,100],
'who':['child', 'women', 'man', 'women'],
'class':['First','First','First','Second'],
'adult_male': [True, False, True, False]})
newData
Image for post

If you are familiar with SQL join operation we can notice that .concat() performs outer join by default. Missing values for unmatched columns are filled with NaN.

如果您熟悉SQL连接操作,我们会注意到.concat()默认执行外部连接 。 未匹配列的缺失值用NaN填充。

pd.concat([smallData, newData])
Image for post

We can control the type of join operation with ‘join’ parameter. Let’s perform an inner join that takes only common columns from two.

我们可以使用'join'参数控制联接操作的类型。 让我们执行一个内部联接,该联接只接受两个公共列。

pd.concat([smallData, newData], join='inner')
Image for post

Merge and Join

合并与加入

Pandas provide us an exclusive and more efficient method .merge() to perform in-memory join operations. Merge method is a subset of relational algebra that comes under SQL.

熊猫为我们提供了一种排他的,更有效的方法.merge()来执行内存中的联接操作。 合并方法是SQL下的关系代数的子集。

I will be moving away from our Titanic dataset only for this section to ease the understanding of join operation with less complex data.

在本节中,我将不再使用Titanic数据集,以简化对不太复杂数据的联接操作的理解。

There are different types of join operations:

有不同类型的联接操作:

  1. One-to-one

    一对一
  2. Many-to-one

    多对一
  3. Many-to-many

    多对多

The classic data used to explain joins in SQL in the employee dataset. Lets create DataFrames.

用于解释员工数据集中SQL中联接的经典数据。 让我们创建DataFrames。

df1=pd.DataFrame({'employee_name'['Tasha','Linda','Olliver','Jack'],'department':['Engineering', 'Accounting', 'HR', 'HR']})df2 = pd.DataFrame({'employee_name':['Linda', 'Tasha', 'Jack', 'Olliver'],'salary':[35000, 20500, 90000, 68000]})
Image for post
Left table-df1, Right table-df2
左表df1,右表df2

One-to-one

一对一

One-to-one merge is very similar to column-wise concatenation. To combine ‘df1’ and ‘df2’ we use .merge() method. Merge is capable of recognizing common columns in the datasets and uses it as a key, in our case column ‘employee_name’. Also, the names are not in order. Let’s see how the merge does the work for us by ignoring the indices.

一对一合并与按列合并非常相似。 要组合'df1'和'df2',我们使用.merge()方法。 合并能够识别数据集中的公共列 ,并将其用作 (在我们的示例中为“ employee_name”列)。 另外,名称不正确。 让我们看看合并如何通过忽略索引为我们工作。

df3 = pd.merge(df1,df2)
df3
Image for post

Many-to-one

多对一

Many-to-one is a type of join in which one of the two key columns have duplicate values. Suppose we have supervisors for each department and there are many employees in each department hence, Many employees to one supervisor.

多对一是一种连接类型,其中两个键列之一具有重复值 。 假设我们每个部门都有主管,每个部门有很多员工,因此, 一个主管有很多员工。

df4 = pd.DataFrame({'department':['Engineering', 'Accounting', 'HR'],'supervisor': ['Jonas', 'Martha', 'Martin']})print('--------df4---------\n',df4)print('-------merged--------')pd.merge(df3, df4)
Image for post

Many-to-many

多对多

This is the case where the key column in both the dataset has duplicate values. Suppose many skills are mapped to each department then the resulting DataFrame will have duplicate entries.

这是两个数据集中的键列都有重复值的情况 。 假设每个部门都有许多技能,那么生成的DataFrame将具有重复的条目。

df5=pd.DataFrame({'department'['Engineering','Engineering','Accounting','Accounting', 'HR', 'HR'],'skills': ['Coding', 'Soft skills', 'Math', 'Excel', 'Organizing', 'Decision making']})print('--------df5---------\n',df5)print('\n-------merged--------')pd.merge(df3, df5)
Image for post

合并不常见的列名称和值 (Merge on uncommon column names and values)

Uncommon column names

罕见的列名

Many times merging is not that simple since the data we receive will not be so clean. We saw how the merge does all the work provided we have one common column. What if we have no common columns at all? or there is more than one common column. Pandas provide us the flexibility to explicitly specify the columns to act as the key in both DataFrames.

很多时候合并不是那么简单,因为我们收到的数据不会那么干净。 我们看到合并是如何完成所有工作的,只要我们有一个共同的专栏即可。 如果我们根本没有公共列怎么办? 或有多个共同的栏目 。 熊猫为我们提供了灵活地显式指定列以充当两个DataFrame中的键的灵活性。

Suppose we change our ‘employee_name’ column to ‘name’ in ‘df2’. Let’s see how datasets look and how to tell merge explicitly the key columns.

假设我们在“ df2”中将“ employee_name”列更改为“ name”。 让我们看看数据集的外观以及如何明确地合并关键列。

df2 = pd.DataFrame({'name':['Linda', 'Tasha', 'Jack', 'Olliver'],
'salary':[35000, 20500, 90000, 68000]})
print('--------df1---------\n',df1)print('--------df2---------\n',df2)print('\n-------merged--------')pd.merge(df1, df2, left_on='employee_name', right_on='name')
Image for post

Parameter ‘left_on’ to specify the key of the first column and ‘right_on’ for the key of the second. Remember, the value of ‘left_on’ should match with the columns of the first DataFrame you passed and ‘right_on’ with second. Notice, we get redundant column ‘name’, we can drop it if not needed.

参数'left_on'指定第一列的键,'right_on'指定第二列的键。 请记住,“ left_on”的值应与您传递的第一个DataFrame的列匹配,第二个则与“ right_on”的列匹配。 注意,我们得到了多余的列“名称”,如果不需要的话可以将其删除。

Uncommon values

罕见的价值观

Previously we saw that all the employee names present in one dataset were also present in other. What if the names are missing.

以前,我们看到一个数据集中存在的所有雇员姓名也存在于另一个数据集中。 如果缺少名称怎么办。

df1=pd.DataFrame({'employee_name'['Tasha','Linda','Olliver','Jack'],  'department':['Engineering', 'Accounting', 'HR', 'HR']})
df2 = pd.DataFrame({'employee_name':['Linda', 'Mary'],'salary':[35000, 20500]})
print('--------df1---------\n',df1)print('--------df2---------\n',df2)print('\n-------merged--------\n')pd.merge(df1, df2)
Image for post

By default merge applies inner join, meaning join in performed only on common values which is always not preferred way since there will be data loss. The method of joining can be controlled by using the parameter ‘how’. We can perform left join or right join to overcome data loss. The missing values will be represented as NaN by Pandas.

默认情况下,合并将应用内部联接 ,这意味着联接仅在通用值上执行,由于存在数据丢失,因此始终不建议采用这种方式。 可以通过使用参数“如何”来控制加入方法。 我们可以执行左联接或右联接以克服数据丢失。 缺失值将由Pandas表示为NaN。

print('-------left join--------\n',pd.merge(df1, df2, how='left'))
print('\n-------right join--------\n',pd.merge(df1,df2,how='right'))
Image for post

通过...分组 (GroupBy)

GroupBy is a very flexible abstraction, we can think of it as a collection of DataFrame. It allows us to do many different powerful operations. In simple words, it groups the entire data set by the values of the column we specify and allows us to perform operations to extract insights.

GroupBy是一个非常灵活的抽象,我们可以将其视为DataFrame的集合。 它允许我们执行许多不同的强大操作。 简而言之,它通过我们指定的列的值对整个数据集进行分组,并允许我们执行操作以提取见解。

Let’s come back to our Titanic dataset

让我们回到泰坦尼克号数据集

Suppose we would like to see how many male and female passengers survived.

假设我们想看看有多少男女乘客幸存下来。

print(df.groupby('sex'))
df.groupby('sex').sum()
Image for post

Notice, printing only the groupby without performing any operation gives GroupBy object. Since there are only two unique values in the column ‘sex’ we can see a summation of every other column grouped by male and female. More insightful would be to get the percentage. We will capture only the ‘survived’ column of groupby result above upon summation and calculate percentages.

注意,仅打印groupby而不执行任何操作将给出GroupBy对象。 由于“性别”列中只有两个唯一值,因此我们可以看到按性别分组的所有其他列的总和。 更有见地的将是获得百分比。 求和后,我们将仅捕获以上groupby结果的“幸存”列,并计算百分比。

data = df.groupby('sex')['survived'].sum()print('% of male survivers',(data['male']/(data['male']+data['female']))*100)print('% of male female',(data['female']/(data['male']+data['female']))*100)Output% of male survivers 31.87134502923976
% of male female 68.12865497076024

Under the hood, the GroupBy function performs three operations: split-apply-combine.

在幕后,GroupBy函数执行三个操作: split-apply-combine。

  1. Split - breaking the DataFrame in order to group it into the specified key.

    拆分-拆分DataFrame以便将其分组为指定的键。
  2. Apply - it involves computing the function we wish like aggregation or transformation or filter.

    应用-它涉及计算我们希望的功能,例如聚合,转换或过滤。
  3. Combine - merging the output into a single DataFrame.

    合并-将输出合并到单个DataFrame中。
Image for post
Courtesy-Python Data Science Handbook by Jake VanderPlas
由Jake VanderPlas提供的Python数据科学手册

Perhaps, more powerful operations that can be performed on groupby are:

也许可以对groupby执行的更强大的操作是:

  1. Aggregate

    骨料
  2. Filter

    过滤
  3. Transform

    转变
  4. Apply

    应用

Let’s see each one with an example.

让我们来看一个例子。

Aggregate

骨料

The aggregate function allows us to perform more than one aggregation at a time. We need to pass the list of required aggregates as a parameter to .aggregate() function.

聚合功能使我们可以一次执行多个聚合 。 我们需要将所需聚合的列表作为参数传递给.aggregate()函数。

df.groupby('sex')['survived'].aggregate(['sum', np.mean,'median'])
Image for post

Filter

过滤

The filter function allows us to drop data based on group property. Suppose we want to see data where the standard deviation of ‘fare’ is greater than the threshold value say 50 when grouped by ‘survived’.

过滤器功能使我们可以基于组属性删除数据 。 假设我们想查看“票价”的标准偏差大于按“生存”分组的阈值(例如50)的数据。

df.groupby('survived').filter(lambda x: x['fare'].std() > 50)
Image for post

Since the standard deviation of ‘fare’ is greater than 50 only for values of ‘survived’ equal to 1, we can see data only where ‘survived’ is 1.

由于仅当“ survived”等于1时,“ fare”的标准差才大于50,因此我们只能在“ survived”为1的情况下才能看到数据。

Transform

转变

Transform returns the transformed version of the entire data. The best example to explain is to center the dataset. Centering the data is nothing but subtracting each value of the column with the mean value of its respective column.

转换返回整个数据的转换版本 。 最好的解释示例是将数据集居中。 使数据居中无非是用该列的各个值的平均值减去该列的每个值。

df.groupby('survived').transform(lambda x: x - x.mean())
Image for post

Apply

应用

Apply is very flexible unlike filter and transform, the only criteria are it takes a DataFrame and returns Pandas object or scalar. We have the flexibility to do anything we wish in the function.

Apply与过滤器和变换不同,它非常灵活,唯一的条件是它需要一个DataFrame并返回Pandas对象或标量。 我们可以灵活地执行该函数中希望执行的任何操作。

def func(x):
x['fare'] = x['fare'] / x['fare'].sum()
return x
df.groupby('survived').apply(func)
Image for post

数据透视表 (Pivot tables)

Previously in GroupBy, we saw how ‘sex’ affected survival, the survival rate of females is much larger than males. Suppose we would also like to see how ‘pclass’ affected the survival but both ‘sex’ and ‘pclass’ side by side. Using GroupBy we would do something like this.

之前在GroupBy中,我们看到了“性别”如何影响生存,女性的生存率远高于男性。 假设我们还想了解“ pclass”如何影响生存,但“ sex”和“ pclass”并存。 使用GroupBy,我们将执行以下操作。

df.groupby(['sex', 'pclass']['survived'].aggregate('mean').unstack()
Image for post

This is more insightful, we can easily make out passengers in the third class section of the Titanic are less likely to be survived.

这是更有洞察力的,我们可以很容易地看出,泰坦尼克号三等舱的乘客幸存下来的可能性较小。

This type of operation is very common in the analysis. Hence, Pandas provides the function .pivot_table() which performs the same with more flexibility and less complexity.

这种类型的操作在分析中非常常见。 因此,Pandas提供了.pivot_table()函数,该函数以更大的灵活性和更低的复杂度执行相同的功能。

df.pivot_table('survived', index='sex', columns='pclass')
Image for post

The result of the pivot table function is a DataFrame, unlike groupby which returned a groupby object. We can perform all the DataFrame operations normally on it.

数据透视表功能的结果是一个DataFrame,与groupby不同,后者返回了groupby对象。 我们可以正常执行所有DataFrame操作。

We can also add a third dimension to our result. Suppose we want to see how ‘age’ has also affected the survival rate along with ‘sex’ and ‘pclass’. Let’s divide our ‘age’ into groups within it: 0–18 child/teenager, 18–40 adult, and 41–80 old.

我们还可以在结果中添加第三维 。 假设我们想了解“年龄”如何同时影响“性别”和“ pclass”的生存率。 让我们将“年龄”分为几类:0-18岁的儿童/青少年,18-40岁的成人和41-80岁的孩子。

age = pd.cut(df['age'], [0, 18, 40, 80])
pivotTable = df.pivot_table('survived', ['sex', age], 'class')
pivotTable
Image for post

Interestingly female children and teenagers in the second class have a 100% survival rate. This is the kind of power the pivot table of Pandas has.

有趣的是,二等班的女孩和青少年的存活率是100%。 这就是Pandas数据透视表具有的功能。

重塑多索引DataFrame (Reshaping Multi-index DataFrame)

To see a multi-index DataFrame from a different view we reshape it. Stack and Unstack are the two methods to accomplish this.

为了从不同的视图查看多索引DataFrame,我们对其进行了重塑。 Stack和Unstack是完成此操作的两种方法。

unstack( )

取消堆叠()

It is the process of converting the row index to the column index. The pivot table we created previously is multi-indexed row-wise. We can get the innermost row index(age groups) into the innermost column index.

这是将行索引转换为列索引的过程。 我们之前创建的数据透视表是多索引的。 我们可以将最里面的行索引(年龄组)转换为最里面的列索引。

pivotTable = pivotTable.unstack()
pivotTable
Image for post

We can also convert the outermost row index(sex) into the innermost column index by using parameter ‘level’.

我们还可以通过使用参数“级别”将最外面的行索引(性别)转换为最里面的列索引。

pivotTable = pivotTable.unstack(level=0)
piviotTable
Image for post

stack( )

堆栈()

Stacking is exactly inverse of unstacking. We can convert the column index of multi-index DataFrame into a row index. The innermost column index ‘sex’ is converted to the innermost row index. The result is slightly different from the original DataFrame because we unstacked with level 0 previously.

堆叠与堆叠完全相反。 我们可以将多索引DataFrame的列索引转换为行索引。 最里面的列索引“ sex”被转换​​为最里面的行索引。 结果与原始DataFrame略有不同,因为我们之前使用0级进行了堆叠。

pivotTable.stack()
Image for post

These functions and methods are very helpful to understand data, further used to manipulation, or to build a predictive model. We can also plot graphs to get visual insights.

这些功能和方法对于理解数据,进一步用于操纵或建立预测模型非常有帮助。 我们还可以绘制图形以获得视觉见解。

翻译自: https://medium.com/towards-artificial-intelligence/6-pandas-operations-you-should-not-miss-d531736c6574

大熊猫卸妆后

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389431.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

数据eda_关于分类和有序数据的EDA

数据eda数据科学和机器学习统计 (STATISTICS FOR DATA SCIENCE AND MACHINE LEARNING) Categorical variables are the ones where the possible values are provided as a set of options, it can be pre-defined or open. An example can be the gender of a person. In the …

jdk重启后步行_向后介绍步行以一种新颖的方式来预测未来

jdk重启后步行“永远不要做出预测,尤其是关于未来的预测。” (KK Steincke) (“Never Make Predictions, Especially About the Future.” (K. K. Steincke)) Does this picture portray a horse or a car? 这张照片描绘的是马还是汽车? How likely is …

mongodb仲裁者_真理的仲裁者

mongodb仲裁者Coming out of college with a background in mathematics, I fell upward into the rapidly growing field of data analytics. It wasn’t until years later that I realized the incredible power that comes with the position. As Uncle Ben told Peter Par…

优化 回归_使用回归优化产品价格

优化 回归应用数据科学 (Applied data science) Price and quantity are two fundamental measures that determine the bottom line of every business, and setting the right price is one of the most important decisions a company can make. Under-pricing hurts the co…

大数据数据科学家常用面试题_进行数据科学工作面试

大数据数据科学家常用面试题During my time as a Data Scientist, I had the chance to interview my fair share of candidates for data-related roles. While doing this, I started noticing a pattern: some kinds of (simple) mistakes were overwhelmingly frequent amo…

scrapy模拟模拟点击_模拟大流行

scrapy模拟模拟点击复杂系统 (Complex Systems) In our daily life, we encounter many complex systems where individuals are interacting with each other such as the stock market or rush hour traffic. Finding appropriate models for these complex systems may give…

vue.js python_使用Python和Vue.js自动化报告过程

vue.js pythonIf your organization does not have a data visualization solution like Tableau or PowerBI nor means to host a server to deploy open source solutions like Dash then you are probably stuck doing reports with Excel or exporting your notebooks.如果…

plsql中导入csvs_在命令行中使用sql分析csvs

plsql中导入csvsIf you are familiar with coding in SQL, there is a strong chance you do it in PgAdmin, MySQL, BigQuery, SQL Server, etc. But there are times you just want to use your SQL skills for quick analysis on a small/medium sized dataset.如果您熟悉SQ…

计算机科学必读书籍_5篇关于数据科学家的产品分类必读文章

计算机科学必读书籍Product categorization/product classification is the organization of products into their respective departments or categories. As well, a large part of the process is the design of the product taxonomy as a whole.产品分类/产品分类是将产品…

交替最小二乘矩阵分解_使用交替最小二乘矩阵分解与pyspark建立推荐系统

交替最小二乘矩阵分解pyspark上的动手推荐系统 (Hands-on recommender system on pyspark) Recommender System is an information filtering tool that seeks to predict which product a user will like, and based on that, recommends a few products to the users. For ex…

python 网页编程_通过Python编程检索网页

python 网页编程The internet and the World Wide Web (WWW), is probably the most prominent source of information today. Most of that information is retrievable through HTTP. HTTP was invented originally to share pages of hypertext (hence the name Hypertext T…

火种 ctf_分析我的火种数据

火种 ctfOriginally published at https://www.linkedin.com on March 27, 2020 (data up to date as of March 20, 2020).最初于 2020年3月27日 在 https://www.linkedin.com 上 发布 (数据截至2020年3月20日)。 Day 3 of social distancing.社会疏离的第三天。 As I sit on…

data studio_面向营销人员的Data Studio —报表指南

data studioIn this guide, we describe both the theoretical and practical sides of reporting with Google Data Studio. You can use this guide as a comprehensive cheat sheet in your everyday marketing.在本指南中,我们描述了使用Google Data Studio进行…

人流量统计系统介绍_统计介绍

人流量统计系统介绍Its very important to know about statistics . May you be a from a finance background, may you be data scientist or a data analyst, life is all about mathematics. As per the wiki definition “Statistics is the discipline that concerns the …

乐高ev3 读取外部数据_数据就是新乐高

乐高ev3 读取外部数据When I was a kid, I used to love playing with Lego. My brother and I built almost all kinds of stuff with Lego — animals, cars, houses, and even spaceships. As time went on, our creations became more ambitious and realistic. There were…

图像灰度化与二值化

图像灰度化 什么是图像灰度化? 图像灰度化并不是将单纯的图像变成灰色,而是将图片的BGR各通道以某种规律综合起来,使图片显示位灰色。 规律如下: 手动实现灰度化 首先我们采用手动灰度化的方式: 其思想就是&#…

分析citibike数据eda

数据科学 (Data Science) CitiBike is New York City’s famous bike rental company and the largest in the USA. CitiBike launched in May 2013 and has become an essential part of the transportation network. They make commute fun, efficient, and affordable — no…

上采样(放大图像)和下采样(缩小图像)(最邻近插值和双线性插值的理解和实现)

上采样和下采样 什么是上采样和下采样? • 缩小图像(或称为下采样(subsampled)或降采样(downsampled))的主要目的有 两个:1、使得图像符合显示区域的大小;2、生成对应图…

r语言绘制雷达图_用r绘制雷达蜘蛛图

r语言绘制雷达图I’ve tried several different types of NBA analytical articles within my readership who are a group of true fans of basketball. I found that the most popular articles are not those with state-of-the-art machine learning technologies, but tho…

java 分裂数字_分裂的补充:超越数字,打印物理可视化

java 分裂数字As noted in my earlier Nightingale writings, color harmony is the process of choosing colors on a Color Wheel that work well together in the composition of an image. Today, I will step further into color theory by discussing the Split Compleme…