python pca主成分_超越“经典” PCA:功能主成分分析(FPCA)应用于使用Python的时间序列...

python pca主成分

FPCA is traditionally implemented with R but the “FDASRSF” package from J. Derek Tucker will achieve similar (and even greater) results in Python.

FPCA传统上是使用R实现的,但是J. Derek Tucker的“ FDASRSF ”软件包将在Python中获得相似(甚至更高)的结果。

If you have reached this page, you are probably familiar with PCA.

如果您已到达此页面,则可能熟悉PCA。

Principal Components Analysis is part of the Data Science exploration toolkit as it provides many benefits: reducing dimensions of a large dataset, preventing multi-collinearity, etc.

主成分分析是数据科学探索工具包的一部分,因为它具有许多优点:减少大型数据集的维数,防止多重共线性等。

There are many articles out there that explain the benefits of PCA and, if needed, I suggest you to have a look at this one which summarizes my understanding of this methodology:

那里有很多文章解释了PCA的好处,如果需要的话,我建议您看一下这篇文章,总结一下我对这种方法的理解:

“功能性” PCA背后的直觉 (The intuition behind the “Functional” PCA)

In a standard PCA process, we define Eigenvectors to convert the original dataset into a smaller one with fewer dimensions and for which most of the initial dataset variance is preserved (usually 90 or 95%).

在标准PCA流程中,我们定义特征向量以将原始数据集转换为尺寸较小的较小数据集,并为此保留了大部分初始数据集差异(通常为90%或95%)。

Image for post
Initial dataset (blue crosses) and the corresponding first two Eigenvectors
初始数据集(蓝色叉号)和对应的前两个特征向量

Now let’s imagine that the patterns of the time-series have more importance than their absolute variance. For example, you would like to compare physical phenomena such as signals, temperatures’ variation, production batches, etc.. Functional Principal Components Analysis will act this way by determining the corresponding underlying functions!

现在,让我们想象一下时间序列的模式比其绝对方差更重要。 例如,您想比较诸如信号,温度变化,生产批次等物理现象。功能主成分分析将通过确定相应的基础功能来执行此操作!

Image for post
Initial dataset (blue crosses) and the corresponding first Function
初始数据集(蓝色叉号)和相应的第一个功能

Let’s take the example of the temperatures’ variation over a year across different locations in a four-seasons country: we can assume that there is a global trend from cold in winter to hot during summertime.

让我们以一个四个季节的国家中不同位置一年中温度的变化为例:我们可以假设存在从冬季寒冷到夏季炎热的全球趋势。

We can also assume that the regions close to the ocean will follow a different pattern than the ones close to mountains (i.e.: smoother temperature variations on the sea-side Vs extremely low temperatures during winter in the mountains).

我们还可以假设,靠近海洋的地区将遵循与靠近山脉的地区不同的模式(即:海边的温度变化更为平稳,而山区冬季的极端低温则相对较低)。

Image for post

We will now use this methodology to identify such differences between French regions in 2019. This example is directly inspired by the traditional “Canadian weather” FPCA example developed in R.

现在,我们将使用此方法来确定2019年法国各地区之间的差异。此示例直接受到R中开发的传统“加拿大天气” FPCA示例的启发。

2019年按地区划分的法国温度数据集 (Dataset creation with French temperatures by regions in 2019)

We start by getting daily temperature records since 2018 in France by regions* and prepare the corresponding dataset.

我们首先获取自2018年以来法国各地区的每日温度记录*,并准备相应的数据集。

(*the temperatures are recorded at the “department” level, which is a smaller scale than regions in France (96 departments Vs 13 regions). However, we rename “Department” into “Region” for an easier understanding of readers.)

(*温度记录在“部门”级别,该范围比法国的区域小(96个部门对13个区域)。但是,我们将“部门”重命名为“区域”,以便于读者理解。)

We select 7 regions spread across France that correspond to different weather patterns (they will be disclosed later on): 06, 25, 59, 62, 83, 85, 75.

我们选择了分布在法国的7个区域,分别对应不同的天气模式(稍后将进行披露):06、25、59、62、83、85、75。

import pandas as pd
import numpy as np# Import the CSV file with only useful columns
# source: https://www.data.gouv.fr/fr/datasets/temperature-quotidienne-departementale-depuis-janvier-2018/
df = pd.read_csv("temperature-quotidienne-departementale.csv", sep=";", usecols=[0,1,4])# Rename columns to simplify syntax
df = df.rename(columns={"Code INSEE département": "Region", "TMax (°C)": "Temp"})# Select 2019 records only
df = df[(df["Date"]>="2019-01-01") & (df["Date"]<="2019-12-31")]# Pivot table to get "Date" as index and regions as columns 
df = df.pivot(index='Date', columns='Region', values='Temp')# Select a set of regions across France
df = df[["06","25","59","62","83","85","75"]]display(df)# Convert the Pandas dataframe to a Numpy array with time-series only
f = df.to_numpy().astype(float)# Create a float vector between 0 and 1 for time index
time = np.linspace(0,1,len(f))
Image for post

FDASRSF软件包在数据集上的安装和使用 (FDASRSF package installation and use on the dataset)

To install the FDASRSF package in your current environment, you simply need to run:

要在当前环境中安装FDASRSF软件包,您只需要运行:

pip install fdasrsf

(note: based on my experience, you might need to install manually one or two additional packages to complete the installation properly. You just need to check the anaconda logs in case of failure to identify them.)

(注意:根据我的经验,您可能需要手动安装一个或两个其他软件包才能正确完成安装。您只需检查anaconda日志以防无法识别它们。)

The FDASRSF package from J. Derek Tucker provides a number of interesting functions and we will use two of them: Functional Alignment and Functional Principal Components Analysis (see corresponding documentation below):

J. Derek Tucker的FDASRSF软件包提供了许多有趣的功能,我们将使用其中两个功能功能对齐功能主成分分析 (请参见下面的相应文档)

Functional Alignment will synchronize time-series in case they are not perfectly aligned. The illustration below provides a relatively simple example to understand this mechanism. The time-series are processed from both phase and amplitude’s perspectives (aka x and y axis).

如果它们未完全对齐, 功能对齐将同步时间序列。 下图提供了一个相对简单的示例来了解此机制。 从相位和幅度的角度(也称为x和y轴)角度处理时间序列。

Image for post
Extract from J.D. Tucker et al. / Computational Statistics and Data Analysis 61 (2013) 50–66
JD Tucker等人的摘录。 /计算统计与数据分析61(2013)50-66

To understand more precisely the algorithms involved, I highly recommend you to have a look at “Generative models for functional data using phase and amplitude separation” from J. Derek Tucker, Wei Wu, and Anuj Srivastava.

为了更精确地理解所涉及的算法,我强烈建议您看一下J. Derek Tucker,Wei Wu和Anuj Srivastava的“ 使用相位和幅度分离的功能数据生成模型 ”。

Even though this is quite hard to notice by simply looking at the Original and Warped Data, we can observe that the Warping functions do have some small inflections (see the yellow curve slightly lagging below the x=y axis), which means than these functions have synchronized the time series when needed. (As you might have guessed, temperature records are — by design — well aligned since they are captured simultaneously.)

尽管仅通过查看原始数据和变形数据很难注意到这一点,但我们可以观察到变形函数确实有一些小变形(请参见黄色曲线略微滞后于x = y轴),这意味着这些函数比在需要时已同步时间序列。 (您可能已经猜到,温度记录在设计上是一致的,因为它们是同时捕获的。)

Image for post
Image for post
Image for post

Functional Principal Components Analysis

功能主成分分析

Now that our dataset is “warped”, we can run a Functional Principal Components Analysis. The FDASRSF package allows horizontal, vertical, or joint analysis. We will use the vertical one and plot the corresponding functions and coefficients for PC1 & PC2.

现在我们的数据集已经“扭曲”了,我们可以运行功能主成分分析了。 FDASRSF软件包允许进行水平,垂直或联合分析。 我们将使用垂直的一个,并绘制PC1和PC2的相应函数和系数。

from fdasrsf import fPCA, time_warping, fdawarp, fdahpca# Functional Alignment
# Align time-series
warp_f = time_warping.fdawarp(f, time)
warp_f.srsf_align()warp_f.plot()# Functional Principal Components Analysis# Define the FPCA as a vertical analysis
fPCA_analysis = fPCA.fdavpca(warp_f)# Run the FPCA on a 3 components basis 
fPCA_analysis.calc_fpca(no=3)
fPCA_analysis.plot()import plotly.graph_objects as go# Plot of the 3 functions
fig = go.Figure()# Add traces
fig.add_trace(go.Scatter(y=fPCA_analysis.f_pca[:,0,0], mode='lines', name="PC1"))
fig.add_trace(go.Scatter(y=fPCA_analysis.f_pca[:,0,1], mode='lines', name="PC2"))
fig.add_trace(go.Scatter(y=fPCA_analysis.f_pca[:,0,2], mode='lines', name="PC3"))fig.update_layout(title_text='<b>Principal Components Analysis Functions</b>', title_x=0.5,
)fig.show()# Coefficients of PCs against regions
fPCA_coef = fPCA_analysis.coef# Plot of PCs against regions
fig = go.Figure(data=go.Scatter(x=fPCA_coef[:,0], y=fPCA_coef[:,1], mode='markers+text', text=df.columns))fig.update_traces(textposition='top center')fig.update_layout(autosize=False,width=800,height=700,title_text='<b>Function Principal Components Analysis on 2018 French Temperatures</b>', title_x=0.5,xaxis_title="PC1",yaxis_title="PC2",
)
fig.show()
Image for post
Image for post

Now we can add the different weather patterns on the plot, according to the weathers observed in France:

现在,根据法国观察到的天气,我们可以在地块上添加不同的天气模式:

Image for post

很容易看出聚类与法国观测到的天气的吻合程度。 (It is easy to see how well the clustering fits with the observed weathers in France.)

It is also important to mention that I have chosen the departments arbitrarily according to the places where I live, work and travel frequently but they have not been selected because they were providing good results for this demo. I would expect the same quality of results with other regions.

还要提一提的是,我根据我经常居住,工作和旅行的地点随意选择了部门,但由于他们在此演示中提供了良好的结果,因此未选择这些部门。 我希望结果与其他地区的质量相同。

Maybe you are wondering if a standard PCA would also provide an interesting result?

也许您想知道标准PCA是否还会提供有趣的结果?

The plot here-below of standard PC1 and PC2 extracted from the original dataset shows that it is not performing as well as FPCA:

以下是从原始数据集中提取的标准PC1和PC2的图,显示其性能不如FPCA:

Image for post

I hope this article has provided a better understanding of the Functional Principal Components Analysis to you.

希望本文为您提供了对功能主成分分析的更好理解。

I would also like to warmly thank J. Derek Tucker who has been kind enough to patiently guide me through the use of the FDASRSF package.

我还要衷心感谢J. Derek Tucker,他很友好地耐心指导我使用FDASRSF软件包。

The complete notebook is stored here.

完整的笔记本存储在此处 。

Here are some other articles you might like as well:

以下是您可能还会喜欢的其他一些文章:

翻译自: https://towardsdatascience.com/beyond-classic-pca-functional-principal-components-analysis-fpca-applied-to-time-series-with-python-914c058f47a0

python pca主成分

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391254.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

初探Golang(2)-常量和命名规范

1 命名规范 1.1 Go是一门区分大小写的语言。 命名规则涉及变量、常量、全局函数、结构、接口、方法等的命名。 Go语言从语法层面进行了以下限定&#xff1a;任何需要对外暴露的名字必须以大写字母开头&#xff0c;不需要对外暴露的则应该以小写字母开头。 当命名&#xff08…

大数据平台构建_如何像产品一样构建数据平台

大数据平台构建重点 (Top highlight)Over the past few years, many companies have embraced data platforms as an effective way to aggregate, handle, and utilize data at scale. Despite the data platform’s rising popularity, however, little literature exists on…

初探Golang(3)-数据类型

Go语言拥有两大数据类型&#xff0c;基本数据类型和复合数据类型。 1. 数值类型 ##有符号整数 int8&#xff08;-128 -> 127&#xff09; int16&#xff08;-32768 -> 32767&#xff09; int32&#xff08;-2,147,483,648 -> 2,147,483,647&#xff09; int64&#x…

时间序列预测 时间因果建模_时间序列建模以预测投资基金的回报

时间序列预测 时间因果建模Time series analysis, discussed ARIMA, auto ARIMA, auto correlation (ACF), partial auto correlation (PACF), stationarity and differencing.时间序列分析&#xff0c;讨论了ARIMA&#xff0c;自动ARIMA&#xff0c;自动相关(ACF)&#xff0c;…

(58)PHP开发

LAMP0、使用include和require命令来包含外部PHP文件。使用include_once命令&#xff0c;但是include和include_once命令相比的不足就是这两个命令并不关心请求的文件是否实际存在&#xff0c;如果不存在&#xff0c;PHP解释器就会直接忽略这个命令并且显示一个错误消息&#xf…

css flexbox模型_如何将Flexbox后备添加到CSS网格

css flexbox模型I shared how to build a calendar with CSS Grid in the previous article. Today, I want to share how to build a Flexbox fallback for the same calendar. 在上一篇文章中&#xff0c;我分享了如何使用CSS Grid构建日历。 今天&#xff0c;我想分享如何为…

贝塞尔修正_贝塞尔修正背后的推理:n-1

贝塞尔修正A standard deviation seems like a simple enough concept. It’s a measure of dispersion of data, and is the root of the summed differences between the mean and its data points, divided by the number of data points…minus one to correct for bias.标…

RESET MASTER和RESET SLAVE使用场景和说明【转】

【前言】在配置主从的时候经常会用到这两个语句&#xff0c;刚开始的时候还不清楚这两个语句的使用特性和使用场景。 经过测试整理了以下文档&#xff0c;希望能对大家有所帮助&#xff1b; 【一】RESET MASTER参数 功能说明&#xff1a;删除所有的binglog日志文件&#xff0c;…

Kubernetes 入门(1)基本概念

1. Kubernetes简介 作为一个目前在生产环境已经广泛使用的开源项目 Kubernetes 被定义成一个用于自动化部署、扩容和管理容器应用的开源系统&#xff1b;它将一个分布式软件的一组容器打包成一个个更容易管理和发现的逻辑单元。 Kubernetes 是希腊语『舵手』的意思&#xff0…

android 西班牙_分析西班牙足球联赛(西甲)

android 西班牙The Spanish football league commonly known as La Liga is the first national football league in Spain, being one of the most popular professional sports leagues in the world. It was founded in 1929 and has been held every year since then with …

Goalng软件包推荐

2019独角兽企业重金招聘Python工程师标准>>> 前言 哈喽大家好呀! 马上要迎来狗年了大家是不是已经怀着过年的心情了呢? 今天笔者给大家带来了一份礼物, Goalng的软件包推荐, 主要总结了一下在go语言中大家开源的优秀的软件, 大家了解之后在后续使用过程有遇到如下软…

Kubernetes 入门(2)基本组件

1. C/S架构 Kubernetes 遵循非常传统的客户端服务端架构&#xff0c;客户端通过 RESTful 接口或者直接使用 kubectl 与 Kubernetes 集群进行通信&#xff0c;这两者在实际上并没有太多的区别&#xff0c;后者也只是对 Kubernetes 提供的 RESTful API 进行封装并提供出来。 左侧…

【powerdesign】从mysql数据库导出到powerdesign,生成数据字典

使用版本powerdesign16.5&#xff0c;mysql 5.5&#xff0c;windows 64 步骤&#xff1a; 1.下载mysql驱动【注意 32和64的驱动都下载下来&#xff0c;具体原因查看第三步 依旧会报错处】 下载地址&#xff1a;https://dev.mysql.com/downloads/connector/odbc/5.3.html 请下…

php amazon-s3_推荐亚马逊电影-一种协作方法

php amazon-s3Item-based collaborative and User-based collaborative approach for recommendation system with simple coding.推荐系统的基于项目的协作和基于用户的协作方法&#xff0c;编码简单。 推荐系统概述 (Overview of Recommendation System) There are many met…

python:使用Djangorestframework编写post和get接口

1、安装django pip install django 2、新建一个django工程 python manage.py startproject cainiao_monitor_api 3、新建一个app python manage.py startapp monitor 4、安装DRF pip install djangorestframework 5、编写视图函数 views.py from rest_framework.views import A…

Kubernetes 入门(3)集群安装

1. kubeadm简介 kubeadm 是 Kubernetes 官方提供的一个 CLI 工具&#xff0c;可以很方便的搭建一套符合官方最佳实践的最小化可用集群。当我们使用 kubeadm 搭建集群时&#xff0c;集群可以通过 K8S 的一致性测试&#xff0c;并且 kubeadm 还支持其他的集群生命周期功能&#…

【9303】平面分割

Time Limit: 10 second Memory Limit: 2 MB 问题描述 同一平面内有n&#xff08;n≤500&#xff09;条直线&#xff0c;已知其中p&#xff08;p≥2&#xff09;条直线相交与同一点&#xff0c;则这n条直线最多能将平面分割成多少个不同的区域&#xff1f; Input 两个整数n&am…

简述yolo1-yolo3_使用YOLO框架进行对象检测的综合指南-第一部分

简述yolo1-yolo3重点 (Top highlight)目录&#xff1a; (Table Of Contents:) Introduction 介绍 Why YOLO? 为什么选择YOLO&#xff1f; How does it work? 它是如何工作的&#xff1f; Intersection over Union (IoU) 联合路口(IoU) Non-max suppression 非最大抑制 Networ…

JAVA基础知识|lambda与stream

lambda与stream是java8中比较重要两个新特性&#xff0c;lambda表达式采用一种简洁的语法定义代码块&#xff0c;允许我们将行为传递到函数中。之前我们想将行为传递到函数中&#xff0c;仅有的选择是使用匿名内部类&#xff0c;现在我们可以使用lambda表达式替代匿名内部类。在…

数据库:存储过程_数据科学过程:摘要

数据库:存储过程Once you begin studying data science, you will hear something called ‘data science process’. This expression refers to a five stage process that usually data scientists perform when working on a project. In this post I will walk through ea…