python pca主成分_超越“经典” PCA:功能主成分分析(FPCA)应用于使用Python的时间序列...

python pca主成分

FPCA is traditionally implemented with R but the “FDASRSF” package from J. Derek Tucker will achieve similar (and even greater) results in Python.

FPCA传统上是使用R实现的,但是J. Derek Tucker的“ FDASRSF ”软件包将在Python中获得相似(甚至更高)的结果。

If you have reached this page, you are probably familiar with PCA.

如果您已到达此页面,则可能熟悉PCA。

Principal Components Analysis is part of the Data Science exploration toolkit as it provides many benefits: reducing dimensions of a large dataset, preventing multi-collinearity, etc.

主成分分析是数据科学探索工具包的一部分,因为它具有许多优点:减少大型数据集的维数,防止多重共线性等。

There are many articles out there that explain the benefits of PCA and, if needed, I suggest you to have a look at this one which summarizes my understanding of this methodology:

那里有很多文章解释了PCA的好处,如果需要的话,我建议您看一下这篇文章,总结一下我对这种方法的理解:

“功能性” PCA背后的直觉 (The intuition behind the “Functional” PCA)

In a standard PCA process, we define Eigenvectors to convert the original dataset into a smaller one with fewer dimensions and for which most of the initial dataset variance is preserved (usually 90 or 95%).

在标准PCA流程中,我们定义特征向量以将原始数据集转换为尺寸较小的较小数据集,并为此保留了大部分初始数据集差异(通常为90%或95%)。

Image for post
Initial dataset (blue crosses) and the corresponding first two Eigenvectors
初始数据集(蓝色叉号)和对应的前两个特征向量

Now let’s imagine that the patterns of the time-series have more importance than their absolute variance. For example, you would like to compare physical phenomena such as signals, temperatures’ variation, production batches, etc.. Functional Principal Components Analysis will act this way by determining the corresponding underlying functions!

现在,让我们想象一下时间序列的模式比其绝对方差更重要。 例如,您想比较诸如信号,温度变化,生产批次等物理现象。功能主成分分析将通过确定相应的基础功能来执行此操作!

Image for post
Initial dataset (blue crosses) and the corresponding first Function
初始数据集(蓝色叉号)和相应的第一个功能

Let’s take the example of the temperatures’ variation over a year across different locations in a four-seasons country: we can assume that there is a global trend from cold in winter to hot during summertime.

让我们以一个四个季节的国家中不同位置一年中温度的变化为例:我们可以假设存在从冬季寒冷到夏季炎热的全球趋势。

We can also assume that the regions close to the ocean will follow a different pattern than the ones close to mountains (i.e.: smoother temperature variations on the sea-side Vs extremely low temperatures during winter in the mountains).

我们还可以假设,靠近海洋的地区将遵循与靠近山脉的地区不同的模式(即:海边的温度变化更为平稳,而山区冬季的极端低温则相对较低)。

Image for post

We will now use this methodology to identify such differences between French regions in 2019. This example is directly inspired by the traditional “Canadian weather” FPCA example developed in R.

现在,我们将使用此方法来确定2019年法国各地区之间的差异。此示例直接受到R中开发的传统“加拿大天气” FPCA示例的启发。

2019年按地区划分的法国温度数据集 (Dataset creation with French temperatures by regions in 2019)

We start by getting daily temperature records since 2018 in France by regions* and prepare the corresponding dataset.

我们首先获取自2018年以来法国各地区的每日温度记录*,并准备相应的数据集。

(*the temperatures are recorded at the “department” level, which is a smaller scale than regions in France (96 departments Vs 13 regions). However, we rename “Department” into “Region” for an easier understanding of readers.)

(*温度记录在“部门”级别,该范围比法国的区域小(96个部门对13个区域)。但是,我们将“部门”重命名为“区域”,以便于读者理解。)

We select 7 regions spread across France that correspond to different weather patterns (they will be disclosed later on): 06, 25, 59, 62, 83, 85, 75.

我们选择了分布在法国的7个区域,分别对应不同的天气模式(稍后将进行披露):06、25、59、62、83、85、75。

import pandas as pd
import numpy as np# Import the CSV file with only useful columns
# source: https://www.data.gouv.fr/fr/datasets/temperature-quotidienne-departementale-depuis-janvier-2018/
df = pd.read_csv("temperature-quotidienne-departementale.csv", sep=";", usecols=[0,1,4])# Rename columns to simplify syntax
df = df.rename(columns={"Code INSEE département": "Region", "TMax (°C)": "Temp"})# Select 2019 records only
df = df[(df["Date"]>="2019-01-01") & (df["Date"]<="2019-12-31")]# Pivot table to get "Date" as index and regions as columns 
df = df.pivot(index='Date', columns='Region', values='Temp')# Select a set of regions across France
df = df[["06","25","59","62","83","85","75"]]display(df)# Convert the Pandas dataframe to a Numpy array with time-series only
f = df.to_numpy().astype(float)# Create a float vector between 0 and 1 for time index
time = np.linspace(0,1,len(f))
Image for post

FDASRSF软件包在数据集上的安装和使用 (FDASRSF package installation and use on the dataset)

To install the FDASRSF package in your current environment, you simply need to run:

要在当前环境中安装FDASRSF软件包,您只需要运行:

pip install fdasrsf

(note: based on my experience, you might need to install manually one or two additional packages to complete the installation properly. You just need to check the anaconda logs in case of failure to identify them.)

(注意:根据我的经验,您可能需要手动安装一个或两个其他软件包才能正确完成安装。您只需检查anaconda日志以防无法识别它们。)

The FDASRSF package from J. Derek Tucker provides a number of interesting functions and we will use two of them: Functional Alignment and Functional Principal Components Analysis (see corresponding documentation below):

J. Derek Tucker的FDASRSF软件包提供了许多有趣的功能,我们将使用其中两个功能功能对齐功能主成分分析 (请参见下面的相应文档)

Functional Alignment will synchronize time-series in case they are not perfectly aligned. The illustration below provides a relatively simple example to understand this mechanism. The time-series are processed from both phase and amplitude’s perspectives (aka x and y axis).

如果它们未完全对齐, 功能对齐将同步时间序列。 下图提供了一个相对简单的示例来了解此机制。 从相位和幅度的角度(也称为x和y轴)角度处理时间序列。

Image for post
Extract from J.D. Tucker et al. / Computational Statistics and Data Analysis 61 (2013) 50–66
JD Tucker等人的摘录。 /计算统计与数据分析61(2013)50-66

To understand more precisely the algorithms involved, I highly recommend you to have a look at “Generative models for functional data using phase and amplitude separation” from J. Derek Tucker, Wei Wu, and Anuj Srivastava.

为了更精确地理解所涉及的算法,我强烈建议您看一下J. Derek Tucker,Wei Wu和Anuj Srivastava的“ 使用相位和幅度分离的功能数据生成模型 ”。

Even though this is quite hard to notice by simply looking at the Original and Warped Data, we can observe that the Warping functions do have some small inflections (see the yellow curve slightly lagging below the x=y axis), which means than these functions have synchronized the time series when needed. (As you might have guessed, temperature records are — by design — well aligned since they are captured simultaneously.)

尽管仅通过查看原始数据和变形数据很难注意到这一点,但我们可以观察到变形函数确实有一些小变形(请参见黄色曲线略微滞后于x = y轴),这意味着这些函数比在需要时已同步时间序列。 (您可能已经猜到,温度记录在设计上是一致的,因为它们是同时捕获的。)

Image for post
Image for post
Image for post

Functional Principal Components Analysis

功能主成分分析

Now that our dataset is “warped”, we can run a Functional Principal Components Analysis. The FDASRSF package allows horizontal, vertical, or joint analysis. We will use the vertical one and plot the corresponding functions and coefficients for PC1 & PC2.

现在我们的数据集已经“扭曲”了,我们可以运行功能主成分分析了。 FDASRSF软件包允许进行水平,垂直或联合分析。 我们将使用垂直的一个,并绘制PC1和PC2的相应函数和系数。

from fdasrsf import fPCA, time_warping, fdawarp, fdahpca# Functional Alignment
# Align time-series
warp_f = time_warping.fdawarp(f, time)
warp_f.srsf_align()warp_f.plot()# Functional Principal Components Analysis# Define the FPCA as a vertical analysis
fPCA_analysis = fPCA.fdavpca(warp_f)# Run the FPCA on a 3 components basis 
fPCA_analysis.calc_fpca(no=3)
fPCA_analysis.plot()import plotly.graph_objects as go# Plot of the 3 functions
fig = go.Figure()# Add traces
fig.add_trace(go.Scatter(y=fPCA_analysis.f_pca[:,0,0], mode='lines', name="PC1"))
fig.add_trace(go.Scatter(y=fPCA_analysis.f_pca[:,0,1], mode='lines', name="PC2"))
fig.add_trace(go.Scatter(y=fPCA_analysis.f_pca[:,0,2], mode='lines', name="PC3"))fig.update_layout(title_text='<b>Principal Components Analysis Functions</b>', title_x=0.5,
)fig.show()# Coefficients of PCs against regions
fPCA_coef = fPCA_analysis.coef# Plot of PCs against regions
fig = go.Figure(data=go.Scatter(x=fPCA_coef[:,0], y=fPCA_coef[:,1], mode='markers+text', text=df.columns))fig.update_traces(textposition='top center')fig.update_layout(autosize=False,width=800,height=700,title_text='<b>Function Principal Components Analysis on 2018 French Temperatures</b>', title_x=0.5,xaxis_title="PC1",yaxis_title="PC2",
)
fig.show()
Image for post
Image for post

Now we can add the different weather patterns on the plot, according to the weathers observed in France:

现在,根据法国观察到的天气,我们可以在地块上添加不同的天气模式:

Image for post

很容易看出聚类与法国观测到的天气的吻合程度。 (It is easy to see how well the clustering fits with the observed weathers in France.)

It is also important to mention that I have chosen the departments arbitrarily according to the places where I live, work and travel frequently but they have not been selected because they were providing good results for this demo. I would expect the same quality of results with other regions.

还要提一提的是,我根据我经常居住,工作和旅行的地点随意选择了部门,但由于他们在此演示中提供了良好的结果,因此未选择这些部门。 我希望结果与其他地区的质量相同。

Maybe you are wondering if a standard PCA would also provide an interesting result?

也许您想知道标准PCA是否还会提供有趣的结果?

The plot here-below of standard PC1 and PC2 extracted from the original dataset shows that it is not performing as well as FPCA:

以下是从原始数据集中提取的标准PC1和PC2的图,显示其性能不如FPCA:

Image for post

I hope this article has provided a better understanding of the Functional Principal Components Analysis to you.

希望本文为您提供了对功能主成分分析的更好理解。

I would also like to warmly thank J. Derek Tucker who has been kind enough to patiently guide me through the use of the FDASRSF package.

我还要衷心感谢J. Derek Tucker,他很友好地耐心指导我使用FDASRSF软件包。

The complete notebook is stored here.

完整的笔记本存储在此处 。

Here are some other articles you might like as well:

以下是您可能还会喜欢的其他一些文章:

翻译自: https://towardsdatascience.com/beyond-classic-pca-functional-principal-components-analysis-fpca-applied-to-time-series-with-python-914c058f47a0

python pca主成分

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391254.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

blender视图缩放_如何使用主视图类型缩放Elm视图

blender视图缩放A concept to help Elm Views scale as applications grow larger and more complicated.当应用程序变得更大和更复杂时&#xff0c;可帮助Elm Views扩展的概念。 In Elm, there are a lot of great ways to scale the Model, and update, but there is more c…

初探Golang(2)-常量和命名规范

1 命名规范 1.1 Go是一门区分大小写的语言。 命名规则涉及变量、常量、全局函数、结构、接口、方法等的命名。 Go语言从语法层面进行了以下限定&#xff1a;任何需要对外暴露的名字必须以大写字母开头&#xff0c;不需要对外暴露的则应该以小写字母开头。 当命名&#xff08…

789

789 转载于:https://www.cnblogs.com/Forever77/p/11524161.html

sql的split()函数

ALTER function [dbo].[StrToList_Test](Str varchar(max), fg NVARCHAR(200)) returns table table(value nvarchar(max) ) as begindeclare tempStr nvarchar(max),len INT LEN(fg); --去除前后分割符 while substring(Str,1,len)fg beginset Strsubstring(Str,len1,len(S…

大数据平台构建_如何像产品一样构建数据平台

大数据平台构建重点 (Top highlight)Over the past few years, many companies have embraced data platforms as an effective way to aggregate, handle, and utilize data at scale. Despite the data platform’s rising popularity, however, little literature exists on…

初探Golang(3)-数据类型

Go语言拥有两大数据类型&#xff0c;基本数据类型和复合数据类型。 1. 数值类型 ##有符号整数 int8&#xff08;-128 -> 127&#xff09; int16&#xff08;-32768 -> 32767&#xff09; int32&#xff08;-2,147,483,648 -> 2,147,483,647&#xff09; int64&#x…

freecodecamp_freeCodeCamp的服务器到底发生了什么?

freecodecampUpdate at 17:00 California time: We have now fixed most of the problems. Were still working on a few known issues, but /learn is now fully operational.加利福尼亚时间17:00更新 &#xff1a;我们现在解决了大多数问题。 我们仍在处理一些已知问题&#…

为什么Linux下的环境变量要用大写而不是小写

境变量的名称通常用大写字母来定义。实际上用小写字母来定义环境变量也不会报错&#xff0c;只是习惯上都是用大写字母来表示的。 首先说明一下&#xff0c;在Windows下是不区分大小写的&#xff0c;所以在Windows下怎么写都能获取到值。 而Linux下不同&#xff0c;区分大小写&…

python:连接Oracle数据库后控制台打印中文为??

打印查询结果&#xff0c;中文显示为了&#xff1f;&#xff1f;&#xff1f; [(72H FCR, 2.0), (?????, 8.0)] E:\Python35\Lib\site-packages中新增文件&#xff1a; sitecustomize.py import os os.environ[NLS_LANG] SIMPLIFIED CHINESE_CHINA.UTF8 转载于:https://w…

时间序列预测 时间因果建模_时间序列建模以预测投资基金的回报

时间序列预测 时间因果建模Time series analysis, discussed ARIMA, auto ARIMA, auto correlation (ACF), partial auto correlation (PACF), stationarity and differencing.时间序列分析&#xff0c;讨论了ARIMA&#xff0c;自动ARIMA&#xff0c;自动相关(ACF)&#xff0c;…

初探Golang(4)-map和流程控制语句

1.map map 是引用类型的&#xff0c;如果声明没有初始化值&#xff0c;默认是nil。空的切片是可以直接使用的&#xff0c;因为他有对应的底层数组,空的map不能直接使用。需要先make之后才能使用。 //1, 声明map 默认值是nil var m1 map[key_data_type]value_data_type 声明 …

网络传输之TCP/IP协议族

我们现实网络无处不在&#xff0c;我们被庞大的虚拟网络包围&#xff0c;但我们却对它是怎样把我们的信息传递并实现通信的&#xff0c;我们并没有了解过&#xff0c;那么当我们在浏览器中出入一段地址&#xff0c;按下回车这背后都会发生什么&#xff1f; 比如说一般场景下&am…

(58)PHP开发

LAMP0、使用include和require命令来包含外部PHP文件。使用include_once命令&#xff0c;但是include和include_once命令相比的不足就是这两个命令并不关心请求的文件是否实际存在&#xff0c;如果不存在&#xff0c;PHP解释器就会直接忽略这个命令并且显示一个错误消息&#xf…

css flexbox模型_如何将Flexbox后备添加到CSS网格

css flexbox模型I shared how to build a calendar with CSS Grid in the previous article. Today, I want to share how to build a Flexbox fallback for the same calendar. 在上一篇文章中&#xff0c;我分享了如何使用CSS Grid构建日历。 今天&#xff0c;我想分享如何为…

python:封装连接数据库方法

config.py # 数据库测试环境 name *** password ****** host_port_sid 10.**.*.**:1521/bidbuat OracleOperation.py import cx_Oracle import configclass OracleOperation(object):# 执行下面的execute_sql方法时会自动执行该初始化方法进行连接数据库def __init__(self):…

贝塞尔修正_贝塞尔修正背后的推理:n-1

贝塞尔修正A standard deviation seems like a simple enough concept. It’s a measure of dispersion of data, and is the root of the summed differences between the mean and its data points, divided by the number of data points…minus one to correct for bias.标…

RESET MASTER和RESET SLAVE使用场景和说明【转】

【前言】在配置主从的时候经常会用到这两个语句&#xff0c;刚开始的时候还不清楚这两个语句的使用特性和使用场景。 经过测试整理了以下文档&#xff0c;希望能对大家有所帮助&#xff1b; 【一】RESET MASTER参数 功能说明&#xff1a;删除所有的binglog日志文件&#xff0c;…

Kubernetes 入门(1)基本概念

1. Kubernetes简介 作为一个目前在生产环境已经广泛使用的开源项目 Kubernetes 被定义成一个用于自动化部署、扩容和管理容器应用的开源系统&#xff1b;它将一个分布式软件的一组容器打包成一个个更容易管理和发现的逻辑单元。 Kubernetes 是希腊语『舵手』的意思&#xff0…

Python程序互斥体

Python程序互斥体 有时候我们需要程序只运行一个实例&#xff0c;在windows平台下我们可以很简单的用mutex实现这个目的。   在开始时&#xff0c;程序创建了一个命名的mutex&#xff0c;这个mutex可以被其他进程检测到。 这样如果程序已经启动&#xff0c;再次运行时的进程就…

890

890 转载于:https://www.cnblogs.com/Forever77/p/11528605.html