arima模型怎么拟合_7个统计测试,用于验证和帮助拟合ARIMA模型

arima模型怎么拟合

什么是ARIMA? (What is ARIMA?)

ARIMA models are one of the most classic and most widely used statistical forecasting techniques when dealing with univariate time series. It basically uses the lag values and lagged forecast errors to predict the feature values.

ARIMA模型是处理单变量时间序列时最经典,使用最广泛的统计预测技术之一。 它基本上使用滞后值滞后的预测误差来预测特征值。

Image for post
Full form of ARIMA (Image created by Pratik Gandhi)
ARIMA的完整形式(Pratik Gandhi创建)
  • AR: using the lags of previous values

    AR:使用先前值的滞后

  • I: non-stationary differencing

    I: 非平稳差分

  • MA: moving average for the error term

    MA: 移动平均线 对于错误项

Some of these terms are very commonly used when working with time-series data. ARIMA models can fit accurately if we deeply understand these terms or components of the data. Following are the few of them:

其中一些术语在处理时间序列数据时非常常用。 如果我们深刻理解数据的这些术语或组成部分,则ARIMA模型可以准确拟合。 以下是其中一些:

趋势: (Trend:)

Data is considered to have a trend when there is an increase or decrease direction in the data. E.g. increase of airline passengers during summer, reduction in a number of customers during weekdays, etc.

当数据中存在增加或减少的方向时,数据被认为具有趋势 。 例如,夏季航空乘客的增加,工作日乘客数量的减少等。

Image for post
Photo by Chris Liverani on Unsplash
Chris Liverani在Unsplash上拍摄的照片

季节性: (Seasonality:)

Data is considered to have a seasonal pattern if the data is influenced by external factors. For instance, growth and fall of leaves are driven by the weather/season of mother nature.

如果数据受外部因素影响,则认为该数据具有季节性模式 。 例如,树叶的生长和下降是由自然的天气/季节驱动的。

Image for post
Photo by Chris Lawton on Unsplash
克里斯·劳顿在《 Unsplash》上的照片

循环性: (Cyclicity:)

Data is considered to have a cyclic component if there are repeated but non-periodic fluctuations. In simple words, if the pattern is caused because of certain circumstances and there is no set amount of time, it can be considered as cyclicity. For instance, the stock market exhibits cyclic behavior with highs and lows due to the occurrence of specific events and the time between such peaks is never precise.

如果出现重复但非周期性的波动,则认为数据具有循环成分 。 简而言之,如果模式是由于某些情况造成的,并且没有固定的时间量,则可以将其视为周期性。 例如,由于特定事件的发生,股票市场表现出周期性的高低波动,而这种高峰之间的时间从来都不是精确的。

白噪声: (White Noise:)

This is the random and irregular component of the time series. In other words, the residuals after extracting trend+seasonality+cyclicity from the signal are mostly considered as white noise. The best example of white noise is when you lost your antenna connection to TV in the 90s (yes I am a 90s kid!).

这是时间序列的随机和不规则部分。 换句话说,从信号中提取趋势+季节+周期性后的残差通常被认为是白噪声。 白噪声的最好例子是在90年代您失去与电视的天线连接(是的,我是90年代的孩子!)。

Image for post
Photo by Fran Jacquier on Unsplash
Fran Jacquier在Unsplash上的照片

平稳性: (Stationarity:)

A time series with constant mean and zero variance is considered to be stationary. A well-known image that always strikes my mind when considering stationarity is:

具有恒定均值零方差的时间序列被认为是平稳的 。 考虑平稳性时,我总是想起一个众所周知的图像:

Image for post
https://beingdatum.com/time-series-forecasting/https://beingdatum.com/time-series-forecasting/

The packages I have used to explain these tests mainly are:

我用来解释这些测试的软件包主要是:

  • statsmodels: https://www.statsmodels.org/stable/index.html

    statsmodels : https : //www.statsmodels.org/stable/index.html

  • pmdarima: http://alkaline-ml.com/pmdarima/index.html

    pmdarima : http : //alkaline-ml.com/pmdarima/index.html

There are a lot of tests but I am going to talk about a few that I have used and helped me in my battle with time series problems:

有很多测试,但是我将讨论一些在时间序列问题上使用并帮助我的测试:

1.增强的Dickey-Fuller(ADF)测试: (1. Augmented Dickey-Fuller (ADF) test:)

Time series should be made stationary using transformation techniques (log, moving average, etc.) before applying ARIMA models. ADF test is a great way and one of the most widely used techniques to confirm if the series is stationary or not. The data can be found on Kaggle. Below is the code:

在应用ARIMA模型之前,应使用变换技术(对数,移动平均值等)使时间序列固定。 ADF测试是一种很好的方法,也是确认系列是否固定的最广泛使用的技术之一。 数据可以在Kaggle上找到。 下面是代码:

Image for post
Difference between Non-Stationary and Stationary Data with their T-statistic value
非平稳数据与平稳数据之间的差异及其T统计量

To make the data stationary we applied some transformation to the data (shown in code above). On calculating the t-statistic value we see that the value is significant and confirms that the data is stationary now!

为了使数据稳定,我们对数据进行了一些转换(如上面的代码所示)。 在计算t统计值时,我们看到该值显着,并确认数据现在处于静止状态!

2. PP测试: (2. PP test:)

PP stands for Phillips-Perron test. In some cases, I in ARIMA which stands for Integral is needed. Differencing of I=1 or 2 mostly does the job. This PP test is a unit root test to confirm that the time series is integrated of order 1. This is also an alternative to the ADF test if want to check stationarity. They have become quite popular in the analysis of financial time series[3]. Below is the code:

PP代表Phillips-Perron测试。 在某些情况下,需要ARIMA中代表Integral的I。 I = 1或2的差异大部分可以完成工作。 此PP测试是单位根测试,用于确认时间序列是否已集成1级。如果要检查平稳性,这也是ADF测试的替代方法。 在金融时间序列分析中,它们已经变得非常流行[3]。 下面是代码:

This will return a boolean value(1 or 0), indicating whether the series is stationary or not.

这将返回一个布尔值(1或0),指示该序列是否平稳。

3. KPSS测试: (3. KPSS Test:)

A widely used test in econometrics is Kwiatkowski–Phillips–Schmidt–Shint or abbreviated as the KPSS test. This test is pretty similar to ADF too and can help to validate the null hypothesis that an observable time series is stationary around a deterministic trend. There is a major disadvantage though that it has a high rate of type-I errors. In such cases, it is often recommended to combine it with the ADF test and check if both of them return the same results[4]. The code is similar to the ADF test as shown below:

计量经济学中广泛使用的测试是Kwiatkowski–Phillips–Schmidt–Shint或简称为KPSS测试。 该测试也与ADF非常相似,并且可以帮助验证可观察的时间序列在确定性趋势附近平稳的零假设。 尽管存在很大的I型错误率,但它有一个主要缺点 。 在这种情况下,通常建议将其与ADF测试结合使用,并检查两者是否返回相同的结果[4]。 该代码类似于ADF测试,如下所示:

Image for post
Difference between Non-Stationary and Stationary Data
非固定数据与固定数据之间的区别

We can see from the image above that before applying the transformation(figure A) the p-value of data is <0.05 and thus it is not stationary. Post transformation(figure B) the p-value becomes 0.1 to. confirm the stationarity of the data.

从上图可以看出,在应用变换之前(图A),数据的p值 <0.05 ,因此它不是平稳的。 转换后(图B), p值变为0.1至。 确认数据的平稳性。

Before we dive into the next tests, it is important to know that ARIMA models may contain seasonal component that can be handled by adding a few more parameters(P, D, Q, m) to our ARIMA equation. We can broadly divide ARIMA type of models into two types:

在我们进行下一个测试之前,重要的是要知道ARIMA模型可能包含季节性分量 ,可以通过在ARIMA方程中添加更多参数(P,D,Q,m)来处理这些分量 。 我们可以将ARIMA类型的模型大致分为两种类型:

  1. ARIMA: Handling Non-seasonal components as explained in the beginning

    ARIMA :如开头所述处理非季节性组件

  2. SARIMA: Seasonal Component + ARIMA

    SARIMA:S easonal组件 + ARIMA

4. CH测试: (4. CH Test:)

The Canova Hansen(CH) test is mainly used to test for seasonal differences and to validate that the null hypothesis that the seasonal pattern is stable over a sample period or it is changing across time. This is mostly helpful in economic or meteorological data[5]. This is already implemented in Python within pmdarima library.

Canova Hansen(CH)检验主要用于检验季节差异并验证零假设,即季节性模式在采样期内是稳定的或随时间而变化。 这对经济或气象数据最有帮助[5]。 这已经在pmdarima库中的Python中实现。

5. OCSB测试: (5. OCSB Test:)

Osborn, Chui, Smith, and Birchenhall (OCSB) test is used to determine if the data needs seasonal differencing (D component of P,D,Q,m). pmdarima package has a predefined function that one can leverage as follows:

Osborn,Chui,Smith和Birchenhall(OCSB)检验用于确定数据是否需要季节性差异(P,D,Q,m的D分量 )。 pmdarima软件包具有一项预定义的功能,可以按以下方式使用:

Here, we have defined m = 12 as it is monthly data. ‘aic’ is default lag_method for assessing performance(lower is better). Refer here for other accepted metrics. The output for this data is 1 as we already know that there is definitely visibility of the seasonal component.

在这里,我们将m = 12定义为月度数据。 “ aic”是用于评估效果的默认lag_method (越低越好)。 有关其他可接受的指标,请参考此处 。 该数据的输出为1,因为我们已经知道季节分量绝对可见。

6.分解图: (6. Decompose Plot:)

This is one of the tools that can really help when you encounter a time series problem. I think of this function is similar to the doctor taking vitals when you first go for a visit. As the vitals might indicate some obvious things in a patient, the decompose plot gives a breakdown of the data and shows if there are any clear trend, seasonality, and the pattern of residuals. Below is the snippet of the code and the output result:

这是遇到时间序列问题时真正有用的工具之一。 我认为此功能类似于您初次去看医生时要注意的重要事项。 由于生命体征可能指示患者中有一些明显的现象,因此分解图会分解数据并显示是否存在任何明确的趋势,季节性和残差模式。 下面是代码段和输出结果:

Image for post
Decomposition Plot: Subplots showing the original data(top), trend, seasonal and residuals(bottom)
分解图:显示原始数据(顶部),趋势,季节性和残差(底部)的子图

7. ACF和PACF图: (7. ACF and PACF Plot:)

ACF and PACF plot stand for Autocorrelation Plot and Partial Autocorrelation Plot respectively. ACF and PACF plot help to determine AR and MA terms needed in a systematic way after the time series has been stationarized. Below are the code for ACF and PACF plots:

ACF和PACF图分别代表自相关图和部分自相关图。 在时间序列平稳后,ACF和PACF图有助于系统地确定所需的AR和MA项。 以下是ACF和PACF图的代码:

Image for post
Autocorrelation Plot for Airline Passengers data
航空公司乘客数据的自相关图
Image for post
Partial Autocorrelation Plot for Airline Passengers data
航空公司乘客数据的部分自相关图

The lags which fall inside the blue shaded region are not considered to be significant. Based on the ACF plot we can say that it is AR13 model meaning AutoRegression with 13 lags would help. Based on the PACF plot we can say that it is MA2 model: Moving Average with 2 lags. There are methods to read these plots and have a good estimate of the order of the ARIMA model.

落在蓝色阴影区域内的滞后被认为不重要。 基于ACF图,我们可以说它是AR13模型,意味着具有13个滞后的自回归将有所帮助。 基于PACF图,我们可以说它是MA2模型: 2个滞后的移动平均线 。 有一些方法可以读取这些图并很好地估计ARIMA模型的阶数。

结论: (Conclusion:)

There are many other statistical tests that can be used other than listed above. However, the tests/tools I mentioned here can be really powerful to understand the data and fit accurate ARIMA models.

除上面列出的以外,还有许多其他统计测试可以使用。 但是,我在这里提到的测试/工具对于理解数据和拟合准确的ARIMA模型确实非常强大。

This is my first attempt to write an article on medium. I have learned a lot from my fellow writers and community and this is the best way I think to share or return some of my experiences back to them.

这是我在媒体上写文章的第一次尝试。 我从其他作家和社区中学到了很多东西,这是我认为与他人分享或回馈自己经验的最好方式。

翻译自: https://medium.com/@pratikkgandhi/7-statistical-tests-to-validate-and-help-to-fit-arima-model-33c5853e2e93

arima模型怎么拟合

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388678.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

[WPF]ListView点击列头排序功能实现

[WPF]ListView点击列头排序功能实现 这是一个非常常见的功能&#xff0c;要求也很简单&#xff0c;在Column Header上显示一个小三角表示表示现在是在哪个Header上的正序还是倒序就可以了。微软的MSDN也已经提供了实现方式。微软的方法中&#xff0c;是通过ColumnHeader Templ…

天池幸福感的数据处理_了解幸福感与数据(第1部分)

天池幸福感的数据处理In these exceptional times, the lockdown left many of us with a lot of time to think. Think about the past and the future. Think about our way of life and our achievements. But most importantly, think about what has been and would be ou…

红草绿叶

从小到大喜欢阴天&#xff0c;喜欢下雨&#xff0c;喜欢那种潮湿的感觉。却又丝毫容不得脚上有一丝的水汽&#xff0c;也极其讨厌穿凉鞋。小时候特别喜欢去山上玩&#xff0c;偷桃子柿子&#xff0c;一切一切都成了美好的回忆&#xff0c;长大了&#xff0c;那些事情就都不复存…

詹森不等式_注意詹森差距

詹森不等式背景 (Background) In Kaggle’s M5 Forecasting — Accuracy competition, the square root transformation ruined many of my team’s forecasts and led to a selective patching effort in the eleventh hour. Although it turned out well, we were reminded t…

数据分析师 需求分析师_是什么让分析师出色?

数据分析师 需求分析师重点 (Top highlight)Before we dissect the nature of analytical excellence, let’s start with a quick summary of three common misconceptions about analytics from Part 1:在剖析卓越分析的本质之前&#xff0c;让我们从第1部分中对分析的三种常…

JQuery发起ajax请求,并在页面动态的添加元素

页面html代码&#xff1a; <li><div class"coll-tit"><span class"coll-icon"><iclass"sysfont coll-default"></i>全域旅游目的地</span></div><div class"coll-panel"><div c…

MAYA插件入门

我们知道&#xff0c; MAYA 是一个基于结点的插件式软件架构&#xff0c;这种开放式的软件架构是非常优秀的&#xff0c;它可以让用户非常方便地在其基础上开发一些自已想要的插件&#xff0c;从而实现一些特殊的功能或效果。 在MAYA上开发自已的插件&#xff0c;你有3种选择&a…

(原創) 如何使用C++/CLI读/写jpg檔? (.NET) (C++/CLI) (GDI+) (C/C++) (Image Processing)

Abstract因为Computer Vision的作业&#xff0c;之前都是用C# GDI写&#xff0c;但这次的作业要做Grayscale Dilation&#xff0c;想用STL的Generic Algorithm写&#xff0c;但C Standard Library并无法读取jpg档&#xff0c;用其它Library又比较麻烦&#xff0c;所以又回头想…

猫眼电影评论_电影的人群意见和评论家的意见一样好吗?

猫眼电影评论Ryan Bellgardt’s 2018 movie, The Jurassic Games, tells the story of ten death row inmates who must compete for survival in a virtual reality game where they not only fight each other but must also fight dinosaurs which can kill them both in th…

c#对文件的读写

最近需要对一个文件进行数量的分割&#xff0c;因为数据量庞大&#xff0c;所以就想到了通过写程序来处理。将代码贴出来以备以后使用。 //读取文件的内容 放置于StringBuilder 中 StreamReader sr new StreamReader(path, Encoding.Default); String line; StringBuilder sb …

ai前沿公司_美术是AI的下一个前沿吗?

ai前沿公司In 1950, Alan Turing developed the Turing Test as a test of a machine’s ability to display human-like intelligent behavior. In his prolific paper, he posed the following questions:1950年&#xff0c;阿兰图灵开发的图灵测试作为一台机器的显示类似人类…

关于WKWebView高度的问题的解决

关于WKWebView高度的问题的解决 IOS端嵌入网页的方式有两种UIWebView和WKWebView。其中WKWebView的性能要高些;WKWebView的使用也相对简单 WKWebView在加载完成后&#xff0c;在相应的代理里面获取其内容高度&#xff0c;大多数网上的方法在获取高度是会出现一定的问题&#xf…

测试nignx php请求并发数,nginx 优化(突破十万并发)

一般来说nginx 配置文件中对优化比较有作用的为以下几项&#xff1a;worker_processes 8;nginx 进程数&#xff0c;建议按照cpu 数目来指定&#xff0c;一般为它的倍数。worker_cpu_affinity 00000001 00000010 00000100 00001000 00010000 00100000 01000000 10000000;为每个进…

mardown 标题带数字_标题中带有数字的故事更成功吗?

mardown 标题带数字统计 (Statistics) I have read a few stories on Medium about writing advice, and there were some of them which, along with other tips, suggested that putting numbers in your story’s title will increase the number of views, as people tend …

使用Pandas 1.1.0进行稳健的2个DataFrames验证

Pandas is one of the most used Python library for both data scientist and data engineers. Today, I want to share some Python tips to help us do qualification checks between 2 Dataframes.Pandas是数据科学家和数据工程师最常用的Python库之一。 今天&#xff0c;我…

置信区间的置信区间_什么是置信区间,为什么人们使用它们?

置信区间的置信区间I’m going to try something a little different today, in which I combine two (completely unrelated) topics I love talking about, and hopefully create something that is interesting and educational.今天&#xff0c;我将尝试一些与众不同的东西…

php中wlog是什么意思,d-log模式是什么意思

D-Log是一种高动态范围的视频素材记录格式&#xff0c;总而言之这个色彩模式为后期调色提供了更大的空间。在相机和摄影机拍摄时&#xff0c;一颗高性能的传感器通常支持11档以上的动态范围&#xff0c;而在8bit的照片或视频上&#xff0c;以符合人眼感知的Gamma进行机内处理和…

PowerShell入门(三):如何快速地掌握PowerShell?

如何快速地掌握PowerShell呢&#xff1f;总的来说&#xff0c;就是要尽可能多的使用它&#xff0c;就像那句谚语说的&#xff1a;Practice makes perfect。当然这里还有一些原则和方法让我们可以遵循。 有效利用交互式环境 一般来说&#xff0c;PowerShell有两个主要的运行环境…

pca 主成分分析_通过主成分分析(PCA)了解您的数据并发现潜在模式

pca 主成分分析Save time, resources and stay healthy with data exploration that goes beyond means, distributions and correlations: Leverage PCA to see through the surface of variables. It saves time and resources, because it uncovers data issues before an h…

UML-- plantUML安装

plantUML安装 因为基于intellid idea,所以第一步自行安装.setting->plugins 搜索plantUML安装完成后&#xff0c;重启idea 会有如下显示安装Graphviz 下载地址 https://graphviz.gitlab.io/_pages/Download/Download_windows.html配置Graphviz环境变量&#xff1a; dot -ver…