arima模型怎么拟合_7个统计测试,用于验证和帮助拟合ARIMA模型

arima模型怎么拟合

什么是ARIMA? (What is ARIMA?)

ARIMA models are one of the most classic and most widely used statistical forecasting techniques when dealing with univariate time series. It basically uses the lag values and lagged forecast errors to predict the feature values.

ARIMA模型是处理单变量时间序列时最经典,使用最广泛的统计预测技术之一。 它基本上使用滞后值滞后的预测误差来预测特征值。

Image for post
Full form of ARIMA (Image created by Pratik Gandhi)
ARIMA的完整形式(Pratik Gandhi创建)
  • AR: using the lags of previous values

    AR:使用先前值的滞后

  • I: non-stationary differencing

    I: 非平稳差分

  • MA: moving average for the error term

    MA: 移动平均线 对于错误项

Some of these terms are very commonly used when working with time-series data. ARIMA models can fit accurately if we deeply understand these terms or components of the data. Following are the few of them:

其中一些术语在处理时间序列数据时非常常用。 如果我们深刻理解数据的这些术语或组成部分,则ARIMA模型可以准确拟合。 以下是其中一些:

趋势: (Trend:)

Data is considered to have a trend when there is an increase or decrease direction in the data. E.g. increase of airline passengers during summer, reduction in a number of customers during weekdays, etc.

当数据中存在增加或减少的方向时,数据被认为具有趋势 。 例如,夏季航空乘客的增加,工作日乘客数量的减少等。

Image for post
Photo by Chris Liverani on Unsplash
Chris Liverani在Unsplash上拍摄的照片

季节性: (Seasonality:)

Data is considered to have a seasonal pattern if the data is influenced by external factors. For instance, growth and fall of leaves are driven by the weather/season of mother nature.

如果数据受外部因素影响,则认为该数据具有季节性模式 。 例如,树叶的生长和下降是由自然的天气/季节驱动的。

Image for post
Photo by Chris Lawton on Unsplash
克里斯·劳顿在《 Unsplash》上的照片

循环性: (Cyclicity:)

Data is considered to have a cyclic component if there are repeated but non-periodic fluctuations. In simple words, if the pattern is caused because of certain circumstances and there is no set amount of time, it can be considered as cyclicity. For instance, the stock market exhibits cyclic behavior with highs and lows due to the occurrence of specific events and the time between such peaks is never precise.

如果出现重复但非周期性的波动,则认为数据具有循环成分 。 简而言之,如果模式是由于某些情况造成的,并且没有固定的时间量,则可以将其视为周期性。 例如,由于特定事件的发生,股票市场表现出周期性的高低波动,而这种高峰之间的时间从来都不是精确的。

白噪声: (White Noise:)

This is the random and irregular component of the time series. In other words, the residuals after extracting trend+seasonality+cyclicity from the signal are mostly considered as white noise. The best example of white noise is when you lost your antenna connection to TV in the 90s (yes I am a 90s kid!).

这是时间序列的随机和不规则部分。 换句话说,从信号中提取趋势+季节+周期性后的残差通常被认为是白噪声。 白噪声的最好例子是在90年代您失去与电视的天线连接(是的,我是90年代的孩子!)。

Image for post
Photo by Fran Jacquier on Unsplash
Fran Jacquier在Unsplash上的照片

平稳性: (Stationarity:)

A time series with constant mean and zero variance is considered to be stationary. A well-known image that always strikes my mind when considering stationarity is:

具有恒定均值零方差的时间序列被认为是平稳的 。 考虑平稳性时,我总是想起一个众所周知的图像:

Image for post
https://beingdatum.com/time-series-forecasting/https://beingdatum.com/time-series-forecasting/

The packages I have used to explain these tests mainly are:

我用来解释这些测试的软件包主要是:

  • statsmodels: https://www.statsmodels.org/stable/index.html

    statsmodels : https : //www.statsmodels.org/stable/index.html

  • pmdarima: http://alkaline-ml.com/pmdarima/index.html

    pmdarima : http : //alkaline-ml.com/pmdarima/index.html

There are a lot of tests but I am going to talk about a few that I have used and helped me in my battle with time series problems:

有很多测试,但是我将讨论一些在时间序列问题上使用并帮助我的测试:

1.增强的Dickey-Fuller(ADF)测试: (1. Augmented Dickey-Fuller (ADF) test:)

Time series should be made stationary using transformation techniques (log, moving average, etc.) before applying ARIMA models. ADF test is a great way and one of the most widely used techniques to confirm if the series is stationary or not. The data can be found on Kaggle. Below is the code:

在应用ARIMA模型之前,应使用变换技术(对数,移动平均值等)使时间序列固定。 ADF测试是一种很好的方法,也是确认系列是否固定的最广泛使用的技术之一。 数据可以在Kaggle上找到。 下面是代码:

Image for post
Difference between Non-Stationary and Stationary Data with their T-statistic value
非平稳数据与平稳数据之间的差异及其T统计量

To make the data stationary we applied some transformation to the data (shown in code above). On calculating the t-statistic value we see that the value is significant and confirms that the data is stationary now!

为了使数据稳定,我们对数据进行了一些转换(如上面的代码所示)。 在计算t统计值时,我们看到该值显着,并确认数据现在处于静止状态!

2. PP测试: (2. PP test:)

PP stands for Phillips-Perron test. In some cases, I in ARIMA which stands for Integral is needed. Differencing of I=1 or 2 mostly does the job. This PP test is a unit root test to confirm that the time series is integrated of order 1. This is also an alternative to the ADF test if want to check stationarity. They have become quite popular in the analysis of financial time series[3]. Below is the code:

PP代表Phillips-Perron测试。 在某些情况下,需要ARIMA中代表Integral的I。 I = 1或2的差异大部分可以完成工作。 此PP测试是单位根测试,用于确认时间序列是否已集成1级。如果要检查平稳性,这也是ADF测试的替代方法。 在金融时间序列分析中,它们已经变得非常流行[3]。 下面是代码:

This will return a boolean value(1 or 0), indicating whether the series is stationary or not.

这将返回一个布尔值(1或0),指示该序列是否平稳。

3. KPSS测试: (3. KPSS Test:)

A widely used test in econometrics is Kwiatkowski–Phillips–Schmidt–Shint or abbreviated as the KPSS test. This test is pretty similar to ADF too and can help to validate the null hypothesis that an observable time series is stationary around a deterministic trend. There is a major disadvantage though that it has a high rate of type-I errors. In such cases, it is often recommended to combine it with the ADF test and check if both of them return the same results[4]. The code is similar to the ADF test as shown below:

计量经济学中广泛使用的测试是Kwiatkowski–Phillips–Schmidt–Shint或简称为KPSS测试。 该测试也与ADF非常相似,并且可以帮助验证可观察的时间序列在确定性趋势附近平稳的零假设。 尽管存在很大的I型错误率,但它有一个主要缺点 。 在这种情况下,通常建议将其与ADF测试结合使用,并检查两者是否返回相同的结果[4]。 该代码类似于ADF测试,如下所示:

Image for post
Difference between Non-Stationary and Stationary Data
非固定数据与固定数据之间的区别

We can see from the image above that before applying the transformation(figure A) the p-value of data is <0.05 and thus it is not stationary. Post transformation(figure B) the p-value becomes 0.1 to. confirm the stationarity of the data.

从上图可以看出,在应用变换之前(图A),数据的p值 <0.05 ,因此它不是平稳的。 转换后(图B), p值变为0.1至。 确认数据的平稳性。

Before we dive into the next tests, it is important to know that ARIMA models may contain seasonal component that can be handled by adding a few more parameters(P, D, Q, m) to our ARIMA equation. We can broadly divide ARIMA type of models into two types:

在我们进行下一个测试之前,重要的是要知道ARIMA模型可能包含季节性分量 ,可以通过在ARIMA方程中添加更多参数(P,D,Q,m)来处理这些分量 。 我们可以将ARIMA类型的模型大致分为两种类型:

  1. ARIMA: Handling Non-seasonal components as explained in the beginning

    ARIMA :如开头所述处理非季节性组件

  2. SARIMA: Seasonal Component + ARIMA

    SARIMA:S easonal组件 + ARIMA

4. CH测试: (4. CH Test:)

The Canova Hansen(CH) test is mainly used to test for seasonal differences and to validate that the null hypothesis that the seasonal pattern is stable over a sample period or it is changing across time. This is mostly helpful in economic or meteorological data[5]. This is already implemented in Python within pmdarima library.

Canova Hansen(CH)检验主要用于检验季节差异并验证零假设,即季节性模式在采样期内是稳定的或随时间而变化。 这对经济或气象数据最有帮助[5]。 这已经在pmdarima库中的Python中实现。

5. OCSB测试: (5. OCSB Test:)

Osborn, Chui, Smith, and Birchenhall (OCSB) test is used to determine if the data needs seasonal differencing (D component of P,D,Q,m). pmdarima package has a predefined function that one can leverage as follows:

Osborn,Chui,Smith和Birchenhall(OCSB)检验用于确定数据是否需要季节性差异(P,D,Q,m的D分量 )。 pmdarima软件包具有一项预定义的功能,可以按以下方式使用:

Here, we have defined m = 12 as it is monthly data. ‘aic’ is default lag_method for assessing performance(lower is better). Refer here for other accepted metrics. The output for this data is 1 as we already know that there is definitely visibility of the seasonal component.

在这里,我们将m = 12定义为月度数据。 “ aic”是用于评估效果的默认lag_method (越低越好)。 有关其他可接受的指标,请参考此处 。 该数据的输出为1,因为我们已经知道季节分量绝对可见。

6.分解图: (6. Decompose Plot:)

This is one of the tools that can really help when you encounter a time series problem. I think of this function is similar to the doctor taking vitals when you first go for a visit. As the vitals might indicate some obvious things in a patient, the decompose plot gives a breakdown of the data and shows if there are any clear trend, seasonality, and the pattern of residuals. Below is the snippet of the code and the output result:

这是遇到时间序列问题时真正有用的工具之一。 我认为此功能类似于您初次去看医生时要注意的重要事项。 由于生命体征可能指示患者中有一些明显的现象,因此分解图会分解数据并显示是否存在任何明确的趋势,季节性和残差模式。 下面是代码段和输出结果:

Image for post
Decomposition Plot: Subplots showing the original data(top), trend, seasonal and residuals(bottom)
分解图:显示原始数据(顶部),趋势,季节性和残差(底部)的子图

7. ACF和PACF图: (7. ACF and PACF Plot:)

ACF and PACF plot stand for Autocorrelation Plot and Partial Autocorrelation Plot respectively. ACF and PACF plot help to determine AR and MA terms needed in a systematic way after the time series has been stationarized. Below are the code for ACF and PACF plots:

ACF和PACF图分别代表自相关图和部分自相关图。 在时间序列平稳后,ACF和PACF图有助于系统地确定所需的AR和MA项。 以下是ACF和PACF图的代码:

Image for post
Autocorrelation Plot for Airline Passengers data
航空公司乘客数据的自相关图
Image for post
Partial Autocorrelation Plot for Airline Passengers data
航空公司乘客数据的部分自相关图

The lags which fall inside the blue shaded region are not considered to be significant. Based on the ACF plot we can say that it is AR13 model meaning AutoRegression with 13 lags would help. Based on the PACF plot we can say that it is MA2 model: Moving Average with 2 lags. There are methods to read these plots and have a good estimate of the order of the ARIMA model.

落在蓝色阴影区域内的滞后被认为不重要。 基于ACF图,我们可以说它是AR13模型,意味着具有13个滞后的自回归将有所帮助。 基于PACF图,我们可以说它是MA2模型: 2个滞后的移动平均线 。 有一些方法可以读取这些图并很好地估计ARIMA模型的阶数。

结论: (Conclusion:)

There are many other statistical tests that can be used other than listed above. However, the tests/tools I mentioned here can be really powerful to understand the data and fit accurate ARIMA models.

除上面列出的以外,还有许多其他统计测试可以使用。 但是,我在这里提到的测试/工具对于理解数据和拟合准确的ARIMA模型确实非常强大。

This is my first attempt to write an article on medium. I have learned a lot from my fellow writers and community and this is the best way I think to share or return some of my experiences back to them.

这是我在媒体上写文章的第一次尝试。 我从其他作家和社区中学到了很多东西,这是我认为与他人分享或回馈自己经验的最好方式。

翻译自: https://medium.com/@pratikkgandhi/7-statistical-tests-to-validate-and-help-to-fit-arima-model-33c5853e2e93

arima模型怎么拟合

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388678.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

jQuery禁止Ajax请求缓存

一 现象 get请求在有些浏览器中会缓存。浏览器不会发送请求&#xff0c;而是使用上次请求获取到的结果。 post请求不会缓存。每次都会发送请求。 二 解决 jQuery提供了禁止Ajax请求缓存的方法&#xff1a; $.ajax({type: "get",url: "http://www.baidu.com?_&…

python 实例

参考 http://developer.51cto.com/art/201804/570408.htm 转载于:https://www.cnblogs.com/artesian0526/p/9552510.html

[WPF]ListView点击列头排序功能实现

[WPF]ListView点击列头排序功能实现 这是一个非常常见的功能&#xff0c;要求也很简单&#xff0c;在Column Header上显示一个小三角表示表示现在是在哪个Header上的正序还是倒序就可以了。微软的MSDN也已经提供了实现方式。微软的方法中&#xff0c;是通过ColumnHeader Templ…

天池幸福感的数据处理_了解幸福感与数据(第1部分)

天池幸福感的数据处理In these exceptional times, the lockdown left many of us with a lot of time to think. Think about the past and the future. Think about our way of life and our achievements. But most importantly, think about what has been and would be ou…

标线markLine的用法

series: [{markLine: {itemStyle: {normal: { lineStyle: { type: solid, color:#000 },label: { show: true, position:left } }},data: [{name: 平均线,// 支持 average, min, maxtype: average},{name: Y 轴值为 100 的水平线,yAxis: 100},[{// 起点和终点的项会共用一个 na…

php pfm 改端口,罗马2ESF和PFM 修改建筑 军团 派系 兵种等等等很多东西的教程

本帖最后由 clueber 于 2013-10-5 12:30 编辑本人是个罗马死忠加修改党&#xff0c;恩&#xff0c;所以分享一下自己的修改心得修改工具为ESF1.0.7和PFM3.0.3首先是ESF修改。ESF可以用来改开局设定和存档&#xff0c;修改开局设定是startpos.esf文件&#xff0c;在存档在我这里…

红草绿叶

从小到大喜欢阴天&#xff0c;喜欢下雨&#xff0c;喜欢那种潮湿的感觉。却又丝毫容不得脚上有一丝的水汽&#xff0c;也极其讨厌穿凉鞋。小时候特别喜欢去山上玩&#xff0c;偷桃子柿子&#xff0c;一切一切都成了美好的回忆&#xff0c;长大了&#xff0c;那些事情就都不复存…

wpf listview 使用

单列&#xff1a; <ListView Grid.Column"1" Height"284" HorizontalAlignment"Left" Margin"64,73,0,0" Name"listView1" VerticalAlignment"Top" Width"310" > <ListView.Items…

php 获取当天到23 59,js 获取当天23点59分59秒 时间戳 (最简单的方法)

原生Ajax 和Jq Ajax前言:这次介绍的是利用ajax与后台进行数据交换的小例子,所以demo必须通过服务器来打开.服务器环境非常好搭建,从网上下载wamp或xampp,一步步安装就ok,然后再把写好的页面放在服务器中指定的 ...『TCP&sol;IP详解——卷一&#xff1a;协议』读书笔记——1…

詹森不等式_注意詹森差距

詹森不等式背景 (Background) In Kaggle’s M5 Forecasting — Accuracy competition, the square root transformation ruined many of my team’s forecasts and led to a selective patching effort in the eleventh hour. Although it turned out well, we were reminded t…

【转载】儒林外史人物——荀玫

写在前面&#xff1a;本博客内容为转载&#xff0c;原文URL&#xff1a;http://blog.sina.com.cn/s/blog_9132ac5b0101iukw.html 说完周进&#xff0c;本应顺着说范进&#xff0c;但我觉得荀玫他们村的事情过于喜感&#xff0c;想先说荀玫。 荀玫简直是儒林中的某类标杆人物&am…

WebM VP8 SDK Usage/关于WebM VP8 SDK的用法

WebM是Google提出的新的网络视频格式&#xff0c;本质上是个MKV的壳&#xff0c;封装VPX中的VP8视频流与Vorbis OGG音频流。目前Firefox、Opera、Chrome都能直接打开WebM视频文件而无需其他任何乱七八糟的插件。我个人倒是很喜欢WebM的OGG音频&#xff0c;虽然在低比特率下不如…

数据分析师 需求分析师_是什么让分析师出色?

数据分析师 需求分析师重点 (Top highlight)Before we dissect the nature of analytical excellence, let’s start with a quick summary of three common misconceptions about analytics from Part 1:在剖析卓越分析的本质之前&#xff0c;让我们从第1部分中对分析的三种常…

JQuery发起ajax请求,并在页面动态的添加元素

页面html代码&#xff1a; <li><div class"coll-tit"><span class"coll-icon"><iclass"sysfont coll-default"></i>全域旅游目的地</span></div><div class"coll-panel"><div c…

arcgis镜像图形工具,ArcGis图形编辑

一、编辑工具条介绍二、草图工具介绍Sketch Tool&#xff1a;使用草图工具来创建点要素或是线或面要素的节点。双击或是F2键结束草图状态&#xff0c;转化为要素。Intersection Tool&#xff1a;使用相交工具在两个线要素相交(或延长相交)的地方创建一个节点。如图&#xff1a;…

MAYA插件入门

我们知道&#xff0c; MAYA 是一个基于结点的插件式软件架构&#xff0c;这种开放式的软件架构是非常优秀的&#xff0c;它可以让用户非常方便地在其基础上开发一些自已想要的插件&#xff0c;从而实现一些特殊的功能或效果。 在MAYA上开发自已的插件&#xff0c;你有3种选择&a…

(原創) 如何使用C++/CLI读/写jpg檔? (.NET) (C++/CLI) (GDI+) (C/C++) (Image Processing)

Abstract因为Computer Vision的作业&#xff0c;之前都是用C# GDI写&#xff0c;但这次的作业要做Grayscale Dilation&#xff0c;想用STL的Generic Algorithm写&#xff0c;但C Standard Library并无法读取jpg档&#xff0c;用其它Library又比较麻烦&#xff0c;所以又回头想…

猫眼电影评论_电影的人群意见和评论家的意见一样好吗?

猫眼电影评论Ryan Bellgardt’s 2018 movie, The Jurassic Games, tells the story of ten death row inmates who must compete for survival in a virtual reality game where they not only fight each other but must also fight dinosaurs which can kill them both in th…

128.Two Sum

题目&#xff1a; Given an array of integers, return indices of the two numbers such that they add up to a specific target. 给定一个整数数组&#xff0c;返回两个数字的索引&#xff0c;使它们相加到特定目标。 You may assume that each input would have exactly on…