使用TensorFlow概率预测航空乘客人数

TensorFlow Probability uses structural time series models to conduct time series forecasting. In particular, this library allows for a “scenario analysis” form of modelling — whereby various forecasts regarding the future are made.

TensorFlow概率使用结构时间序列模型进行时间序列预测。 尤其是,该库允许进行“情景分析”形式的建模,从而做出有关未来的各种预测。

Structural time series modelling takes the inherent characteristics of the time series into account when making forecasts. This includes factors such as the local linear trend, seasonal, residual and autoregressive components. The greater the variation surrounding these components — the more uncertain the forecast.

结构时间序列建模在进行预测时会考虑时间序列的固有特征。 这包括局部线性趋势季节残差自回归成分等因素。 这些组件之间的差异越大,预测就越不确定。

The examples illustrated in this article use the template from the Structural Time Series modeling in TensorFlow Probability tutorial, of which the original authors (Copyright 2019 The TensorFlow Authors) have made available under the Apache 2.0 license.

本文中说明的示例使用TensorFlow概率教程中的结构时间序列建模中的模板,该模板的原始作者(Copyright 2019 The TensorFlow Authors)已获得Apache 2.0许可。

联合航空旅客数据 (United Airlines Passenger Data)

For this example, a structural time series model is built in TensorFlow Probability to forecast air passenger data. The data is sourced from San Francisco Open Data: Air Traffic Passenger Statistics.

对于此示例,在TensorFlow概率中构建了一个结构时间序列模型来预测航空乘客数据。 该数据来自“旧金山开放数据:空中交通旅客统计” 。

In particular, passenger numbers for United Airlines from February 2014 — June 2020 are analysed. The specific segment of passengers analysed are enplaned, domestic, departing from Terminal 3 at Boarding Area E.

特别是分析了2014年2月至2020年6月联合航空的乘客数量。 从3号航站楼E登机区出发的经过分析的特定旅客是国内旅客。

Here is a visual overview of the time series:

这是时间序列的直观概述:

We can see that passenger numbers have traditionally ranged between 200,000 to 350,000 — before plummeting to a low of 7,115 in May 2020.

我们可以看到,旅客人数传统上介于200,000至350,000之间,然后在2020年5月跌至7,115的低点。

It is wishful thinking to expect that any time series model would have been able to forecast this — such a drop was very sudden and completely out of line with the overall trend.

一厢情愿的期望是,任何时间序列模型都能够预测到这一点-这种下降是非常突然的,并且与总体趋势完全不符。

However, could TensorFlow Probability have potentially identified a drop of a similar scale? Let’s find out.

但是,TensorFlow概率是否有可能识别出类似规模的下降? 让我们找出答案。

TensorFlow概率模型 (TensorFlow Probability Model)

The model is fitted with a local linear trend, along with a monthly seasonal effect.

该模型符合局部线性趋势以及每月的季节性影响。

def build_model(observed_time_series):
trend = sts.LocalLinearTrend(observed_time_series=observed_time_series)
seasonal = tfp.sts.Seasonal(
num_seasons=12, observed_time_series=observed_time_series)
residual_level = tfp.sts.Autoregressive(
order=1,
observed_time_series=observed_time_series, name='residual')
autoregressive = sts.Autoregressive(
order=1,
observed_time_series=observed_time_series,
name='autoregressive')
model = sts.Sum([trend, seasonal, residual_level, autoregressive], observed_time_series=observed_time_series)
return model

Note that since autocorrelation is detected as being present in the series — an autoregressive component is also added to the model.

请注意,由于检测到序列中存在自相关,因此还将自回归分量添加到模型中。

Here is a plot of the autocorrelation function for the series:

这是该系列的自相关函数的图:

Image for post
Source: Jupyter Notebook Output
资料来源:Jupyter Notebook输出

The time series is split into training and test data for the purposes of comparing the forecasts with the actual values.

时间序列分为训练和测试数据,目的是将预测值与实际值进行比较。

The forecast is made using the assumption of a posterior distribution — that is, a distribution comprised of the prior distribution (prior data) and a likelihood function.

预测是使用后验分布 (即由先验分布(先验数据)和似然函数组成的分布)的假设进行的。

Image for post
Source: Image Created by Author
资料来源:作者创作的图片

In order to effect this forecast, the TensorFlow Probability model minimises the loss in the variational posterior as follows:

为了实现此预测,TensorFlow概率模型将变后验中的损失最小化,如下所示:

#@title Minimize the variational loss.# Allow external control of optimization to reduce test runtimes.
num_variational_steps = 200 # @param { isTemplate: true}
num_variational_steps = int(num_variational_steps)optimizer = tf.optimizers.Adam(learning_rate=.1)
# Using fit_surrogate_posterior to build and optimize the variational loss function.@tf.function(experimental_compile=True)
def train():
elbo_loss_curve = tfp.vi.fit_surrogate_posterior(
target_log_prob_fn=tseries_model.joint_log_prob(
observed_time_series=tseries_training_data),
surrogate_posterior=variational_posteriors,
optimizer=optimizer,
num_steps=num_variational_steps)
return elbo_loss_curveelbo_loss_curve = train()plt.plot(elbo_loss_curve)
plt.title("Loss curve")
plt.show()# Draw samples from the variational posterior.
q_samples_tseries_ = variational_posteriors.sample(50)

Here is a visual of the loss curve:

这是损耗曲线的外观:

Image for post
Source: TensorFlow Probability
资料来源:TensorFlow概率

预报 (Forecasts)

20 samples (or 20 separate forecasts) are made using the model:

使用该模型制作了20个样本(或20个单独的预测):

# Number of scenarios
num_samples=20tseries_forecast_mean, tseries_forecast_scale, tseries_forecast_samples = (
tseries_forecast_dist.mean().numpy()[..., 0],
tseries_forecast_dist.stddev().numpy()[..., 0],
tseries_forecast_dist.sample(num_samples).numpy()[..., 0])

Here is a plot of the forecasts:

这是预测的图:

Image for post
Source: TensorFlow Probability
资料来源:TensorFlow概率

We can see that while the worst case scenario forecasted a drop to 150,000 passengers — the model generally could not forecast the sharp drop we have seen in passenger numbers.

我们可以看到,即使在最坏的情况下,预测的乘客量将下降到15万人,但该模型通常无法预测我们所看到的乘客人数的急剧下降。

Here is an overview of the time series components:

以下是时间序列组件的概述:

Image for post
Source: TensorFlow Probability
资料来源:TensorFlow概率

In particular, we can see that towards the end of the series — we see a widening of variation in the autoregressive and seasonal components — indicating that the forecasts have become more uncertain as a result of this higher variation.

特别是,我们可以看到在系列末期(我们看到自回归和季节成分的变化范围扩大了),这表明由于这种较高的变化,预测变得更加不确定。

However, what if we were to shorten the time series? Let’s rebuild the model using data from January 2017 onwards and see how this affects the forecast.

但是,如果我们要缩短时间序列怎么办? 让我们使用2017年1月以后的数据重建模型,看看这如何影响预测。

Image for post
Source: TensorFlow Probability
资料来源:TensorFlow概率

We can see that the “worst-case scenario” forecast comes in at roughly 70,000 or so. While this is still significantly above the actual drop in passenger numbers — this model is doing a better job at indicating that a sharp drop in passenger numbers potentially lies ahead.

我们可以看到“最坏情况”的预测大约为70,000。 尽管这仍大大高于实际的乘客人数下降,但该模型在表明潜在的乘客人数急剧下降方面做得更好。

Let’s analyse the time series components for this forecast:

让我们分析此预测的时间序列成分:

Image for post
Source: TensorFlow Probability
资料来源:TensorFlow概率

Unlike in the last forecast, we can see that the autoregressive, residual and seasonal components are actually narrowing in this instance — indicating more certainty behind the forecasts. In this regard, incorporating more recent data into this forecast has allowed the model to determine that a significant drop in passenger numbers could lie ahead — which ultimately came to pass.

与上次预测不同,我们可以看到在这种情况下自回归,残差和季节性成分实际上正在缩小,这表明预测的确定性更高。 在这方面,将更多最新数据纳入此预测已使该模型能够确定未来可能会出现旅客数量的大幅下降,而这种下降最终将成为现实。

Note that a main forecast (as indicated by the dashed orange line) is also given. Under normal circumstances, the model indicates that while there would have been a dip in passenger numbers to 200,000 — numbers would have rebounded to 250,000 in June. This is still less than the nearly 300,000 passengers recorded for the month of June — indicating that downward pressure on passenger numbers was an issue before COVID-19 — though nowhere near to that which has actually transpired, of course.

注意,还给出了主要预测(如橙色虚线所示)。 在正常情况下,该模型表明,尽管旅客人数将下降至20万人,但6月份的人数将回升至25万人。 这仍然低于6月份记录的近30万名乘客-这表明在COVID-19之前,乘客人数的下降压力是一个问题-当然,距离实际发生的事情还差得很远。

结论 (Conclusion)

This has been an overview of how TensorFlow Probability can be used to conduct forecasts — in this case using air passenger data.

这是如何使用TensorFlow概率进行预测的概述-在这种情况下,使用航空乘客数据。

Hope you found this article of use, and any feedback or comments are greatly welcomed. The code and datasets for this example can be found at my GitHub repository here.

希望您能找到本文的使用,并欢迎任何反馈或意见。 该示例的代码和数据集可以在我的GitHub存储库中找到 。

Disclaimer: This article is written on an “as is” basis and without warranty. It was written with the intention of providing an overview of data science concepts, and should not be interpreted as professional advice in any way.

免责声明:本文按“原样”撰写,不作任何担保。 它旨在提供数据科学概念的概述,并且不应以任何方式解释为专业建议。

翻译自: https://towardsdatascience.com/forecasting-air-passenger-numbers-with-tensorflow-probability-1b53e5e5fea2

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388474.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

python画激活函数图像

导入必要的库 import math import matplotlib.pyplot as plt import numpy as np import matplotlib as mpl mpl.rcParams[axes.unicode_minus] False 绘制softmax函数图像 fig plt.figure(figsize(6,4)) ax fig.add_subplot(111) x np.linspace(-10,10) y sigmoid(x)ax.s…

pdf.js插件使用记录,在线打开pdf

pdf.js插件使用记录,在线打开pdf 原文:pdf.js插件使用记录,在线打开pdf天记录一个js库:pdf.js。主要是实现在线打开pdf功能。因为项目需求需要能在线查看pdf文档,所以就研究了一下这个控件。 有些人很好奇,在线打开pdf…

程序员 sql面试_非程序员SQL使用指南

程序员 sql面试Today, the word of the moment is DATA, this little combination of 4 letters is transforming how all companies and their employees work, but most people don’t really know how data behaves or how to access it and they also think that this is j…

r a/b 测试_R中的A / B测试

r a/b 测试什么是A / B测试? (What is A/B Testing?) A/B testing is a method used to test whether the response rate is different for two variants of the same feature. For instance, you may want to test whether a specific change to your website lik…

Java基础回顾

内容: 1、Java中的数据类型 2、引用类型的使用 3、IO流及读写文件 4、对象的内存图 5、this的作用及本质 6、匿名对象 1、Java中的数据类型 Java中的数据类型有如下两种: 基本数据类型: 4类8种 byte(1) boolean(1) short(2) char(2) int(4) float(4) l…

计算机部分应用显示模糊,win10系统打开部分软件字体总显示模糊的解决方法-电脑自学网...

win10系统打开部分软件字体总显示模糊的解决方法。方法一:win10软件字体模糊1、首先,在Win10的桌面点击鼠标右键,选择“显示设置”。2、在“显示设置”的界面下方,点击“高级显示设置”。3、在“高级显示设置”的界面中&#xff0…

Tomcat调节

Tomcat默认可以使用的内存为128MB,在较大型的应用项目中,这点内存是不够的,需要调大,并且Tomcat本身不能直接在计算机上运行,需要依赖于硬件基础之上的操作系统和一个java虚拟机。 AD: 这里向大家描述一下如何使用Tom…

turtle 20秒画完小猪佩奇“社会人”

转载:https://blog.csdn.net/csdnsevenn/article/details/80650456 图片源自网络 作者 丁彦军 如需转载,请联系原作者授权。 今年社交平台上最火的带货女王是谁?范冰冰?杨幂?Angelababy?不,是猪…

最佳子集aic选择_AutoML的起源:最佳子集选择

最佳子集aic选择As there is a lot of buzz about AutoML, I decided to write about the original AutoML; step-wise regression and best subset selection. Then I decided to ignore step-wise regression because it is bad and should probably stop being taught. That…

Java虚拟机内存溢出

最近在看周志明的《深入理解Java虚拟机》,虽然刚刚开始看,但是觉得还是一本不错的书。对于和我一样对于JVM了解不深,有志进一步了解的人算是一本不错的书。注明:不是书托,同样是华章出的书,质量要比《深入剖…

用户输入汉字时计算机首先将,用户输入汉字时,计算机首先将汉字的输入码转换为__________。...

用户的蓄的形能器常见式有。输入时计算机首先输入包括药物具有基的酚羟。汉字换物包腺皮括质激肾上素药。对既荷又有线有相间负负荷时,将汉倍作为等选取相负效三相负荷乘荷最大,将汉相负荷换荷应先将线间负算为,效三相负荷时在计算等&#xf…

从最终用户角度来看外部结构_从不同角度来看您最喜欢的游戏

从最终用户角度来看外部结构The complete python code and Exploratory Data Analysis Notebook are available at my github profile;完整的python代码和Exploratory Data Analysis Notebook可在我的github个人资料中找到 ; Pokmon is a Japanese media franchise,…

apache+tomcat配置

无意间看到tomcat 6集群的内容,就尝试配置了一下,还是遇到很多问题,特此记录。apache服务器和tomcat的连接方法其实有三种:JK、http_proxy和ajp_proxy。本文主要介绍最为常见的JK。 环境:PC2台:pc1(IP 192.168.88.118…

记自己在spring中使用redis遇到的两个坑

本人在spring中使用redis作为缓存时&#xff0c;遇到两个坑&#xff0c;现在记录如下&#xff0c;算是作为自己的备忘吧&#xff0c;文笔不好&#xff0c;望大家见谅&#xff1b; 一、配置文件 1 <!-- 加载Properties文件 -->2 <bean id"configurer" cl…

Azure实践之如何批量为资源组虚拟机创建alert

通过上一篇的简介&#xff0c;相信各位对于简单的创建alert&#xff0c;以及Azure monitor使用以及大概有个印象了。基础的使用总是非常简单的&#xff0c;这里再分享一个常用的alert使用方法实际工作中&#xff0c;不管是日常运维还是做项目&#xff0c;我们都需要知道VM的实际…

管道过滤模式 大数据_大数据管道配方

管道过滤模式 大数据介绍 (Introduction) If you are starting with Big Data it is common to feel overwhelmed by the large number of tools, frameworks and options to choose from. In this article, I will try to summarize the ingredients and the basic recipe to …

DevOps时代,企业数字化转型需要强大的工具链

伴随时代的飞速进步&#xff0c;中国的人口红利带来了互联网业务的快速发展&#xff0c;巨大的流量也带动了技术的不断革新&#xff0c;研发的模式也在不断变化。传统企业纷纷效仿互联网的做法&#xff0c;结合DevOps进行数字化的转型。通常提到DevOps&#xff0c;大家浮现在脑…

用户体验可视化指南pdf_R中增强可视化的初学者指南

用户体验可视化指南pdfLearning to build complete visualizations in R is like any other data science skill, it’s a journey. RStudio’s ggplot2 is a useful package for telling data’s story, so if you are newer to ggplot2 and would love to develop your visua…

linux挂载磁盘阵列

linux挂载磁盘阵列 在许多项目中&#xff0c;都会把数据存放于磁盘阵列&#xff0c;以确保数据安全或者实现负载均衡。在初始安装数据库系统和数据恢复时&#xff0c;都需要先挂载磁盘阵列到系统中。本文记录一次在linux系统中挂载磁盘的操作步骤&#xff0c;以及注意事项。 此…

sql横着连接起来sql_SQL联接的简要介绍(到目前为止)

sql横着连接起来sqlSQL Join是什么意思&#xff1f; (What does a SQL Join mean?) A SQL join describes the process of merging rows in two different tables or files together.SQL连接描述了将两个不同表或文件中的行合并在一起的过程。 Rows of data are combined bas…