使用TensorFlow概率预测航空乘客人数

TensorFlow Probability uses structural time series models to conduct time series forecasting. In particular, this library allows for a “scenario analysis” form of modelling — whereby various forecasts regarding the future are made.

TensorFlow概率使用结构时间序列模型进行时间序列预测。 尤其是,该库允许进行“情景分析”形式的建模,从而做出有关未来的各种预测。

Structural time series modelling takes the inherent characteristics of the time series into account when making forecasts. This includes factors such as the local linear trend, seasonal, residual and autoregressive components. The greater the variation surrounding these components — the more uncertain the forecast.

结构时间序列建模在进行预测时会考虑时间序列的固有特征。 这包括局部线性趋势季节残差自回归成分等因素。 这些组件之间的差异越大,预测就越不确定。

The examples illustrated in this article use the template from the Structural Time Series modeling in TensorFlow Probability tutorial, of which the original authors (Copyright 2019 The TensorFlow Authors) have made available under the Apache 2.0 license.

本文中说明的示例使用TensorFlow概率教程中的结构时间序列建模中的模板,该模板的原始作者(Copyright 2019 The TensorFlow Authors)已获得Apache 2.0许可。

联合航空旅客数据 (United Airlines Passenger Data)

For this example, a structural time series model is built in TensorFlow Probability to forecast air passenger data. The data is sourced from San Francisco Open Data: Air Traffic Passenger Statistics.

对于此示例,在TensorFlow概率中构建了一个结构时间序列模型来预测航空乘客数据。 该数据来自“旧金山开放数据:空中交通旅客统计” 。

In particular, passenger numbers for United Airlines from February 2014 — June 2020 are analysed. The specific segment of passengers analysed are enplaned, domestic, departing from Terminal 3 at Boarding Area E.

特别是分析了2014年2月至2020年6月联合航空的乘客数量。 从3号航站楼E登机区出发的经过分析的特定旅客是国内旅客。

Here is a visual overview of the time series:

这是时间序列的直观概述:

We can see that passenger numbers have traditionally ranged between 200,000 to 350,000 — before plummeting to a low of 7,115 in May 2020.

我们可以看到,旅客人数传统上介于200,000至350,000之间,然后在2020年5月跌至7,115的低点。

It is wishful thinking to expect that any time series model would have been able to forecast this — such a drop was very sudden and completely out of line with the overall trend.

一厢情愿的期望是,任何时间序列模型都能够预测到这一点-这种下降是非常突然的,并且与总体趋势完全不符。

However, could TensorFlow Probability have potentially identified a drop of a similar scale? Let’s find out.

但是,TensorFlow概率是否有可能识别出类似规模的下降? 让我们找出答案。

TensorFlow概率模型 (TensorFlow Probability Model)

The model is fitted with a local linear trend, along with a monthly seasonal effect.

该模型符合局部线性趋势以及每月的季节性影响。

def build_model(observed_time_series):
trend = sts.LocalLinearTrend(observed_time_series=observed_time_series)
seasonal = tfp.sts.Seasonal(
num_seasons=12, observed_time_series=observed_time_series)
residual_level = tfp.sts.Autoregressive(
order=1,
observed_time_series=observed_time_series, name='residual')
autoregressive = sts.Autoregressive(
order=1,
observed_time_series=observed_time_series,
name='autoregressive')
model = sts.Sum([trend, seasonal, residual_level, autoregressive], observed_time_series=observed_time_series)
return model

Note that since autocorrelation is detected as being present in the series — an autoregressive component is also added to the model.

请注意,由于检测到序列中存在自相关,因此还将自回归分量添加到模型中。

Here is a plot of the autocorrelation function for the series:

这是该系列的自相关函数的图:

Image for post
Source: Jupyter Notebook Output
资料来源:Jupyter Notebook输出

The time series is split into training and test data for the purposes of comparing the forecasts with the actual values.

时间序列分为训练和测试数据,目的是将预测值与实际值进行比较。

The forecast is made using the assumption of a posterior distribution — that is, a distribution comprised of the prior distribution (prior data) and a likelihood function.

预测是使用后验分布 (即由先验分布(先验数据)和似然函数组成的分布)的假设进行的。

Image for post
Source: Image Created by Author
资料来源:作者创作的图片

In order to effect this forecast, the TensorFlow Probability model minimises the loss in the variational posterior as follows:

为了实现此预测,TensorFlow概率模型将变后验中的损失最小化,如下所示:

#@title Minimize the variational loss.# Allow external control of optimization to reduce test runtimes.
num_variational_steps = 200 # @param { isTemplate: true}
num_variational_steps = int(num_variational_steps)optimizer = tf.optimizers.Adam(learning_rate=.1)
# Using fit_surrogate_posterior to build and optimize the variational loss function.@tf.function(experimental_compile=True)
def train():
elbo_loss_curve = tfp.vi.fit_surrogate_posterior(
target_log_prob_fn=tseries_model.joint_log_prob(
observed_time_series=tseries_training_data),
surrogate_posterior=variational_posteriors,
optimizer=optimizer,
num_steps=num_variational_steps)
return elbo_loss_curveelbo_loss_curve = train()plt.plot(elbo_loss_curve)
plt.title("Loss curve")
plt.show()# Draw samples from the variational posterior.
q_samples_tseries_ = variational_posteriors.sample(50)

Here is a visual of the loss curve:

这是损耗曲线的外观:

Image for post
Source: TensorFlow Probability
资料来源:TensorFlow概率

预报 (Forecasts)

20 samples (or 20 separate forecasts) are made using the model:

使用该模型制作了20个样本(或20个单独的预测):

# Number of scenarios
num_samples=20tseries_forecast_mean, tseries_forecast_scale, tseries_forecast_samples = (
tseries_forecast_dist.mean().numpy()[..., 0],
tseries_forecast_dist.stddev().numpy()[..., 0],
tseries_forecast_dist.sample(num_samples).numpy()[..., 0])

Here is a plot of the forecasts:

这是预测的图:

Image for post
Source: TensorFlow Probability
资料来源:TensorFlow概率

We can see that while the worst case scenario forecasted a drop to 150,000 passengers — the model generally could not forecast the sharp drop we have seen in passenger numbers.

我们可以看到,即使在最坏的情况下,预测的乘客量将下降到15万人,但该模型通常无法预测我们所看到的乘客人数的急剧下降。

Here is an overview of the time series components:

以下是时间序列组件的概述:

Image for post
Source: TensorFlow Probability
资料来源:TensorFlow概率

In particular, we can see that towards the end of the series — we see a widening of variation in the autoregressive and seasonal components — indicating that the forecasts have become more uncertain as a result of this higher variation.

特别是,我们可以看到在系列末期(我们看到自回归和季节成分的变化范围扩大了),这表明由于这种较高的变化,预测变得更加不确定。

However, what if we were to shorten the time series? Let’s rebuild the model using data from January 2017 onwards and see how this affects the forecast.

但是,如果我们要缩短时间序列怎么办? 让我们使用2017年1月以后的数据重建模型,看看这如何影响预测。

Image for post
Source: TensorFlow Probability
资料来源:TensorFlow概率

We can see that the “worst-case scenario” forecast comes in at roughly 70,000 or so. While this is still significantly above the actual drop in passenger numbers — this model is doing a better job at indicating that a sharp drop in passenger numbers potentially lies ahead.

我们可以看到“最坏情况”的预测大约为70,000。 尽管这仍大大高于实际的乘客人数下降,但该模型在表明潜在的乘客人数急剧下降方面做得更好。

Let’s analyse the time series components for this forecast:

让我们分析此预测的时间序列成分:

Image for post
Source: TensorFlow Probability
资料来源:TensorFlow概率

Unlike in the last forecast, we can see that the autoregressive, residual and seasonal components are actually narrowing in this instance — indicating more certainty behind the forecasts. In this regard, incorporating more recent data into this forecast has allowed the model to determine that a significant drop in passenger numbers could lie ahead — which ultimately came to pass.

与上次预测不同,我们可以看到在这种情况下自回归,残差和季节性成分实际上正在缩小,这表明预测的确定性更高。 在这方面,将更多最新数据纳入此预测已使该模型能够确定未来可能会出现旅客数量的大幅下降,而这种下降最终将成为现实。

Note that a main forecast (as indicated by the dashed orange line) is also given. Under normal circumstances, the model indicates that while there would have been a dip in passenger numbers to 200,000 — numbers would have rebounded to 250,000 in June. This is still less than the nearly 300,000 passengers recorded for the month of June — indicating that downward pressure on passenger numbers was an issue before COVID-19 — though nowhere near to that which has actually transpired, of course.

注意,还给出了主要预测(如橙色虚线所示)。 在正常情况下,该模型表明,尽管旅客人数将下降至20万人,但6月份的人数将回升至25万人。 这仍然低于6月份记录的近30万名乘客-这表明在COVID-19之前,乘客人数的下降压力是一个问题-当然,距离实际发生的事情还差得很远。

结论 (Conclusion)

This has been an overview of how TensorFlow Probability can be used to conduct forecasts — in this case using air passenger data.

这是如何使用TensorFlow概率进行预测的概述-在这种情况下,使用航空乘客数据。

Hope you found this article of use, and any feedback or comments are greatly welcomed. The code and datasets for this example can be found at my GitHub repository here.

希望您能找到本文的使用,并欢迎任何反馈或意见。 该示例的代码和数据集可以在我的GitHub存储库中找到 。

Disclaimer: This article is written on an “as is” basis and without warranty. It was written with the intention of providing an overview of data science concepts, and should not be interpreted as professional advice in any way.

免责声明:本文按“原样”撰写,不作任何担保。 它旨在提供数据科学概念的概述,并且不应以任何方式解释为专业建议。

翻译自: https://towardsdatascience.com/forecasting-air-passenger-numbers-with-tensorflow-probability-1b53e5e5fea2

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/388474.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

python画激活函数图像

导入必要的库 import math import matplotlib.pyplot as plt import numpy as np import matplotlib as mpl mpl.rcParams[axes.unicode_minus] False 绘制softmax函数图像 fig plt.figure(figsize(6,4)) ax fig.add_subplot(111) x np.linspace(-10,10) y sigmoid(x)ax.s…

计算机网络管理SIMP,计算机网络管理实验报告.docx

计算机网络管理实验报告计算机网络管理实验报告PAGEPAGE #计算机网络管理实验报告作 者: 孙玉虎 学 号:914106840229学院(系):计算机科学与工程学院专 业:网络工程题 目:SNMR报文禾口 MIB指导教师陆一飞2016年12月目录…

tomcat集群

1】 下载安装 httpd-2.2.15-win32-x86-no_ssl.msi 网页服务器 32-bit Windows zip tomcat mod_jk-1.2.30-httpd-2.2.3.so Apache/IIS 用来连接后台Tomcat的模块,支持集群和负载均衡 JK 分为两个版本 1,x 和 2.x &…

pdf.js插件使用记录,在线打开pdf

pdf.js插件使用记录,在线打开pdf 原文:pdf.js插件使用记录,在线打开pdf天记录一个js库:pdf.js。主要是实现在线打开pdf功能。因为项目需求需要能在线查看pdf文档,所以就研究了一下这个控件。 有些人很好奇,在线打开pdf…

程序员 sql面试_非程序员SQL使用指南

程序员 sql面试Today, the word of the moment is DATA, this little combination of 4 letters is transforming how all companies and their employees work, but most people don’t really know how data behaves or how to access it and they also think that this is j…

Apache+Tomcat集群负载均衡的两种session处理方式

session共享有两种方式: 1、session共享,多个服务器session拷贝保存,一台宕机不会影响用户的登录状态; 2、请求精确集中定位,即当前用户的请求都集中定位到一台服务器中,这样单台服务器保存了用户的sessi…

SmartSVN:File has inconsistent newlines

用SmartSVN提交文件的时候,提示svn: File has inconsistent newlines 这是由于要提交的文件编码时混合了windows和unix符号导致的。 解决方案 SmartSVN设置做如下修改可以解决问题: Project–>Setting选择Working copy下的EOL-style将Default EOL-sty…

我要认真学Git了 - Config

有一天,当我像往常一样打开SourceTree提交代码,然后推送的时候,我突然意识到我只是根据肌肉记忆完成这个过程,我压根不知道这其中到底发生了什么。这是个很严重的问题,作为一个技术人员,居然只满足于使用工…

计算机科学与技术科研论文,计算机科学与技术学院2007年度科研论文一览表

1Qiang Sun,Xianwen Zeng, Raihan Ur Rasool, Zongwu Ke, Niansheng Chen. The Capacity of Wireless Ad Hoc Networks with Power Control. IWCLD 2007. (EI收录: 083511480101)2Hong jia ping. The Application of the AES in the Bootloader of AVR Microcontroller. In: DC…

r a/b 测试_R中的A / B测试

r a/b 测试什么是A / B测试? (What is A/B Testing?) A/B testing is a method used to test whether the response rate is different for two variants of the same feature. For instance, you may want to test whether a specific change to your website lik…

一台机器同时运行两个Tomcat

如果不加任何修改,在一台服务器上同时运行两个Tomcat服务显然会发生端口冲突。假设现在已经按照正常的方式安装配置好了第一个Tomcat,第二个如何设置呢?以下是使用Tomcat5.5解压版本所做的实验。 解决办法: 1.解压Tomcat到一个新的目录&#…

PHP获取IP地址的方法,防止伪造IP地址注入攻击

PHP获取IP地址的方法,防止伪造IP地址注入攻击 原文:PHP获取IP地址的方法,防止伪造IP地址注入攻击PHP获取IP地址的方法 /*** 获取客户端IP地址* <br />来源&#xff1a;ThinkPHP* <br />"X-FORWARDED-FOR" 是代理服务器通过 HTTP Headers 提供的客户端IP。…

工作10年厌倦写代码_厌倦了数据质量讨论?

工作10年厌倦写代码I have been in tons of meetings where data and results of any sort of analysis have been presented. And most meetings have one thing in common, data quality is being challenged and most of the meeting time is used for discussing potential…

Java基础回顾

内容&#xff1a; 1、Java中的数据类型 2、引用类型的使用 3、IO流及读写文件 4、对象的内存图 5、this的作用及本质 6、匿名对象 1、Java中的数据类型 Java中的数据类型有如下两种&#xff1a; 基本数据类型: 4类8种 byte(1) boolean(1) short(2) char(2) int(4) float(4) l…

oracle数据库 日志满了

1、 数据库不能启动SQL> startupORACLE 例程已经启动。Total System Global Area 289406976 bytesFixed Size 1248576 bytesVariable Size 83886784 bytesDatabase Buffers 197132288 bytesRedo Buffers 7139328 byt…

计算机应用基础学生自查报告,计算机应用基础(专科).docx

1.在资源管理器中&#xff0c;如果要选择连续多个文件或文件夹&#xff0c;需要单击第一个文件或文件夹&#xff0c;按下键盘()&#xff0c;再用鼠标单击最后一个文件或文件夹即可。(A)Shift(B)Tab(C)Alt(D)Ctrl分值&#xff1a;2完全正确?得分&#xff1a;2?2.下列数据能被E…

Random随机数

Random 随机数 1 产生随机数 1.1 Random的使用步骤 我们想产生1-100(包含1和100)的随机数该怎么办&#xff1f;我们不需要自己写算法&#xff0c;因为额Java已经为我们提供好了产生随机数的类---Random 作用&#xff1a;用于产生一个随机数 使用步骤(和Scanner类似)&#xff1a…

模拟一个简单计算器_阅读模拟器的简单介绍

模拟一个简单计算器Read simulators are widely being used within the research community to create synthetic and mock datasets for analysis. In this article, I will introduce some recently proposed, commonly used read simulators.阅读模拟器在研究社区中被广泛使…

计算机部分应用显示模糊,win10系统打开部分软件字体总显示模糊的解决方法-电脑自学网...

win10系统打开部分软件字体总显示模糊的解决方法。方法一&#xff1a;win10软件字体模糊1、首先&#xff0c;在Win10的桌面点击鼠标右键&#xff0c;选择“显示设置”。2、在“显示设置”的界面下方&#xff0c;点击“高级显示设置”。3、在“高级显示设置”的界面中&#xff0…

Tomcat调节

Tomcat默认可以使用的内存为128MB&#xff0c;在较大型的应用项目中&#xff0c;这点内存是不够的&#xff0c;需要调大,并且Tomcat本身不能直接在计算机上运行&#xff0c;需要依赖于硬件基础之上的操作系统和一个java虚拟机。 AD&#xff1a; 这里向大家描述一下如何使用Tom…