pymc3使用_使用PyMC3了解飞机事故趋势

pymc3使用

Visually exploring historic airline accidents, applying frequentist interpretations and validating changing trends with PyMC3.

使用PyMC3直观地浏览历史性航空事故,应用常识性解释并验证变化趋势。

前言 (Preface)

On the 7th of August this year, an Air India Express flight on a repatriation mission from Dubai (United Arab Emirates) to Kozhikode (Kerala, India) skidded off the runway under heavy rainfall and fell into a valley [1].

今年8月7日,印度航空快运航班执行遣返任务,从迪拜(阿拉伯联合酋长国)飞往科泽科德(印度喀拉拉邦),在大雨中滑出跑道并坠入山谷[1]。

The ensuing 35 foot drop broke the aircraft into two. The flight was ferrying a total of 180 souls and 18 of them lost their lives as an immediate consequence of the accident. The remaining 172 were injured to varying degrees and underwent treatment [2].

随后的35英尺高将飞机分成两部分。 这次飞行总共运送了180个人,其中18个人是事故的直接后果。 其余172人不同程度地受伤并接受了治疗[2]。

The official probe into this horrifying accident will naturally be a fact finding mission and would try to make sense of what went wrong and who’s to blame.

官方对此可怕事件的调查自然是一项事实调查任务,并将试图弄清出了什么问题以及应归咎于谁。

动机 (Motivation)

Following this story, I started Googling about recent aircraft accidents, to understand the context and to look at these events from a global perspective.

跟随这个故事,我开始谷歌搜索有关最近发生的飞机事故,以了解背景并从全球角度审视这些事件。

This search led me to numerous webpages that had photos and videos of plane crashes, tables of crash statistics, accident investigation reports and sound bites from different aviation industry experts following such catastrophic accidents.

通过搜索,我找到了许多网页,其中包含飞机失事的照片和视频 , 失事统计表 ,事故调查报告以及发生此类灾难性事故后来自不同航空业专家的声音。

The bottom line of this search was that we are in the midst of an increasingly safe flying environment. Regulatory, Design, Mechanical and Electronic safety measures are more stringent than ever before, thus making flying a relatively safer means of transport.

搜索的底线是我们处于一个越来越安全的飞行环境中。 法规,设计,机械和电子安全措施比以往任何时候都更加严格,因此使飞行成为一种相对安全的运输方式。

But I wanted to play with these numbers myself to validate this conclusion.

但是我想自己使用这些数字来验证这个结论。

The motivating question for this exercise was —

这项练习的动机问题是-

Has flying become relatively safer in recent times than in the past?

最近的飞行是否比过去变得相对安全?

数据源 (Data Source)

I looked at publicly available air crash data on Wikipedia and the National Transportation Safety Board (NTSB) and created a dataset that suited the needs of this exercise.

我查看了Wikipedia和国家运输安全委员会 (NTSB)上公开可用的空难数据,并创建了适合此练习需求的数据集。

The entire exercise and dataset can be found on my GitHub repository.

整个练习和数据集可以在我的GitHub存储库中找到。

Switching over to the first person plural….Now.

切换到第一人称复数…。现在。

工作工具 (Tools for the Job)

To answer the motivating question, we divide the task into two parts —

为了回答激励性问题,我们将任务分为两个部分:

  1. Exploratory Data Analysis (EDA) in Python.

    Python中的探索性数据分析(EDA)。
  2. Probabilistic programming (PyMC3) in Python.

    Python中的概率编程(PyMC3)。

探索性数据分析 (Exploratory Data Analysis)

In this part, we look at the aircraft crashes in the past, which forms our time series for analysis. A few things to remember -

在这一部分中,我们将查看过去的飞机失事,这构成了我们进行分析的时间序列。 需要记住的几件事-

  1. The Convention on International Civil Aviation differentiates an aircraft accident from an aircraft incident. The difference is essentially whether fatalities occurred or not.

    在国际民用航空公约区分从飞机事故的飞机事故 。 本质上的区别在于是否发生了死亡。

  2. Our focus in this exercise is restricted to the occurrence of the accident, rather than its cause.

    我们在此练习中的重点仅限于事故的发生 ,而不是事故的原因。

  3. We look at commercial aircraft accidents from 1975 till 2019.

    我们考察了1975年至2019年的商用飞机事故。

事故和死亡人数趋势 (Trend of accidents and fatalities)

Image for post
Fig. 1— Number of accidents and fatalities per year from 1975 till 2019.
图1-从1975年到2019年每年的事故和死亡人数。

Looking at the historic time series, we visually sense a decline in number of accidents per year from 1978 onwards. There appears to be a minor rise in number of accidents between 1987 and 1989, after which the numbers steadily decrease. The lowest number of accidents was observed in 2017, which is considered the safest year in aviation history. After 2017, the numbers seem to increase marginally.

纵观历史时间序列,我们从视觉上感觉到自1978年以来每年事故数量的下降。 在1987年至1989年之间,事故数量似乎有小幅上升,此后,这一数字稳步下降。 2017年的事故发生率最低,这被认为是航空历史上最安全的一年。 2017年之后,这个数字似乎略有增加。

Another clear trend observable is the drop in the number of fatalities over time. The 1970s and 80s were dangerous times to fly, with aircraft accidents, on an average, causing nearly 2200 fatalities a year. But over time we see that this number has dramatically reduced.

可观察到的另一个明显趋势是死亡人数随着时间的流逝而下降。 1970年代和80年代是飞行的危险时期,平均每年有飞机事故,每年造成近2200人死亡。 但是随着时间的流逝,我们看到这个数字已经大大减少了。

When this declining trend is looked at in the context of rising number of air travellers (green shaded region in Fig. 1), we get a better picture of airline safety.

在航空旅客人数增加的背景下观察这种下降趋势(图1中的绿色阴影区域),我们可以更好地了解航空公司的安全状况。

每百万乘客死亡人数 (Fatalities per million passengers)

Image for post
Fig. 2 — Fatalities in the context of million passengers travelling every year
图2 –每年有百万旅客死亡的情况

When the declining number of fatalities are looked at from the perspective of rising number of air travellers, we get a clearly declining trend. The number of fatalities per million passengers travelling by air every year, has dropped drastically from 5 in a million to less than 1 in a million.

从航空旅客人数上升的角度看待死亡人数的下降,我们得到了明显下降的趋势。 每年每百万乘飞机旅行的乘客中的死亡人数已从百万分之五减少到不到百万分之一。

(Disclaimer: Bayesians, keep that pinch of salt ready)

(免责声明:贝叶斯主义者,请准备一点盐)

每次事故死亡人数 (Fatalities per accident)

Image for post
Fig. 3 — Variation in the number of fatalities per aircraft accident
图3 —每架飞机事故中死亡人数的变化

Another measure of aircraft safety is the number of fatalities per accident. Although there may be a number of exogenous factors (external factors) that influence the number of fatalities in a given accident — weather, nature of crash, time of day etc. — we still look at this measure as a rough estimate of aircraft safety.

飞机安全的另一项衡量标准是每次事故的死亡人数。 尽管可能有许多外来因素(外部因素)会影响给定事故中的死亡人数(天气,坠机性质,一天中的时间等),但我们仍然将此措施视为飞机安全的粗略估计。

There seems to be a slight decrease in trend beyond 1995 but it is not immediately observable from the graph. We also see that 1985, 1996, 2014 and 2018 were fatal years involving major crashes, because the average number of fatalities per crash is large.

1995年以后趋势似乎略有下降,但不能立即从图中观察到。 我们还看到1985、1996、2014和2018年是涉及重大坠机事故的致命年份,因为每起事故的平均死亡人数很高。

变化率 (Rate of change)

Image for post
Fig. 4 — Yearly percentage change in number of accidents
图4 —事故数量的年度百分比变化

A final piece of evidence, before we begin the probabilistic testing of the motivating question, is the yearly rate of change of accidents.

在我们开始对动机问题进行概率测试之前,最后的证据是事故的年变化率。

If we are truly living in safe times, then we expect the graph to show a series of successively increasing green bars. Such a window was observed only in 1979–80, 1980–84, 1999–00, 2006–07 and 2013–14. Extended periods of relatively safe travel can be seen from 1980–84 and 1996–2000.

如果我们确实生活在安全时期,那么我们希望图表显示一系列连续增加的绿色柱。 仅在1979–80、1980–84、1999–00、2006–07和2013–14中观察到这种窗口。 从1980-84年和1996-2000年可以看到相对安全的旅行延长了。

If we look at the rate of change beyond 1995, we see that there has largely been a decline in year-on-year accidents (very few red bars and more of green bars).

如果我们看一下1995年以后的变化率,我们发现事故率在逐年下降(红色条很少,绿色条更多)。

It appears that some external factor (like change in aircraft design, civil aviation regulations, better ATC technology etc.) may have caused this decline beyond 1995.

看来,某些外部因素(例如飞机设计变更,民航法规,更好的空中交通管制技术等)可能导致了1995年以后的下降。

概率编程 (Probabilistic Programming)

From our data exploration we saw that there is a continued decline in number of aircraft accidents every decade and we validated this trend with a couple of statistical measures.

从我们的数据探索中,我们可以看到每十年飞机事故的数量持续下降,并且我们通过一些统计方法验证了这一趋势。

We also saw that 1995 was, presumably, a turning point for the aviation industry. How can we validate this assumption?

我们还看到,1995年大概是航空业的转折点。 我们如何验证这个假设?

One interesting technique to do so, with the limited data and non-repeatability of events (Let us assume that we can’t simulate these accidents a million times) is the use of probabilistic techniques like Markov Chain Monte Carlo (MCMC).

在数据有限且事件不可重复的情况下(例如, 假设我们无法百万次模拟这些事故 ),一种有趣的技术是使用概率技术,例如Markov Chain Monte Carlo(MCMC)。

And one of the ways of implementing these techniques is by means of the PyMC3 library in Python.

实现这些技术的方法之一是借助Python中的PyMC3库。

快速入门 (A quick primer)

PyMC3 is a library in Python that helps us carry out probabilistic programming. This does not mean that the programming is probabilistic (it is still a very much deterministic process!), but instead, we employ probability distributions and Bayesian methods.

PyMC3是Python中的一个库,可帮助我们进行概率编程。 这并不意味着编程是概率性的( 它仍然是一个非常确定性的过程! ),而是使用概率分布和贝叶斯方法。

This technique is built on top of a Bayesian outlook of the world. We start with a belief (called prior probability)about a certain process or a parameter and we update this belief (called posterior probability) after several thousand runs (a.k.a random sampling). This method is opposite to that of the frequentist way of looking at things (like we did in the EDA).

该技术建立在贝叶斯基础之上 世界观。 我们从某个过程或参数的置信度( 称为先验概率 )开始,经过数千次运行( 又称为随机抽样 ) ,我们更新此置信度( 称为后验概率 )。 这种方法与常看事物的方法相反( 就像我们在EDA中所做的那样 )。

The second foundation for this process is the random sampling methods of Markov Chain Monte Carlo (MCMC). This is a set of algorithms that allows us to sample from the prior probability distributions and generate data to test our prior beliefs and update them.

此过程的第二个基础是马尔可夫链蒙特卡洛 (MCMC)的随机抽样方法。 这是一组算法,使我们可以从先前的概率分布中采样并生成数据以测试我们的先前的信念并对其进行更新。

The documentation provided on the PyMC3 webpage and this hands on approach by Susan Li are excellent for a high-level understanding of the library and the techniques. The book Bayesian Methods for Hackers, by Cam Davidson-Pilon is really helpful if you are thinking of getting your hands dirty.

PyMC3 网页上提供的文档以及Susan Li的 这种动手操作方法对于深入了解库和技术非常有用。 如果您打算弄脏手, Cam Davidson-Pilon的《 贝叶斯黑客方法》确实很有帮助。

好吧,让我们测试 (Alright so let’s test)

We begin by establishing our prior beliefs about the accidents —

我们首先建立对事故的先前信念,

What kind of distribution do aircraft accidents follow?

飞机事故的后果如何?

Here we assume that the accidents follow a Poisson Distribution.

在这里,我们假设事故遵循泊松分布。

P(x|lambda) = (lambda^x)*(exp^-lambda)/(lambda!)x: number of accidents
lambda: rate of occurrence of the accident

What would be the rate of occurrence?

发生率是多少?

Given our initial assumption, we further presume that this rate of occurrence can be roughly the reciprocal of the average occurrences for the whole dataset.

给定我们最初的假设,我们进一步假设该发生率可以大致等于整个数据集平均发生率的倒数。

In other words,

换一种说法,

lambda = 1/(mean of number of accidents from 1975 to 2019)

What would be the initial turning point?

最初的转折点是什么?

The turning point is that year before which the rate of occurrence was high and after which, it became low. We initially assume that every year from 1975 to 2019 has an equal probability (drawn from a discrete uniform distribution) of being considered as a turning point.

转折点是发生率高的那一年,之后发生率低的那一年。 我们最初假设,从1975年到2019年,每年都有相等的可能性( 从离散的均匀分布中得出 )被视为转折点。

With these set of prior beliefs, we instantiate the model —

基于这些先验信念,我们实例化了模型-

import pymc3 as pm
import arviz as azyears = np.arange(1975, 2020)
with pm.Model() as accident_model:

alpha = 1/df.loc[df.year>=1975].num_crashes.mean()
# Setting the prior belief for the inflection point
change_point = pm.DiscreteUniform(
'change_point', lower=1975, upper=2020)

# Setting prior belief for the rate of occurrence, ie, lambda, before and after inflection
rate_before = pm.Exponential('before_change', alpha)
rate_after = pm.Exponential('after_change', alpha)

# Allocate appropriate Poisson rates to years before and after current
rate = pm.math.switch(change_point >= years, rate_before, rate_after)accidents = pm.Poisson("accidents", rate, observed=df.loc[df.year>=1975].num_crashes)

And we sample these distributions using the No-U-Turn Sampler (NUTS) at least 10,000 times —

而且,我们使用“不掉头”采样器(NUTS)对这些分布进行了至少10,000次采样-

with accident_model:
trace = pm.sample(10000, return_inferencedata=False)

测试结果 (Results of the test)

Image for post
Fig. 5 — Results of updating our prior beliefs on the possible change point and rates of occurrences.
图5 –更新我们对可能的变化点和发生率的先前信念的结果。

We see that after sampling 10,000 times our initial belief that all years have an equal chance of being considered a turning point was updated. The result indicates that 1997 (and not 1995) is the most likely candidate for being considered a turning point in aviation accident history.

我们看到,在对我们最初认为所有年份都具有被视为转折点的机会均等的10,000次抽样之后,这一事实得到了更新。 结果表明, 1997年(而不是1995年)最有可能被认为是航空事故历史的转折点。

And the initial assumption that the rate of occurrences would be the reciprocal of the 45 year average has been updated. 1997 was considered the turning point because the rate of occurrences of accident changed from about 300 per year to 165 per year!

最初的假设是发生率将是45年平均值的倒数。 1997年被认为是转折点,因为事故发生率从每年300次更改为每年165次!

So how sure are these predictions?

那么这些预测有多可靠?

Image for post
Fig. 6 — Uncertainties in the predicted values
图6-预测值的不确定性

The USP of probabilistic programming is that predictions are made with a pinch of salt! Unlike a frequentist prediction, the predictions from the Bayesian methods come with uncertainties attached (which is more realistic).

USP概率性编程是通过少量的盐来进行预测! 与惯常论者的预测不同,贝叶斯方法的预测带有不确定性( 更为现实 )。

Our model shows that the 94% High Density Interval (HDI) is between 1996 and 1999, with 1997 being the mean. In other words, 1997 has a higher probability of being the turning point.

我们的模型显示94%的高密度间隔(HDI)在1996年至1999年之间,以1997年为平均值。 换句话说,1997年成为转折点的可能性更高。

Similar 94% HDI for the rate of occurrences before this turning point is between 295 to 312 accidents per year; and for accidents after 1997 it is between 158 and 172 accidents per year.

在此转折点之前,每年发生295到312起事故,发生率相似,为94%。 对于1997年以后的事故,每年发生158至172起事故。

最近的过去 (Recent Past)

Since our motivating question was restricted to ‘recent times’, we apply this model to data from 2000 to 2019 (assuming the last 20 years to be recent enough).

由于我们的激励性问题仅限于“最近的时间”,因此我们将此模型应用于2000年至2019年的数据( 假设最近20年足够近 )。

Image for post
Fig. 7 — Model results for accidents between 2000 and 2019
图7 — 2000年至2019年之间事故的模型结果

We observe that 2012 is a strong candidate for a turning point (with the 94% HDI being 2010 to 2013), with the rate of accidents being nearly 180 per year before 2012 and about 120 per year after 2012.

我们观察到,2012年是转折点的强劲候选者(94%的HDI为2010年至2013年),2012年之前的事故发生率每年接近180次,2012年之后的事故发生率大约每年120次。

Image for post
Fig. 8 — Uncertainties of model results for accidents between 2000 and 2019
图8 — 2000年至2019年之间事故模型结果的不确定性

判决 (Verdict)

So by carrying out this small exercise, I was able to satisfy my curiosity and answer the motivating question —

因此,通过进行这一小练习,我就能满足我的好奇心并回答了这个激励性的问题-

If low rates of aviation accidents per year are the sole indicators of aviation safety, then after 1997, the rates have significantly reduced and in the last 20 years the numbers have dropped further beyond 2012.

如果说每年的航空事故低发生率是航空安全的唯一指标,那么在1997年之后,发生率大大降低了,在过去的20年中,这一数字进一步下降到2012年以后。

Despite the low number of accidents every year, it is relatively safer to fly right now than it was 20 years ago.

尽管每年发生的事故较少,但现在飞行比20年前相对安全。

翻译自: https://towardsdatascience.com/understanding-aircraft-accidents-trends-with-pymc3-b1ca0e4c5d33

pymc3使用

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391567.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

爬虫结果数据完整性校验

数据完整性分为三个方面: 1、域完整性(列) 限制输入数据的类型,及范围,或者格式,如性别字段必须是“男”或者“女”,不允许其他数据插入,成绩字段只能是0-100的整型数据,…

go map数据结构

map数据结构 key-value的数据结构,又叫字典或关联数组 声明:var map1 map[keytype]valuetype var a map[string]string var a map[string]int var a map[int]string var a map[string]map[string]string备注:声明是不会分配内存的&#xff0c…

吴恩达神经网络1-2-2_图神经网络进行药物发现-第2部分

吴恩达神经网络1-2-2预测毒性 (Predicting Toxicity) 相关资料 (Related Material) Jupyter Notebook for the article Jupyter Notebook的文章 Drug Discovery with Graph Neural Networks — part 1 图神经网络进行药物发现-第1部分 Introduction to Cheminformatics 化学信息…

Android热修复之 - 阿里开源的热补丁

1.1 基本介绍     我们先去github上面了解它https://github.com/alibaba/AndFix 这里就有一个概念那就AndFix.apatch补丁用来修复方法,接下来我们看看到底是怎么实现的。1.2 生成apatch包      假如我们收到了用户上传的崩溃信息,我们改完需要修复…

seaborn分类数据可视:散点图|箱型图|小提琴图|lv图|柱状图|折线图

一、散点图stripplot( ) 与swarmplot() 1.分类散点图stripplot( ) 用法stripplot(xNone, yNone, hueNone, dataNone, orderNone, hue_orderNone,jitterTrue, dodgeFalse, orientNone, colorNone, paletteNone,size5, edgecolor"gray", linewi…

数据图表可视化_数据可视化十大最有用的图表

数据图表可视化分析师每天使用的最佳数据可视化图表列表。 (List of best data visualization charts that Analysts use on a daily basis.) Presenting information or data in a visual format is one of the most effective ways. Researchers have proved that the human …

javascript实现自动添加文本框功能

转自:http://www.cnblogs.com/damonlan/archive/2011/08/03/2126046.html 昨天,我们公司的网络小组决定为公司做一个内部的网站,主要是为员工比如发布公告啊、填写相应信息、投诉、问题等等需求。我那同事给了我以下需求: 1.点击一…

从Mysql slave system lock延迟说开去

本文主要分析 sql thread中system lock出现的原因,但是笔者并明没有系统的学习过master-slave的代码,这也是2018年的一个目标,2018年我都排满了,悲剧。所以如果有错误请指出,也作为一个笔记用于后期学习。同时也给出笔…

接facebook广告_Facebook广告分析

接facebook广告Is our company’s Facebook advertising even worth the effort?我们公司的Facebook广告是否值得努力? 题: (QUESTION:) A company would like to know if their advertising is effective. Before you start, yes…. Facebook does ha…

seaborn线性关系数据可视化:时间线图|热图|结构化图表可视化

一、线性关系数据可视化lmplot( ) 表示对所统计的数据做散点图,并拟合一个一元线性回归关系。 lmplot(x, y, data, hueNone, colNone, rowNone, paletteNone,col_wrapNone, height5, aspect1,markers"o", sharexTrue,shareyTrue, hue_orderNone, col_orde…

eda可视化_5用于探索性数据分析(EDA)的高级可视化

eda可视化Early morning, a lady comes to meet Sherlock Holmes and Watson. Even before the lady opens her mouth and starts telling the reason for her visit, Sherlock can tell a lot about a person by his sheer power of observation and deduction. Similarly, we…

Hyperledger Fabric 1.0 从零开始(十二)——fabric-sdk-java应用

Hyperledger Fabric 1.0 从零开始(十)——智能合约(参阅:Hyperledger Fabric Chaincode for Operators——实操智能合约) Hyperledger Fabric 1.0 从零开始(十一)——CouchDB(参阅&a…

css跑道_如何不超出跑道:计划种子的简单方法

css跑道There’s lots of startup advice floating around. I’m going to give you a very practical one that’s often missed — how to plan your early growth. The seed round is usually devoted to finding your product-market fit, meaning you start with no or li…

熊猫数据集_为数据科学拆箱熊猫

熊猫数据集If you are already familiar with NumPy, Pandas is just a package build on top of it. Pandas provide more flexibility than NumPy to work with data. While in NumPy we can only store values of single data type(dtype) Pandas has the flexibility to st…

JAVA基础——时间Date类型转换

在java中有六大时间类,分别是: 1、java.util包下的Date类, 2、java.sql包下的Date类, 3、java.text包下的DateFormat类,(抽象类) 4、java.text包下的SimpleDateFormat类, 5、java.ut…

LeetCode第五天

leetcode 第五天 2018年1月6日 22.(566) Reshape the Matrix JAVA class Solution {public int[][] matrixReshape(int[][] nums, int r, int c) {int[][] newNums new int[r][c];int size nums.length*nums[0].length;if(r*c ! size)return nums;for(int i0;i<size;i){ne…

matplotlib可视化_使用Matplotlib改善可视化设计的5个魔术技巧

matplotlib可视化It is impossible to know everything, no matter how much our experience has increased over the years, there are many things that remain hidden from us. This is normal, and maybe an exciting motivation to search and learn more. And I am sure …

robot:循环遍历数据库查询结果是否满足要求

使用list类型变量{}接收查询结果&#xff0c;再for循环遍历每行数据&#xff0c;取出需要比较的数值 转载于:https://www.cnblogs.com/gcgc/p/11424114.html

rm命令

命令 ‘rm’ &#xff08;remove&#xff09;&#xff1a;删除一个目录中的一个或多个文件或目录&#xff0c;也可以将某个目录及其下属的所有文件及其子目录均删除掉 语法&#xff1a;rm&#xff08;选项&#xff09;&#xff08;参数&#xff09; 默认会提示‘是否’删除&am…

感知器 机器学习_机器学习感知器实现

感知器 机器学习In this post, we are going to have a look at a program written in Python3 using numpy. We will discuss the basics of what a perceptron is, what is the delta rule and how to use it to converge the learning of the perceptron.在本文中&#xff0…