python生日悖论分析_生日悖论

python生日悖论分析

If you have a group of people in a room, how many do you need to for it to be more likely than not, that two or more will have the same birthday?

如果您在一个房间里有一群人,那么您需要多少个才能使两个或两个以上的人有相同的生日?

Theoretically, the chances of two people having the same birthday are 1 in 365 (not accounting for leap years and the uneven distribution of birthdays across the year), and so odds are you’ll only meet a handful of people in your life who enjoy the same birthday as you. This leads many people to intuitively guess around 180.

从理论上讲,两个人拥有相同生日的机会是365分之一(不考虑leap年和全年中生日分布不均),因此,您人生中只会遇到少数几个喜欢和你一样的生日 这导致许多人凭直觉猜测大约180。

The correct answer is just 23.

正确的答案只有23。

That means in each of your classes at school, amongst the fellow commuters on the bus to work and amongst the players on a soccer field, there are more than likely at least two people with the same birthday.

这意味着在您学校的每个班级中,上班的通勤同胞和足球场上的球员中,至少有两个人的生日相同。

Humans have a notoriously poor intuition when it comes to probability. The multi-billion dollar gambling industry is proof of this.

当涉及到概率时,人类的直觉非常差。 数十亿美元的赌博业就是证明。

The source of confusion within the Birthday Paradox is that the probability grows relative to the number of possible pairings of people, not just the group’s size. The number of pairings grows with respect to the square of the number of participants, such that a group of 23 people contains 253 (23 x 22 / 2) unique pairs of people.

生日悖论之内的困惑根源在于,这种可能性相对于可能的配对人数而增加,而不仅仅是小组的人数。 配对的数量相对于参与者数量的平方而增加,因此,一个23人的组包含253(23 x 22/2)个独特的人对。

In each of these pairings, there is a 364/365 chance of having different birthdays, but this needs to happen for every pair for there to be no matching birthdays across the entire group. Therefore the probability of two people having the same birthday in a group of 23 is:

在每个配对中,都有364/365个不同生日的机会,但是对配对需要这样做,因为整个组中没有匹配的生日。 因此,在23人一组中,两个人有相同生日的概率为:

1 — (364/365)^253 = 50.05%

If we plot the probability vs different group sizes, we see how the probability grows as the group size increases.

如果我们绘制概率与不同组大小的关系图,我们将看到概率随着组大小的增加而增加。

Image for post
Probability of at least one matching birthday vs size of group
至少一个匹配生日的概率与组的大小

The line crosses 50% just before a group size of 23. Our previous guess of 180 has a probability so close to 100%, it’s not worth showing. In fact, the chance of choosing a group of 180 people at random, and having none of them share the same birthday, is roughly 6x10^-20 — 100 times less likely than two people picking the same grain of sand out of all the sand on Earth!

这条线在小组人数23之前越过了50%。我们先前的180猜测很可能接近100%,因此不值得显示。 实际上,随机选择一组180个人并且没有一个人共享同一生日的机会大约是6x10 ^ -20-比两个人从所有沙子中挑选相同颗粒的可能性低100倍在地球上!

不太可能的巧合 (Less likely coincidences)

We can generalise the Birthday Paradox to look at other phenomena with a similar structure.

我们可以概括生日悖论,以研究具有相似结构的其他现象。

The probability of two people having the same PIN on their bank card is 1 in 10,000, or 0.01%. It would only take a group of 119 people however, to have odds in favour of two people having the same PIN.

两个人的银行卡上具有相同PIN的概率为10,000分之一,即0.01%。 但是,只需要一组119人,就能使两个人拥有相同的PIN。

Of course, these numbers assume a randomly sampled, uniform distribution of birthdays and PINs. In reality, birthdays peak at certain times of year and people are more likely to pick certain numbers than others for their PIN. But the lack of a uniform distribution in fact reduces the size of group that you need.

当然,这些数字假设生日和PIN是随机抽样的均匀分布。 实际上, 生日会在一年中的某些时候达到顶峰 ,因此人们选择PIN的可能性比其他人高。 但是实际上缺乏统一的分布会减小所需组的大小。

If we decrease the probability of a coincidence occurring, the size of group required to get an even chance of a collision obviously increases. However, it increases much more slowly than inverse of the probability.

如果我们降低发生重合的可能性,则获得均匀碰撞机会所需的组的大小会明显增加。 但是,它的增长比概率倒数慢得多。

For example, with a probability of 1 in 10,000, the minimum group size is 119. For a coincidence 10x less likely, the minimum group is 373, or only 3.15 times bigger. Therefore, even for incredibly tiny probabilities, the group size doesn’t grow particularly large. For odds of one in a million, the group required is only 1178.

例如,概率为10,000分之一,最小组大小为119。如果巧合的可能性小10倍,则最小组为373,或仅大3.15倍。 因此,即使对于极小的概率,组的大小也不会特别大。 对于百万分之一的赔率,所需的小组仅为1178。

宇宙垃圾 (Space junk)

Image for post
Photo by SpaceX on Unsplash
由SpaceX在Unsplash上拍摄

This has implications in the area of satellite collisions and space junk. The odds of two particular orbiting objects colliding with each other over the course of a year are almost infinitesimally small. However, given that there are around 5,500 satellites and approximately 900,000 objects of greater than 1 cm in size whizzing above our heads, collisions occur more regularly than you might expect.

这在卫星碰撞和太空垃圾领域具有影响。 在一年的过程中,两个特定的轨道物体相互碰撞的几率几乎是无限小。 但是,考虑到大约有5500颗卫星和大约900,000个大小超过1厘米的物体在我们头顶上方呼啸而过,因此发生碰撞的次数比您预期的要多。

Various governments are able to track the larger pieces of space junk. This allows avoidance manoeuvres to take place to shift active satellites and the space station out of harm’s way. But with around 20,000 close approaches per week and growing, this could become an increasingly difficult and costly procedure.

各国政府能够追踪更大的太空垃圾。 这样可以进行回避演习,以使活动中的卫星和空间站摆脱伤害。 但是,随着每周大约20,000种接近方法不断发展,这可能会变得越来越困难且成本更高。

In 2009, two satellites — an 16 year old defunct Russian military satellite and a still active Iridium communications satellite — collided, at a relative velocity of almost 12 km /s. Both satellites shattered into clouds of debris fragments, with over 1,000 pieces larger than a grapefruit in size.

2009年,两颗卫星以近12 km / s的相对速度相撞,这是一颗16岁的已经失效的俄罗斯军事卫星和一颗仍在活动的铱通信卫星。 两颗卫星都破碎成碎片碎片云,其大小比葡萄柚大1,000颗。

More space junk means a higher chance of collisions occurring. And each collision increases the number of pieces of space junk. This positive feedback loop, if it exceeds the rate at which objects fall into the atmosphere and burn up, could lead to something called the Kessler Syndrome. This is a chain reaction in which collisions become increasingly common, spraying out more and more debris, until placing a satellite in low earth orbit becomes too dangerous to be feasible.

更多的太空垃圾意味着发生碰撞的机会更高。 每次碰撞都会增加太空垃圾的数量。 这种正反馈回路如果超过物体掉入大气并燃烧的速率,则可能导致凯斯勒综合症。 这是一个连锁React,其中碰撞变得越来越普遍,喷出越来越多的碎片,直到将卫星置于低地球轨道变得太危险以致于无法实现。

DNA证据 (DNA evidence)

Over the past forty years, DNA evidence has revolutionised the field of forensic investigation. As we go about our daily business, we leave behind us a trail of genetic material, mostly via skin cells and hair. Governments compile huge databases of DNA “profiles”, recording a series of uncorrelated genetic markers.

在过去的四十年中,DNA证据彻底革新了法医调查领域。 在进行日常业务时,我们会留下大量遗传物质,主要是通过皮肤细胞和头发。 各国政府汇编了庞大的DNA“特征”数据库,记录了一系列不相关的遗传标记。

For some systems, the probability of two people matching on all recorded genetic markers is estimated at one in one trillion (excluding identical twins). Given this number is over 100x the number of people on the planet, if a person’s DNA is found at the scene, you can be pretty sure they were there, right?

对于某些系统,两个人在所有记录的遗传标记上匹配的概率估计为万亿分之一(不包括同卵双胞胎)。 鉴于这个数字是地球上人数的100倍以上,如果在现场发现一个人的DNA,您就可以确定他们在那里。

Well, not necessarily. Following on from the previous examples, a tiny probability can inflate into something tangible when you have a large enough group of people.

好吧,不一定。 在前面的示例之后,当您有足够多的人时,很小的概率就会膨胀为有形的东西。

In a country the size of the US (328 million people), a match rate of one in a trillion converts to a 1 in 3,000 chance of you having a genetic profile ‘twin’, somewhere out there. In 2019, there were 16k murders in the US. This means there are likely around 5 murders per year, for which the perpetrator’s DNA matches perfectly with that of another American (again, excluding identical twins). Even with the incredibly low probabilities involved, the power of the Birthday Paradox means that you shouldn’t convict based on DNA evidence alone, and other circumstantial evidence needs to be taken into consideration as well.

在美国这个庞大的国家(3.28亿人口)中,万亿分之一的匹配率可以使您在某处具有“双胞胎”遗传特征的概率为3,000的三分之一。 2019年,美国发生了1.6万起谋杀案。 这意味着每年可能有大约5起谋杀案,凶手的DNA与另一名美国人的DNA完全匹配(同样,不包括同卵双胞胎)。 即使涉及到的概率极低,“生日悖论”的力量也意味着您不应该仅凭DNA证据就定罪,还需要考虑其他间接证据。

It’s worth considering also, that DNA profiling systems have improved greatly in the last thirty years. Earlier in the application of the technology, probabilities of 1 in a billion were often quoted. This would have given around 5,000 murders with a DNA ambiguity.

同样值得考虑的是,在过去的30年中,DNA分析系统已经有了很大的进步。 在该技术的早期应用中,经常引用十亿分之一的概率。 这样一来,大约有5,000起谋杀案带有DNA歧义。

生日袭击 (Birthday Attack)

Image for post
Photo by Mauro Sbicego on Unsplash
Mauro Sbicego在Unsplash上的照片

The Birthday Paradox can be leveraged in a cryptographic attack on digital signatures. Digital signatures rely on something called a hash function f(x), which transforms a message or document into a very large number (hash value). This number is then combined with the signer’s secret key to create a signature. Someone reading the document could then “de-crypt” the signature using the signer’s public key, and this would prove that the signer had digitally signed the document.

可以将生日悖论用于对数字签名的加密攻击。 数字签名依赖某种称为哈希函数 f(x)的函数 ,该函数将消息或文档转换为非常大的数字(哈希值) 。 然后将此数字与签名者的秘密密钥结合在一起以创建签名。 然后,阅读文档的人可以使用签名者的公钥“解密”签名,这将证明签名者已经对文​​档进行了数字签名。

These signatures can be used to verify the authenticity of a document. By reading this article on Medium.com, you’re using a digital signature right now, via the HTTPS protocol. The security relies on the difficulty of finding another document with the same hash value as the signed original.

这些签名可用于验证文档的真实性。 通过在Medium.com上阅读本文,您现在正在通过HTTPS协议使用数字签名。 安全性依赖于查找具有与签名原始文档相同的哈希值的另一个文档的难度。

However, the Birthday Paradox lets us potentially abuse this system by attacking this hash function.

但是,生日悖论使我们有可能通过攻击此哈希函数来滥用此系统。

Let’s say Bob is an authority that digitally signs contracts. We want to trick Bob into signing a fraudulent contract, without knowing, so that we can later suggest that he approved it. What we need to find are two contracts, one legitimate and one fraudulent, which produce the same hash value when passed through f(x).

假设鲍勃是通过数字方式签署合同的机构。 我们想欺骗鲍勃在不知情的情况下签署欺诈性合同,以便我们以后可以建议他批准该合同。 我们需要找到两个合同,一个合法合同,一个欺诈合同,当通过f(x)传递时会产生相同的哈希值。

For each contract, we can identify many ways of subtly changing it, without altering its meaning. For example, you could add differing amounts of white-space at the end of each line, slightly alter the pixels in a logo, or make small changes to the formatting. In combination this gives us millions of technically different but semantically identical documents, which in Bob’s eyes would all get the stamp of approval. It also gives us millions of variations on the fraudulent document. If we find a pair of documents, one legitimate, one fraudulent, that produce the same hash, then we can pass the legitimate one to Bob for signing, and then use that signature to “prove” the authenticity of the fraudulent contract.

对于每个合同,我们可以找到许多在不改变其含义的情况下对其进行细微更改的方法。 例如,您可以在每行的末尾添加不同数量的空格,略微更改徽标中的像素,或对格式进行小的更改。 结合起来,我们得到了数以百万计的技术上不同但语义相同的文档,在Bob看来,这些文档都将获得认可。 它还为我们提供了数以百万计的欺诈性文件变体。 如果我们找到一对产生相同散列的合法的,一个欺诈的文件,那么我们可以将合法的文件传递给Bob进行签名,然后使用该签名来“证明”欺诈性合同的真实性。

Thanks to the Birthday Paradox, the likelihood of at least one hash value collision between one of the legitimate and one of the fraudulent documents is much higher than might be expected, given the huge range of the hash function. In fact, the number of documents you need to produce is around the square root of the number of possible outputs of the hash function. This is improved by the fact that no hash function is perfectly uniformly distributed, which has led to many popular hashing algorithms becoming insecure.

多亏了生日悖论,鉴于散列函数的范围很广,合法文档之一与欺诈文档之一之间至少发生一次哈希值冲突的可能性比预期的要高得多。 实际上,您需要生成的文档数量大约是散列函数可能输出的数量的平方根。 没有散​​列函数可以完美地均匀分布这一事实得到了改善,这导致许多流行的散列算法变得不安全 。

翻译自: https://towardsdatascience.com/the-birthday-paradox-ec71357d45f3

python生日悖论分析

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389325.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

统计0-n数字中出现k的次数

/*** 统计0-n数字中出现k的次数,其中k范围为0-9 */ public static int countOne(int k, int n) {if (k > n) {return 0;}int sum 0;int right 0;for (int i 0; n > 0; i) {int last n % 10;sum last * i * (int) Math.pow(10, i - 1);if (k 0) {sum - (…

房价预测 search Search 中对数据预处理的学习

对于缺失的数据: 我们对连续数值的特征做标准化(standardization):设该特征在整个数据集上的均值为 μ ,标准差为 σ 。那么,我们可以将该特征的每个值先减去 μ 再除以 σ 得到标准化后的每个特征值。对于…

3.6.1.非阻塞IO

本节讲解什么是非阻塞IO,如何将文件描述符修改为非阻塞式 3.6.1.1、阻塞与非阻塞 (1)阻塞是指函数调用会被阻塞。本质是当前进程调用了函数,进入内核里面去后,因为当前进程的执行条件不满足,内核无法里面完…

rstudio 管道符号_R中的管道指南

rstudio 管道符号R基础知识 (R Fundamentals) Data analysis often involves many steps. A typical journey from raw data to results might involve filtering cases, transforming values, summarising data, and then running a statistical test. But how can we link al…

蒙特卡洛模拟预测股票_使用蒙特卡洛模拟来预测极端天气事件

蒙特卡洛模拟预测股票In a previous article, I outlined the limitations of conventional time series models such as ARIMA when it comes to forecasting extreme temperature values, which in and of themselves are outliers in the time series.在上一篇文章中 &#…

iOS之UITraitCollection

UITraitCollection 为表征 size class 而生,用来区分设备。你可以在它身上获取到足以区分所有设备的特征。 UITraitEnvironment 协议、UIContentContainer 协议 UIViewController 遵循了这两个协议,用来监听和设置 traitCollection 的变化。 protocol UI…

直方图绘制与直方图均衡化实现

一,直方图的绘制 1.直方图的概念: 在图像处理中,经常用到直方图,如颜色直方图、灰度直方图等。 图像的灰度直方图就描述了图像中灰度分布情况,能够很直观的展示出图像中各个灰度级所 占的多少。 图像的灰度直方图是灰…

eclipse警告与报错的修复

1.关闭所有eclipse校验 windows->perference->validation disable all 2.Access restriction: The constructor BASE64Decoder() is not API (restriction on required library C:\Program Files\Java\jdk1.8.0_131\jre\lib\rt.jar) 在builde path 移除jre,再…

时间序列因果关系_分析具有因果关系的时间序列干预:货币波动

时间序列因果关系When examining a time series, it is quite common to have an intervention influence that series at a particular point.在检查时间序列时,在特定时间点对该序列产生干预影响是很常见的。 Some examples of this could be:例如: …

微生物 研究_微生物监测如何工作,为何如此重要

微生物 研究Background背景 While a New York Subway station is bustling with swarms of businessmen, students, artists, and millions of other city-goers every day, its floors, railings, stairways, toilets, walls, kiosks, and benches are teeming with non-huma…

Linux shell 脚本SDK 打包实践, 收集assets和apk, 上传FTP

2019独角兽企业重金招聘Python工程师标准>>> git config user.name "jenkins" git config user.email "jenkinsgerrit.XXX.net" cp $JENKINS_HOME/maven.properties $WORKSPACE cp $JENKINS_HOME/maven.properties $WORKSPACE/app cp $JENKINS_…

opencv:卷积涉及的基础概念,Sobel边缘检测代码实现及卷积填充模式

具体参考我的另一篇文章: opencv:卷积涉及的基础概念,Sobel边缘检测代码实现及Same(相同)填充与Vaild(有效)填充 这里是对这一篇文章的补充! 卷积—三种填充模式 橙色部分为image, 蓝色部分为…

怎么查这个文件在linux下的哪个目录

因为要装pl/sql所以要查找tnsnames.ora文件。。看看怎么查这个文件在linux下的哪个目录 find / -name tnsnames.ora 查到: /opt/app/oracle/product/10.2/network/admin/tnsnames.ora/opt/app/oracle/product/10.2/network/admin/samples/tnsnames.ora 还可以用loca…

无法从套接字中获取更多数据_数据科学中应引起更多关注的一个组成部分

无法从套接字中获取更多数据介绍 (Introduction) Data science, machine learning, artificial intelligence, those terms are all over the news. They get everyone excited with the promises of automation, new savings or higher earnings, new features, markets or te…

web数据交互_通过体育运动使用定制的交互式Web应用程序数据科学探索任何数据...

web数据交互Most good data projects start with the analyst doing something to get a feel for the data that they are dealing with.大多数好的数据项目都是从分析师开始做一些事情,以便对他们正在处理的数据有所了解。 They might hack together a Jupyter n…

C# .net 对图片操作

using System.Drawing;using System.Drawing.Drawing2D;using System.Drawing.Imaging;public class ImageHelper{/// <summary>/// 获取图片中的各帧/// </summary>/// <param name"pPath">图片路径</param>/// <param name"pSaveP…

数据类型之Integer与int

数据类型之Integer与int Java入门 基本数据类型 众所周知&#xff0c;Java是面向对象的语言&#xff0c;一切皆对象。但是为了兼容人类根深蒂固的数据处理习惯&#xff0c;加快常规数据的处理速度&#xff0c;提供了9种基本数据类型&#xff0c;他们都不具备对象的特性&#xf…

PCA(主成分分析)思想及实现

PCA的概念&#xff1a; PCA是用来实现特征提取的。 特征提取的主要目的是为了排除信息量小的特征&#xff0c;减少计算量等。 简单来说&#xff1a; 当数据含有多个特征的时候&#xff0c;选取主要的特征&#xff0c;排除次要特征或者不重要的特征。 比如说&#xff1a;我们要…

【安富莱二代示波器教程】第8章 示波器设计—测量功能

第8章 示波器设计—测量功能 二代示波器测量功能实现比较简单&#xff0c;使用2D函数绘制即可。不过也专门开辟一个章节&#xff0c;为大家做一个简单的说明&#xff0c;方便理解。 8.1 水平测量功能 8.2 垂直测量功能 8.3 总结 8.1 水平测量功能 水平测量方…