编译原理 数据流方程
重点 (Top highlight)
Prepare a box of tissues! I’m about to drop a truth bomb about statistics and data science that’ll bring tears to your eyes.
准备一盒纸巾! 我将投放一本关于统计和数据科学的真相炸弹,这会让您眼泪汪汪。
INFERENCE = DATA + ASSUMPTIONS. In other words, statistics does not give you truth.
推断=数据+假设。 换句话说,统计并不能为您提供真实的信息。
常见的神话 (Common myths)
Here are some standard misconceptions:
以下是一些标准的误解:
“If I find the right equations, I can know the unknown.”
“如果找到正确的方程式,我就能知道未知数。”
“If I math at my data hard enough, I can reduce my uncertainty.”
“如果我对数据进行足够的数学计算,就可以减少不确定性。”
“Statistics can transform data into truth!”
“统计可以将数据转化为事实!”
They sound like fairytales, don’t they? That’s because they are!
他们听起来像童话,不是吗? 那是因为他们!
痛苦的事实 (Painful truths)
There is no magic in the world that lets you make something out of nothing, so abandon that hope now. That’s not what statistics is about. Take it from a statistician. (As a bonus, this article might save you from wasting a decade of your life studying the dark arts of statistics to chase that elusive dream.)
世界上没有任何魔法可以让您一无所有,所以现在就放弃那个希望。 那不是统计的意义。 从统计学家那里拿来。 (作为奖励,这篇文章可能使您免于浪费生命的十年来研究统计的黑暗艺术来追逐那个难以捉摸的梦想。)
Unfortunately, there are plenty of charlatans out there who may try to convince you otherwise. They’ll pull a classic bullying move on you, “You don’t understand the equations I’m clobbering you with, so bow before my superiority and do what I say!”
不幸的是,那里有许多骗子可能试图说服您。 他们将向您施加经典的欺凌举动, “您不理解我正在困扰您的方程式,所以在我的优势面前屈服,做我说的!”
Resist those posers.
抵制那些装腔作势者。
伊卡洛斯(Icarus)别摔了! (Don’t land with a splat, Icarus!)
Think of statistical inference (“statistics” for short) as an Icarus-like leap from what we know (our sample data) to what we don’t (our population parameter).
将统计推断(简称“ 统计 ”)视为从我们所知道的(我们的样本数据 )到我们所不知道的(我们的总体参数 )类似伊卡洛斯的飞跃。
In statistics, what you know is not what you wish you knew.
在统计中,您所知道的并不是您所希望的。
Perhaps you want tomorrow’s facts, but you only have the past to inform you. (It’s so annoying when we can’t remember the future, right?) Perhaps you want to know what all your potential users think of your product, but you can only ask a hundred of them. Then you’re dealing with uncertainty!
也许您想要明天的事实,但只有过去可以告诉您。 (当我们不记得未来时,这真令人讨厌,对吗?)也许您想知道所有潜在用户对您产品的看法,但您只能问其中的一百个 。 然后,您正在处理不确定性 !
这不是魔术,而是假设 (It’s not magic, it’s assumptions)
How can you possibly leap from what you know to what you don’t? You need a bridge to cross that chasm… and that bridge is assumptions. Which brings me back to the most painful equation in all of data science: DATA + ASSUMPTIONS = PREDICTION.
您怎么可能从知道的知识跃升为不知道的知识? 您需要一座桥梁来克服这一鸿沟……而这座桥梁是假设 。 这使我回到了所有数据科学中最痛苦的方程式:数据+假设=预测。
DATA + ASSUMPTIONS = PREDICTION
数据+假设=预测
(Feel free to replace the word “prediction” with “inference” or “forecast” if you like — they’re all the same thing here: a statement about something you can’t know for sure.)
(如果愿意,可以用“ 推断 ”或“ 预测 ”替换“ 预测 ”一词,它们在这里都是一样的:关于您不确定的事情的陈述。)
有什么假设? (What‘s an assumption?)
If we knew all the facts (and we knew that our facts were actually true facts), we wouldn’t need assumptions (or statisticians). Assumptions are the ugly patches you use to bridge the gap between what you know and what you wish you knew. They’re hacks you have to use to make the math work out when you’re missing the facts.
如果我们知道所有事实 (并且我们知道我们的事实实际上是真实的事实),则不需要假设(或统计学家)。 假设是您用来弥合您所知道和所希望之间的鸿沟的丑陋补丁。 当您错过事实时,您必须使用它们来进行数学计算。
Assumptions are ugly band-aids you put over the parts where information is missing.
假设是您在缺少信息的部分上贴上了丑陋的创可贴。
Should I put it more bluntly? An assumption is not a fact, it’s some nonsense you make up precisely because you’ve got gaping holes in your knowledge. If you’re in the habit of bullying people with your overconfidence intervals, take a moment to remind yourself of that it’s a stretch to refer to anything based on assumptions as truth. It’s best to start treating the whole thing as a personal decision-making tool that is imperfect but better than nothing (in specific situations).
我应该说得更直白些吗? 假设不是事实,这恰恰是因为您的知识空洞而造成的,这是胡说八道。 如果您习惯于以过分自信的时间欺负他人,请花点时间提醒自己,将任何基于假设的东西称为真理是很困难的 。 最好开始将整个事情视为不完美但总比没有好( 在特定情况下 )的个人决策工具 。
Statistics is your attempt to do your best in an uncertain world.
统计数据是您在不确定的世界中尽力而为的尝试。
There are always assumptions.
总有假设。
假设是决策的一部分 (Assumptions are part of decision-making)
Show me an “assumption-free” real-world decision and I’ll rattle off a host of implicit assumptions you’re not even aware you’re making.
向我展示一个“无假设”的现实决策,我会冒充您甚至不知道自己在做的一系列隐含假设。
Examples: When you read a newspaper, did you assume all the facts were checked? When you made your plans for 2020, did you assume there would be no global pandemic? If you analyzed data, did you assume the information was captured without errors? Did you assume that your random number generator is random? (They usually aren’t.) When you chose to make an online purchase, did you assume the right amount would be withdrawn from your bank account? What about the last snack you had, did you assume it wouldn’t poison you? When you took medicine, did you *know* anything about its long-term safety and efficacy… or did you assume?
示例: 当您阅读报纸时,您是否假设所有事实都经过检查? 当您制定2020年计划时,您是否假设不会发生全球大流行? 如果您分析了数据,您是否假设信息被正确捕获? 您是否假设您的随机数生成器是随机的? (通常不是。)当您选择进行在线购买时,您是否假设将从您的银行帐户中提取了正确的金额? 您最近吃的零食怎么样,您是否认为它不会毒死您? 当您服药时,您是否*知道*有关其长期安全性和功效的任何信息……还是您假设?
Like it or not, assumptions are part of decision-making.
不管喜欢与否,假设都是决策的一部分。
Like it or not, assumptions are always part of decision-making. A proper foray into real-world data should contain a host of written-down assumptions where the data scientist comes clean about corners they had to cut.
无论喜欢与否,假设始终是决策的一部分。 对现实世界数据的适当尝试应包含大量的书面假设, 数据科学家可以清楚地了解自己必须削减的数据。
Even if you choose to steer clear of statistics, you’re probably using assumptions to guide your actions. To stay safe, it’s crucial that you keep track of the assumptions that your decisions are based on.
即使您选择避开统计信息,您也可能会使用假设来指导自己的行动。 为了保持安全,至关重要的是,您要跟踪决策所依据的假设。
统计“魔术”如何发生 (How the statistical “magic” happens)
The field of statistics gives you a whole arsenal of tools for formalizing your assumptions and combining them with evidence to make reasonable decisions. (Catch my 8 minute intro to stats here.)
统计领域为您提供了一整套工具,用于正规化您的假设并将其与证据结合以做出合理的决定。 ( 在这里获取我8分钟的统计简介)。
It’s preposterous to expect an analysis involving uncertainty and probability to be a source of truth-with-a-capital-T.
期望将涉及不确定性和概率的分析作为资本真实性T的来源是荒谬的。
Yep, that’s how the statistical “magic” happens. You choose which assumptions you’re willing to live with, then you combine them with data to take reasonable actions on the basis of that unholy union. That’s all statistics is.
是的,这就是统计“魔术”的发生方式。 您选择愿意接受的假设,然后将它们与数据结合起来,以根据那个邪恶的联盟采取合理的行动。 这就是所有统计信息。
That’s why an analysis involving uncertainty and probability could never be a source of truth-with-a-capital-T. There is no secret dark art that can do that for you.
这就是为什么涉及不确定性和概率的分析永远不会成为资本真实性的来源。 没有秘密的黑暗艺术可以为您做到这一点。
Two people can come to completely different valid conclusions from the same data! All it takes is using different assumptions.
两个人可以从同一数据得出完全不同的有效结论! 它所要做的只是使用不同的假设。
It’s also why two people can come to completely different valid conclusions from the same data! All it takes is using different assumptions. Statistics gives you a tool for making decisions more thoughtfully, but there’s no single right way to use it. It’s a personal decision-making tool.
这也是为什么两个人可以从同一数据得出完全不同的有效结论的原因! 它所要做的只是使用不同的假设。 统计信息为您提供了一种更周到地制定决策的工具,但是没有唯一正确的使用方法。 这是个人决策工具。
A study is only as good as the assumptions you’ll make about it.
一项研究仅与您对它所做的假设一样好 。
那科学呢? (What about science?)
What does it mean when a scientist uses statistics to come to a conclusion? Simply that they’ve formed an opinion and have made the decision to share it with the world. That’s not a bad thing — it’s a scientist’s job to form opinions reluctantly, which makes me feel better about assuming that they’re worth listening to.
科学家使用统计数据得出结论是什么意思? 只是他们已经形成了一种意见,并决定与世界分享。 这不是一件坏事-勉强地形成观点是科学家的工作,这使我对假设它们值得听取感到更好。
It’s a scientist’s job to form opinions reluctantly.
勉强形成意见是科学家的工作。
I’m a huge fan of taking advice from those who have more expertise and information than I do, but I never let myself confuse their opinions with facts. But while many scientists are well-versed in working with probability, I’ve seen other scientists make enough statistical mess to last several lifetimes. Opinions could not (and should not) convince someone who’s not willing to make the assumption that those opinions were arrived at competently from a blend of evidence and mutually-palatable untested assumptions.
我非常喜欢 忠告 那些比我拥有更多专业知识和信息的人,但我从来没有让自己迷惑他们 意见 与 事实 。 但是,尽管许多科学家精通概率论,但我已经看到其他科学家在统计上一团糟,可以持续几生。 意见不能(也不应该)说服别人谁是不愿意让这些意见是在胜任从证据和相互 -palatable未经检验的假设混合到达的假设 。
If you’d like to hear more of my musings on science and scientists, read this.
如果您想听到更多我对科学和科学家沉思的,读 这个 。
综上所述 (In summary)
It’s best to think of statistics as the science of changing your mind under uncertainty. It’s a framework to help you make thoughtful decisions when you lack information… and there’s no single right way to use it.
最好将统计数字视为在不确定性下改变主意的科学 。 它是一个框架,可在您缺乏信息时帮助您做出周到的决定……并且没有唯一正确的使用方法。
And no, it doesn’t give you the facts you need; it gives you what you need to cope with not having those facts in the first place. The entire point is to help you do your best in an uncertain world.
不,它并不能为您提供所需的事实。 它为您提供了您需要解决的事情,而不是一开始就没有这些事实。 关键是要帮助您在不确定的世界中尽力而为。
To do that, you’ll have to start making assumptions.
为此,您必须开始进行假设。
接下来 (Next up)
In follow-up articles, I’ll write about where assumptions come from, how to pick “good” assumptions, and what it means to test an assumption. If these topics intrigue you, your retweets are my favorite motivation for writing.
在后续文章中,我将介绍假设的来源,如何选择“好的”假设以及检验假设的含义。 如果这些主题引起您的兴趣,您的转发是我最喜欢写的动机。
In the meantime, most of the links in this article take you to my other musings. Can’t choose? Try one of these:
同时,本文中的大多数链接都将您带入我的其他想法。 无法选择? 尝试以下方法之一:
翻译自: https://towardsdatascience.com/the-saddest-equation-in-data-science-e60e7819b63f
编译原理 数据流方程
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391796.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!