编译原理 数据流方程_数据科学中最可悲的方程式

编译原理 数据流方程

重点 (Top highlight)

Prepare a box of tissues! I’m about to drop a truth bomb about statistics and data science that’ll bring tears to your eyes.

准备一盒纸巾! 我将投放一本关于统计和数据科学的真相炸弹,这会让您眼泪汪汪。

Image for post
SOURCE.SOURCE 。

INFERENCE = DATA + ASSUMPTIONS. In other words, statistics does not give you truth.

推断=数据+假设。 换句话说,统计并不能为您提供真实的信息。

常见的神话 (Common myths)

Here are some standard misconceptions:

以下是一些标准的误解:

  • “If I find the right equations, I can know the unknown.”

    “如果找到正确的方程式,我就能知道未知数。”

  • “If I math at my data hard enough, I can reduce my uncertainty.”

    “如果我对数据进行足够的数学计算,就可以减少不确定性。”

  • “Statistics can transform data into truth!”

    “统计可以将数据转化为事实!”

They sound like fairytales, don’t they? That’s because they are!

他们听起来像童话,不是吗? 那是因为他们!

痛苦的事实 (Painful truths)

There is no magic in the world that lets you make something out of nothing, so abandon that hope now. That’s not what statistics is about. Take it from a statistician. (As a bonus, this article might save you from wasting a decade of your life studying the dark arts of statistics to chase that elusive dream.)

世界上没有任何魔法可以让您一无所有,所以现在就放弃那个希望。 那不是统计的意义。 从统计学家那里拿来。 (作为奖励,这篇文章可能使您免于浪费生命的十年来研究统计的黑暗艺术来追逐那个难以捉摸的梦想。)

Unfortunately, there are plenty of charlatans out there who may try to convince you otherwise. They’ll pull a classic bullying move on you, “You don’t understand the equations I’m clobbering you with, so bow before my superiority and do what I say!”

不幸的是,那里有许多骗子可能试图说服您。 他们将向您施加经典的欺凌举动, “您不理解我正在困扰您的方程式,所以在我的优势面前屈服,做我说的!”

Resist those posers.

抵制那些装腔作势者。

Image for post
SOURCE.SOURCE 。

伊卡洛斯(Icarus)别摔了! (Don’t land with a splat, Icarus!)

Think of statistical inference (“statistics” for short) as an Icarus-like leap from what we know (our sample data) to what we don’t (our population parameter).

将统计推断(简称“ 统计 ”)视为从我们所知道的(我们的样本数据 )到我们所不知道的(我们的总体参数 )类似伊卡洛斯的飞跃。

In statistics, what you know is not what you wish you knew.

在统计中,您所知道的并不是您所希望的。

Perhaps you want tomorrow’s facts, but you only have the past to inform you. (It’s so annoying when we can’t remember the future, right?) Perhaps you want to know what all your potential users think of your product, but you can only ask a hundred of them. Then you’re dealing with uncertainty!

也许您想要明天的事实,但只有过去可以告诉您。 (当我们不记得未来时,这真令人讨厌,对吗?)也许您想知道所有潜在用户对您产品的看法,但您只能问其中的一百个 。 然后,您正在处理不确定性 !

这不是魔术,而是假设 (It’s not magic, it’s assumptions)

How can you possibly leap from what you know to what you don’t? You need a bridge to cross that chasm… and that bridge is assumptions. Which brings me back to the most painful equation in all of data science: DATA + ASSUMPTIONS = PREDICTION.

您怎么可能从知道的知识跃升为不知道的知识? 您需要一座桥梁来克服这一鸿沟……而这座桥梁是假设 。 这使我回到了所有数据科学中最痛苦的方程式:数据+假设=预测。

DATA + ASSUMPTIONS = PREDICTION

数据+假设=预测

(Feel free to replace the word “prediction” with “inference” or “forecast” if you like — they’re all the same thing here: a statement about something you can’t know for sure.)

(如果愿意,可以用“ 推断 ”或“ 预测 ”替换“ 预测 ”一词,它们在这里都是一样的:关于您不确定的事情的陈述。)

Image for post
SOURCE.SOURCE 。

有什么假设? (What‘s an assumption?)

If we knew all the facts (and we knew that our facts were actually true facts), we wouldn’t need assumptions (or statisticians). Assumptions are the ugly patches you use to bridge the gap between what you know and what you wish you knew. They’re hacks you have to use to make the math work out when you’re missing the facts.

如果我们知道所有事实 (并且我们知道我们的事实实际上是真实的事实),则不需要假设(或统计学家)。 假设是您用来弥合您所知道和所希望之间的鸿沟的丑陋补丁。 当您错过事实时,您必须使用它们来进行数学计算。

Assumptions are ugly band-aids you put over the parts where information is missing.

假设是您在缺少信息的部分上贴上了丑陋的创可贴。

Should I put it more bluntly? An assumption is not a fact, it’s some nonsense you make up precisely because you’ve got gaping holes in your knowledge. If you’re in the habit of bullying people with your overconfidence intervals, take a moment to remind yourself of that it’s a stretch to refer to anything based on assumptions as truth. It’s best to start treating the whole thing as a personal decision-making tool that is imperfect but better than nothing (in specific situations).

我应该说得更直白些吗? 假设不是事实,这恰恰是因为您的知识空洞而造成的,这是胡说八道。 如果您习惯于以过分自信的时间欺负他人,请花点时间提醒自己,将任何基于假设的东西称为真理是很困难的 。 最好开始将整个事情视为不完美但总比没有好( 在特定情况下 )的个人决策工具 。

Statistics is your attempt to do your best in an uncertain world.

统计数据是您在不确定的世界中尽力而为的尝试。

There are always assumptions.

总有假设。

假设是决策的一部分 (Assumptions are part of decision-making)

Show me an “assumption-free” real-world decision and I’ll rattle off a host of implicit assumptions you’re not even aware you’re making.

向我展示一个“无假设”的现实决策,我会冒充您甚至不知道自己在做的一系列隐含假设。

Examples: When you read a newspaper, did you assume all the facts were checked? When you made your plans for 2020, did you assume there would be no global pandemic? If you analyzed data, did you assume the information was captured without errors? Did you assume that your random number generator is random? (They usually aren’t.) When you chose to make an online purchase, did you assume the right amount would be withdrawn from your bank account? What about the last snack you had, did you assume it wouldn’t poison you? When you took medicine, did you *know* anything about its long-term safety and efficacy… or did you assume?

示例: 当您阅读报纸时,您是否假设所有事实都经过检查? 当您制定2020年计划时,您是否假设不会发生全球大流行? 如果您分析了数据,您是否假设信息被正确捕获? 您是否假设您的随机数生成器是随机的? (通常不是。)当您选择进行在线购买时,您是否假设将从您的银行帐户中提取了正确的金额? 您最近吃的零食怎么样,您是否认为它不会毒死您? 当您服药时,您是否*知道*有关其长期安全性和功效的任何信息……还是您假设?

Like it or not, assumptions are part of decision-making.

不管喜欢与否,假设都是决策的一部分。

Like it or not, assumptions are always part of decision-making. A proper foray into real-world data should contain a host of written-down assumptions where the data scientist comes clean about corners they had to cut.

无论喜欢与否,假设始终是决策的一部分。 对现实世界数据的适当尝试应包含大量的书面假设, 数据科学家可以清楚地了解自己必须削减的数据。

Even if you choose to steer clear of statistics, you’re probably using assumptions to guide your actions. To stay safe, it’s crucial that you keep track of the assumptions that your decisions are based on.

即使您选择避开统计信息,您也可能会使用假设来指导自己的行动。 为了保持安全,至关重要的是,您要跟踪决策所依据的假设。

统计“魔术”如何发生 (How the statistical “magic” happens)

The field of statistics gives you a whole arsenal of tools for formalizing your assumptions and combining them with evidence to make reasonable decisions. (Catch my 8 minute intro to stats here.)

统计领域为您提供了一整套工具,用于正规化您的假设并将其与证据结合以做出合理的决定。 ( 在这里获取我8分钟的统计简介)。

It’s preposterous to expect an analysis involving uncertainty and probability to be a source of truth-with-a-capital-T.

期望将涉及不确定性和概率的分析作为资本真实性T的来源是荒谬的。

Yep, that’s how the statistical “magic” happens. You choose which assumptions you’re willing to live with, then you combine them with data to take reasonable actions on the basis of that unholy union. That’s all statistics is.

是的,这就是统计“魔术”的发生方式。 选择愿意接受的假设,然后将它们与数据结合起来,以根据那个邪恶的联盟采取合理的行动。 这就是所有统计信息。

Image for post
SOURCE.SOURCE 。

That’s why an analysis involving uncertainty and probability could never be a source of truth-with-a-capital-T. There is no secret dark art that can do that for you.

这就是为什么涉及不确定性和概率的分析永远不会成为资本真实性的来源。 没有秘密的黑暗艺术可以为您做到这一点。

Two people can come to completely different valid conclusions from the same data! All it takes is using different assumptions.

两个人可以从同一数据得出完全不同的有效结论! 它所要做的只是使用不同的假设。

It’s also why two people can come to completely different valid conclusions from the same data! All it takes is using different assumptions. Statistics gives you a tool for making decisions more thoughtfully, but there’s no single right way to use it. It’s a personal decision-making tool.

这也是为什么两个人可以从同一数据得出完全不同的有效结论的原因! 它所要做的只是使用不同的假设。 统计信息为您提供了一种更周到地制定决策的工具,但是没有唯一正确的使用方法。 这是个人决策工具。

A study is only as good as the assumptions you’ll make about it.

一项研究仅与您对它所做的假设一样好 。

那科学呢? (What about science?)

What does it mean when a scientist uses statistics to come to a conclusion? Simply that they’ve formed an opinion and have made the decision to share it with the world. That’s not a bad thing — it’s a scientist’s job to form opinions reluctantly, which makes me feel better about assuming that they’re worth listening to.

科学家使用统计数据得出结论是什么意思? 只是他们已经形成了一种意见,并决定与世界分享。 这不是一件坏事-勉强地形成观点是科学家的工作,这使我对假设它们值得听取感到更好。

It’s a scientist’s job to form opinions reluctantly.

勉强形成意见是科学家的工作。

I’m a huge fan of taking advice from those who have more expertise and information than I do, but I never let myself confuse their opinions with facts. But while many scientists are well-versed in working with probability, I’ve seen other scientists make enough statistical mess to last several lifetimes. Opinions could not (and should not) convince someone who’s not willing to make the assumption that those opinions were arrived at competently from a blend of evidence and mutually-palatable untested assumptions.

我非常喜欢 忠告 那些比我拥有更多专业知识和信息的人,但我从来没有让自己迷惑他们 意见 事实 但是,尽管许多科学家精通概率论,但我已经看到其他科学家在统计上一团糟,可以持续几生。 意见不能(也不应该)说服别人谁是不愿意让这些意见是在胜任从证据和相互 -palatable未经检验的假设混合到达的假设

If you’d like to hear more of my musings on science and scientists, read this.

如果您想听到更多我对科学和科学家沉思的,读 这个

综上所述 (In summary)

It’s best to think of statistics as the science of changing your mind under uncertainty. It’s a framework to help you make thoughtful decisions when you lack information… and there’s no single right way to use it.

最好将统计数字视为在不确定性下改变主意的科学 。 它是一个框架,可在您缺乏信息时帮助您做出周到的决定……并且没有唯一正确的使用方法。

And no, it doesn’t give you the facts you need; it gives you what you need to cope with not having those facts in the first place. The entire point is to help you do your best in an uncertain world.

不,它并不能为您提供所需的事实。 它为您提供了您需要解决的事情,而不是一开始就没有这些事实。 关键是要帮助您在不确定的世界中尽力而为。

To do that, you’ll have to start making assumptions.

为此,您必须开始进行假设。

接下来 (Next up)

In follow-up articles, I’ll write about where assumptions come from, how to pick “good” assumptions, and what it means to test an assumption. If these topics intrigue you, your retweets are my favorite motivation for writing.

在后续文章中,我将介绍假设的来源,如何选择“好的”假设以及检验假设的含义。 如果这些主题引起您的兴趣,您的转发是我最喜欢写的动机。

In the meantime, most of the links in this article take you to my other musings. Can’t choose? Try one of these:

同时,本文中的大多数链接都将您带入我的其他想法。 无法选择? 尝试以下方法之一:

翻译自: https://towardsdatascience.com/the-saddest-equation-in-data-science-e60e7819b63f

编译原理 数据流方程

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391796.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

iOS-FMDB

2019独角兽企业重金招聘Python工程师标准>>> #import <Foundation/Foundation.h> #import <FMDatabase.h> #import "MyModel.h"interface FMDBManager : NSObject {FMDatabase *_dataBase; }(instancetype)shareInstance;- (BOOL)insert:(MyM…

解决朋友圈压缩_朋友中最有趣的朋友[已解决]

解决朋友圈压缩We live in uncertain times.我们生活在不确定的时代。 We don’t know when we’re going back to school or the office. We don’t know when we’ll be able to sit inside at a restaurant. We don’t even know when we’ll be able to mosh at a Korn co…

MapServer应用开发平台示例

MapServer为当前开源WebGIS的应用代表&#xff0c;在西方社会应用面极为广泛&#xff0c;现介绍几个基于它的开源应用平台。 1.GeoMOOSE GeoMoose is a Web Client Javascript Framework for displaying distributed cartographic data. Among its many strengths, it can hand…

pymc3 贝叶斯线性回归_使用PyMC3进行贝叶斯媒体混合建模,带来乐趣和收益

pymc3 贝叶斯线性回归Michael Johns, Zhenyu Wang, Bruno Dupont, and Luca Fiaschi迈克尔约翰斯&#xff0c;王振宇&#xff0c;布鲁诺杜邦和卢卡菲亚斯基 “If you can’t measure it, you can’t manage it, or fix it”“如果无法衡量&#xff0c;就无法管理或修复它” –…

ols线性回归_普通最小二乘[OLS]方法使用于机器学习的简单线性回归变得容易

ols线性回归Hello Everyone!大家好&#xff01; I am super excited to be writing another article after a long time since my previous article was published.自从上一篇文章发表很长时间以来&#xff0c;我很高兴能写另一篇文章。 A Simple Linear Regression [SLR] is…

Amazon Personalize:帮助释放精益数字业务的高级推荐解决方案的功能

By Gerd Wittchen盖德维琴 推荐解决方案的动机 (Motivation for recommendation solutions) Rapid changes in customer behaviour requires businesses to adapt at an ever increasing pace. The recent changes to our work and personal life has forced entire nations t…

Linux 链接文件讲解

链接文件是Linux文件系统的一个优势。如需要在系统上维护同一文件的两份或者多份副本&#xff0c;除了保存多份单独的物理文件之外&#xff0c;可以采用保留一份物理文件副本和多个虚拟副本的方式&#xff0c;这种虚拟的副本就成为链接。链接是目录中指向文件真实位置的占位符。…

系统滚动条实现的NUD控件Unusable版

昨天研究了一下系统滚动条&#xff0c;准备使用它来实现一个NumericUpDown控件&#xff0c;因为它可以带来最正宗的微调按钮外观&#xff0c;并说了一下可以使用viewport里的onScroll事件来获取系统滚动条的上下点击动作。 同时昨天还说了onScroll事件的一个问题是&#xf…

[习题].FindControl()方法 与 PlaceHolder控件 #2(动态加入「子控件」的事件)

这是我的文章备份&#xff0c;有空请到我的网站走走&#xff0c; http://www.dotblogs.com.tw/mis2000lab/ 才能掌握我提供的第一手信息&#xff0c;谢谢您。 http://www.dotblogs.com.tw/mis2000lab/archive/2011/07/26/placeholder_findcontrol_eventhandler.aspx [习题].Fi…

西雅图治安_数据科学家对西雅图住宿业务的分析

西雅图治安介绍 (Introduction) Airbnb provides an online platform for hosts to accommodate guests with short-term lodging. Guests can search for lodging using filters such as lodging type, dates, location, and price, and can search for specific types of hom…

【贪心】买卖股票的最佳时机含手续费

/** 贪心&#xff1a;每次选取更低的价格买入&#xff0c;遇到高于买入的价格就出售(此时不一定是最大收益)。* 使用buy表示买入股票的价格和手续费的和。遍历数组&#xff0c;如果后面的股票价格加上手续费* 小于buy&#xff0c;说明有更低的买入价格更新buy。如…

排序算法Java代码实现(二)—— 冒泡排序

本篇内容&#xff1a; 冒泡排序冒泡排序 算法思想&#xff1a; 冒泡排序的原理是&#xff1a;从左到右&#xff0c;相邻元素进行比较。 每次比较一轮&#xff0c;就会找到序列中最大的一个或最小的一个。这个数就会从序列的最右边冒出来。 代码实现&#xff1a; /*** */ packag…

创意产品 分析_使用联合分析来发展创意

创意产品 分析Advertising finds itself in a tenacious spot these days serving two masters: creativity and data.如今&#xff0c;广告业处于一个顽强的位置&#xff0c;服务于两个大师&#xff1a;创造力和数据。 On the one hand, it values creativity; and it’s not…

vue.js 安装

写 一个小小的安装步骤 踩坑过来的 点击.然后安装cnpm.再接着使用文章说明继续安装 # 全局安装 vue-cli $ cnpm install --global vue-cli # 创建一个基于 webpack 模板的新项目 $ vue init webpack my-project这时候一路空格 选项.当遇到第一个让你敲 Y/N 的时候 选择Y …

pandas之表格样式

在juoyter notebook中直接通过df输出DataFrame时&#xff0c;显示的样式为表格样式&#xff0c;通过sytle可对表格的样式做一些定制&#xff0c;类似excel的条件格式。 df pd.DataFrame(np.random.rand(5,4),columns[A,B,C,D]) s df.style print(s,type(s)) #<pandas.io.f…

多层感知机 深度神经网络_使用深度神经网络和合同感知损失的能源产量预测...

多层感知机 深度神经网络in collaboration with Hsu Chung Chuan, Lin Min Htoo, and Quah Jia Yong.与许忠传&#xff0c;林敏涛和华佳勇合作。 1. Introduction1.简介 Since the early 1990s, several countries, mostly in the European Union and North America, had sta…

蓝牙调试工具如何使用_使用此有价值的工具改进您的蓝牙项目:第2部分!

蓝牙调试工具如何使用This post is originally from www.jaredwolff.com. 这篇文章最初来自www.jaredwolff.com。 This is Part 2 of configuring your own Bluetooth Low Energy Service using a Nordic NRF52 series processor. If you haven’t seen Part 1 go back and ch…

使用Matplotlib Numpy Pandas构想泰坦尼克号高潮

Did you know, a novel predicted the Titanic sinking 14 years previously to the actual disaster???您知道吗&#xff0c;一本小说预言泰坦尼克号在14年前沉没到了真正的灾难中&#xff1f;&#xff1f;&#xff1f; In 1898 (14 years before the Titanic sank), Amer…

pca数学推导_PCA背后的统计和数学概念

pca数学推导As I promised in the previous article, Principal Component Analysis (PCA) with Scikit-learn, today, I’ll discuss the mathematics behind the principal component analysis by manually executing the algorithm using the powerful numpy and pandas lib…

红黑树分析

红黑树的性质&#xff1a; 性质1&#xff1a;每个节点要么是黑色&#xff0c;要么是红色。 性质2&#xff1a;根节点是黑色。性质3&#xff1a;每个叶子节点&#xff08;NIL&#xff09;是黑色。性质4&#xff1a;每个红色节点的两个子节点一定都是黑色。不能有两个红色节点相…