编译原理 数据流方程_数据科学中最可悲的方程式

编译原理 数据流方程

重点 (Top highlight)

Prepare a box of tissues! I’m about to drop a truth bomb about statistics and data science that’ll bring tears to your eyes.

准备一盒纸巾! 我将投放一本关于统计和数据科学的真相炸弹,这会让您眼泪汪汪。

Image for post
SOURCE.SOURCE 。

INFERENCE = DATA + ASSUMPTIONS. In other words, statistics does not give you truth.

推断=数据+假设。 换句话说,统计并不能为您提供真实的信息。

常见的神话 (Common myths)

Here are some standard misconceptions:

以下是一些标准的误解:

  • “If I find the right equations, I can know the unknown.”

    “如果找到正确的方程式,我就能知道未知数。”

  • “If I math at my data hard enough, I can reduce my uncertainty.”

    “如果我对数据进行足够的数学计算,就可以减少不确定性。”

  • “Statistics can transform data into truth!”

    “统计可以将数据转化为事实!”

They sound like fairytales, don’t they? That’s because they are!

他们听起来像童话,不是吗? 那是因为他们!

痛苦的事实 (Painful truths)

There is no magic in the world that lets you make something out of nothing, so abandon that hope now. That’s not what statistics is about. Take it from a statistician. (As a bonus, this article might save you from wasting a decade of your life studying the dark arts of statistics to chase that elusive dream.)

世界上没有任何魔法可以让您一无所有,所以现在就放弃那个希望。 那不是统计的意义。 从统计学家那里拿来。 (作为奖励,这篇文章可能使您免于浪费生命的十年来研究统计的黑暗艺术来追逐那个难以捉摸的梦想。)

Unfortunately, there are plenty of charlatans out there who may try to convince you otherwise. They’ll pull a classic bullying move on you, “You don’t understand the equations I’m clobbering you with, so bow before my superiority and do what I say!”

不幸的是,那里有许多骗子可能试图说服您。 他们将向您施加经典的欺凌举动, “您不理解我正在困扰您的方程式,所以在我的优势面前屈服,做我说的!”

Resist those posers.

抵制那些装腔作势者。

Image for post
SOURCE.SOURCE 。

伊卡洛斯(Icarus)别摔了! (Don’t land with a splat, Icarus!)

Think of statistical inference (“statistics” for short) as an Icarus-like leap from what we know (our sample data) to what we don’t (our population parameter).

将统计推断(简称“ 统计 ”)视为从我们所知道的(我们的样本数据 )到我们所不知道的(我们的总体参数 )类似伊卡洛斯的飞跃。

In statistics, what you know is not what you wish you knew.

在统计中,您所知道的并不是您所希望的。

Perhaps you want tomorrow’s facts, but you only have the past to inform you. (It’s so annoying when we can’t remember the future, right?) Perhaps you want to know what all your potential users think of your product, but you can only ask a hundred of them. Then you’re dealing with uncertainty!

也许您想要明天的事实,但只有过去可以告诉您。 (当我们不记得未来时,这真令人讨厌,对吗?)也许您想知道所有潜在用户对您产品的看法,但您只能问其中的一百个 。 然后,您正在处理不确定性 !

这不是魔术,而是假设 (It’s not magic, it’s assumptions)

How can you possibly leap from what you know to what you don’t? You need a bridge to cross that chasm… and that bridge is assumptions. Which brings me back to the most painful equation in all of data science: DATA + ASSUMPTIONS = PREDICTION.

您怎么可能从知道的知识跃升为不知道的知识? 您需要一座桥梁来克服这一鸿沟……而这座桥梁是假设 。 这使我回到了所有数据科学中最痛苦的方程式:数据+假设=预测。

DATA + ASSUMPTIONS = PREDICTION

数据+假设=预测

(Feel free to replace the word “prediction” with “inference” or “forecast” if you like — they’re all the same thing here: a statement about something you can’t know for sure.)

(如果愿意,可以用“ 推断 ”或“ 预测 ”替换“ 预测 ”一词,它们在这里都是一样的:关于您不确定的事情的陈述。)

Image for post
SOURCE.SOURCE 。

有什么假设? (What‘s an assumption?)

If we knew all the facts (and we knew that our facts were actually true facts), we wouldn’t need assumptions (or statisticians). Assumptions are the ugly patches you use to bridge the gap between what you know and what you wish you knew. They’re hacks you have to use to make the math work out when you’re missing the facts.

如果我们知道所有事实 (并且我们知道我们的事实实际上是真实的事实),则不需要假设(或统计学家)。 假设是您用来弥合您所知道和所希望之间的鸿沟的丑陋补丁。 当您错过事实时,您必须使用它们来进行数学计算。

Assumptions are ugly band-aids you put over the parts where information is missing.

假设是您在缺少信息的部分上贴上了丑陋的创可贴。

Should I put it more bluntly? An assumption is not a fact, it’s some nonsense you make up precisely because you’ve got gaping holes in your knowledge. If you’re in the habit of bullying people with your overconfidence intervals, take a moment to remind yourself of that it’s a stretch to refer to anything based on assumptions as truth. It’s best to start treating the whole thing as a personal decision-making tool that is imperfect but better than nothing (in specific situations).

我应该说得更直白些吗? 假设不是事实,这恰恰是因为您的知识空洞而造成的,这是胡说八道。 如果您习惯于以过分自信的时间欺负他人,请花点时间提醒自己,将任何基于假设的东西称为真理是很困难的 。 最好开始将整个事情视为不完美但总比没有好( 在特定情况下 )的个人决策工具 。

Statistics is your attempt to do your best in an uncertain world.

统计数据是您在不确定的世界中尽力而为的尝试。

There are always assumptions.

总有假设。

假设是决策的一部分 (Assumptions are part of decision-making)

Show me an “assumption-free” real-world decision and I’ll rattle off a host of implicit assumptions you’re not even aware you’re making.

向我展示一个“无假设”的现实决策,我会冒充您甚至不知道自己在做的一系列隐含假设。

Examples: When you read a newspaper, did you assume all the facts were checked? When you made your plans for 2020, did you assume there would be no global pandemic? If you analyzed data, did you assume the information was captured without errors? Did you assume that your random number generator is random? (They usually aren’t.) When you chose to make an online purchase, did you assume the right amount would be withdrawn from your bank account? What about the last snack you had, did you assume it wouldn’t poison you? When you took medicine, did you *know* anything about its long-term safety and efficacy… or did you assume?

示例: 当您阅读报纸时,您是否假设所有事实都经过检查? 当您制定2020年计划时,您是否假设不会发生全球大流行? 如果您分析了数据,您是否假设信息被正确捕获? 您是否假设您的随机数生成器是随机的? (通常不是。)当您选择进行在线购买时,您是否假设将从您的银行帐户中提取了正确的金额? 您最近吃的零食怎么样,您是否认为它不会毒死您? 当您服药时,您是否*知道*有关其长期安全性和功效的任何信息……还是您假设?

Like it or not, assumptions are part of decision-making.

不管喜欢与否,假设都是决策的一部分。

Like it or not, assumptions are always part of decision-making. A proper foray into real-world data should contain a host of written-down assumptions where the data scientist comes clean about corners they had to cut.

无论喜欢与否,假设始终是决策的一部分。 对现实世界数据的适当尝试应包含大量的书面假设, 数据科学家可以清楚地了解自己必须削减的数据。

Even if you choose to steer clear of statistics, you’re probably using assumptions to guide your actions. To stay safe, it’s crucial that you keep track of the assumptions that your decisions are based on.

即使您选择避开统计信息,您也可能会使用假设来指导自己的行动。 为了保持安全,至关重要的是,您要跟踪决策所依据的假设。

统计“魔术”如何发生 (How the statistical “magic” happens)

The field of statistics gives you a whole arsenal of tools for formalizing your assumptions and combining them with evidence to make reasonable decisions. (Catch my 8 minute intro to stats here.)

统计领域为您提供了一整套工具,用于正规化您的假设并将其与证据结合以做出合理的决定。 ( 在这里获取我8分钟的统计简介)。

It’s preposterous to expect an analysis involving uncertainty and probability to be a source of truth-with-a-capital-T.

期望将涉及不确定性和概率的分析作为资本真实性T的来源是荒谬的。

Yep, that’s how the statistical “magic” happens. You choose which assumptions you’re willing to live with, then you combine them with data to take reasonable actions on the basis of that unholy union. That’s all statistics is.

是的,这就是统计“魔术”的发生方式。 选择愿意接受的假设,然后将它们与数据结合起来,以根据那个邪恶的联盟采取合理的行动。 这就是所有统计信息。

Image for post
SOURCE.SOURCE 。

That’s why an analysis involving uncertainty and probability could never be a source of truth-with-a-capital-T. There is no secret dark art that can do that for you.

这就是为什么涉及不确定性和概率的分析永远不会成为资本真实性的来源。 没有秘密的黑暗艺术可以为您做到这一点。

Two people can come to completely different valid conclusions from the same data! All it takes is using different assumptions.

两个人可以从同一数据得出完全不同的有效结论! 它所要做的只是使用不同的假设。

It’s also why two people can come to completely different valid conclusions from the same data! All it takes is using different assumptions. Statistics gives you a tool for making decisions more thoughtfully, but there’s no single right way to use it. It’s a personal decision-making tool.

这也是为什么两个人可以从同一数据得出完全不同的有效结论的原因! 它所要做的只是使用不同的假设。 统计信息为您提供了一种更周到地制定决策的工具,但是没有唯一正确的使用方法。 这是个人决策工具。

A study is only as good as the assumptions you’ll make about it.

一项研究仅与您对它所做的假设一样好 。

那科学呢? (What about science?)

What does it mean when a scientist uses statistics to come to a conclusion? Simply that they’ve formed an opinion and have made the decision to share it with the world. That’s not a bad thing — it’s a scientist’s job to form opinions reluctantly, which makes me feel better about assuming that they’re worth listening to.

科学家使用统计数据得出结论是什么意思? 只是他们已经形成了一种意见,并决定与世界分享。 这不是一件坏事-勉强地形成观点是科学家的工作,这使我对假设它们值得听取感到更好。

It’s a scientist’s job to form opinions reluctantly.

勉强形成意见是科学家的工作。

I’m a huge fan of taking advice from those who have more expertise and information than I do, but I never let myself confuse their opinions with facts. But while many scientists are well-versed in working with probability, I’ve seen other scientists make enough statistical mess to last several lifetimes. Opinions could not (and should not) convince someone who’s not willing to make the assumption that those opinions were arrived at competently from a blend of evidence and mutually-palatable untested assumptions.

我非常喜欢 忠告 那些比我拥有更多专业知识和信息的人,但我从来没有让自己迷惑他们 意见 事实 但是,尽管许多科学家精通概率论,但我已经看到其他科学家在统计上一团糟,可以持续几生。 意见不能(也不应该)说服别人谁是不愿意让这些意见是在胜任从证据和相互 -palatable未经检验的假设混合到达的假设

If you’d like to hear more of my musings on science and scientists, read this.

如果您想听到更多我对科学和科学家沉思的,读 这个

综上所述 (In summary)

It’s best to think of statistics as the science of changing your mind under uncertainty. It’s a framework to help you make thoughtful decisions when you lack information… and there’s no single right way to use it.

最好将统计数字视为在不确定性下改变主意的科学 。 它是一个框架,可在您缺乏信息时帮助您做出周到的决定……并且没有唯一正确的使用方法。

And no, it doesn’t give you the facts you need; it gives you what you need to cope with not having those facts in the first place. The entire point is to help you do your best in an uncertain world.

不,它并不能为您提供所需的事实。 它为您提供了您需要解决的事情,而不是一开始就没有这些事实。 关键是要帮助您在不确定的世界中尽力而为。

To do that, you’ll have to start making assumptions.

为此,您必须开始进行假设。

接下来 (Next up)

In follow-up articles, I’ll write about where assumptions come from, how to pick “good” assumptions, and what it means to test an assumption. If these topics intrigue you, your retweets are my favorite motivation for writing.

在后续文章中,我将介绍假设的来源,如何选择“好的”假设以及检验假设的含义。 如果这些主题引起您的兴趣,您的转发是我最喜欢写的动机。

In the meantime, most of the links in this article take you to my other musings. Can’t choose? Try one of these:

同时,本文中的大多数链接都将您带入我的其他想法。 无法选择? 尝试以下方法之一:

翻译自: https://towardsdatascience.com/the-saddest-equation-in-data-science-e60e7819b63f

编译原理 数据流方程

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/391796.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

@ConTrollerAdvice的使用

ConTrollerAdvice&#xff0c;从名字上面看是控制器增强的意思。 在javaDoc写到/*** Indicates the annotated class assists a "Controller".** <p>Serves as a specialization of {link Component Component}, allowing for* implementation classes to be a…

Mybatis—注解开发

Mybatis的注解开发 MyBatis的常用注解 这几年来注解开发越来越流行&#xff0c;Mybatis也可以使用注解开发方式&#xff0c;这样我们就可以减少编写Mapper映射文件了。 Insert&#xff1a;实现新增 Update&#xff1a;实现更新 Delete&#xff1a;实现删除 Select&#x…

道路工程结构计算软件_我从软件工程到产品管理的道路

道路工程结构计算软件by Sari Harrison莎莉哈里森(Sari Harrison) 我从软件工程到产品管理的道路 (My path from software engineering to product management) 以及一些有关如何自己做的建议 (And some advice on how to do it yourself) I am often asked how to make the m…

Vue 指令

下面列举VUE的HTML页面模板指令&#xff0c;并进行分别练习。 1. templates 2. v-if, v-for <div idapp><ol><li v-for"todo in todos>{{ todo.text}}</li></ol> </div><script>app new Vue({ el: #app, data: { return…

iOS-FMDB

2019独角兽企业重金招聘Python工程师标准>>> #import <Foundation/Foundation.h> #import <FMDatabase.h> #import "MyModel.h"interface FMDBManager : NSObject {FMDatabase *_dataBase; }(instancetype)shareInstance;- (BOOL)insert:(MyM…

解决朋友圈压缩_朋友中最有趣的朋友[已解决]

解决朋友圈压缩We live in uncertain times.我们生活在不确定的时代。 We don’t know when we’re going back to school or the office. We don’t know when we’ll be able to sit inside at a restaurant. We don’t even know when we’ll be able to mosh at a Korn co…

西安项目分析

西安物流 西安高考补习 西安艺考 转载于:https://www.cnblogs.com/wpxuexi/p/7294269.html

MapServer应用开发平台示例

MapServer为当前开源WebGIS的应用代表&#xff0c;在西方社会应用面极为广泛&#xff0c;现介绍几个基于它的开源应用平台。 1.GeoMOOSE GeoMoose is a Web Client Javascript Framework for displaying distributed cartographic data. Among its many strengths, it can hand…

leetcode 995. K 连续位的最小翻转次数(贪心算法)

在仅包含 0 和 1 的数组 A 中&#xff0c;一次 K 位翻转包括选择一个长度为 K 的&#xff08;连续&#xff09;子数组&#xff0c;同时将子数组中的每个 0 更改为 1&#xff0c;而每个 1 更改为 0。 返回所需的 K 位翻转的最小次数&#xff0c;以便数组没有值为 0 的元素。如果…

kotlin数据库_如何在Kotlin应用程序中使用Xodus数据库

kotlin数据库I want to show you how to use one of my favorite database choices for Kotlin applications. Namely, Xodus. Why do I like using Xodus for Kotlin applications? Well, here are a couple of its selling points:我想向您展示如何在Kotlin应用程序中使用我…

使用route add添加路由,使两个网卡同时访问内外网

route add命令格式&#xff1a;route [-f] [-p] [Command] [Destination] [mask Netmask] [Gateway] [metric Metric] [if Interface] 通过配置电脑的静态路由来实现同时访问内外网的。电脑的网络IP配置不用变&#xff0c;两个网卡都按照正常配置&#xff08;都配置IP地址、子网…

基于JavaConfig配置的Spring MVC的构建

上一篇讲了基于XML配置的构建&#xff0c;这一篇讲一讲基于JavaConfig的构建。为什么要写这篇文章&#xff0c;因为基于xml配置的构建&#xff0c;本人认为很麻烦&#xff0c;要写一堆的配置&#xff0c;不够简洁&#xff0c;而基于JavacConfig配置的构建符合程序员的编码习惯&…

pymc3 贝叶斯线性回归_使用PyMC3进行贝叶斯媒体混合建模,带来乐趣和收益

pymc3 贝叶斯线性回归Michael Johns, Zhenyu Wang, Bruno Dupont, and Luca Fiaschi迈克尔约翰斯&#xff0c;王振宇&#xff0c;布鲁诺杜邦和卢卡菲亚斯基 “If you can’t measure it, you can’t manage it, or fix it”“如果无法衡量&#xff0c;就无法管理或修复它” –…

webkit中对incomplete type指针的处理技巧

近日在研究webkit的时候发现了一个函数 template<typename T> inline void deleteOwnedPtr(T* ptr) {typedef char known[sizeof(T) ? 1 : -1];if(sizeof(known))delete ptr; } 一开始对这个函数非常费解&#xff0c;为什么作者不直接 delete ptr; 通过上stackoverflow提…

leetcode 1004. 最大连续1的个数 III(滑动窗口)

给定一个由若干 0 和 1 组成的数组 A&#xff0c;我们最多可以将 K 个值从 0 变成 1 。 返回仅包含 1 的最长&#xff08;连续&#xff09;子数组的长度。 示例 1&#xff1a; 输入&#xff1a;A [1,1,1,0,0,0,1,1,1,1,0], K 2 输出&#xff1a;6 解释&#xff1a; [1,1,1…

我如何找到工作并找到理想的工作

By Julius Zerwick朱利叶斯泽威克(Julius Zerwick) This article is about how I went through my job hunt for a full time position as a software engineer in New York City and ended up with my dream job. I had spent two years building my skills and had aspirati…

synchronized 与 Lock 的那点事

synchronized 与 Lock 的那点事 最近在做一个监控系统&#xff0c;该系统主要包括对数据实时分析和存储两个部分&#xff0c;由于并发量比较高&#xff0c;所以不可避免的使用到了一些并发的知识。为了实现这些要求&#xff0c;后台使用一个队列作为缓存&#xff0c;对于请求只…

ols线性回归_普通最小二乘[OLS]方法使用于机器学习的简单线性回归变得容易

ols线性回归Hello Everyone!大家好&#xff01; I am super excited to be writing another article after a long time since my previous article was published.自从上一篇文章发表很长时间以来&#xff0c;我很高兴能写另一篇文章。 A Simple Linear Regression [SLR] is…

ubuntu安装配置jdk

先去 Oracle下载Linux下的JDK压缩包&#xff0c;我下载的是jdk-7u4-linux-i586.tar.gz文件&#xff0c;下好后直接解压Step1:# 将解压好的jdk1.7.0_04文件夹用最高权限复制到/usr/lib/jvm目录里sudo cp -r ~/jdk1.7.0_04/ /usr/lib/jvm/Step2:# 配置环境变量sudo gedit ~/.prof…

leetcode 697. 数组的度(hashmap)

给定一个非空且只包含非负数的整数数组 nums&#xff0c;数组的度的定义是指数组里任一元素出现频数的最大值。 你的任务是在 nums 中找到与 nums 拥有相同大小的度的最短连续子数组&#xff0c;返回其长度。 示例 1&#xff1a; 输入&#xff1a;[1, 2, 2, 3, 1] 输出&…