mongodb仲裁者_真理的仲裁者

mongodb仲裁者

Coming out of college with a background in mathematics, I fell upward into the rapidly growing field of data analytics. It wasn’t until years later that I realized the incredible power that comes with the position. As Uncle Ben told Peter Parker (aka Spiderman), “With great power, comes great responsibility”. The proverb echoed by Uncle Ben perfectly sums up an unspoken reality for data professionals of all levels and types. You have to wonder if Peter Parker’s real superpower was data expertise. Unlike Spiderman, our enemies are not quite as obvious as a flying green monster. As a data professional, we must remain vigilant on topics such as data privacy, algorithmic biases, and presenting information objectively.

从大学毕业并拥有数学背景后,我就进入了快速增长的数据分析领域。 直到几年后,我才意识到该职位所具有的强大力量。 正如本叔叔对彼得·帕克(又名蜘蛛侠)说的那样:“能力越大,责任就越大”。 本叔叔回响的谚语完美地概括了所有级别和类型的数据专业人员一个不言而喻的现实。 您必须怀疑Peter Parker的真正超级能力是否是数据专业知识。 与蜘蛛侠不同,我们的敌人并不像飞行的绿色怪物那么明显。 作为数据专业人员,我们必须保持警惕,例如数据隐私,算法偏见和客观地呈现信息。

政府中的数据伦理 (Data Ethics in the Government)

My first encounter with sensitive data came at the U.S. Census Bureau back in 2016. My team was responsible for compiling and disseminating the U.S International Trade in Goods and Services report each month. The reports show how much the U.S. imports and exports various commodities with other countries. To the average person, this might not impact their lives, but to an investor, this information is incredibly valuable.

我第一次接触敏感数据是在2016年的美国人口普查局。我的团队负责每月编制和发布《美国国际商品和服务贸易报告》 。 报告显示,美国与其他国家进出口了多少商品。 对于普通人来说,这可能不会影响他们的生活,但对投资者而言,此信息非常有价值。

Being an ambitious employee, I wanted to add a little pizzazz to their webpage. My plan was to display a fancy, Tableau chart (yes, they were fancy back then) relating to the Trans-Pacific-Partnership. This would be the equivalent of a news agency reporting the relevant facts for any major economic event. Sadly, I was shut down. I was told that the Census could not appear biased on the new free trade agreement. At the time, I did not quite understand. However, looking back on it, I can fully appreciate the sensitivity. The Census controls incredibly valuable information that could have wide implications on the economy and its people. In order to be effective, it must remain non-partisan. Otherwise, the numbers will become politicized and then the truth becomes questionable.

作为一个雄心勃勃的员工,我想在他们的网页上加些小气。 我的计划是要显示一张与跨太平洋伙伴关系有关的Tableau图表(是的,当时它们很漂亮)。 这相当于新闻机构报道任何重大经济事件的有关事实。 可悲的是,我被关闭了。 有人告诉我,人口普查似乎不会对新的自由贸易协定产生偏见。 当时,我不太了解。 但是,回顾它,我可以完全理解它的敏感性。 人口普查控制着极其宝贵的信息,这些信息可能对经济及其人民产生广泛影响。 为了有效,它必须保持无党派。 否则,数字将被政治化,然后真相就变得可疑。

算法偏向 (Algorithmic Biases)

“When a measure becomes a target, it ceases to be a good measure”- Goodhart’s Law

“当一项措施成为目标时,它就不再是一项好的措施”-古德哈特定律

I see the above statement quoted often, yet KPIs remain incredibly common in organizations. One of my previous digital transformation projects required my department to adopt a new CRM (Contact Relationship Management) software. With this new system, leadership requested KPIs to measure participation in the tool. Anyone who has installed a new system knows the challenges of culture change and adoption. The software and the process must go hand-in-hand to be successful. Therefore, we needed the best method for measuring and incentivizing user activity in the CRM.

我看到上面的陈述经常被引用,但是KPI在组织中仍然非常普遍。 我以前的数字转换项目之一要求我的部门采用新的CRM(联系关系管理)软件。 通过此新系统,领导层要求KPI衡量该工具的参与程度。 任何安装了新系统的人都知道文化变革和采用的挑战。 该软件和过程必须齐头并进,才能成功。 因此,我们需要衡量和激励CRM中用户活动的最佳方法。

In our system, users were expected to enter and update potential public policies that would impact the organization. We had users responsible for different regions around the globe. Some regions, such as Europe, had more policy activity than other regions. Some regions had more users to help keep the records up to date. Each region could vary in its importance from a financial perspective. In our CRM, you could measure logins, views, edits, added records, deleted records, and more. Each metric had an inherent bias in the calculation. To simplify things, we will assume that we can only calculate metrics at the region level and this will be on a biweekly basis. Let’s take a look at some of the options and their implications.

在我们的系统中,希望用户输入并更新可能影响组织的潜在公共策略。 我们有负责全球不同地区的用户。 欧洲等某些地区的政策活动比其他地区更多。 一些地区有更多的用户来帮助使记录保持最新。 从财务角度看,每个地区的重要性可能会有所不同。 在我们的CRM中,您可以衡量登录,视图,编辑,添加的记录,删除的记录等。 每个指标在计算中都有一个固有的偏差。 为了简化起见,我们假设我们只能在区域级别上计算指标,并且这将是每两周一次。 让我们看一些选项及其含义。

When designing the appropriate KPIs for this new system, there were biases, assumptions, and incentives at play no matter which metric we chose. While mindlessly scrolling through Twitter, I recently came upon a quote that perfectly sums up the above process.

在为该新系统设计适当的KPI时,无论我们选择哪种度量标准,都存在偏差,假设和激励因素。 在漫不经心地浏览Twitter时,我最近引述了一个引言,它完美地总结了上述过程。

“The very act of turning something into a number is an assumption.”- Kareem Carr

“将某物转化为数字的行为只是一种假设。”- Kareem Carr

诚信是必须的 (Integrity is a Must)

A few months back, I was working with a colleague who needed some assistance with the analysis and presentation of information that would be available to the public. As soon as you hear the words, “public data”, any data professional’s mind will immediately gravitate towards data security. Fortunately, this was not an issue.

几个月前,我正在与一位同事合作,他需要一些帮助来分析和呈现可供公众使用的信息。 一旦您听到“公共数据”一词,任何数据专业人士的想法都会立即趋向于数据安全。 幸运的是,这不是问题。

My colleague proceeded to explain what data we had (i.e. very little) and the purpose of the presentation. After some exploration, I realized that we could not provide any summary statistics at the requested level of detail. We could only provide an estimate of the overall total. This was insufficient for their project. There was pressure to “make some magic happen”; especially, if I wanted to impress a few senior level colleagues. The short term would yield a reputational boost for myself, but over the long term, it risks significant reputational damage for the organization (and myself).

我的同事开始解释我们拥有的数据(即很少)以及演示的目的。 经过一番探索,我意识到我们无法提供所要求的详细级别的任何摘要统计信息。 我们只能提供总体估算值。 这对于他们的项目是不够的。 迫于压力“要使一些魔术发生”; 特别是如果我想打动一些高级同事。 短期将为自己带来声誉提升,但从长远来看,它将给组织(和我自己)带来重大声誉损失。

Image for post
UnsplashUnsplash

最后的想法 (Final Thoughts)

As data is becoming seamlessly woven into every process, there come ethical risks that aren’t talked about enough. When data professionals start implementing black-box algorithms into your decision-making processes, it will be too late. Organizations need to instill a culture of ethical, data-driven decision making from the top.

随着数据无缝地融入到每个流程中,随之而来的道德风险还没有得到足够的重视。 当数据专业人员开始在您的决策过程中实施黑盒算法时,为时已晚。 组织需要从高层灌输一种道德的,由数据驱动的决策文化。

As a data professional, you will frequently find yourself at the center of difficult decisions, especially, if you work with colleagues who struggle with data and numbers. Your job is to bridge the gap between their subject matter expertise and the appropriate analysis or presentation of the information. In that gap, lies an opportunistic, invisible enemy who wants you to take the shortcut. Follow in Spiderman’s footsteps and proceed with integrity.

作为数据专业人员,您经常会发现自己处于困难决策的中心,尤其是与与数据和数字纠缠不清的同事一起工作时。 您的工作是弥合他们的主题专业知识和适当的信息分析或表示之间的鸿沟。 在那个空白中,是一个机会主义的,看不见的敌人,他想让你走捷径。 跟随蜘蛛侠的脚步,继续诚信。

~ The Data Generalist

〜 数据通才

翻译自: https://towardsdatascience.com/the-arbiters-of-truth-d97ce1a4e4a6

mongodb仲裁者

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389423.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

优化 回归_使用回归优化产品价格

优化 回归应用数据科学 (Applied data science) Price and quantity are two fundamental measures that determine the bottom line of every business, and setting the right price is one of the most important decisions a company can make. Under-pricing hurts the co…

Node.js——异步上传文件

前台代码 submit() {var file this.$refs.fileUpload.files[0];var formData new FormData();formData.append("file", file);formData.append("username", this.username);formData.append("password", this.password);axios.post("http…

用 JavaScript 的方式理解递归

原文地址 1. 递归是啥? 递归概念很简单,“自己调用自己”(下面以函数为例)。 在分析递归之前,需要了解下 JavaScript 中“压栈”(call stack) 概念。 2. 压栈与出栈 栈是什么?可以理解是在内存…

PyTorch官方教程中文版:Pytorch之图像篇

微调基于 torchvision 0.3的目标检测模型 """ 为数据集编写类 """ import os import numpy as np import torch from PIL import Imageclass PennFudanDataset(object):def __init__(self, root, transforms):self.root rootself.transforms …

大数据数据科学家常用面试题_进行数据科学工作面试

大数据数据科学家常用面试题During my time as a Data Scientist, I had the chance to interview my fair share of candidates for data-related roles. While doing this, I started noticing a pattern: some kinds of (simple) mistakes were overwhelmingly frequent amo…

scrapy模拟模拟点击_模拟大流行

scrapy模拟模拟点击复杂系统 (Complex Systems) In our daily life, we encounter many complex systems where individuals are interacting with each other such as the stock market or rush hour traffic. Finding appropriate models for these complex systems may give…

公司想申请网易企业电子邮箱,怎么样?

不论公司属于哪个行业,选择企业邮箱,交互界面友好度、稳定性、安全性都是选择邮箱所必须考虑的因素。网易企业邮箱邮箱方面已有21年的运营经验,是国内资历最高的电子邮箱,在各个方面都非常成熟完善。 从交互界面友好度来看&#x…

莫烦Matplotlib可视化第二章基本使用代码学习

基本用法 import matplotlib.pyplot as plt import numpy as np""" 2.1基本用法 """ # x np.linspace(-1,1,50) #[-1,1]50个点 # #y 2*x 1 # # y x**2 # plt.plot(x,y) #注意:x,y顺序不能反 # plt.show()"""…

vue.js python_使用Python和Vue.js自动化报告过程

vue.js pythonIf your organization does not have a data visualization solution like Tableau or PowerBI nor means to host a server to deploy open source solutions like Dash then you are probably stuck doing reports with Excel or exporting your notebooks.如果…

plsql中导入csvs_在命令行中使用sql分析csvs

plsql中导入csvsIf you are familiar with coding in SQL, there is a strong chance you do it in PgAdmin, MySQL, BigQuery, SQL Server, etc. But there are times you just want to use your SQL skills for quick analysis on a small/medium sized dataset.如果您熟悉SQ…

第十八篇 Linux环境下常用软件安装和使用指南

提醒:如果之后要安装virtualenvwrapper的话,可以直接跳到安装virtualenvwrapper的方法,而不需要先安装好virtualenv安装virtualenv和生成虚拟环境安装virtualenv:yum -y install python-virtualenv生成虚拟环境:先切换…

莫烦Matplotlib可视化第三章画图种类代码学习

3.1散点图 import matplotlib.pyplot as plt import numpy as npn 1024 X np.random.normal(0,1,n) Y np.random.normal(0,1,n) T np.arctan2(Y,X) #用于计算颜色plt.scatter(X,Y,s75,cT,alpha0.5)#alpha是透明度 #plt.scatter(np.arange(5),np.arange(5)) #一条线的散点…

计算机科学必读书籍_5篇关于数据科学家的产品分类必读文章

计算机科学必读书籍Product categorization/product classification is the organization of products into their respective departments or categories. As well, a large part of the process is the design of the product taxonomy as a whole.产品分类/产品分类是将产品…

es6解决回调地狱问题

本文摘抄自阮一峰老师的 http://es6.ruanyifeng.com/#docs/generator-async 异步 所谓"异步",简单说就是一个任务不是连续完成的,可以理解成该任务被人为分成两段,先执行第一段,然后转而执行其他任务,等做好…

交替最小二乘矩阵分解_使用交替最小二乘矩阵分解与pyspark建立推荐系统

交替最小二乘矩阵分解pyspark上的动手推荐系统 (Hands-on recommender system on pyspark) Recommender System is an information filtering tool that seeks to predict which product a user will like, and based on that, recommends a few products to the users. For ex…

莫烦Matplotlib可视化第四章多图合并显示代码学习

4.1Subplot多合一显示 import matplotlib.pyplot as plt import numpy as npplt.figure() """ 每个图占一个位置 """ # plt.subplot(2,2,1) #将画板分成两行两列,选取第一个位置,可以去掉逗号 # plt.plot([0,1],[0,1]) # # plt.su…

python 网页编程_通过Python编程检索网页

python 网页编程The internet and the World Wide Web (WWW), is probably the most prominent source of information today. Most of that information is retrievable through HTTP. HTTP was invented originally to share pages of hypertext (hence the name Hypertext T…

Python+Selenium自动化篇-5-获取页面信息

1.获取页面title title:获取当前页面的标题显示的字段from selenium import webdriver import time browser webdriver.Chrome() browser.get(https://www.baidu.com) #打印网页标题 print(browser.title) #输出内容:百度一下,你就知道 2.…

火种 ctf_分析我的火种数据

火种 ctfOriginally published at https://www.linkedin.com on March 27, 2020 (data up to date as of March 20, 2020).最初于 2020年3月27日 在 https://www.linkedin.com 上 发布 (数据截至2020年3月20日)。 Day 3 of social distancing.社会疏离的第三天。 As I sit on…

莫烦Matplotlib可视化第五章动画代码学习

5.1 Animation 动画 import numpy as np import matplotlib.pyplot as plt from matplotlib import animationfig,ax plt.subplots()x np.arange(0,2*np.pi,0.01) line, ax.plot(x,np.sin(x))def animate(i):line.set_ydata(np.sin(xi/10))return line,def init():line.set…