mongodb仲裁者_真理的仲裁者

mongodb仲裁者

Coming out of college with a background in mathematics, I fell upward into the rapidly growing field of data analytics. It wasn’t until years later that I realized the incredible power that comes with the position. As Uncle Ben told Peter Parker (aka Spiderman), “With great power, comes great responsibility”. The proverb echoed by Uncle Ben perfectly sums up an unspoken reality for data professionals of all levels and types. You have to wonder if Peter Parker’s real superpower was data expertise. Unlike Spiderman, our enemies are not quite as obvious as a flying green monster. As a data professional, we must remain vigilant on topics such as data privacy, algorithmic biases, and presenting information objectively.

从大学毕业并拥有数学背景后,我就进入了快速增长的数据分析领域。 直到几年后,我才意识到该职位所具有的强大力量。 正如本叔叔对彼得·帕克(又名蜘蛛侠)说的那样:“能力越大,责任就越大”。 本叔叔回响的谚语完美地概括了所有级别和类型的数据专业人员一个不言而喻的现实。 您必须怀疑Peter Parker的真正超级能力是否是数据专业知识。 与蜘蛛侠不同,我们的敌人并不像飞行的绿色怪物那么明显。 作为数据专业人员,我们必须保持警惕,例如数据隐私,算法偏见和客观地呈现信息。

政府中的数据伦理 (Data Ethics in the Government)

My first encounter with sensitive data came at the U.S. Census Bureau back in 2016. My team was responsible for compiling and disseminating the U.S International Trade in Goods and Services report each month. The reports show how much the U.S. imports and exports various commodities with other countries. To the average person, this might not impact their lives, but to an investor, this information is incredibly valuable.

我第一次接触敏感数据是在2016年的美国人口普查局。我的团队负责每月编制和发布《美国国际商品和服务贸易报告》 。 报告显示,美国与其他国家进出口了多少商品。 对于普通人来说,这可能不会影响他们的生活,但对投资者而言,此信息非常有价值。

Being an ambitious employee, I wanted to add a little pizzazz to their webpage. My plan was to display a fancy, Tableau chart (yes, they were fancy back then) relating to the Trans-Pacific-Partnership. This would be the equivalent of a news agency reporting the relevant facts for any major economic event. Sadly, I was shut down. I was told that the Census could not appear biased on the new free trade agreement. At the time, I did not quite understand. However, looking back on it, I can fully appreciate the sensitivity. The Census controls incredibly valuable information that could have wide implications on the economy and its people. In order to be effective, it must remain non-partisan. Otherwise, the numbers will become politicized and then the truth becomes questionable.

作为一个雄心勃勃的员工,我想在他们的网页上加些小气。 我的计划是要显示一张与跨太平洋伙伴关系有关的Tableau图表(是的,当时它们很漂亮)。 这相当于新闻机构报道任何重大经济事件的有关事实。 可悲的是,我被关闭了。 有人告诉我,人口普查似乎不会对新的自由贸易协定产生偏见。 当时,我不太了解。 但是,回顾它,我可以完全理解它的敏感性。 人口普查控制着极其宝贵的信息,这些信息可能对经济及其人民产生广泛影响。 为了有效,它必须保持无党派。 否则,数字将被政治化,然后真相就变得可疑。

算法偏向 (Algorithmic Biases)

“When a measure becomes a target, it ceases to be a good measure”- Goodhart’s Law

“当一项措施成为目标时,它就不再是一项好的措施”-古德哈特定律

I see the above statement quoted often, yet KPIs remain incredibly common in organizations. One of my previous digital transformation projects required my department to adopt a new CRM (Contact Relationship Management) software. With this new system, leadership requested KPIs to measure participation in the tool. Anyone who has installed a new system knows the challenges of culture change and adoption. The software and the process must go hand-in-hand to be successful. Therefore, we needed the best method for measuring and incentivizing user activity in the CRM.

我看到上面的陈述经常被引用,但是KPI在组织中仍然非常普遍。 我以前的数字转换项目之一要求我的部门采用新的CRM(联系关系管理)软件。 通过此新系统,领导层要求KPI衡量该工具的参与程度。 任何安装了新系统的人都知道文化变革和采用的挑战。 该软件和过程必须齐头并进,才能成功。 因此,我们需要衡量和激励CRM中用户活动的最佳方法。

In our system, users were expected to enter and update potential public policies that would impact the organization. We had users responsible for different regions around the globe. Some regions, such as Europe, had more policy activity than other regions. Some regions had more users to help keep the records up to date. Each region could vary in its importance from a financial perspective. In our CRM, you could measure logins, views, edits, added records, deleted records, and more. Each metric had an inherent bias in the calculation. To simplify things, we will assume that we can only calculate metrics at the region level and this will be on a biweekly basis. Let’s take a look at some of the options and their implications.

在我们的系统中,希望用户输入并更新可能影响组织的潜在公共策略。 我们有负责全球不同地区的用户。 欧洲等某些地区的政策活动比其他地区更多。 一些地区有更多的用户来帮助使记录保持最新。 从财务角度看,每个地区的重要性可能会有所不同。 在我们的CRM中,您可以衡量登录,视图,编辑,添加的记录,删除的记录等。 每个指标在计算中都有一个固有的偏差。 为了简化起见,我们假设我们只能在区域级别上计算指标,并且这将是每两周一次。 让我们看一些选项及其含义。

When designing the appropriate KPIs for this new system, there were biases, assumptions, and incentives at play no matter which metric we chose. While mindlessly scrolling through Twitter, I recently came upon a quote that perfectly sums up the above process.

在为该新系统设计适当的KPI时,无论我们选择哪种度量标准,都存在偏差,假设和激励因素。 在漫不经心地浏览Twitter时,我最近引述了一个引言,它完美地总结了上述过程。

“The very act of turning something into a number is an assumption.”- Kareem Carr

“将某物转化为数字的行为只是一种假设。”- Kareem Carr

诚信是必须的 (Integrity is a Must)

A few months back, I was working with a colleague who needed some assistance with the analysis and presentation of information that would be available to the public. As soon as you hear the words, “public data”, any data professional’s mind will immediately gravitate towards data security. Fortunately, this was not an issue.

几个月前,我正在与一位同事合作,他需要一些帮助来分析和呈现可供公众使用的信息。 一旦您听到“公共数据”一词,任何数据专业人士的想法都会立即趋向于数据安全。 幸运的是,这不是问题。

My colleague proceeded to explain what data we had (i.e. very little) and the purpose of the presentation. After some exploration, I realized that we could not provide any summary statistics at the requested level of detail. We could only provide an estimate of the overall total. This was insufficient for their project. There was pressure to “make some magic happen”; especially, if I wanted to impress a few senior level colleagues. The short term would yield a reputational boost for myself, but over the long term, it risks significant reputational damage for the organization (and myself).

我的同事开始解释我们拥有的数据(即很少)以及演示的目的。 经过一番探索,我意识到我们无法提供所要求的详细级别的任何摘要统计信息。 我们只能提供总体估算值。 这对于他们的项目是不够的。 迫于压力“要使一些魔术发生”; 特别是如果我想打动一些高级同事。 短期将为自己带来声誉提升,但从长远来看,它将给组织(和我自己)带来重大声誉损失。

Image for post
UnsplashUnsplash

最后的想法 (Final Thoughts)

As data is becoming seamlessly woven into every process, there come ethical risks that aren’t talked about enough. When data professionals start implementing black-box algorithms into your decision-making processes, it will be too late. Organizations need to instill a culture of ethical, data-driven decision making from the top.

随着数据无缝地融入到每个流程中,随之而来的道德风险还没有得到足够的重视。 当数据专业人员开始在您的决策过程中实施黑盒算法时,为时已晚。 组织需要从高层灌输一种道德的,由数据驱动的决策文化。

As a data professional, you will frequently find yourself at the center of difficult decisions, especially, if you work with colleagues who struggle with data and numbers. Your job is to bridge the gap between their subject matter expertise and the appropriate analysis or presentation of the information. In that gap, lies an opportunistic, invisible enemy who wants you to take the shortcut. Follow in Spiderman’s footsteps and proceed with integrity.

作为数据专业人员,您经常会发现自己处于困难决策的中心,尤其是与与数据和数字纠缠不清的同事一起工作时。 您的工作是弥合他们的主题专业知识和适当的信息分析或表示之间的鸿沟。 在那个空白中,是一个机会主义的,看不见的敌人,他想让你走捷径。 跟随蜘蛛侠的脚步,继续诚信。

~ The Data Generalist

〜 数据通才

翻译自: https://towardsdatascience.com/the-arbiters-of-truth-d97ce1a4e4a6

mongodb仲裁者

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389423.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

优化 回归_使用回归优化产品价格

优化 回归应用数据科学 (Applied data science) Price and quantity are two fundamental measures that determine the bottom line of every business, and setting the right price is one of the most important decisions a company can make. Under-pricing hurts the co…

大数据数据科学家常用面试题_进行数据科学工作面试

大数据数据科学家常用面试题During my time as a Data Scientist, I had the chance to interview my fair share of candidates for data-related roles. While doing this, I started noticing a pattern: some kinds of (simple) mistakes were overwhelmingly frequent amo…

scrapy模拟模拟点击_模拟大流行

scrapy模拟模拟点击复杂系统 (Complex Systems) In our daily life, we encounter many complex systems where individuals are interacting with each other such as the stock market or rush hour traffic. Finding appropriate models for these complex systems may give…

vue.js python_使用Python和Vue.js自动化报告过程

vue.js pythonIf your organization does not have a data visualization solution like Tableau or PowerBI nor means to host a server to deploy open source solutions like Dash then you are probably stuck doing reports with Excel or exporting your notebooks.如果…

plsql中导入csvs_在命令行中使用sql分析csvs

plsql中导入csvsIf you are familiar with coding in SQL, there is a strong chance you do it in PgAdmin, MySQL, BigQuery, SQL Server, etc. But there are times you just want to use your SQL skills for quick analysis on a small/medium sized dataset.如果您熟悉SQ…

计算机科学必读书籍_5篇关于数据科学家的产品分类必读文章

计算机科学必读书籍Product categorization/product classification is the organization of products into their respective departments or categories. As well, a large part of the process is the design of the product taxonomy as a whole.产品分类/产品分类是将产品…

交替最小二乘矩阵分解_使用交替最小二乘矩阵分解与pyspark建立推荐系统

交替最小二乘矩阵分解pyspark上的动手推荐系统 (Hands-on recommender system on pyspark) Recommender System is an information filtering tool that seeks to predict which product a user will like, and based on that, recommends a few products to the users. For ex…

python 网页编程_通过Python编程检索网页

python 网页编程The internet and the World Wide Web (WWW), is probably the most prominent source of information today. Most of that information is retrievable through HTTP. HTTP was invented originally to share pages of hypertext (hence the name Hypertext T…

火种 ctf_分析我的火种数据

火种 ctfOriginally published at https://www.linkedin.com on March 27, 2020 (data up to date as of March 20, 2020).最初于 2020年3月27日 在 https://www.linkedin.com 上 发布 (数据截至2020年3月20日)。 Day 3 of social distancing.社会疏离的第三天。 As I sit on…

data studio_面向营销人员的Data Studio —报表指南

data studioIn this guide, we describe both the theoretical and practical sides of reporting with Google Data Studio. You can use this guide as a comprehensive cheat sheet in your everyday marketing.在本指南中,我们描述了使用Google Data Studio进行…

人流量统计系统介绍_统计介绍

人流量统计系统介绍Its very important to know about statistics . May you be a from a finance background, may you be data scientist or a data analyst, life is all about mathematics. As per the wiki definition “Statistics is the discipline that concerns the …

乐高ev3 读取外部数据_数据就是新乐高

乐高ev3 读取外部数据When I was a kid, I used to love playing with Lego. My brother and I built almost all kinds of stuff with Lego — animals, cars, houses, and even spaceships. As time went on, our creations became more ambitious and realistic. There were…

图像灰度化与二值化

图像灰度化 什么是图像灰度化? 图像灰度化并不是将单纯的图像变成灰色,而是将图片的BGR各通道以某种规律综合起来,使图片显示位灰色。 规律如下: 手动实现灰度化 首先我们采用手动灰度化的方式: 其思想就是&#…

分析citibike数据eda

数据科学 (Data Science) CitiBike is New York City’s famous bike rental company and the largest in the USA. CitiBike launched in May 2013 and has become an essential part of the transportation network. They make commute fun, efficient, and affordable — no…

上采样(放大图像)和下采样(缩小图像)(最邻近插值和双线性插值的理解和实现)

上采样和下采样 什么是上采样和下采样? • 缩小图像(或称为下采样(subsampled)或降采样(downsampled))的主要目的有 两个:1、使得图像符合显示区域的大小;2、生成对应图…

r语言绘制雷达图_用r绘制雷达蜘蛛图

r语言绘制雷达图I’ve tried several different types of NBA analytical articles within my readership who are a group of true fans of basketball. I found that the most popular articles are not those with state-of-the-art machine learning technologies, but tho…

java 分裂数字_分裂的补充:超越数字,打印物理可视化

java 分裂数字As noted in my earlier Nightingale writings, color harmony is the process of choosing colors on a Color Wheel that work well together in the composition of an image. Today, I will step further into color theory by discussing the Split Compleme…

结构化数据建模——titanic数据集的模型建立和训练(Pytorch版)

本文参考《20天吃透Pytorch》来实现titanic数据集的模型建立和训练 在书中理论的同时加入自己的理解。 一,准备数据 数据加载 titanic数据集的目标是根据乘客信息预测他们在Titanic号撞击冰山沉没后能否生存。 结构化数据一般会使用Pandas中的DataFrame进行预处理…

比赛,幸福度_幸福与生活满意度

比赛,幸福度What is the purpose of life? Is that to be happy? Why people go through all the pain and hardship? Is it to achieve happiness in some way?人生的目的是什么? 那是幸福吗? 人们为什么要经历所有的痛苦和磨难? 是通过…

带有postgres和jupyter笔记本的Titanic数据集

PostgreSQL is a powerful, open source object-relational database system with over 30 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance.PostgreSQL是一个功能强大的开源对象关系数据库系统&am…