算法偏见是什么
在上一篇文章中,我们展示了当数据将情绪从动作中剥离时会发生什么 (In the last article, we showed what happens when data strip emotions out of an action)
In Part 1 of this series, we argued that data can turn anyone into a psychopath, and though that’s an extreme way of looking at things, it holds a certain amount of truth.
在本系列的第1部分中 ,我们认为数据可以使任何人都变得精神病 ,尽管这是一种看待事物的极端方法,但它具有一定的真理性。
It’s natural to cheer at a newspaper headline proclaiming the downfall of a distant enemy stronghold, but is it ok the cheer while actually watching thousands of civilians inside that city die gruesome deaths?
在报纸头条上宣称一个遥远的敌人要塞的倒塌是很自然的,但是当实际上看着这座城市中成千上万的平民死于可怕的死亡时,这种欢呼是可以的吗?
No, it’s not.
不,这不对。
But at the same time―if you cheer the headline showing a distant military victory, it means you’re a human, and not necessarily a psychopath.
但同时-如果您为头条新闻打招呼,表明遥遥领先的军事胜利,那意味着您是人,不一定是精神病患者。
The abstracted data of that headline strips the emotional currency of the event, and induces a psychopathic response from you.
该标题的抽象数据剥夺了事件的情感色彩,并引起了您的精神病性React。
That’s what headlines do, and they can induce a callous response from most anyone.
头条新闻就是这样做的,它们可以引起大多数人的冷酷回应。
So if data can induce a state of momentary psychopathy, what happens when you combine data and algorithms?
因此,如果数据可能导致短暂的心理状态,那么将数据和算法结合起来会发生什么?
Data can’t feel, and algorithms can’t feel either.
数据感觉不到,算法也感觉不到。
Is that a state of unfeeling multiplied by two?
那是一种无情的状态乘以2吗?
Or is it a state of unfeeling squared?
还是处于无情的平方状态?
Whatever the case, let’s not talk about the momentary psychopathy abetted by these unfeeling elements.
无论如何,我们不要谈论这些不情愿的因素助长的暂时性精神病。
Let’s talk about bias.
让我们谈谈偏见 。
Because if left unchecked, unfeeling algorithms can and will lead anyone into a state of bias, including you.
因为如果任其发展,毫无情理的算法会并且会导致任何人(包括您在内)进入偏见状态。
But before we try to understand algorithmic bias, we must take a moment to recognize how much we don’t understand our own algorithms.
但在此之前,我们试着去理解算法的偏见,我们必须花点时间认识到我们是多么喜欢和不理解我们自己的算法。
Yes, humanity makes algorithms, and humanity relies upon them countless times every day, but we don’t understand them.
是的,人类创造了算法,人类每天都依赖于它们无数次,但是我们不了解它们。
无论我们认为自己做了多少,我们都不再了解我们自己的算法 (We no longer understand our own algorithms, no matter how much we think we do)
At a high, high level, we could conceive of an algorithmic process as having three parts―an Input, the algorithm itself, and an Outcome.
在较高,较高的级别上,我们可以将算法过程想象为包含三个部分:输入,算法本身和结果。
But we are now far, far away from human-understandable algorithms like the Sieve of Eratosthenes, and though the above image might be great for an Introduction to Algorithms class―today’s algorithms can no longer be adequately described by the above three parts alone.
但是我们现在离人类可以理解的算法( 例如Eratosthenes的Sieve)还差得很远,尽管上面的图片对于算法入门课程来说可能很棒,但是仅靠以上三个部分就不能充分描述当今的算法 。
The tech writer Franklin Foer describes one of the reasons for this in his book World Without Mind: The Existential Threat of Big Tech―
科技作家富兰克林·富尔(Franklin Foer)在他的《 世界无思想:大科技的生存威胁》一书中描述了其中的原因之一-
Perhaps Facebook no longer fully understands its own tangle of algorithms — the code, all sixty million lines of it, is a palimpsest, where engineers add layer upon layer of new commands. (This is hardly a condition unique to Facebook. The Cornell University computer scientist Jon Kleinberg cowrote an essay that argued, “We have, perhaps for the first time ever, built machines we do not understand. . . . At some deep level we don’t even really understand how they’re producing the behavior we observe. This is the essence of their incomprehensibility.” What’s striking is that the “we” in that sentence refers to the creators of code.)
也许Facebook不再完全了解它自己的算法缠结-它的全部六千万行代码都是最简单的方法,工程师在其中添加了一层又一层的新命令。 (这几乎不是Facebook所独有的条件。康奈尔大学计算机科学家乔恩·克莱恩伯格(Jon Kleinberg)在一篇文章中写道:“也许我们有史以来第一次建造了我们不了解的机器……从某种程度上讲,我们不了解。 “甚至还没有真正理解他们如何产生我们观察到的行为。这是他们难以理解的本质。”令人惊讶的是,该句子中的“我们”是指代码的创建者。)
At the very least, the algorithmic codes that run our lives are palimpsests―documents that are originally written by one group of people, and then written over by another group, and then a third, and then a fourth―until there is no one expert on the code itself, or perhaps even one person who understands it.
最起码,运行我们的生活算法代码是被最初是由一组人写的,然后由另一组写在palimpsests的文档,然后第三个,然后第四直到没有一个专家代码本身,甚至可能是一个了解代码的人。
And these algorithmic palimpsests are millions of lines of code long, or even billions.
这些算法上的障碍是数百万行代码, 甚至数十亿行 。
Remember Mark Zuckerberg’s 2018 testimony before Congress?
还记得马克·扎克伯格(Mark Zuckerberg)在国会面前的2018年证词吗?
That was the testimony of an individual who didn’t have the faintest understanding about 99% of Facebook’s inner workings.
那是一个对Facebook 99%的内部运作不了解的人的证词。
Because no one does.
因为没有人做。
Larry Page and Sergey Brin don’t understand Google as a whole.
拉里·佩奇(Larry Page)和谢尔盖·布林(Sergey Brin)不太了解Google。
Because no one does.
因为没有人做。
And the algorithms that define our daily lives?
以及定义我们日常生活的算法?
No one understands them completely, nor does anyone understand the massive amounts of data that they take in.
没有人能完全理解它们,也没有人能理解他们吸收的海量数据。
So let’s update our algorithm diagram. We need to understand that there are more Inputs than we can understand, and that the algorithms themselves are black boxes.
因此,让我们更新算法图。 我们需要了解的是,输入比我们理解的要多,并且算法本身就是黑匣子。
So here is a slightly more accurate, yet still high-level view of what is happening with our algorithms.
因此,对于我们的算法正在发生的事情,这是一个稍微准确但仍是高级的视图。
Again―there are more Inputs than we can understand, going into a black-box algorithm we do not fully understand.
再说一遍–输入的数量超出了我们的理解,进入了一个我们还不完全了解的黑盒算法。
And this can lead to many things, including bias.
这会导致很多事情,包括偏见。
算法偏见的案例研究-一家公司被告知青睐曲棍球运动员Jared (A case study in algorithmic bias―a company is told to favor lacrosse players named Jared)
A company recently ran a hiring algorithm, and the intent of the algorithm was to eliminate bias in hiring processes.
一家公司最近运行了一种招聘算法,该算法的目的是消除招聘过程中的偏差。
The algorithm’s purpose was to find the best candidates.
该算法的目的是找到最佳候选者。
The company entered some training data into the algorithm based on past successful candidates, and then ran the algorithm again with a current group of candidates.
公司根据过去成功的候选人将一些训练数据输入到算法中,然后使用当前的候选人组再次运行该算法。
The algorithm, among other things, favored candidates named Jared that played lacrosse.
除其他因素外,该算法更喜欢打曲棍网兜球的候选人贾里德 。
The algorithmic Output was biased, but not in the way anyone expected.
算法输出有偏差,但没有人期望的那样。
How could this have happened?
这怎么可能发生?
算法不是富有同情心的,更不用说有感觉了,但是它们确实善于发现模式 (Algorithms are not compassionate, let alone sentient―but they are really good at finding patterns)
In the above case, the algorithm found a pattern within the training data that lacrosse players named Jared tend to be good hires.
在上述情况下,该算法在训练数据中发现了一种模式,即名为Jared的长曲棍球运动员往往是不错的录用者。
That’s a biased recommendation of course, and a faulty one.
当然,这是有偏见的建议,也是错误的建议。
Why did it occur?
为什么会发生?
Well, beyond us recognizing that we don’t understand the algorithm itself, we can cite thinkers like Dr. Nicol Turner Lee of Brookings, who explained on Noah Feldman’s Deep Background podcast that external sources of algorithmic bias are often manifold.
好吧,除了我们认识到我们不了解算法本身之外,我们还可以引用布鲁金斯大学的Nicol Turner Lee博士这样的思想家,他们在Noah Feldman的Deep Background播客中解释说,算法偏差的外部来源通常是多种多样的。
There might be bias in the training data, and quite often the data scientists who made the algorithm might be of a homogenous group, which might in turn encourage the algorithm to suggest the hiring of more candidates like themselves.
训练数据中可能存在偏差,并且制定该算法的数据科学家通常可能属于同一个群体,这反过来可能会鼓励该算法建议雇用更多像他们自己的候选人。
And of course, there is societal and systemic bias, which will inevitably work its way into an unfeeling, pattern-recognizing algorithm.
当然,存在社会和系统偏见,这不可避免地会变成一种毫无感觉的模式识别算法。
So to update our algorithm chart once again―
所以要再次更新我们的算法图
There are faint echoes of Jared and lacrosse somewhere in the Inputs, and we certainly see them in the Outputs.
输入中有些地方有Jared和曲棍网兜球的微弱回声,我们当然可以在输出中看到它们。
Of course, both the full scope of the Inputs and the algorithm itself remain a mystery.
当然,输入的全部范围和算法本身都是一个谜。
The only thing we know for sure is that if your name is Jared, and you played lacrosse, you will have an advantage.
我们唯一可以确定的是,如果您的名字叫Jared,并且打过曲棍球,那么您将获得优势。
这是一个幽默的例子,但是当赌注更高时会发生什么呢? (This was a humorous example―but what happens when the stakes are higher?)
Hiring algorithms are relatively low stakes in the grand scheme of things, especially considering that virtually any rational company would take steps to eliminate a penchant for lacrosse-playing Jareds from their hiring processes as soon as they could.
招聘算法在总体规划中的风险相对较低,尤其是考虑到几乎任何一家理性公司都将采取措施,尽可能消除对那些喜欢曲棍网兜球的Jared的雇用过程。
But what if the algorithm is meant to set credit rates?
但是,如果该算法用于设置信用利率怎么办?
What if the algorithm is meant to determine a bail amount?
如果该算法用于确定保释金该怎么办?
What if this algorithm leads to a jail term for someone who should have been sent home instead?
如果此算法导致应该送回家的人入狱,该怎么办?
If you are spending the night in jail only because your name isn’t Jared and you didn’t play lacrosse, your plight is no longer a humorous cautionary tale.
如果您只是因为您的名字不是贾里德(Jared)并且没有参加曲棍网兜球而在监狱里过夜,那么您的困境不再是一个幽默的警示故事。
And when considering Outcome of a single unwarranted night in jail, there is one conclusion―
考虑到监狱里一个不必要的夜晚的结果,有一个结论-
An Outcome like that cannot be.
这样的结果不可能 。
Even if a robotic algorithm leads to 100 just verdicts in a row, if the 101st leads to an unjust jail sentence, that cannot be.
即使自动执行算法导致连续100次判决,如果第101条导致不公正的监禁判决,那也不可能。
There are protections against this of course―the legal system understands, in theory at least, that an unjust sentence cannot be.
当然可以防止这种情况的发生-法律制度至少在理论上理解到,不公正的判决是不可能的 。
But we’re dealing with algorithms here, and they often operate at a level far beyond our understanding of what can and cannot be.
但是我们在这里处理算法,它们的工作水平通常超出了我们对可能和不可能的理解。
简要说明一下-算法无法从技术上显示基于受宪法保护的类的偏见,但他们通常会找到实现此目的的方法 (A brief aside — Algorithms cannot technically show bias based on Constitutionally protected classes, but they often find ways to do this)
It’s not just morality prohibiting bias in high stakes algorithmic decisions, it’s the Constitution.
不仅道德禁止高风险算法决策中的偏见,还在于宪法。
Algorithms are prohibited from showing bias―or preferences―based on ethnicity, gender, sexual orientation and many other things.
禁止算法基于种族,性别,性取向和许多其他事物显示偏见或偏好。
Those cannot be a factor, due to them being a Constitutionally-protected class.
由于它们是受宪法保护的阶级 ,所以它们不能成为一个因素。
But what about secondary characteristics that imply any of the above?
但是暗示上述任何特征的次要特征呢?
Again, algorithms are great at finding patterns, and even if they are told to ignore certain categories, they can―and will―find patterns that act as substitute for those categories.
再有,算法擅长于找到模式,即使被告知忽略某些类别,它们也可以并且会找到能够替代那些类别的模式。
Consider these questions―
考虑这些问题-
- What gender has a name like Jared? 什么性别的人都喜欢Jared?
- What kind of background suggests that a person played lacrosse in high school? 什么样的背景表明一个人在高中打曲棍球?
And going a bit further―
再往前走-
- What is implied by the zip code of the subject’s home address? 主题的家庭住址的邮政编码意味着什么?
So no, an algorithm―particularly one born of a public institution like a courthouse―cannot show bias against Constitutionally-protected classes.
因此,不能,一种算法(尤其是像法院这样的公共机构出生的算法)不能表现出对受宪法保护的阶级的偏见。
But it might, and probably will if we are not vigilant.
但是,如果我们不保持警惕,它可能会,也可能会。
算法可以使您产生偏见吗? 考虑到算法无处不在-答案可能是肯定的。 (Can algorithms make you biased? Considering algorithms are everywhere―the answer may be yes.)
You don’t have to be an HR person at a Tech company or a bail-setting judge to become biased by algorithms.
您不必成为技术公司的人力资源人员或保释法官就可以对算法产生偏见。
If you live in the modern world and―
如果您生活在现代世界中,并且-
Engage in Social Media, read a news feed, go onto dating apps, or do just about anything online―that bias will be sent down to you.
参与社交媒体,阅读新闻提要,使用约会应用程序或在线上进行几乎所有操作,这些偏见都会被发送给您。
Bias will influence the friends you choose, the beliefs you have, the people you date and everything else.
偏见会影响您选择的朋友,您的信念,与您约会的人以及其他所有因素。
The average smartphone user engages with 9 apps per day, and spends about 2 hours and 15 minutes per day interacting with them.
智能手机的平均用户每天使用9个应用程序 ,并且每天花费约2个小时15分钟与之互动 。
And what are the inner-workings of these apps?
这些应用程序的内部功能是什么?
That’s a mystery to the user.
这对用户来说是个谜。
What are the inner-workings of the algorithms inside these apps?
这些应用程序内部算法的内部运作方式是什么?
The inner-workings of the apps are a black box to both the user and the company that designed them.
应用程序的内部工作对于用户和设计它们的公司来说都是一个黑匣子。
当然,恒定的算法数据流会导致长期存在的隐性和隐性系统偏差 (And of course, the constant stream of algorithmic data can lead to the perpetuation of insidious, and often unseen systemic bias)
Dr. Lee gave this example on the podcast―
李博士在播客上举了这个例子
One thing for example I think we say in the paper which I think is just profound is that as an African-American who may be served more higher-interest credit card rates, what if I see that ad come through, and I click it just because I’m interested to see why I’m getting this ad, automatically I will be served similar ads, right? So it automatically places me in that high credit risk category. The challenge that we’re having now, Noah, is that as an individual consumer I have no way of recurating what my identity is.
例如,我认为我们在论文中说的一件我认为意义深远的事情是,作为可能会获得更高利率信用卡利率的非裔美国人,如果我看到该广告通过,然后点击它,该怎么办?因为我很想知道为什么要得到这则广告,所以会自动向我投放类似的广告,对吗? 因此,它自动将我置于高信用风险类别中。 诺亚,我们现在面临的挑战是,作为个人消费者,我无法重新获得自己的身份。
Dr. Lee has a Doctorate and is a Senior Fellow at a prestigious institute, and has accomplished countless other things.
Lee博士拥有博士学位,并且是著名研究所的高级研究员,并完成了无数其他工作。
But if an algorithm sends her an ad for a high-interest credit card because of her profile, and she inadvertently clicks an ad, or even just hovers her mouse over an ad, that action is registered and added to her profile.
但是,如果算法由于她的个人资料而向她发送了一张针对高息信用卡的广告,而她无意间点击了广告, 甚至只是将鼠标悬停在广告上 ,该操作就会被注册并添加到她的个人资料中。
And then her credit is dinged, because another algorithm sees her as the type of person who clicks or hovers on ads for high-interest credit card rates.
然后,她的信用下降了,因为另一种算法将她视为点击或徘徊在广告上以获得高利率信用卡利率的人的类型。
And of course, if an algorithm sees that lacrosse-playing Jareds should be served ads for Individual Retirement Accounts, that may lead to a different Outcome.
当然,如果算法发现应将打长曲棍球的Jareds投放给个人退休帐户广告,这可能会导致不同的结果。
Dr. Lee makes the point that this is no one’s fault per se, but systemic bias can certain show up.
李博士指出,这本身不是人的错,但一定会出现系统性偏见。
Every response you make to a biased algorithm is added to your profile, even if the addition is antithetical to your true profile.
您对偏差算法做出的每个响应都会添加到您的配置文件中,即使添加的内容与您的真实配置文件相反。
And of course there is no way that any of us can know what our profile is, let alone recurate it.
当然,我们绝对不可能知道我们的个人资料,更不用说对其进行复述了。
因此,个人和系统无意间受到算法的偏见-我们该怎么办? (So individuals and the system are unintentionally biased by algorithms―what do we do?)
First of all, we don’t scrap the whole system.
首先,我们不会废弃整个系统。
Algorithms can make you biased, and as I showed in Part 1, data can lead you to a form of psychopathy.
算法可能会使您产生偏见,正如我在第1部分中所展示的,数据会导致您陷入某种精神病。
But algorithms and data also improve our lives in countless other ways. They can cure diseases and control epidemics. They can improve test scores of the children from underserved communities.
但是算法和数据还可以通过无数其他方式改善我们的生活。 他们可以治愈疾病并控制流行病。 他们可以提高服务不足社区儿童的考试成绩。
Rockford, Illinois employed data and algorithms to end homelessness in their city.
伊利诺伊州罗克福德采用数据和算法来结束他们所在城市的无家可归现象 。
They solved homelessness, and that is incredible.
他们解决了无家可归的问题,这简直令人难以置信。
So what do we do?
那么我们该怎么办?
We tweak the system, and we tweak our own approach to it.
我们调整系统,然后调整自己的方法。
And we’ll do that in Part 3.
我们将在第3部分中进行说明。
Stay tuned!
敬请关注!
This article is Part 2 of a 3 Part series — The Perils and Promise of Data
本文是3部分系列的第2部分-数据的危害和承诺
Part 1 of this series is here— 3 ways data can turn anyone into a psychopath, including you
本系列的第1部分在这里- 数据可以使任何人变成精神病者(包括您)的3种方式
Part 3 of this series — Coming Soon!
本系列的第3部分-即将推出!
Jonathan Maas has a few books on Amazon, and you can contact him through Medium, or Goodreads.com/JMaas .
乔纳森·马斯(Jonathan Maas) 在亚马逊上有几本书 ,您可以通过Medium或Goodreads.com/JMaas与他联系。
翻译自: https://medium.com/predict/algorithms-can-leave-anyone-biased-including-you-f39cb6abd127
算法偏见是什么
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389461.shtml
如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!