跟TED演讲学英文:A new way to build AI, openly by Percy Liang

A new way to build AI, openly

在这里插入图片描述

Link: https://www.ted.com/talks/percy_liang_a_new_way_to_build_ai_openly?

Speaker: Percy Liang

Date: October 2023

文章目录

  • A new way to build AI, openly
    • Introduction
    • Vocabulary
    • Transcript
    • Summary
    • 后记

Introduction

Today’s AI is trained on the work of artists and writers without attribution, its core values decided by a privileged few. What if the future of AI was more open and democratic? Researcher Percy Liang offers a vision of a transparent, participatory future for emerging technology, one that credits contributors and gives everyone a voice.

今天的人工智能是在没有归属的艺术家和作家的作品上接受训练的,其核心价值由少数特权阶层决定。如果人工智能的未来更加开放和民主会怎样?研究员Percy Liang为新兴技术提供了一个透明、参与性的未来愿景,一个表彰贡献者并给予每个人发言权的愿景。

Vocabulary

participatory: 美 [pɑːrˈtɪsəpətɔːri] 参与性的

core value:核心价值

intrigue:美 [ɪnˈtriːɡ] 引起xxx的好奇心;耍阴谋

I was intrigued, I wanted to understand it, I wanted to see how far we could go with this.我很感兴趣,我想了解它,我想看看我们能走多远。

enter the mainstream:跻身主流,成为主流

Language models and more generally, foundation models, have taken off and entered the mainstream. 语言模型和更一般的基础模型已经起飞并进入主流。

ensemble:美 [ɑːnˈsɑːmbl] 乐团,剧团: jazz ensemble 爵士乐合奏团, 注意发音

It was like a jazz ensemble where everyone was riffing off of each other, developing the technology that we have today. 这就像一个爵士乐合奏团,每个人都在即兴表演,发展我们今天拥有的技术。

not released openly: 没有开源

recipe:注意发音 美 [ˈresəpi] 烹饪法;食谱

And then today, the most advanced foundation models in the world are not released openly. They are instead guarded closely behind black box APIs with little to no information about how they’re built. So it’s like we have these castles which house the world’s most advanced AIs and the secret recipes for creating them. 然后今天,世界上最先进的基础模型没有公开发布。相反,它们被严密保护在黑盒API之后,几乎没有关于它们是如何构建的信息。这就像我们有这些城堡,里面有世界上最先进的人工智能和创造它们的秘方。

asymmetry: 不对称

stark: 明显的

but the resource and information asymmetry is stark. 但是资源和信息的不对称是明显的。

opacity:美 [oʊˈpæsədi] 不透明,晦涩,难懂

This opacity and centralization of power is concerning. 这种不透明和权力集中令人担忧。

tenet:美 [ˈtenɪt] 原则,信条

The most basic tenet of machine learning is that the training data and the test data have to be independent for evaluation to be meaningful. So if we don’t know what’s in the training data, then that 95 percent number is meaningless. 机器学习的最基本原则是训练数据和测试数据必须独立,评估才有意义。因此,如果我们不知道训练数据中有什么,那么95%的数字就没有意义。

we are flying blind.

accountability: 有责任,责任制

And with all the enthusiasm to deploying these models in the real world without meaningful evaluation, we are flying blind. And transparency isn’t just about the training data or evaluation. It’s also about environmental impact, labor practices, release processes, risk mitigation strategies. Without transparency, we lose accountability. 尽管我们满怀热情地在现实世界中部署这些模型,但却没有进行有意义的评估,这无疑是盲目的。透明度不仅仅是关于训练数据或评估。它还涉及环境影响、劳工实践、发布流程、风险缓解策略。没有透明度,我们就失去了问责制。

affirmative action

Affirmative action (also sometimes called reservations, alternative access, positive discrimination or positive action in various countries’ laws and policies)[1][2][3][4][5][6][7] refers to a set of policies and practices within a government or organization seeking to benefit marginalized groups. Historically and internationally, support for affirmative action has been justified by the idea that it may help with bridging inequalities in employment and pay, increasing access to education, and promoting diversity, social equity and redressing alleged wrongs, harms, or hindrances, also called substantive equality.[8]

subjective,controversial,contested questions

These are highly subjective, controversial, contested questions, and any decision on how to answer them is necessarily value-laden.这些都是高度主观的、有争议的、有争议的问题,任何关于如何回答这些问题的决定都必然是基于价值(观)的。

without attribution or consent:没有归属或者未经同意

The data here is a result of human labor, and currently this data is being scraped, often without attribution or consent. 这里的数据是人类劳动的结果,目前这些数据正在被爬取,通常没有归属或同意。

status quo:现状,美 [ˌsteɪtəs ˈkwoʊ]

So how can we change the status quo? 我们如何才能改变现状?

bleak:美 [bliːk] 凄凉的,暗淡的

situation seems pretty bleak:情况看起来相当惨淡。

With these castles,the situation might seem pretty bleak. But let me try to give you some hope.

encyclopedia:美 [ɪnˌsaɪkləˈpiːdiə] 百科全书, 注意发音

against all odds:尽管很困难,排除万难

But against all odds, Wikipedia prevailed. 但尽管困难重重,维基百科还是流行开来。

hobbyist:美 [ˈhɑbiɪst] 业余爱好者

peer production:对等生产

Peer production (also known as mass collaboration) is a way of producing goods and services that relies on self-organizing communities of individuals. In such communities, the labor of many people is coordinated towards a shared outcome.

embark on:开始从事,着手

I feel the same excitement about this vision as I did 19 years ago as that master’s student, embarking on his first NLP research project. 我对这个愿景感到兴奋,就像我19年前作为那个硕士生开始他的第一个NLP研究项目时一样。

Transcript

I was a young masters student

about to start my first
NLP research project,

and my task was to train a language model.

Now that language model was a little bit
smaller than the ones we have today.

It was trained on millions
rather than trillions of words.

I used a hidden Markov model
as opposed to a transformer,

but that little language model I trained

did something I thought was amazing.

It took all this raw text

and somehow it organized it into concepts.

A concept for months,

male first names,

words related to the law,

countries and continents and so on.

But no one taught
these concepts to this model.

It discovered them all by itself,
just by analyzing the raw text.

But how?

I was intrigued,
I wanted to understand it,

I wanted to see how far
we could go with this.

So I became an AI researcher.

In the last 19 years,

we have come a long way
as a research community.

Language models and more generally,
foundation models, have taken off

and entered the mainstream.

But, it is important to realize
that all of these achievements

are based on decades of research.

Research on model architectures,

research on optimization algorithms,
training objectives, data sets.

For a while,

we had an incredible free culture,

a culture of open innovation,

a culture where researchers published,

researchers released data sets, code,

so that others can go further.

It was like a jazz ensemble where everyone
was riffing off of each other,

developing the technology
that we have today.

But then in 2020,

things started changing.

Innovation became less open.

And then today, the most advanced
foundation models in the world

are not released openly.

They are instead guarded closely
behind black box APIs

with little to no information
about how they’re built.

So it’s like we have these castles

which house the world’s most advanced AIs

and the secret recipes for creating them.

Meanwhile, the open community
still continues to innovate,

but the resource and information
asymmetry is stark.

This opacity and centralization
of power is concerning.

Let me give you three reasons why.

First, transparency.

With closed foundation models,
we lose the ability to see,

to evaluate, to audit these models

which are going to impact
billions of people.

Say we evaluate a model through an API
on medical question answering

and it gets 95 percent accuracy.

What does that 95 percent mean?

The most basic tenet of machine learning

is that the training data
and the test data

have to be independent
for evaluation to be meaningful.

So if we don’t know
what’s in the training data,

then that 95 percent
number is meaningless.

And with all the enthusiasm
to deploying these models

in the real world
without meaningful evaluation,

we are flying blind.

And transparency isn’t just
about the training data or evaluation.

It’s also about environmental impact,

labor practices, release processes,

risk mitigation strategies.

Without transparency,
we lose accountability.

It’s like not having nutrition labels
on the food you eat,

or not having safety ratings
on the cars you drive.

Fortunately, the food and auto industries
have matured over time,

but AI still has a long way to go.

Second, values.

So model developers like to talk
about aligning foundation models

to human values,
which sounds wonderful.

But whose values
are we talking about here?

If we were just building a model
to answer math questions,

maybe we wouldn’t care,

because as long as the model
produces the right answer,

we would be happy,
just as we’re happy with calculators.

But these models are not calculators.

These models will attempt to answer
any question you throw it.

Who is the best basketball
player of all time?

Should we build nuclear reactors?

What do you think of affirmative action?

These are highly subjective,
controversial, contested question,

and any decision on how to answer them
is necessarily value laden.

And currently, these values
are unilaterally decided

by the rulers of the castles.

So can we imagine
a more democratic process

for determining these values
based on the input from everybody?

So foundation models will be the primary
way that we interact with information.

And so determining these values
and how we set them

will have a sweeping impact

on how we see the world and how we think.

Third, attribution.

So why are these foundation
models so powerful?

It’s because they’re trained
on massive amounts of data.

See what machine-learning
researchers call data

is what artists call art

or writers call books

or programers call software.

The data here is a result of human labor,

and currently this data is being scraped,

often without attribution or consent.

So understandably, some people are upset,

filing lawsuits, going on strike.

But this is just an indication
that the incentive system is broken.

And in order to fix it,
we need to center the creators.

We need to figure out
how to compensate them

for the value of the content
they produced,

and how to incentivize them
to continue innovating.

Figuring this out
will be critical to sustaining

the long term development of AI.

So here we are.

We don’t have transparency
about how the models are being built.

We have to live with a fixed values
set by the rulers of the castles,

and we have no means of attributing

the creators who make
foundation models possible.

So how can we change the status quo?

With these castles,

the situation might seem pretty bleak.

But let me try to give you some hope.

In 2001,

Encyclopedia Britannica was a castle.

Wikipedia was an open experiment.

It was a website
where anyone could edit it,

and all the resulting knowledge
would be made freely available

to everyone on the planet.

It was a radical idea.

In fact, it was a ridiculous idea.

But against all odds, Wikipedia prevailed.

In the '90s, Microsoft
Windows was a castle.

Linux was an open experiment.

Anyone could read its source code,
anyone could contribute.

And over the last two decades,

Linux went from being a hobbyist toy

to the dominant operating system
on mobile and in the data center.

So let us not underestimate
the power of open source

and peer production.

These examples show us a different way
that the world could work.

A world in which everyone can participate

and development is transparent.

So how can we do the same for AI?

Let me end with a picture.

The world is filled
with incredible people:

artists, musicians, writers, scientists.

Each person has unique skills,
knowledge and values.

Collectively, this defines
the culture of our civilization.

And the purpose of AI, as I see it,

should be to organize
and augment this culture.

So we need to enable people to create,
to invent, to discover.

And we want everyone to have a voice.

The research community has focused
so much on the technical progress

that is necessary to build these models,

because for so long,
that was the bottleneck.

But now we need to consider
the social context

in which these models are built.

Instead of castles,

let us imagine a more transparent
and participatory process for building AI.

I feel the same excitement
about this vision

as I did 19 years ago
as that masters student,

embarking on his first
NLP research project.

But realizing this vision will be hard.

It will require innovation.

It will require participation
of researchers, companies, policymakers,

and all of you

to not accept the status quo as inevitable

and demand a more participatory
and transparent future for AI.

Thank you.

(Applause)

Summary

The speaker’s manuscript outlines his journey from a young master’s student working on his first NLP research project in 2004 to becoming an AI researcher. He highlights the significant advancements made by the research community over the last 19 years, particularly in language and foundation models. However, he expresses concerns about the recent trend towards less open innovation, with advanced models now hidden behind closed APIs. This shift raises issues of transparency, values, and attribution in AI development.

The speaker emphasizes the importance of transparency in evaluating and auditing models, as well as the need to consider whose values are embedded in these models. He also discusses the lack of attribution and consent in the data used to train these models, calling attention to the broken incentive system in AI development.

To address these challenges, the speaker advocates for a more open and participatory approach to AI development, citing the success of projects like Wikipedia and Linux. He believes that by embracing open source and peer production principles, the AI community can create a more transparent and inclusive future for AI development.

演讲者的手稿概述了他从2004年作为年轻的硕士生开始进行他的第一个自然语言处理研究项目,到成为人工智能研究员的旅程。他强调了过去19年来研究界取得的重大进展,特别是在语言和基础模型方面。然而,他对最近向较少开放创新的趋势表示担忧,因为现在先进的模型都隐藏在封闭的API背后。这种转变引发了AI开发中透明度、价值观和归因的问题。

演讲者强调了在评估和审计模型时透明度的重要性,以及需要考虑到这些模型中嵌入的价值观。他还讨论了在训练这些模型所使用的数据中缺乏归因和同意,引起了人工智能开发中破碎的激励机制的关注。

为了解决这些挑战,演讲者主张采取更开放和参与式的人工智能开发方式,引用了维基百科和Linux等项目的成功。他认为,通过 embracing开源和peer production原则,AI社区可以为AI开发创造一个更透明和包容的未来。

后记

2024年4月10日19点17分写于上海市。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/806574.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

vitepress/vite vue3 怎么实现vue模版字符串实时编译

如果是vue模版字符串的话,先解析成模版对象 另一篇文章里有vue模版字符串解析成vue模版对象-CSDN博客 //vue3写法(vue2可以用new Vue.extend(vue模版对象)来实现)import { createApp, defineComponent } from vue;// 定义一个简单的Vue组件c…

登陆qq,经常收到qq游戏中心的推送信息,关闭推送信息

手动关闭推送信息的步骤: 1.点开左侧游戏中心 2、在打开界面,点击左下角自己的头像 3、打开设置中心,关闭所有的推送 4、完成关闭,不会推送了

头歌-机器学习 第13次实验 特征工程——共享单车之租赁需求预估

第1关:数据探索与可视化 任务描述 本关任务:编写python代码,完成一天中不同时间段的平均租赁数量的可视化功能。 相关知识 为了完成本关任务,你需要掌握: 读取数据数据探索与可视化 读取数据 数据保存在./step1/…

vmware esxi6.0安装配置操作

系统安装及配置 在服务器上安装ESXI 6.0 提示是否继续安装 如果不想安装,按ESC后再按F11即可,稍后电脑会重启. 继续安装,则按回车键 按F11同意声明继续 选择将EXSI 安装到哪个硬盘上,我这里使用的是虚拟机,所以只有这一个选项 选择默认键盘布局,默认的美国键盘即可 设置root…

华为ensp中PPP(点对点协议)中的CHAP认证 原理和配置命令

作者主页:点击! ENSP专栏:点击! 创作时间:2024年4月11日6点00分 PPP协议(Point-to-Point Protocol)是点到点协议,是一种常用的串行链路层协议,用于在两个节点之间建立点…

Facial Micro-Expression Recognition Based on DeepLocal-Holistic Network 阅读笔记

中科院王老师团队的工作,用于做微表情识别。 摘要: Toimprove the efficiency of micro-expression feature extraction,inspired by the psychological studyof attentional resource allocation for micro-expression cognition,we propose a deep loc…

【网站项目】校园失物招领小程序

🙊作者简介:拥有多年开发工作经验,分享技术代码帮助学生学习,独立完成自己的项目或者毕业设计。 代码可以私聊博主获取。🌹赠送计算机毕业设计600个选题excel文件,帮助大学选题。赠送开题报告模板&#xff…

chromium 协议栈 cronet ios 踩坑案例

1、请求未携带 Accept-Language http header 出现图片加载失败 现象: 访问 https://www.huawei.com/cn/?ic_mediumdirect&ic_sourcesurlent 时出现图片加载失败的问题 预期结果: 原因: 网络库删除了添加 Accept-Language header 的逻…

搭建NFS服务器,部署k8s集群,并在k8s中使用NFS作为持久化储存

🐇明明跟你说过:个人主页 🏅个人专栏:《Kubernetes航线图:从船长到K8s掌舵者》 🏅 🔖行路有良友,便是天堂🔖 目录 一、前言 1、k8s概述 2、NFS简介 二、NFS服务器…

分享 WebStorm 2024 激活的方案,支持JetBrains全家桶

大家好,欢迎来到金榜探云手! WebStorm公司简介 JetBrains 是一家专注于开发工具的软件公司,总部位于捷克。他们以提供强大的集成开发环境(IDE)而闻名,如 IntelliJ IDEA、PyCharm、和 WebStorm等。这些工具…

SOCKS代理是如何增强网络隐私?

在数字化时代🌐,网络隐私的重要性日益凸显。个人和组织都在寻找有效的方法来保护自己的网络活动不受侵犯。SOCKS代理作为一种流行的网络协议,提供了一种有效的手段来增强网络隐私。本文将详细介绍SOCKS代理是如何工作的,以及它是如…

C++模板编程

模板是泛型编程的基础,先给出泛型编程的概念。 泛型编程:编写与类型无关的通用代码,是代码复用的一种手段。 应用场景:比如要实现一个通用的,进行两个变量互相交换的函数,此时可以通过函数重载的方式&…

【从浅学到熟知Linux】进程状态与进程优先级(含进程R/S/T/t/D/X/Z状态介绍、僵尸进程、孤儿进程、使用top及renice调整进程优先级)

🏠关于专栏:Linux的浅学到熟知专栏用于记录Linux系统编程、网络编程及数据库等内容。 🎯每天努力一点点,技术变化看得见 文章目录 进程状态进程状态查看R运行状态(running)S睡眠状态(sleeping&a…

蓝桥杯嵌入式速成

蓝桥杯嵌入式速成 cubmx创建工程利用官方提供的LCD代码创建工程(15届不能用)利用官方提供的LCD代码创建工程(15届能用)Keil配置头文件注意其他注意 LED闪烁 按键短按长按双击 LCD高亮行高亮字符 RTCADCI2Cuart接收发送 PWMDAC定时…

Vue.js------vue基础

1. 能够了解更新监测, key作用, 虚拟DOM, diff算法2. 能够掌握设置动态样式3. 能够掌握过滤器, 计算属性, 侦听器4. 能够完成品牌管理案例 一.Vue基础_更新监测和key 1.v-for更新监测 目标:目标结构变化, 触发v-for的更新 情况1: 数组翻转情况2: 数组截取情况3…

VIT论文阅读

论文地址:https://arxiv.org/pdf/2010.11929.pdf VIT论文阅读 摘要INTRODUCTION结论RELATEDWORKMETHOD1.VISIONTRANSFORMER(VIT)整体流程消融实验HEAD TYPE AND CLASSTOKENpoisitional embedding 整体过程公式Inductive biasHybrid Architecture 2.FINE-TUNINGANDH…

Substance 3D2024版 下载地址及安装教程

Substance 3D是Adobe公司推出的一套全面的3D设计和创作工具集合,用于创建高质量的3D资产、纹理和材质。 Substance 3D包括多个功能强大的软件和服务,如Substance 3D Painter、Substance 3D Designer和Substance 3D Sampler等。这些工具提供了广泛的功能…

TQ15EG开发板教程:在MPSOC上运行ADRV9009

首先需要在github上下载两个文件,本例程用到的文件以及最终文件我都会放在网盘里面, 地址放在最后面。在github搜索hdl选择第一个,如下图所示 GitHub网址:https://github.com/analogdevicesinc/hdl/releases 点击releases选择版…

vue实现海康h5player问题汇总

1. 引入问题 最开始写的时候,把h5player封装成了一个组件,把资源文件随便放在了一个目录下, 直接在子组件中引入,报错window.JSPlugin is not a constructor 或者JSPlugin is not defined 初步分析应该是引入资源文件失败&#x…

Java异常处理机制详解:多层方法调用与异常传播(day23)

1.数组下标越界 2.多个处理异常 上面这两个代码的区别就是有无 System.out.println("抛出了NumberFormatException"); System.out.println("抛出了ArrayIndexOutOfBoundsException"); 第一种是不论捕获到哪种异常,都只会调用e.printStack…