跟TED演讲学英文:The next grand challenge for AI by Jim Fan

The next grand challenge for AI

在这里插入图片描述

Link: https://www.ted.com/talks/jim_fan_the_next_grand_challenge_for_ai?

Speaker: Jim Fan

Date: October 2023

文章目录

  • The next grand challenge for AI
    • Introduction
    • Vocabulary
    • Transcript
    • Summary
    • 后记

Introduction

Researcher Jim Fan presents the next grand challenge in the quest for AI: the “foundation agent,” which would seamlessly operate across both the virtual and physical worlds. He explains how this technology could fundamentally change our lives — permeating everything from video games and metaverses to drones and humanoid robots — and explores how a single model could master skills across these different realities.

研究员Jim Fan介绍了人工智能探索中的下一个重大挑战:“基础agent”,它将在虚拟和现实世界中无缝运行。他解释了这项技术如何从根本上改变我们的生活——渗透到从视频游戏和元宇宙到无人机和人形机器人的所有东西——并探索了一个模型如何掌握这些不同现实的技能。

Vocabulary

grand challenge:重大挑战

permeate:美 [ˈpɜːrmieɪt] 渗透,弥漫

in the quest for:在寻求

It is still early days in the quest for global power of the Chinese carmaker.

中国汽车制造商对全球影响力的争夺目前仍处于初期阶段.

www.24en.com

In the quest for self-expression, composers were imbued with a concern for detail.

为追求自我表现,本时期的作曲家极为注重细节。

www.tdict.com

Perhaps his greatest joy is in the chance encounters during the quest for the perfect image.

也许,在现实中寻找令他耿耿于心的心像并终于与之邂逅的过程,才是他最大的喜悦。

dict.youdao.com

She does gymnastic exercises four times a week in the quest for achieving the perfect figure.

为练成完美的体型,她每周做四次健身操.

wk.baidu.com

humanoid:英 [ˈhjuːmənɔɪd] 类人的,人形的

humanoid robot:人形机器人

metaverse:元宇宙

adrenaline:美 [əˈdrenəlɪn] 肾上腺素

I still remember the adrenaline of seeing history unfold that day. 我仍然记得那天看到历史展开时的肾上腺素。

embodiment:美 [ɪmˈbɑːdimənt] 具体形象;化身;具体表现;体现

It was in Germany alone that his hope seemed capable of embodiment.

似乎只有在德国他的希望才能得到体现。

牛津词典

Kofi is the embodiment of possibility.

Kofi身上体现了一种可能性。

www.kekenet.com

But it is perfect embodiment of modern styles.

是现代时尚风格的完美体现。

bbs.chinadaily.com.cn

A circle was the embodiment of his concept of life.

圈子是他生活理念的具体体现.

dict.engbus.cn

terrain:美 [təˈreɪn] 地形,地带

tree of skills: 技能树

It can explore the terrains, mine all kinds of materials, fight monsters, craft hundreds of recipes, and unlock an ever-expanding tree of skills. 它可以探索地形,开采各种材料,与怪物战斗,制作数百种食谱,并解锁不断扩展的技能树。

indefinitely: 美 [ɪnˈdefɪnətli] 无限期地

how does Voyager keep exploring indefinitely? 旅行者号是如何无限期地继续探索的?

kinematic:美 [ˌkɪnə’mætɪk] 运动学的;运动学上的

Metamorph is able to handle extremely varied kinematic characteristics from different robot bodies. Metamorph能够处理来自不同机器人主体的极其不同的运动学特征。

envision:美 [ɪnˈvɪʒn] 想象,设想

take a big stride:迈出一大步

The speaker envisions that MetaMorph 2.0 will be able to generalize to robot hands, humanoids, dogs, drones, and even beyond. Compared to Voyager, MetaMorph takes a big stride towards multi-body control. 演讲者设想MetaMorph 2.0将能够推广到机器人手,人形机器人,狗,无人机,甚至更远。与旅行者号相比,MetaMorph向多体控制迈出了一大步。

uncanny:奇怪的;神秘的;怪异的

And this car racing scene is where simulation has crossed the uncanny valley. 这个赛车场景是模拟穿越鬼谷的地方。

hardware accelerated ray tracing:硬件加速光线追踪

render extremely complex scenes: 渲染极其复杂的场景

photorealism: 美 [ˌfoʊdoʊˈri(ə)lɪzəm] 摄影写实主义;照相现实主义;超级现实主义

Thanks to hardware accelerated ray tracing, we’re able to render extremely complex scenes with breathtaking levels of details. And this photorealism you see here will help us train computer vision models that will become the eyes of every AI agent. 由于硬件加速光线跟踪,我们能够以惊人的细节水平渲染极其复杂的场景。你在这里看到的照片真实感将帮助我们训练计算机视觉模型,这些模型将成为每个AI智能体的眼睛。

be it xxx, or xxx:无论xxx还是xxx

All language tasks can be expressed as text in and text out. Be it writing poetry, translating English to Spanish, or coding Python, it’s all the same. 所有的语言任务都可以表示为文本输入和文本输出。无论是写诗、将英语翻译成西班牙语,还是编写Python代码,都是一样的。

Transcript

In spring of 2016,

I was sitting in a classroom
at Columbia University

but wasn’t paying attention
to the lecture.

Instead, I was watching a board game
tournament on my laptop.

And it wasn’t just any tournament,
but a very, very special one.

The match was between AlphaGo
and Lee Sedol.

The AI had just won three
out of five games

and became the first ever to beat
a human champion at a game of Go.

I still remember the adrenaline
of seeing history unfold that day.

The [glorious] moment when AI agents
finally entered the mainstream.

But when the excitement fades,

I realized that as mighty as AlphaGo was,

it could only do one thing
and one thing alone.

It isn’t able to play any other games,
like Super Mario or Minecraft,

and it certainly cannot do dirty laundry

or cook a nice dinner for you tonight.

But what we truly want
are AI agents as versatile as Wall-E,

as diverse as all the robot body forms

or embodiments in Star Wars

and works across infinite realities,

virtual or physical,
as in Ready Player One.

在这里插入图片描述

So how can we achieve
these science fictions

in possibly the near future?

This is a practitioner’s guide
towards generally capable AI agents.

Most of the ongoing research efforts
can be laid out nicely across three axes:

the number of skills an agent can do;

the body forms or embodiments
it can control;

and the realities it can master.

AlphaGo is somewhere here,

but the upper right corner
is where we need to go.

So let’s take it one axis at a time.

Earlier this year,
I led the Voyager project,

which is an agent that scales up massively
on a number of skills.

And there’s no game better than Minecraft

for the infinite creative
things it supports.

And here’s a fun fact for all of you.

Minecraft has 140 million active players.

And just to put that number
in perspective,

it’s more than twice
the population of the UK.

And Minecraft is so insanely popular
because it’s open-ended:

it does not have a fixed storyline
for you to follow,

and you can do whatever
your heart desires in the game.

And when we set Voyager free in Minecraft,

we see that it’s able to play
the game for hours on end

without any human intervention.

The video here shows snippets

from a single episode of Voyager
where it just keeps going.

It can explore the terrains,

mine all kinds of materials,
fight monsters,

craft hundreds of recipes

and unlock an ever-expanding
tree of skills.

So what’s the magic?

The core insight is coding as action.

First, we convert the 3D world
into a textual representation

using a Minecraft JavaScript API
made by the enthusiastic community.

Voyager invokes GPT4 to write
code snippets in JavaScript

that become executable skills in the game.

Yet, just like human engineers,
Voyager makes mistakes.

It isn’t always able to get a program
correct on the first try.

So we add a self-reflection
mechanism for it to improve.

There are three sources of feedback
for the self-reflection:

the JavaScript code execution error;

the agent state, like health and hunger;

and a world state, like terrains
and enemies nearby.

So Voyager takes an action,

observes the consequences of its action
on the world and on itself,

reflects on how it can possibly do better,

[tries] out some new action plans
and rinse and repeat.

And once the skill becomes mature,

Voyager saves it to a skill library
as a persistent memory.

You can think of the skill library
as a code repository

written entirely by a language model.

And in this way,

Voyager is able to bootstrap
its own capabilities recursively

as it explores
and experiments in Minecraft.

So let’s work through an example together.

Voyager finds itself hungry

and needs to get food as soon as possible.

It senses four entities nearby:

a cat, a villager, a pig
and some wheat seeds.

Voyager starts an inner monologue.

"Do I kill the cat or villager for food?

Horrible idea.

How about a wheat seed?

I can grow a farm out of the seeds,

but that’s going to take a long time.

So sorry, piggy, you are the chosen one."

(Laughter)

And Voyager finds a piece
of iron in its inventory.

So it recalls an old skill
from the library to craft an iron sword

and starts to learn
a new skill called “hunt pig.”

And now we also know that, unfortunately,
Voyager isn’t vegetarian.

(Laughter)

One question still remains:

how does Voyager keep
exploring indefinitely?

We only give it a high-level directive,

that is, to obtain as many
unique items as possible.

And Voyager implements a curriculum
to find progressively harder

and more novel challenges
to solve all by itself.

And putting all of these together,

Voyager is able to not only master

but also discover new skills
along the way.

在这里插入图片描述

And we did not pre-program any of this.

It’s all Voyager’s idea.

And this, what you see here,
is what we call lifelong learning.

When an agent is forever curious
and forever pursuing new adventures.

Compared to AlphaGo,

Voyager scales up massively
on a number of things he can do,

but still controls only one
body in Minecraft.

So the question is:
can we have an algorithm

that works across many different bodies?

Enter MetaMorph.

It is an initiative
I co-developed at Stanford.

We created a foundation model
that can control not just one

but thousands of robots

with very different
arm and leg configurations.

Metamorph is able to handle extremely
varied kinematic characteristics

from different robot bodies.

And this is the intuition
on how we create a MetaMorph.

First, we design a special vocabulary

to describe the body parts

so that every robot body
is basically a sentence

written in the language
of this vocabulary.

And then we just apply
a transformer to it,

much like ChatGPT,

but instead of writing out text,
MetaMorph writes out motor controls.

We show that MetaMorph is able to control
thousands of robots to go upstairs,

cross difficult terrains
and avoid obstacles.

Extrapolating into the future,

if we can greatly expand
this robot vocabulary,

I envision MetaMorph 2.0 will be able
to generalize to robot hands, humanoids,

dogs, drones and even beyond.

Compared to Voyager,

MetaMorph takes a big stride
towards multi-body control.

在这里插入图片描述

And now, let’s take everything
one level further

and transfer the skills
and embodiments across realities.

Enter IsaacSim,
Nvidia’s simulation effort.

The biggest strength of IsaacSim
is to accelerate physics simulation

to 1,000x faster than real time.

For example,

this character here learns
some impressive martial arts

by going through ten years
of intense training

in only three days of simulation time.

So it’s very much like the virtual
sparring dojo in the movie “Matrix.”

And this car racing scene

is where simulation has crossed
the uncanny valley.

Thanks to hardware
accelerated ray tracing,

we’re able to render
extremely complex scenes

with breathtaking levels of details.

And this photorealism you see here
will help us train computer vision models

that will become the eyes
of every AI agent.

And what’s more, IsaacSim
can procedurally generate worlds

with infinite variations
so that no two look the same.

So here’s an interesting idea.

If an agent is able to master
10,000 simulations,

then it may very well just generalize
to our real physical world,

which is simply the 10,001st reality.

And let that sink in.

As we progress through this map,

we will eventually get
to the upper right corner,

which is a single agent that generalizes
across all three axes,

and that is the “Foundation Agent.”

在这里插入图片描述

I believe training Foundation Agent
will be very similar to ChatGPT.

All language tasks can be expressed
as text in and text out.

Be it writing poetry,

translating English to Spanish
or coding Python,

it’s all the same.

And ChatGPT simply scales this up
massively across lots and lots of data.

It’s the same principle.

The Foundation Agent takes as input
an embodiment prompt and a task prompt

and output actions,

and we train it by simply
scaling it up massively

across lots and lots of realities.

I believe in a future where everything
that moves will eventually be autonomous.

And one day we will realize
that all the AI agents,

across Wall-E, Star Wars,
Ready Player One,

no matter if they are
in the physical or virtual spaces,

will all just be different prompts
to the same Foundation Agent.

And that, my friends,

will be the next grand challenge
in our quest for AI.

(Applause)

Summary

In spring of 2016, the speaker was sitting in a classroom at Columbia University but wasn’t paying attention to the lecture. Instead, he was watching a board game tournament on his laptop, a very special one between AlphaGo and Lee Sedol. AlphaGo had just made history by beating a human champion at the game of Go, winning three out of five games. The adrenaline of witnessing this historic moment marked the entry of AI agents into the mainstream.

After the excitement faded, the speaker realized that as mighty as AlphaGo was, it could only play Go and nothing else. The vision for AI agents that we truly want is to be as versatile as Wall-E, capable of diverse actions across infinite realities. To achieve this, research efforts are focused on three axes: the number of skills an agent can perform, the body forms or embodiments it can control, and the realities it can master.

Taking one axis at a time, progress is being made. The Voyager project, which he led, demonstrated an agent’s ability to scale up massively in the number of skills it can perform, using Minecraft as a platform for its diverse actions. Voyager’s core insight is “coding as action,” where it converts the 3D world into a textual representation, uses GPT-4 to write executable skills in JavaScript, and employs self-reflection mechanisms for improvement.

MetaMorph, another initiative, aims to control thousands of robots with varied configurations. By designing a special vocabulary to describe robot body parts and applying a transformer model to generate motor controls, MetaMorph demonstrates the potential for multi-body control. IsaacSim, Nvidia’s simulation effort, accelerates physics simulations to enable rapid skill acquisition in virtual environments, bridging the gap between virtual and physical realities.

The ultimate goal is to develop a Foundation Agent that can generalize across all three axes, mastering diverse skills, controlling various bodies, and understanding multiple realities. This agent, trained on massive amounts of data and across numerous realities, represents the next grand challenge in the quest for AI.

后记

2024年4月11日20点19分于上海。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/808431.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Terraform 语法配置

配置语法 Terraform 的配置文件都是以 .tf 为后缀Terraform 支持两种模式 HCL、JSON Provider 插件 providers 地址:Terraform Registry Terraform 通过 provider 管理基础设施,使用 provider 与云供应商 API 进行交互,每个 Provider 都包含…

多线程的学习

多线程编辑: 可以简单理解进程是一个软件 而线程就是一个软件中多个可以同时运行的功能 实现多线程的第一种方式:使用Thead类我们再自己创造一个类继承于这个类我们在对Thead方法进行重写,注意我们再重写的时候一定要加上Override这行 我犯下…

多线程回答的滚瓜烂熟,面试官问我虚线程了解吗?我说不太了解!

Java虚拟线程(Virtual Threads)标志着Java在并发编程领域的一次重大飞跃,特别是从Java 21版本开始。这项新技术的引入旨在克服传统多线程和线程池存在的挑战。 多线程和线程池 在Java中,传统的多线程编程依赖于Thread类或实现Ru…

2024 年“认证杯”数学中国数学建模网络挑战赛

题目 C题 云中的海盐 巴黎气候协定提出的目标是:在 2100 年前,把全球平均气温相对于工业 革命以前的气温升幅控制在不超过 2 摄氏度的水平,并为 1.5 摄氏度而努力。 但事实上,许多之前的研究已经指出,全球的碳排放以及…

主从数据同步原理

2.2.主从数据同步原理 2.2.1.全量同步 主从第一次建立连接时,会执行全量同步,将master节点的所有数据都拷贝给slave节点,流程: 这里有一个问题,master如何得知salve是第一次来连接呢?? 有几个…

Linux操作系统的学习

Linux系统的目录结构 / 是所有目录的顶点目录结构像一颗倒挂的树 Linux常用命令 常见命令 序号命令对应英文作用1lslist查看当前目录下的内容2pwdprint work directory查看当前所在目录3cd [目录名]change directory切换目录4touch [文件名]touch如果文件不存在,新…

6.11物联网RK3399项目开发实录-驱动开发之定时器的使用(wulianjishu666)

嵌入式实战开发例程【珍贵收藏,开发必备】: 链接:https://pan.baidu.com/s/1tkDBNH9R3iAaHOG1Zj9q1Q?pwdt41u 定时器使用 前言 RK3399有 12 个 Timers (timer0-timer11),有 12 个 Secure Timers(stimer0~stimer11) 和 2 个 …

鸿蒙实战开发-如何实现标准化数据定义与描述的功能。

介绍 本示例主要使用ohos.data.uniformTypeDescriptor 展示了标准化数据定义与描述的功能,在新增预置媒体文件后,对媒体文件的utd标准类型获取、utd类型归属类型查询、获取文件对应的utd类型的默认图标、支持自定义数据类型等功能。 实现过程中还使用到…

AcWing-直方图中最大的矩形

131. 直方图中最大的矩形 - AcWing题库 所需知识:单调栈 思路:要求最大矩形,所以需要使矩形的高与长的乘积最大即可,依次从左到右将每一列当作中心列,向两边扩散,直到两边的高都小于该列的高,…

Vmware虚拟机Centos7固定IP地址

1、点击编辑-虚拟网络编辑器 2、点击更改设置、修改虚拟网络配置器并确认保存(见图) 这个子网IP和子网掩码的前三位需要一样网关的前三位需要和子网ip一致。 3、打开设置“网络和Internet”,点击“更改适配器选项”,点击适配器VM…

PP-LCNet:一种轻量级CPU卷积神经网络

PP-LCNet: A Lightweight CPU Convolutional Neural Network 最近看了一个新的分享,在图像分类的任务上表现良好,具有很高的实践意义。 论文: https://arxiv.org/pdf/2109.15099.pdf项目: https://github.com/PaddlePaddle/Padd…

JUC并发编程2(高并发,AQS)

JUC AQS核心 当有线程想获取锁时,其中一个线程使用CAS的将state变为1,将加锁线程设为自己。当其他线程来竞争锁时会,判断state是不是0,不是自己就把自己放入阻塞队列种(这个阻塞队列是用双向链表实现)&am…

探索ChatGPT-Plus:AI 助手全套开源解决方案

探索ChatGPT-Plus:AI 助手全套开源解决方案 ChatGPT-plus是一种新型的对话生成模型,它是在OpenAI的ChatGPT基础上进行了改进和优化的版本。ChatGPT-plus的出现引起了广泛关注,因为它在对话生成方面展现出了更加出色的表现和能力。在本文中&am…

蒙特卡洛方法【强化学习】

强化学习笔记 主要基于b站西湖大学赵世钰老师的【强化学习的数学原理】课程,个人觉得赵老师的课件深入浅出,很适合入门. 第一章 强化学习基本概念 第二章 贝尔曼方程 第三章 贝尔曼最优方程 第四章 值迭代和策略迭代 第五章 强化学习实践—GridWorld 第…

make/makefile学习

文章目录 1、makefile函数1.1、字符串替换函数:subst1.2、模式字符串替换函数:patsubst1.3、去空格函数:strip1.4、查找字符串函数:findstring 2、、:、?区别 1、makefile函数 1.1、字符串替换函数:subst …

《QT实用小工具·二十》存款/贷款计算器

1、概述 源码放在文章末尾 该项目实现了用于存款和贷款的计算器的功能&#xff0c;如下图所示&#xff1a; 项目部分代码如下&#xff1a; #ifndef WIDGET_H #define WIDGET_H#include <QWidget>namespace Ui { class Widget; }class Widget : public QWidget {Q_OBJ…

构造函数,原型对象,对象实例 以及原型链的关系

当我们使用构造函数new的方式创建实例对象&#xff0c;此时构造函数的.prototype属性就是实例对象的原型对象&#xff0c;实例对象可以通过.__proto__来访问到原型对象&#xff0c;同时实例对象会继承原型对象的属性方法。 构造函数和原型对象的关系其实是被包含的关系&#x…

【电子通识】普通电阻、敏感电阻、可调电阻的种类和特点

电阻的作用 在【分立元件】理解电阻 中我们知道电阻是在电路中对电流产生阻碍作用的元件。电阻是电子产品中最基本、最常用的电子元件之一。 有各产品的电路板中基本都有电阻器&#xff0c;通常起限流、滤波或分压等作用。实际上&#xff0c;电阻器的种类很多&#xff0c;根据其…

中服云数字孪生平台 3.0 版全新升级!不仅更好看,而且更好用!

近日&#xff0c;中服云数字孪生平台迎来版本升级。中服云数字孪生平台 3.0 版&#xff0c;以物联网平台 数据中台为基础&#xff0c;以 2D/3D 为展示形式&#xff0c;旨在打造一个从设备数据到孪生应用的一站式数智化平台。 中服云数字孪生平台架构 新版升级 功能急速迭代 …

4.Hexo 页面属性和模板设置

Frontmatter frontmatter基本上是可以定义的有关不同文件的信息&#xff0c;本质上是元数据 frontmatter是我们可以分配给每个内容页面的信息 在Hexo中创建文件时&#xff0c;Hexo主题可以使用该信息以不同的方式显示该内容 当在Hexo创建了一个文件&#xff0c;在source文件夹…