arXiv学术速递笔记12.8

文章目录

  • 一、GSGFormer: Generative Social Graph Transformer for Multimodal Pedestrian Trajectory Prediction(GSGFormer:用于多通道行人轨迹预测的产生式社会图转换器)
  • 二、AnimateZero: Video Diffusion Models are Zero-Shot Image Animators(AnimateZero:视频扩散模型是Zero-Shot图像动画师)
  • 三、Camera Height Doesn't Change: Unsupervised Monocular Scale-Aware Road-Scene Depth Estimation(摄像机高度不变:无监督单目尺度感知道路场景深度估计)
  • 四、On the Robustness of Large Multimodal Models Against Image Adversarial Attacks(大型多模模型对图像攻击的稳健性研究)
  • 五、Towards Knowledge-driven Autonomous Driving(走向知识驱动的自动驾驶)
  • 六、Receding Horizon Re-ordering of Multi-Agent Execution Schedules
  • 参考文献

一、GSGFormer: Generative Social Graph Transformer for Multimodal Pedestrian Trajectory Prediction(GSGFormer:用于多通道行人轨迹预测的产生式社会图转换器)

标题: GSGFormer:用于多通道行人轨迹预测的产生式社会图转换器
链接: https://arxiv.org/abs/2312.04479
作者: Zhongchang Luo,Marion Robin,Pavan Vasishta
摘要: 行人轨迹预测对于自动驾驶汽车和具有社会意识的机器人至关重要,由于行人、他们的环境和其他弱势道路使用者之间的复杂互动,因此非常复杂。本文介绍了GSGFormer,一个创新的生成模型,善于预测行人轨迹,考虑这些复杂的相互作用,并提供了大量的潜在的模态行为。我们结合了一个异构的图神经网络来捕捉行人,语义地图和潜在目的地之间的交互。Transformer模块提取时间特征,而我们新的CVAE残差GMM模块促进了多样化的行为模态生成。通过对多个公共数据集的评估,GSGFormer不仅在数据充足的情况下优于领先的方法,而且在数据有限的情况下仍然具有竞争力。
摘要: Pedestrian trajectory prediction, vital for selfdriving cars and socially-aware robots, is complicated due to intricate interactions between pedestrians, their environment, and other Vulnerable Road Users. This paper presents GSGFormer, an innovative generative model adept at predicting pedestrian trajectories by considering these complex interactions and offering a plethora of potential modal behaviors. We incorporate a heterogeneous graph neural network to capture interactions between pedestrians, semantic maps, and potential destinations. The Transformer module extracts temporal features, while our novel CVAE-Residual-GMM module promotes diverse behavioral modality generation. Through evaluations on multiple public datasets, GSGFormer not only outperforms leading methods with ample data but also remains competitive when data is limited.

二、AnimateZero: Video Diffusion Models are Zero-Shot Image Animators(AnimateZero:视频扩散模型是Zero-Shot图像动画师)

标题: AnimateZero:视频扩散模型是Zero-Shot图像动画师
链接: https://arxiv.org/abs/2312.03793
作者: Jiwen Yu,Xiaodong Cun,Chenyang Qi,Yong Zhang,Xintao Wang,Ying Shan,Jian Zhang
备注: Project Page: this https URL
摘要: 近年来,大规模文本到视频(T2V)扩散模型在视觉质量、运动和时间一致性方面取得了很大进展。然而,生成过程仍然是一个黑盒子,其中所有属性(例如,外观、运动)被联合地学习和生成,除了粗略的文本描述之外,没有精确的控制能力。受图像动画的启发,将视频作为一个特定的外观与相应的运动相结合,我们提出了AnimateZero来揭示预先训练的文本到视频扩散模型,即:AnimateDiff,并为其提供更精确的外观和运动控制能力。对于外观控制,我们从文本到图像(T2I)生成中借用中间潜伏期及其特征,以确保生成的第一帧等于给定的生成图像。对于时间控制,我们将原始T2V模型的全局时间注意力替换为我们提出的位置校正窗口注意力,以确保其他帧与第一帧对齐。通过所提出的方法,AnimateZero可以成功地控制生成进度,而无需进一步的训练。作为给定图像的zero-shot图像动画制作器,AnimateZero还支持多个新应用,包括交互式视频生成和真实图像动画。详细的实验证明了该方法在T2V及相关应用中的有效性。
摘要:Large-scale text-to-video (T2V) diffusion models have great progress in recent years in terms of visual quality, motion and temporal consistency. However, the generation process is still a black box, where all attributes (e.g., appearance, motion) are learned and generated jointly without precise control ability other than rough text descriptions. Inspired by image animation which decouples the video as one specific appearance with the corresponding motion, we propose AnimateZero to unveil the pre-trained text-to-video diffusion model, i.e., AnimateDiff, and provide more precise appearance and motion control abilities for it. For appearance control, we borrow intermediate latents and their features from the text-to-image (T2I) generation for ensuring the generated first frame is equal to the given generated image. For temporal control, we replace the global temporal attention of the original T2V model with our proposed positional-corrected window attention to ensure other frames align with the first frame well. Empowered by the proposed methods, AnimateZero can successfully control the generating progress without further training. As a zero-shot image animator for given images, AnimateZero also enables multiple new applications, including interactive video generation and real image animation. The detailed experiments demonstrate the effectiveness of the proposed method in both T2V and related applications.

三、Camera Height Doesn’t Change: Unsupervised Monocular Scale-Aware Road-Scene Depth Estimation(摄像机高度不变:无监督单目尺度感知道路场景深度估计)

标题: 摄像机高度不变:无监督单目尺度感知道路场景深度估计
链接: https://arxiv.org/abs/2312.04530
作者: Genki Kinoshita,Ko Nishino
摘要: 单目深度估计器要么需要通过辅助传感器进行明确的尺度监督,要么存在尺度模糊性,这使得它们难以在下游应用中部署。缩放的一个可能来源是场景中发现的对象的大小,但不准确的定位使它们难以利用。在本文中,我们介绍了一种新的尺度感知的单目深度估计方法,称为StableCamH,不需要任何辅助传感器或监督。其关键思想是利用场景中物体高度的先验知识,但将高度线索聚合成道路视频序列中所有帧共同的单个不变测度,即相机高度。通过将单目深度估计公式化为相机高度优化,我们实现了鲁棒且准确的无监督端到端训练。为了实现StableCamH,我们设计了一种新的基于学习的尺寸先验,可以直接将汽车外观转换为尺寸。在KITTI和Cityscapes上的大量实验表明了StableCamH的有效性,与相关方法相比,其最先进的准确性及其通用性。StableCamH的训练框架可用于任何单目深度估计方法,并有望成为进一步工作的基本构建块。
摘要:Monocular depth estimators either require explicit scale supervision through auxiliary sensors or suffer from scale ambiguity, which renders them difficult to deploy in downstream applications. A possible source of scale is the sizes of objects found in the scene, but inaccurate localization makes them difficult to exploit. In this paper, we introduce a novel scale-aware monocular depth estimation method called StableCamH that does not require any auxiliary sensor or supervision. The key idea is to exploit prior knowledge of object heights in the scene but aggregate the height cues into a single invariant measure common to all frames in a road video sequence, namely the camera height. By formulating monocular depth estimation as camera height optimization, we achieve robust and accurate unsupervised end-to-end training. To realize StableCamH, we devise a novel learning-based size prior that can directly convert car appearance into its dimensions. Extensive experiments on KITTI and Cityscapes show the effectiveness of StableCamH, its state-of-the-art accuracy compared with related methods, and its generalizability. The training framework of StableCamH can be used for any monocular depth estimation method and will hopefully become a fundamental building block for further work.

四、On the Robustness of Large Multimodal Models Against Image Adversarial Attacks(大型多模模型对图像攻击的稳健性研究)

标题: 大型多模态模型对图像攻击的稳健性研究
链接: https://arxiv.org/abs/2312.03777
作者: Xuanimng Cui,Alejandro Aparcedo,Young Kyun Jang,Ser-Nam Lim
摘要: 指令调优方面的最新进展导致了最先进的大型多模态模型(Large Multimodal Models,LMM)的发展。鉴于这些模型的新颖性,视觉对抗性攻击对LMM的影响尚未得到彻底研究。我们全面研究了各种Linux对不同对抗性攻击的鲁棒性,评估了包括图像分类、图像字幕和视觉问答(VQA)在内的任务。我们发现,在一般情况下,Lebron是不鲁棒的视觉对抗性输入。然而,我们的研究结果表明,通过提示向模型提供的上下文,例如QA对中的问题,有助于减轻视觉对抗输入的影响。值得注意的是,评估的Lencil在ScienceQA任务中表现出了对此类攻击的出色弹性,与视觉同行相比,性能仅下降了8.10%,而视觉同行下降了99.73%。我们还提出了一种新的方法来现实世界的图像分类,我们术语查询分解。通过将存在查询纳入我们的输入提示中,我们观察到攻击有效性降低和图像分类准确性提高。这项研究突出了LMM鲁棒性的一个以前未被充分探索的方面,并为未来旨在加强多模态系统在对抗环境中的弹性的工作奠定了基础。
摘要:Recent advances in instruction tuning have led to the development of State-of-the-Art Large Multimodal Models (LMMs). Given the novelty of these models, the impact of visual adversarial attacks on LMMs has not been thoroughly examined. We conduct a comprehensive study of the robustness of various LMMs against different adversarial attacks, evaluated across tasks including image classification, image captioning, and Visual Question Answer (VQA). We find that in general LMMs are not robust to visual adversarial inputs. However, our findings suggest that context provided to the model via prompts, such as questions in a QA pair helps to mitigate the effects of visual adversarial inputs. Notably, the LMMs evaluated demonstrated remarkable resilience to such attacks on the ScienceQA task with only an 8.10% drop in performance compared to their visual counterparts which dropped 99.73%. We also propose a new approach to real-world image classification which we term query decomposition. By incorporating existence queries into our input prompt we observe diminished attack effectiveness and improvements in image classification accuracy. This research highlights a previously under-explored facet of LMM robustness and sets the stage for future work aimed at strengthening the resilience of multimodal systems in adversarial environments.

五、Towards Knowledge-driven Autonomous Driving(走向知识驱动的自动驾驶)

标题: 走向知识驱动的自动驾驶
链接: https://arxiv.org/abs/2312.04316
作者: Xin Li,Yeqi Bai,Pinlong Cai,Licheng Wen,Daocheng Fu,Bo Zhang,Xuemeng Yang,Xinyu Cai,Tao Ma,Jianfei Guo,Xing Gao,Min Dou,Botian Shi,Yong Liu,Liang He,Yu Qiao
摘要: 本文探讨了新兴的知识驱动的自动驾驶技术。我们的调查强调了当前自动驾驶系统的局限性,特别是它们对数据偏差的敏感性,处理长尾场景的困难以及缺乏可解释性。知识驱动的方法具有认知、概括和终身学习的能力,是克服这些挑战的一种有前途的方法。本文深入研究了知识驱动的自动驾驶的本质,并研究了其核心组件:数据集\基准,环境和驱动程序代理。通过利用大型语言模型、世界模型、神经渲染和其他先进的人工智能技术,这些组件共同为更全面、自适应和智能的自动驾驶系统做出了贡献。本文系统地整理和回顾了这一领域的研究成果,并为自动驾驶的未来研究和实际应用提供了见解和指导。我们将持续分享知识驱动自动驾驶领域的最新发展动态以及相关的宝贵开源资源,网址为:https//github.com/PJLab-ADG/awesome-knowledge-driven-AD。
摘要:This paper explores the emerging knowledge-driven autonomous driving technologies. Our investigation highlights the limitations of current autonomous driving systems, in particular their sensitivity to data bias, difficulty in handling long-tail scenarios, and lack of interpretability. Conversely, knowledge-driven methods with the abilities of cognition, generalization and life-long learning emerge as a promising way to overcome these challenges. This paper delves into the essence of knowledge-driven autonomous driving and examines its core components: dataset & benchmark, environment, and driver agent. By leveraging large language models, world models, neural rendering, and other advanced artificial intelligence techniques, these components collectively contribute to a more holistic, adaptive, and intelligent autonomous driving system. The paper systematically organizes and reviews previous research efforts in this area, and provides insights and guidance for future research and practical applications of autonomous driving. We will continually share the latest updates on cutting-edge developments in knowledge-driven autonomous driving along with the relevant valuable open-source resources at: \url{https://github.com/PJLab-ADG/awesome-knowledge-driven-AD}.

六、Receding Horizon Re-ordering of Multi-Agent Execution Schedules

标题: 多智能体执行调度的后退视界重排序
*链接: *https://arxiv.org/abs/2312.04190

作者: Alexander Berndt,Niels van Duijkeren,Luigi Palmieri,Alexander Kleiner,Tamás Keviczky
备注: IEEE Transactions on Robotics (T-Ro) preprint, 17 pages, 32 figures
摘要: 在路线图上为自动引导车辆(AGV)车队进行轨迹规划通常被称为多智能体路径查找(MAPF)问题,该问题的解决方案决定了每个AGV的空间和时间位置,直到它到达目标而不发生碰撞。在动态调度中执行MAPF计划时,AGV可能会频繁延迟,例如,由于遇到人类或第三方车辆。如果其余的AGV继续遵循各自的计划,则车队的同步性会丢失,并且某些AGV可能会以与原始计划不同的顺序通过路线图交叉点。虽然这可以减少AGV的累计路径完成时间,但通常,原始顺序的更改可能会导致冲突,例如死锁。因此,在实践中,通常通过使用MAPF执行策略来强制同步,该MAPF执行策略采用例如,一个动作依赖图(ADG)来维持顺序。为了在不引入死锁的情况下安全地重新排序,我们提出了可切换动作依赖图(SADG)的概念。使用的SADG,我们制定了一个相对低维的混合线性规划(MILP),反复重新排序AGV在递归可行的方式,从而保持无死锁的保证,同时动态地最小化所有AGV的累计路线完成时间。各种模拟验证了我们的方法相比,原始ADG方法以及强大的MAPF解决方案的方法的效率。
摘要: The trajectory planning for a fleet of Automated Guided Vehicles (AGVs) on a roadmap is commonly referred to as the Multi-Agent Path Finding (MAPF) problem, the solution to which dictates each AGV’s spatial and temporal location until it reaches it’s goal without collision. When executing MAPF plans in dynamic workspaces, AGVs can be frequently delayed, e.g., due to encounters with humans or third-party vehicles. If the remainder of the AGVs keeps following their individual plans, synchrony of the fleet is lost and some AGVs may pass through roadmap intersections in a different order than originally planned. Although this could reduce the cumulative route completion time of the AGVs, generally, a change in the original ordering can cause conflicts such as deadlocks. In practice, synchrony is therefore often enforced by using a MAPF execution policy employing, e.g., an Action Dependency Graph (ADG) to maintain ordering. To safely re-order without introducing deadlocks, we present the concept of the Switchable Action Dependency Graph (SADG). Using the SADG, we formulate a comparatively low-dimensional Mixed-Integer Linear Program (MILP) that repeatedly re-orders AGVs in a recursively feasible manner, thus maintaining deadlock-free guarantees, while dynamically minimizing the cumulative route completion time of all AGVs. Various simulations validate the efficiency of our approach when compared to the original ADG method as well as robust MAPF solution approaches.

参考文献

  • 计算机视觉与模式识别学术速递[12.8]

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/216376.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

数据库中常用的锁

目录 1、数据库中常用的锁类型 2、常见的数据库 3、以MySQL为例 3.1 MySQL的事务 3.2 MySQL事务的四大特性 1. 原子性(Atomicity) 2. 一致性(Consistency) 3. 隔离性(Isolation) ⭐mysql中的事务隔…

《C++新经典设计模式》之第12章 状态模式

《C新经典设计模式》之第12章 状态模式 状态模式.cpp 状态模式.cpp #include <iostream> #include <memory> using namespace std;// 用类表示状态, 一般用于有限状态机 // 状态机3部分组成&#xff1a;状态&#xff08;State&#xff09;、事件&#xff08;Event…

temu上传产品的素材哪里找

在为Temu&#xff08;拼多多跨境电商平台&#xff09;上传产品时&#xff0c;您需要准备一些高质量的素材&#xff0c;包括图片和视频。这些素材对于吸引用户的注意力、展示产品的特点以及提高购买意愿非常重要。但是&#xff0c;很多卖家都不知道从哪里找到这些素材。本文将为…

【Deeplearning4j】小小的了解下深度学习

文章目录 1. 起因2. Deeplearning4j是什么3. 相关基本概念4. Maven依赖5. 跑起来了&#xff0c;小例子&#xff01;6. 鸢尾花分类代码 7. 波士顿房价 回归预测代码 8. 参考资料 1. 起因 其实一直对这些什么深度学习&#xff0c;神经网络很感兴趣&#xff0c;之前也尝试过可能因…

SQL Sever 方式做牛客SQL的题目--SQL254

----SQL254 统计salary的累计和running_total 按照salary的累计和running_total&#xff0c;其中running_total为前N个当前( to_date ‘9999-01-01’)员工的salary累计和&#xff0c;其他以此类推。 输出顺序&#xff1a;emp_no salary running_total Demo展示&#xff1a; e…

Unity-小工具-LookAt

Unity-小工具-LookAt &#x1f959;介绍 &#x1f959;介绍 &#x1f4a1;通过扩展方法调用 gameObject.LookAtTarget&#xff0c;让物体转向目标位置 &#x1f4a1;gameObject.StopLookat 停止更新 &#x1f4a1;可以在调用时传入自动停止标记&#xff0c;等转向目标位置后自…

基于Python的教学流程自动化的设计与实现

&#x1f680;&#x1f680; 基于Python的教学流程自动化的设计与实现 Design and Implementation of Teaching Process Automation Based on Python 目录 目录 2 摘要 3 关键词 4 第一章 引言 4 1.1 研究背景与意义 4 1.2 研究目的 6 1.3 相关工作 7 1.4 论文结构 8 第二章 Py…

.net 洋葱模型

洋葱架构 内层部分比外层更抽象(内层接口&#xff0c;外层实现)。外层的代码只能调用内层的代码&#xff0c;内层的代码可以通过依赖注入的形式来间接调用外层的代码 简单的例子&#xff0c;引用依赖图 demo 接口类库 EmailInfo using System; using System.Collections.…

Python安装包(模块)的八种方法,Python初学者必备知识点

文章目录 1. 使用 easy\_install2. 使用 pip install3. 使用 pipx4. 使用 setup.py5. 使用 yum6. 使用 pipenv7. 使用 poetry8. 使用 curl 管道关于Python技术储备一、Python所有方向的学习路线二、Python基础学习视频三、精品Python学习书籍四、Python工具包项目源码合集①Py…

轻量封装WebGPU渲染系统示例<44>- 材质组装流水线(MaterialPipeline)之灯光和阴影(源码)

目标: 数据化&#xff0c;模块化&#xff0c;自动化 备注: 从这个节点开始整体设计往系统规范的方向靠拢。之前的都算作是若干准备。所以会和之前的版本实现有些差异。 当前示例源码github地址: https://github.com/vilyLei/voxwebgpu/blob/feature/material/src/voxgpu/sa…

apt-get update失败

一、先验证是否有网络 rootlocalhost:~# ping www.baidu.com ping: www.baidu.com: Temporary failure in name resolution rootlocalhost:~# 说明没有网&#xff0c;参考&#xff1a;https://blog.csdn.net/qq_43445867/article/details/132384031 sudo vim /etc/resolv.con…

代码随想录二刷 | 二叉树 |404.左叶子之和

代码随想录二刷 &#xff5c; 二叉树 &#xff5c;404.左叶子之和 题目描述解题思路递归法迭代法 代码实现递归法迭代法 题目描述 404.左叶子之和 给定二叉树的根节点 root &#xff0c;返回所有左叶子之和。 示例 1&#xff1a; 输入: root [3,9,20,null,null,15,7] 输出…

使用node实现链接数据库并对数据库进行增删改查的后端接口

环境 node npm 编辑器 vscode 项目配置 新建目录 用vscode打开 终端输入 npm init -y npm install mysql npm install express 代码 安装好之后的代码页面 新建 在根目录新建api.js文件 const express require(express); const db require(./db/index); const app…

el-tree包含下级回显半选状态一些问题

上一篇文章如何在el-tree懒加载并且包含下级的情况下进行数据回显-02 说了一下在包含下级的时候&#xff0c;数据的回显&#xff0c;通过nodesMap进行赋值&#xff0c;这次说一下在做这个需求的过程中遇到的一些问题&#xff1a; loadNode(node, resolve) {// 处理回显主要是通…

13、RockerMQ消息类型之广播与集群消息

RocketMq中提供两种消费模式&#xff1a;集群模式和广播模式。 集群模式 集群模式表示同一个消息会被同一个消费组中的消费者消费一次&#xff0c;消息被负载均衡分配到同一个消费者上的多个实例上。 还有另外一种平均的算法是AllocateMessageQueueAveragelyByCircle&#xff…

CSS 的背景属性(开发中常用)

目录 1 内容预览 背景颜色 背景图片 背景平铺 背景图片位置(常用) 背景图像固定 背景复合写法 背景色半透明 实现案例 1 内容预览 背景属性可以设置背景颜色、背景图片、背景平铺、背景图片位置、背景图像固定等。 注意&#xff1a; 把表格中的五个属背下来&#xff0c…

约数性质以及辗转相除法

文章目录 AcWing 869. 试除法求约数题目链接思路CODE AcWing 870. 约数个数题目链接思路CODE AcWing 871. 约数之和题目链接思路CODE AcWing 872. 最大公约数题目链接思路CODE AcWing 869. 试除法求约数 题目链接 https://www.acwing.com/activity/content/problem/content/9…

go集成nacos

1,go集成nacos 注册实例与注销实例 package mainimport ("fmt""github.com/nacos-group/nacos-sdk-go/clients""github.com/nacos-group/nacos-sdk-go/clients/naming_client""github.com/nacos-group/nacos-sdk-go/common/constant"…

【LuatOS】简单案例网页点灯

材料 硬件&#xff1a;合宙ESP32C3简约版&#xff0c;BH1750光照度模块&#xff0c;0.96寸OLED(4P_IIC)&#xff0c;杜邦线若干 接线&#xff1a; ESP32C3.GND — OLED.GND — BH1750.GND ESP32C3.3.3V — OLED.VCC — BH1750.VCC ESP32C3.GPIO5 — OLED.SCL — BH1750.SCL E…

AOP跨模块捕获异常遭CGLIB拦截而继续向上抛出异常

其他系列文章导航 Java基础合集数据结构与算法合集 设计模式合集 多线程合集 分布式合集 ES合集 文章目录 其他系列文章导航 文章目录 前言 一、BUG详情 1.1 报错信息 1.2 接口响应信息 1.3 全局异常处理器的定义 二、排查过程 三、解决方案 四、总结 前言 最近&…