ECCV2024|AIGC(图像生成,视频生成,3D生成等)相关论文汇总(附论文链接/开源代码)【持续更新】

ECCV2024|AIGC相关论文汇总(如果觉得有帮助,欢迎点赞和收藏)

  • Awesome-ECCV2024-AIGC
  • 1.图像生成(Image Generation/Image Synthesis)
      • Accelerating Diffusion Sampling with Optimized Time Steps
      • AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation
      • A Watermark-Conditioned Diffusion Model for IP Protection
      • BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion
      • ComFusion: Personalized Subject Generation in Multiple Specific Scenes From Single Image
      • Data Augmentation for Saliency Prediction via Latent Diffusion
      • Defect Spectrum: A Granular Look of Large-Scale Defect Datasets with Rich Semantics
      • DiffFAS: Face Anti-Spoofing via Generative Diffusion Models
      • DiffiT: Diffusion Vision Transformers for Image Generation
      • Large-scale Reinforcement Learning for Diffusion Models
      • MasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation
      • Memory-Efficient Fine-Tuning for Quantized Diffusion Model
      • OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models
      • Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts
  • 2.图像编辑(Image Editing)
      • A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting
      • BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion
      • FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing
      • StableDrag: Stable Dragging for Point-based Image Editing
      • TinyBeauty: Toward Tiny and High-quality Facial Makeup with Data Amplify Learning
  • 3.视频生成(Video Generation/Video Synthesis)
      • Audio-Synchronized Visual Animation
      • Dyadic Interaction Modeling for Social Behavior Generation
      • EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis
      • FreeInit : Bridging Initialization Gap in Video Diffusion Models
      • MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
      • ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
  • 4.视频编辑(Video Editing)
      • Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
      • DragAnything: Motion Control for Anything using Entity Representation
  • 5.3D生成(3D Generation/3D Synthesis)
      • EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion
      • GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes
      • GVGEN:Text-to-3D Generation with Volumetric Representation
      • Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM
      • ParCo: Part-Coordinating Text-to-Motion Synthesis
      • Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models
  • 6.3D编辑(3D Editing)
      • Gaussian Grouping: Segment and Edit Anything in 3D Scenes
      • SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer
      • Texture-GS: Disentangling the Geometry and Texture for 3D Gaussian Splatting Editing
  • 7.多模态大语言模型(Multi-Modal Large Language Models)
      • An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
      • ControlCap: Controllable Region-level Captioning
      • DriveLM: Driving with Graph Visual Question Answering
      • Elysium: Exploring Object-level Perception in Videos via MLLM
      • Empowering Multimodal Large Language Model as a Powerful Data Generator
      • GiT: Towards Generalist Vision Transformer through Universal Language Interface
      • How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs
      • Long-CLIP: Unlocking the Long-Text Capability of CLIP
      • MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
      • Merlin:Empowering Multimodal LLMs with Foresight Minds
      • Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs
      • MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
      • PointLLM: Empowering Large Language Models to Understand Point Clouds
      • R2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations
      • SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation
      • ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
      • ST-LLM: Large Language Models Are Effective Temporal Learners
      • TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias
      • UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
  • 8.其他任务(Others)
  • 参考
  • 相关整理

Awesome-ECCV2024-AIGC

A Collection of Papers and Codes for ECCV2024 AIGC

整理汇总下2024年ECCV AIGC相关的论文和代码,具体如下。

欢迎star,fork和PR~
优先在Github更新:Awesome-ECCV2024-AIGC,欢迎star~
知乎:https://zhuanlan.zhihu.com/p/706699484

参考或转载请注明出处

ECCV2024官网:https://eccv.ecva.net/

ECCV接收论文列表:

ECCV完整论文库:

开会时间:2024年9月29日-10月4日

论文接收公布时间:2024年

【Contents】

  • 1.图像生成(Image Generation/Image Synthesis)
  • 2.图像编辑(Image Editing)
  • 3.视频生成(Video Generation/Image Synthesis)
  • 4.视频编辑(Video Editing)
  • 5.3D生成(3D Generation/3D Synthesis)
  • 6.3D编辑(3D Editing)
  • 7.多模态大语言模型(Multi-Modal Large Language Model)
  • 8.其他多任务(Others)

1.图像生成(Image Generation/Image Synthesis)

Accelerating Diffusion Sampling with Optimized Time Steps

  • Paper: https://arxiv.org/abs/2402.17376
  • Code: https://github.com/scxue/DM-NonUniform

AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation

  • Paper: https://arxiv.org/abs/2406.18958
  • Code: https://github.com/open-mmlab/AnyControl

A Watermark-Conditioned Diffusion Model for IP Protection

  • Paper:
  • Code: https://github.com/rmin2000/WaDiff

BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion

  • Paper: https://arxiv.org/abs/2404.04544
  • Code: https://github.com/gwang-kim/BeyondScene

ComFusion: Personalized Subject Generation in Multiple Specific Scenes From Single Image

  • Paper: https://arxiv.org/abs/2402.11849
  • Code:

Data Augmentation for Saliency Prediction via Latent Diffusion

  • Paper:
  • Code: https://github.com/IVRL/AugSal

Defect Spectrum: A Granular Look of Large-Scale Defect Datasets with Rich Semantics

  • Paper: https://arxiv.org/abs/2310.17316
  • Code: https://github.com/EnVision-Research/Defect_Spectrum

DiffFAS: Face Anti-Spoofing via Generative Diffusion Models

  • Paper:
  • Code: https://github.com/murphytju/DiffFAS

DiffiT: Diffusion Vision Transformers for Image Generation

  • Paper: https://arxiv.org/abs/2312.02139
  • Code: https://github.com/NVlabs/DiffiT

Large-scale Reinforcement Learning for Diffusion Models

  • Paper: https://arxiv.org/abs/2401.12244
  • Code: https://github.com/pinterest/atg-research/tree/main/joint-rl-diffusion

MasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation

  • Paper: https://arxiv.org/abs/2405.05806
  • Code: https://github.com/csyxwei/MasterWeaver

Memory-Efficient Fine-Tuning for Quantized Diffusion Model

  • Paper:
  • Code: https://github.com/ugonfor/TuneQDM

OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models

  • Paper: https://arxiv.org/abs/2403.10983
  • Code: https://github.com/kongzhecn/OMG

Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts

  • Paper: https://arxiv.org/abs/2403.09176
  • Code: https://github.com/byeongjun-park/Switch-DiT

2.图像编辑(Image Editing)

A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting

  • Paper: https://arxiv.org/abs/2312.03594
  • Code: https://github.com/open-mmlab/PowerPaint

BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion

  • Paper: https://arxiv.org/abs/2403.06976
  • Code: https://github.com/TencentARC/BrushNet

FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing

  • Paper:
  • Code: https://github.com/kookie12/FlexiEdit

StableDrag: Stable Dragging for Point-based Image Editing

  • Paper: https://arxiv.org/abs/2403.04437
  • Code:

TinyBeauty: Toward Tiny and High-quality Facial Makeup with Data Amplify Learning

  • Paper: https://arxiv.org/abs/2403.15033
  • Code: https://github.com/TinyBeauty/TinyBeauty

3.视频生成(Video Generation/Video Synthesis)

Audio-Synchronized Visual Animation

  • Paper: https://arxiv.org/abs/2403.05659
  • Code: https://github.com/lzhangbj/ASVA

Dyadic Interaction Modeling for Social Behavior Generation

  • Paper: https://arxiv.org/abs/2403.09069
  • Code: https://github.com/Boese0601/Dyadic-Interaction-Modeling

EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis

  • Paper: https://arxiv.org/abs/2404.01647
  • Code: https://github.com/tanshuai0219/EDTalk

FreeInit : Bridging Initialization Gap in Video Diffusion Models

  • Paper: https://arxiv.org/abs/2312.07537
  • Code: https://github.com/TianxingWu/FreeInit

MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

  • Paper: https://arxiv.org/abs/2405.20222
  • Code: https://github.com/MyNiuuu/MOFA-Video

ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video

  • Paper: https://arxiv.org/abs/2310.01324
  • Code:

4.视频编辑(Video Editing)

Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation

  • Paper: https://arxiv.org/abs/2403.13745
  • Code: https://github.com/G-U-N/Be-Your-Outpainter

DragAnything: Motion Control for Anything using Entity Representation

  • Paper: https://arxiv.org/abs/2403.07420
  • Code: https://github.com/showlab/DragAnything

5.3D生成(3D Generation/3D Synthesis)

EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion

  • Paper: https://arxiv.org/abs/2405.00915
  • Code: https://github.com/ymxlzgy/echoscene

GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes

  • Paper: https://arxiv.org/abs/2405.00915
  • Code: https://github.com/ibrahimethemhamamci/GenerateCT

GVGEN:Text-to-3D Generation with Volumetric Representation

  • Paper:
  • Code: https://github.com/SOTAMak1r/GVGEN

Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM

  • Paper: https://arxiv.org/abs/2403.07487
  • Code: https://github.com/steve-zeyu-zhang/MotionMamba

ParCo: Part-Coordinating Text-to-Motion Synthesis

  • Paper: https://arxiv.org/abs/2403.18512
  • Code: https://github.com/qrzou/ParCo

Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models

  • Paper: https://arxiv.org/abs/2311.17050
  • Code: https://github.com/Yzmblog/SurfD

6.3D编辑(3D Editing)

Gaussian Grouping: Segment and Edit Anything in 3D Scenes

  • Paper: https://arxiv.org/abs/2312.00732
  • Code: https://github.com/lkeab/gaussian-grouping

SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer

  • Paper: https://arxiv.org/abs/2403.18512
  • Code: https://github.com/JarrentWu1031/SC4D

Texture-GS: Disentangling the Geometry and Texture for 3D Gaussian Splatting Editing

  • Paper: https://arxiv.org/abs/2403.10050
  • Code: https://github.com/slothfulxtx/Texture-GS

7.多模态大语言模型(Multi-Modal Large Language Models)

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

  • Paper: https://arxiv.org/abs/2403.06764
  • Code: https://github.com/pkunlp-icler/FastV

ControlCap: Controllable Region-level Captioning

  • Paper: https://arxiv.org/abs/2401.17910
  • Code: https://github.com/callsys/ControlCap

DriveLM: Driving with Graph Visual Question Answering

  • Paper: https://arxiv.org/abs/2312.14150
  • Code: https://github.com/OpenDriveLab/DriveLM

Elysium: Exploring Object-level Perception in Videos via MLLM

  • Paper: https://arxiv.org/abs/2403.16558
  • Code: https://github.com/Hon-Wong/Elysium

Empowering Multimodal Large Language Model as a Powerful Data Generator

  • Paper:
  • Code: https://github.com/zhaohengyuan1/Genixer

GiT: Towards Generalist Vision Transformer through Universal Language Interface

  • Paper: https://arxiv.org/abs/2403.09394
  • Code: https://github.com/Haiyang-W/GiT

How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

  • Paper: https://arxiv.org/abs/2311.17600
  • Code: https://github.com/UCSC-VLAA/vllm-safety-benchmark

Long-CLIP: Unlocking the Long-Text Capability of CLIP

  • Paper: https://arxiv.org/abs/2403.15378
  • Code: https://github.com/beichenzbc/Long-CLIP

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

  • Paper: https://arxiv.org/abs/2403.14624
  • Code: https://github.com/ZrrSkywalker/MathVerse

Merlin:Empowering Multimodal LLMs with Foresight Minds

  • Paper: https://arxiv.org/abs/2312.00589
  • Code: https://github.com/Ahnsun/merlin

Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs

  • Paper: https://arxiv.org/abs/2403.11755
  • Code: https://github.com/jmiemirza/Meta-Prompting

MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models

  • Paper: https://arxiv.org/abs/2403.14624
  • Code: https://github.com/isXinLiu/MM-SafetyBench

PointLLM: Empowering Large Language Models to Understand Point Clouds

  • Paper: https://arxiv.org/abs/2308.16911
  • Code: https://github.com/OpenRobotLab/PointLLM

R2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations

  • Paper: https://arxiv.org/abs/2403.04924
  • Code: https://github.com/lxa9867/r2bench

SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation

  • Paper:
  • Code: https://github.com/AI-Application-and-Integration-Lab/SAM4MLLM

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

  • Paper: https://arxiv.org/abs/2311.12793
  • Code: https://github.com/ShareGPT4Omni/ShareGPT4V

ST-LLM: Large Language Models Are Effective Temporal Learners

  • Paper: https://arxiv.org/abs/2404.00308
  • Code: https://github.com/TencentARC/ST-LLM

TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias

  • Paper: https://arxiv.org/abs/2404.00384
  • Code: https://github.com/shjo-april/TTD

UniIR: Training and Benchmarking Universal Multimodal Information Retrievers

  • Paper: https://arxiv.org/abs/2311.17136
  • Code: https://github.com/TIGER-AI-Lab/UniIR

8.其他任务(Others)

持续更新~

参考

相关整理

  • Awesome-CVPR2024-AIGC
  • Awesome-AIGC-Research-Groups
  • Awesome-Low-Level-Vision-Research-Groups
  • Awesome-CVPR2024-CVPR2021-CVPR2020-Low-Level-Vision
  • Awesome-ECCV2020-Low-Level-Vision

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/bicheng/40650.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Objection 对命令的批量操作

假定现在需要对好多不同的类进行批量hook,逐个hook非常繁琐,那么可以要将这些hook的类放到一个文件里,并且在这些类的前面加上hook命令,内容如下 使用如下命令执行该文件中的命令 objection -g 测试 explore -c d:/hookData/toHoo…

git上传本地单独修改的文件_git 只推送变化的文件

git上传本地单独修改的文件_git 只推送变化的文件-CSDN博客 只推送本地修改的文件,这篇文章方法可行。

昇思25天学习打卡营第13天|ResNet50图像分类

1. 学习内容复盘 图像分类是最基础的计算机视觉应用,属于有监督学习类别,如给定一张图像(猫、狗、飞机、汽车等等),判断图像所属的类别。本章将介绍使用ResNet50网络对CIFAR-10数据集进行分类。 ResNet网络介绍 ResNet50网络是2015年由微软…

传承与创新,想让认字更简单?就来看《米小圈动画汉字》吧!

汉字,作为中华文化的精髓和根基,自古以来便承载着中华民族的思想与记忆。在现代社会,随着文化多样性的崛起和科技进步的推动,汉字的教育也更加的多元化,《米小圈动画汉字》作为一项全新的教育资源,不仅致力…

Postman介绍

Postman 是一款流行的 API 开发和测试工具,它提供了一个直观的用户界面,使开发者可以轻松地构建、测试和修改 HTTP 请求。Postman 不仅适用于测试人员,也广泛应用于开发人员、产品经理和API设计者中,以确保API的正确性和性能。 以…

删除账户相关信息

功能需求 获取正确的待删除账户名杀死系统中正在运行的属于该账户的进程确认系统中属于该账户的所有文件删除该账户 1. 获取正确的待删除账户名 #让用户输入账户名 read -t 10 -p "please input account name: " accountif [ -z $account ] thenecho "account…

【python基础】—calendar模块

文章目录 前言一、calendar模块方法1.firstweekday()2.setfirstweekday(firstweekday)3.isleap(year)4.leapdays(y1, y2)5.weekday(year, month, day)6.monthrange(year, month)7.weekheader(n)8.monthcalendar(year, month)9.prmonth(theyear, themonth, w0, l0)10.prcal(year…

【硬核科普】存算一体化系统(Processing-in-Memory, PIM)深入解析

文章目录 0. 前言1. 提出背景1.1 存储墙1.2 功耗墙 2. 架构方案2.1 核心特征2.2 技术实现2.2.1 电流模式2.2.2 电压模式2.2.3 模式选择 2.3 PIM方案优势 3. 应用场景4. 典型产品4.1 鸿图H304.2 三星HBM-PIM 5. 存算一体化缺点6. 总结 0. 前言 按照国际惯例,首先声明…

c++类模板及应用

文章目录 为什么要有函数模板一般实现举例类模板举例 继承中类模板的使用特殊情况 友元函数模板类和静态成员类模板实践 为什么要有函数模板 项目需求: 实现多个函数用来返回两个数的最大值,要求能支持char类型、int类型、double 一般实现举例 类模板举例 继承中类…

如视“VR+AI”实力闪耀2024世界人工智能大会

7月4日,2024世界人工智能大会暨人工智能全球治理高级别会议(以下简称为“WAIC 2024”)在上海盛大开幕,本届大会由外交部、国家发展和改革委员会、教育部等部门共同主办,围绕“以共商促共享 以善治促善智”主题&#xf…

什么是构造函数?Java 中构造函数的重载如何实现?

构造函数,就像是建筑房屋时的奠基仪式,是Java类中一个特殊的方法,主要用于初始化新创建的对象。 每当创建一个类的新实例时,构造函数就会自动调用,负责为这个新对象分配内存,并对其进行必要的设置&#xf…

【PythonGIS】基于Geopandas和Shapely计算矢量面最短路径

在GIS进行空间分析时经常会需要计算最短路径,我也是最近在计算DPC的时候有这方面的需求,刚开始直接是用面的中心点求得距离,但其对不规则或空洞面很不友好。所以今天跟大家分享一下基于Geopandas和Shapely计算矢量面最短路径,这里的最短即点/边的最短! 原创作者:RS迷途小…

mysql查询父级树

WITH RECURSIVE parents AS (SELECT id, parent_idFROM t_departmentWHERE id 10004154UNION ALLSELECT c.id, c.parent_idFROM t_department cINNER JOIN parents p ON c.id p.parent_id ) SELECT parent_id FROM parents;

【SSL 1823】消灭怪物(非传统BFS)

题目大意 小b现在玩一个极其无聊的游戏,它控制角色从基地出发,一路狂奔夺走了对方的水晶,可是正准备回城时,发现地图上已经生成了 n n n 个怪。 现在假设地图是二维平面,所有的怪和角色都认为是在这个二维平面的点上…

【算法训练记录——Day41】

Day41——动态规划Ⅲ 1.理论基础——代码随想录2.纯01背包_[kamacoder46](https://kamacoder.com/problempage.php?pid1046)3.leetcode_416分割等和子集 背包!! 1.理论基础——代码随想录 主要掌握01背包和完全背包 物品数量: 只有一个 ——…

农作物生长环境的远程监控与智能调控

农作物生长环境的远程监控与智能调控 农作物生长环境的远程监控与智能调控技术,作为现代农业科技的核心组成部分,正逐步革新传统农业的生产模式,推动农业向精准化、智能化转型。这一技术体系综合应用了物联网、大数据、云计算以及人工智能等…

龙芯杯个人赛记录

惊觉8.5就是个人赛ddl,啥都不会和没做,打算对着《cpu设计实战》和B站视频走。

chrome 谷歌浏览器插件打包

1、找到id对应的字符串去搜索 C:\Users\<你的用户名>\AppData\Local\Google\Chrome\User Data\Default\Extensions2、选择根目录 直接加载下面的路径扩展可用&#xff1a;

Python酷库之旅-第三方库Pandas(001)

目录 一、Pandas库的由来 1、背景与起源 1-1、开发背景 1-2、起源时间 2、名称由来 3、发展历程 4、功能与特点 4-1、数据结构 4-2、数据处理能力 5、影响与地位 5-1、数据分析“三剑客”之一 5-2、社区支持 二、Pandas库的应用场景 1、数据分析 2、数据清洗 3…

7月2日PythonDay1

阶段一阶段导学 测试人员为什么要学习编程&#xff1f; Python是一门快速增长的计算机编程语言 白盒测试、自动化测试、测试开发 为什么学习Python&#xff1f; 相对于其他编程语言更简单 语言开源并且免费 使用人群广泛 应用领域广泛 学习目标 掌握python基础语法&…