Stale Diffusion、Drag Your Noise、PhysReaction、CityGaussian

本文首发于公众号:机器感知

Stale Diffusion、Drag Your Noise、PhysReaction、CityGaussian

Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic  Propagation

图片

Point-based interactive editing serves as an essential tool to complement the controllability of existing generative models. A concurrent work, DragDiffusion, updates the diffusion latent map in response to user inputs, causing global latent map alterations. This results in imprecise preservation of the original content and unsuccessful editing due to gradient vanishing. In contrast, we present DragNoise, offering robust and accelerated editing without retracing the latent map. The core rationale of DragNoise lies in utilizing the predicted noise output of each U-Net as a semantic editor. This approach is grounded in two critical observations: firstly, the bottleneck features of U-Net inherently possess semantically rich features ideal for interactive editing; secondly, high-level semantics, established early in the denoising process, show minimal variation in subsequent stages. Leveraging these insights, DragNoise edits diffusion semantics in a single denoising step and effi......

Stale Diffusion: Hyper-realistic 5D Movie Generation Using Old-school  Methods

图片

Two years ago, Stable Diffusion achieved super-human performance at generating images with super-human numbers of fingers. Following the steady decline of its technical novelty, we propose Stale Diffusion, a method that solidifies and ossifies Stable Diffusion in a maximum-entropy state. Stable Diffusion works analogously to a barn (the Stable) from which an infinite set of horses have escaped (the Diffusion). As the horses have long left the barn, our proposal may be seen as antiquated and irrelevant. Nevertheless, we vigorously defend our claim of novelty by identifying as early adopters of the Slow Science Movement, which will produce extremely important pearls of wisdom in the future. Our speed of contributions can also be seen as a quasi-static implementation of the recent call to pause AI experiments, which we wholeheartedly support. As a result of a careful archaeological expedition to 18-months-old Git commit histories, we found that naturally-accumulating errors have......

PhysReaction: Physically Plausible Real-Time Humanoid Reaction Synthesis  via Forward Dynamics Guided 4D Imitation

图片

Humanoid Reaction Synthesis is pivotal for creating highly interactive and empathetic robots that can seamlessly integrate into human environments, enhancing the way we live, work, and communicate. However, it is difficult to learn the diverse interaction patterns of multiple humans and generate physically plausible reactions. The kinematics-based approaches face challenges, including issues like floating feet, sliding, penetration, and other problems that defy physical plausibility. The existing physics-based method often relies on kinematics-based methods to generate reference states, which struggle with the challenges posed by kinematic noise during action execution. Constrained by their reliance on diffusion models, these methods are unable to achieve real-time inference. In this work, we propose a Forward Dynamics Guided 4D Imitation method to generate physically plausible human-like reactions. The learned policy is capable of generating physically plausible and human-li......

HairFastGAN: Realistic and Robust Hair Transfer with a Fast  Encoder-Based Approach

图片

Our paper addresses the complex task of transferring a hairstyle from a reference image to an input photo for virtual hair try-on. This task is challenging due to the need to adapt to various photo poses, the sensitivity of hairstyles, and the lack of objective metrics. The current state of the art hairstyle transfer methods use an optimization process for different parts of the approach, making them inexcusably slow. At the same time, faster encoder-based models are of very low quality because they either operate in StyleGAN's W+ space or use other low-dimensional image generators. Additionally, both approaches have a problem with hairstyle transfer when the source pose is very different from the target pose, because they either don't consider the pose at all or deal with it inefficiently. In our paper, we present the HairFast model, which uniquely solves these problems and achieves high resolution, near real-time performance, and superior reconstruction compared to optimiza......

CityGaussian: Real-time High-quality Large-Scale Scene Rendering with  Gaussians

图片

The advancement of real-time 3D scene reconstruction and novel view synthesis has been significantly propelled by 3D Gaussian Splatting (3DGS). However, effectively training large-scale 3DGS and rendering it in real-time across various scales remains challenging. This paper introduces CityGaussian (CityGS), which employs a novel divide-and-conquer training approach and Level-of-Detail (LoD) strategy for efficient large-scale 3DGS training and rendering. Specifically, the global scene prior and adaptive training data selection enables efficient training and seamless fusion. Based on fused Gaussian primitives, we generate different detail levels through compression, and realize fast rendering across various scales through the proposed block-wise detail levels selection and aggregation strategy. Extensive experimental results on large-scale scenes demonstrate that our approach attains state-of-theart rendering quality, enabling consistent real-time rendering of largescale scenes......

Condition-Aware Neural Network for Controlled Image Generation

图片

We present Condition-Aware Neural Network (CAN), a new method for adding control to image generative models. In parallel to prior conditional control methods, CAN controls the image generation process by dynamically manipulating the weight of the neural network. This is achieved by introducing a condition-aware weight generation module that generates conditional weight for convolution/linear layers based on the input condition. We test CAN on class-conditional image generation on ImageNet and text-to-image generation on COCO. CAN consistently delivers significant improvements for diffusion transformer models, including DiT and UViT. In particular, CAN combined with EfficientViT (CaT) achieves 2.78 FID on ImageNet 512x512, surpassing DiT-XL/2 while requiring 52x fewer MACs per sampling step. ......

Uncovering the Text Embedding in Text-to-Image Diffusion Models

图片

The correspondence between input text and the generated image exhibits opacity, wherein minor textual modifications can induce substantial deviations in the generated image. While, text embedding, as the pivotal intermediary between text and images, remains relatively underexplored. In this paper, we address this research gap by delving into the text embedding space, unleashing its capacity for controllable image editing and explicable semantic direction attributes within a learning-free framework. Specifically, we identify two critical insights regarding the importance of per-word embedding and their contextual correlations within text embedding, providing instructive principles for learning-free image editing. Additionally, we find that text embedding inherently possesses diverse semantic potentials, and further reveal this property through the lens of singular value decomposition (SVD). These uncovered properties offer practical utility for image editing and semantic disco......

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

图片

One of the key shortcomings in current text-to-image (T2I) models is their inability to consistently generate images which faithfully follow the spatial relationships specified in the text prompt. In this paper, we offer a comprehensive investigation of this limitation, while also developing datasets and methods that achieve state-of-the-art performance. First, we find that current vision-language datasets do not represent spatial relationships well enough; to alleviate this bottleneck, we create SPRIGHT, the first spatially-focused, large scale dataset, by re-captioning 6 million images from 4 widely used vision datasets. Through a 3-fold evaluation and analysis pipeline, we find that SPRIGHT largely improves upon existing datasets in capturing spatial relationships. To demonstrate its efficacy, we leverage only ~0.25% of SPRIGHT and achieve a 22% improvement in generating spatially accurate images while also improving the FID and CMMD scores. Secondly, we find that training......

Video Interpolation with Diffusion Models

图片

We present VIDIM, a generative model for video interpolation, which creates short videos given a start and end frame. In order to achieve high fidelity and generate motions unseen in the input data, VIDIM uses cascaded diffusion models to first generate the target video at low resolution, and then generate the high-resolution video conditioned on the low-resolution generated video. We compare VIDIM to previous state-of-the-art methods on video interpolation, and demonstrate how such works fail in most settings where the underlying motion is complex, nonlinear, or ambiguous while VIDIM can easily handle such cases. We additionally demonstrate how classifier-free guidance on the start and end frame and conditioning the super-resolution model on the original high-resolution frames without additional parameters unlocks high-fidelity results. VIDIM is fast to sample from as it jointly denoises all the frames to be generated, requires less than a billion parameters per diffusion mo......

StructLDM: Structured Latent Diffusion for 3D Human Generation

图片

Recent 3D human generative models have achieved remarkable progress by learning 3D-aware GANs from 2D images. However, existing 3D human generative methods model humans in a compact 1D latent space, ignoring the articulated structure and semantics of human body topology. In this paper, we explore more expressive and higher-dimensional latent space for 3D human modeling and propose StructLDM, a diffusion-based unconditional 3D human generative model, which is learned from 2D images. StructLDM solves the challenges imposed due to the high-dimensional growth of latent space with three key designs: 1) A semantic structured latent space defined on the dense surface manifold of a statistical human body template. 2) A structured 3D-aware auto-decoder that factorizes the global latent space into several semantic body parts parameterized by a set of conditional structured local NeRFs anchored to the body template, which embeds the properties learned from the 2D training data and can b......

A Unified and Interpretable Emotion Representation and Expression  Generation

图片

Canonical emotions, such as happy, sad, and fearful, are easy to understand and annotate. However, emotions are often compound, e.g. happily surprised, and can be mapped to the action units (AUs) used for expressing emotions, and trivially to the canonical ones. Intuitively, emotions are continuous as represented by the arousal-valence (AV) model. An interpretable unification of these four modalities - namely, Canonical, Compound, AUs, and AV - is highly desirable, for a better representation and understanding of emotions. However, such unification remains to be unknown in the current literature. In this work, we propose an interpretable and unified emotion model, referred as C2A2. We also develop a method that leverages labels of the non-unified models to annotate the novel unified one. Finally, we modify the text-conditional diffusion models to understand continuous numbers, which are then used to generate continuous expressions using our unified emotion model. Through quan......

Large Motion Model for Unified Multi-Modal Motion Generation

图片

Human motion generation, a cornerstone technique in animation and video production, has widespread applications in various tasks like text-to-motion and music-to-dance. Previous works focus on developing specialist models tailored for each task without scalability. In this work, we present Large Motion Model (LMM), a motion-centric, multi-modal framework that unifies mainstream motion generation tasks into a generalist model. A unified motion model is appealing since it can leverage a wide range of motion data to achieve broad generalization beyond a single task. However, it is also challenging due to the heterogeneous nature of substantially different motion data and tasks. LMM tackles these challenges from three principled aspects: 1) Data: We consolidate datasets with different modalities, formats and tasks into a comprehensive yet unified motion generation dataset, MotionVerse, comprising 10 tasks, 16 datasets, a total of 320k sequences, and 100 million frames. 2) Archite......

CosmicMan: A Text-to-Image Foundation Model for Humans

图片

We present CosmicMan, a text-to-image foundation model specialized for generating high-fidelity human images. Unlike current general-purpose foundation models that are stuck in the dilemma of inferior quality and text-image misalignment for humans, CosmicMan enables generating photo-realistic human images with meticulous appearance, reasonable structure, and precise text-image alignment with detailed dense descriptions. At the heart of CosmicMan's success are the new reflections and perspectives on data and models: (1) We found that data quality and a scalable data production flow are essential for the final results from trained models. Hence, we propose a new data production paradigm, Annotate Anyone, which serves as a perpetual data flywheel to produce high-quality data with accurate yet cost-effective annotations over time. Based on this, we constructed a large-scale dataset, CosmicMan-HQ 1.0, with 6 Million high-quality real-world human images in a mean resolution of 1488......

MagicMirror: Fast and High-Quality Avatar Generation with a Constrained  Search Space

图片

We introduce a novel framework for 3D human avatar generation and personalization, leveraging text prompts to enhance user engagement and customization. Central to our approach are key innovations aimed at overcoming the challenges in photo-realistic avatar synthesis. Firstly, we utilize a conditional Neural Radiance Fields (NeRF) model, trained on a large-scale unannotated multi-view dataset, to create a versatile initial solution space that accelerates and diversifies avatar generation. Secondly, we develop a geometric prior, leveraging the capabilities of Text-to-Image Diffusion Models, to ensure superior view invariance and enable direct optimization of avatar geometry. These foundational ideas are complemented by our optimization pipeline built on Variational Score Distillation (VSD), which mitigates texture loss and over-saturation issues. As supported by our extensive experiments, these strategies collectively enable the creation of custom avatars with unparalleled vis......

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/796275.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

vite打包失败 - out of memory

在做项目时&#xff0c;随着需求的不断增加&#xff0c;我们的代码文件会越来越大&#xff0c;但是在打包时&#xff0c;在 Node 中通过 JavaScript 使用内存的大小却是有限制的。于是&#xff0c;今天打算部署代码时&#xff0c;报错了: <--- JS stacktrace ---> JS st…

Nuxt 3 项目中配置 Tailwind CSS

官方文档&#xff1a;https://www.tailwindcss.cn/docs/guides/nuxtjs#standard 安装 Tailwind CSS 及其相关依赖 执行如下命令&#xff0c;在 Nuxt 项目中安装 Tailwind CSS 及其相关依赖 npm install -D tailwindcss postcss autoprefixerpnpm install -D tailwindcss post…

VUE必知必会

一、简介 Vue.js是一个流行的JavaScript框架&#xff0c;用于构建用户界面和单页应用程序&#xff08;SPA&#xff09;。自2014年由前Google工程师尤雨溪发布以来&#xff0c;Vue迅速获得了广泛的关注和使用&#xff0c;特别是在前端开发领域。 核心特性 响应式数据绑定&#…

【cpp】快速排序优化

标题&#xff1a;【cpp】快速排序 水墨不写bug 正文开始&#xff1a; 快速排序的局限性&#xff1a; 虽然快速排序是一种高效的排序算法&#xff0c;但也存在一些局限性&#xff1a; 最坏情况下的时间复杂度&#xff1a;如果选择的基准元素不合适&#xff0c;或者数组中存在大…

vue基本写法

<p style"margin-left:.0001pt;text-align:justify;">Vue.js 是一种流行的 JavaScript 框架&#xff0c;用于构建用户界面。下面是 Vue.js 的一些标准写法和最佳实践:</p> 1. Vue 实例&#xff1a; 创建 Vue 实例时&#xff0c;可以指定一些选项来定义应…

Netty 3 - 组件和设计

这里将回顾我们之前章节讲到过的主要概念和组件。 1 Channel 、EventLoop和ChannelFuture Channel —— Socket;EventLoop —— 控制流、多线程处理、并发;ChannelFuture —— 异步通知。 1.1 Channel 接口 基本的I/O操作&#xff08;bind()、connect()、read()和write()&a…

【嵌入式开发 Linux 常用命令系列 4.3 -- git add 不 add untracked file】

请阅读【嵌入式开发学习必备专栏 】 文章目录 git add 不add untracked file git add 不add untracked file 如果你想要Git在执行git add .时不添加未跟踪的文件&#xff08;untracked files&#xff09;&#xff0c;你可以使用以下命令&#xff1a; git add -u这个命令只会加…

boost共享内存使用(3)managed_shared_memory共享内存分配器

文章目录 概述使用示例 概述 Boost.Interprocess提供了一些基本的类来创建共享内存对象和文件映射&#xff0c;并将这些可映射的类映射到进程的地址空间中。 然而&#xff0c;管理这些内存段对于非平凡的任务来说并不容易。一个映射区域是一个固定长度的内存缓冲区&#xff0…

免注册,ChatGPT可即时访问了!

AI又有啥进展&#xff1f;一起看看吧 Apple进军个人家用机器人 Apple在放弃自动驾驶汽车项目并推出混合现实头显后&#xff0c;正在进军个人机器人领域&#xff0c;处于开发家用环境机器人的早期阶段 报告中提到了两种可能的机器人设计。一种是移动机器人&#xff0c;可以跟…

鸿蒙OS元服务开发:【(Stage模型)学习窗口沉浸式能力】

一、体验窗口沉浸式能力说明 在看视频、玩游戏等场景下&#xff0c;用户往往希望隐藏状态栏、导航栏等不必要的系统窗口&#xff0c;从而获得更佳的沉浸式体验。此时可以借助窗口沉浸式能力&#xff08;窗口沉浸式能力都是针对应用主窗口而言的&#xff09;&#xff0c;达到预…

二叉堆解读

在数据结构和算法中&#xff0c;二叉堆是一种非常重要的数据结构&#xff0c;它被广泛用于实现优先队列、堆排序等场景。本文将介绍二叉堆的基本概念、性质、操作以及应用场景。 一、基本概念 二叉堆是一种特殊的完全二叉树&#xff0c;它满足堆性质&#xff1a;对于每个节点…

练习题(2024/4/3)

1题目描述&#xff1a; 给定两个大小分别为 m 和 n 的正序&#xff08;从小到大&#xff09;数组 nums1 和 nums2。请你找出并返回这两个正序数组的 中位数 。 算法的时间复杂度应该为 O(log (mn)) 。 示例 1&#xff1a; 输入&#xff1a;nums1 [1,3], nums2 [2] 输出&…

Redis Hash结构操作

基础篇Redis 6.4 Hash结构操作 在基础篇的最后&#xff0c;咱们对Hash结构操作一下&#xff0c;收一个小尾巴&#xff0c;这个代码咱们就不再解释啦 马上就开始新的篇章~~~进入到我们的Redis实战篇 SpringBootTest class RedisStringTests {Autowiredprivate StringRedisTe…

电子商务平台中大数据的应用|主流电商平台大数据采集API接口

(一)电商平台物流管理中大数据的应用 电商平台订单详情订单列表物流信息API接口应用 电子商务企业对射频识别设备、条形码扫描设备、全球定位系统及销售网站、交通、库存等管理软件数据进行实时或近实时的分析研究,提高物流速度和准确性。部分电商平台已建立高效的物流配送网…

什么是Java中的分布式系统?举例说明

在Java中&#xff0c;分布式系统是由一组通过网络进行通信、为了完成共同的任务而协调工作的计算机节点组成的系统。这种系统架构的目的是利用更多的机器处理更多的数据&#xff0c;从而解决单个计算机无法应对的计算、存储任务。 当单个节点的处理能力无法满足日益增长的计算…

【STL】vector的底层原理及其实现

vector的介绍 vector是一个可变的数组序列容器。 1.vector的底层实际上就是一个数组。因此vector也可以采用连续存储空间来存储元素。也可以直接用下标访问vector的元素。我们完全可以把它就当成一个自定义类型的数组使用。 2.除了可以直接用下标访问元素&#xff0c;vector还…

掌握数据相关性新利器:基于R、Python的Copula变量相关性分析及AI大模型应用探索

在工程、水文和金融等各学科的研究中&#xff0c;总是会遇到很多变量&#xff0c;研究这些相互纠缠的变量间的相关关系是各学科的研究的重点。虽然皮尔逊相关、秩相关等相关系数提供了变量间相关关系的粗略结果&#xff0c;但这些系数都存在着无法克服的困难。例如&#xff0c;…

使用预训练的bert large model实现问答系统源码(本地实现 question answer system)

pre-trained bert model 预训练好的Bert模型 本地实现问答系统 用这条命令将bert下载到本地&#xff1a; model.save_pretrained("path/to/model") 具体代码 如下链接&#xff1a; https://download.csdn.net/download/qqqweiweiqq/89092005

解决win7作为虚拟机无法复制粘贴共享文件的问题

win7作为虚拟机经常会出现无法与主机的剪切板共享、文件共享。 归根结底是win7虚拟机里面没有安装VMware Tools 能够成功安装vmware tools的条件&#xff1a; 1&#xff09;win7版本为win7 sp1及以上 2&#xff09;安装KB4490628&#xff0c;KB4474419补丁 因此下面来详细介绍…

【LeetCode题解】2192. 有向无环图中一个节点的所有祖先+1026. 节点与其祖先之间的最大差值

文章目录 [2192. 有向无环图中一个节点的所有祖先](https://leetcode.cn/problems/all-ancestors-of-a-node-in-a-directed-acyclic-graph/)思路&#xff1a;BFS记忆化搜索代码&#xff1a; 思路&#xff1a;逆向DFS代码&#xff1a; [1026. 节点与其祖先之间的最大差值](https…