PTQ4SAM、Mamba-Attention、AniTalker、IceFormer、U-DiTs、CogDPM

本文首发于公众号:机器感知

PTQ4SAM、Mamba-Attention、AniTalker、IceFormer、U-DiTs、CogDPM

图片

PTQ4SAM: Post-Training Quantization for Segment Anything

图片

Segment Anything Model (SAM) has achieved impressive performance in many computer vision tasks. However, as a large-scale model, the immense memory and computation costs hinder its practical deployment. In this paper, we propose a post-training quantization (PTQ) framework for Segment Anything Model, namely PTQ4SAM. First, we investigate the inherent bottleneck of SAM quantization attributed to the bimodal distribution in post-Key-Linear activations. We analyze its characteristics from both per-tensor and per-channel perspectives, and propose a Bimodal Integration strategy, which utilizes a mathematically equivalent sign operation to transform the bimodal distribution into a relatively easy-quantized normal distribution offline. Second, SAM encompasses diverse attention mechanisms (i.e., self-attention and two-way cross-attention), resulting in substantial variations in the post-Softmax distributions. Therefore, we introduce an Adaptive Granularity Quantization for Softmax th......

AniTalker: Animate Vivid and Diverse Talking Faces through  Identity-Decoupled Facial Motion Encoding

图片

The paper introduces AniTalker, an innovative framework designed to generate lifelike talking faces from a single portrait. Unlike existing models that primarily focus on verbal cues such as lip synchronization and fail to capture the complex dynamics of facial expressions and nonverbal cues, AniTalker employs a universal motion representation. This innovative representation effectively captures a wide range of facial dynamics, including subtle expressions and head movements. AniTalker enhances motion depiction through two self-supervised learning strategies: the first involves reconstructing target video frames from source frames within the same identity to learn subtle motion representations, and the second develops an identity encoder using metric learning while actively minimizing mutual information between the identity and motion encoders. This approach ensures that the motion representation is dynamic and devoid of identity-specific details, significantly reducing the n......

Matten: Video Generation with Mamba-Attention

图片

In this paper, we introduce Matten, a cutting-edge latent diffusion model with Mamba-Attention architecture for video generation. With minimal computational cost, Matten employs spatial-temporal attention for local video content modeling and bidirectional Mamba for global video content modeling. Our comprehensive experimental evaluation demonstrates that Matten has competitive performance with the current Transformer-based and GAN-based models in benchmark performance, achieving superior FVD scores and efficiency. Additionally, we observe a direct positive correlation between the complexity of our designed model and the improvement in video quality, indicating the excellent scalability of Matten. ......

SMCD: High Realism Motion Style Transfer via Mamba-based Diffusion

图片

Motion style transfer is a significant research direction in multimedia applications. It enables the rapid switching of different styles of the same motion for virtual digital humans, thus vastly increasing the diversity and realism of movements. It is widely applied in multimedia scenarios such as movies, games, and the Metaverse. However, most of the current work in this field adopts the GAN, which may lead to instability and convergence issues, making the final generated motion sequence somewhat chaotic and unable to reflect a highly realistic and natural style. To address these problems, we consider style motion as a condition and propose the Style Motion Conditioned Diffusion (SMCD) framework for the first time, which can more comprehensively learn the style features of motion. Moreover, we apply Mamba model for the first time in the motion style transfer field, introducing the Motion Style Mamba (MSM) module to handle longer motion sequences. Thirdly, aiming at the SMCD......

IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs

图片

One limitation of existing Transformer-based models is that they cannot handle very long sequences as input since their self-attention operations exhibit quadratic time and space complexity. This problem becomes especially acute when Transformers are deployed on hardware platforms equipped only with CPUs. To address this issue, we propose a novel method for accelerating self-attention at inference time that works with pretrained Transformer models out-of-the-box without requiring retraining. We experiment using our method to accelerate various long-sequence Transformers, including a leading LLaMA 2-based LLM, on various benchmarks and demonstrate a greater speedup of 2.73x - 7.63x while retaining 98.6% - 99.6% of the accuracy of the original pretrained models. The code is available on our project website at https://yuzhenmao.github.io/IceFormer/. ......

Efficient Text-driven Motion Generation via Latent Consistency Training

图片

Motion diffusion models have recently proven successful for text-driven human motion generation. Despite their excellent generation performance, they are challenging to infer in real time due to the multi-step sampling mechanism that involves tens or hundreds of repeat function evaluation iterations. To this end, we investigate a motion latent consistency Training (MLCT) for motion generation to alleviate the computation and time consumption during iteration inference. It applies diffusion pipelines to low-dimensional motion latent spaces to mitigate the computational burden of each function evaluation. Explaining the diffusion process with probabilistic flow ordinary differential equation (PF-ODE) theory, the MLCT allows extremely few steps infer between the prior distribution to the motion latent representation distribution via maintaining consistency of the outputs over the trajectory of PF-ODE. Especially, we introduce a quantization constraint to optimize motion latent r......

U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers

图片

Diffusion Transformers (DiTs) introduce the transformer architecture to diffusion tasks for latent-space image generation. With an isotropic architecture that chains a series of transformer blocks, DiTs demonstrate competitive performance and good scalability; but meanwhile, the abandonment of U-Net by DiTs and their following improvements is worth rethinking. To this end, we conduct a simple toy experiment by comparing a U-Net architectured DiT with an isotropic one. It turns out that the U-Net architecture only gain a slight advantage amid the U-Net inductive bias, indicating potential redundancies within the U-Net-style DiT. Inspired by the discovery that U-Net backbone features are low-frequency-dominated, we perform token downsampling on the query-key-value tuple for self-attention and bring further improvements despite a considerable amount of reduction in computation. Based on self-attention with downsampled tokens, we propose a series of U-shaped DiTs (U-DiTs) in the ......

From Generalization Analysis to Optimization Designs for State Space  Models

图片

A State Space Model (SSM) is a foundation model in time series analysis, which has recently been shown as an alternative to transformers in sequence modeling. In this paper, we theoretically study the generalization of SSMs and propose improvements to training algorithms based on the generalization results. Specifically, we give a \textit{data-dependent} generalization bound for SSMs, showing an interplay between the SSM parameters and the temporal dependencies of the training sequences. Leveraging the generalization bound, we (1) set up a scaling rule for model initialization based on the proposed generalization measure, which significantly improves the robustness of the output value scales on SSMs to different temporal patterns in the sequence data; (2) introduce a new regularization method for training SSMs to enhance the generalization performance. Numerical results are conducted to validate our results. ......

CogDPM: Diffusion Probabilistic Models via Cognitive Predictive Coding

图片

Predictive Coding (PC) is a theoretical framework in cognitive science suggesting that the human brain processes cognition through spatiotemporal prediction of the visual world. Existing studies have developed spatiotemporal prediction neural networks based on the PC theory, emulating its two core mechanisms: Correcting predictions from residuals and hierarchical learning. However, these models do not show the enhancement of prediction skills on real-world forecasting tasks and ignore the Precision Weighting mechanism of PC theory. The precision weighting mechanism posits that the brain allocates more attention to signals with lower precision, contributing to the cognitive ability of human brains. This work introduces the Cognitive Diffusion Probabilistic Models (CogDPM), which demonstrate the connection between diffusion probabilistic models and PC theory. CogDPM features a precision estimation method based on the hierarchical sampling capabilities of diffusion models and we......

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/diannao/7846.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Day 24 数据库管理及数据类型

数据库管理及数据类型 一:数据类型 1.数值类型 整数类型 ​ 整数类型:TINYINT SMALLINT MEDIUMINT INT BIGINT ​ 作用:用于存储用户的年龄、游戏的Level、经验值等 浮点数类型 ​ 浮点数类型:FLOAT DOUBLE ​ 作用&#xf…

Docker镜像仓库-在私有镜像仓库推送或拉取镜像

推送镜像到私有仓库,要先让镜像打包 前缀为私有仓库地址的名字: 这里也是打包成功了:docker images 可以查看到 push推送镜像到镜像仓库: docker push 192.168.221.129:8080/nginx:1.0推送成功后在主机访问镜像仓库可以看到 这里已经有个镜像了。而且可…

CyberDemo解读

CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation解读 摘要1. 简介2. Related Work2.1 Data for Learning Robot Manipulation2.2 机器人的预训练视觉表征2.3 Sim2Real Transfer 3. CyberDemo3.1 收集人体遥操作数据3.2 在模拟器中…

工程绘图神器:Origin 2021软件安装与图像demo水印问题解决

目录 引言 正文 01-Origin软件简介 02-Origin软件安装 03-Origin软件复制图像带有水印问题解决 引言 注:本篇软件安装内容引用了微信公众号“软件管家”里的Origin 2021安装教程和…

张大哥笔记:自媒体人10种赚钱方法

很多人都在做自媒体,比如平台广告分成、广告收入、公关宣传、品牌植入、演讲、会员制、出书、线下活动。那么本文介绍了自媒体人10种赚钱方法,供大家参考: 1、打造个人IP 什么是个人IP?在百度百科上是这样解释的:指个…

京东生产环境十万并发秒杀系统三高架构

文章目录 三高——高并发、高可用、高可扩展用数据库乐观锁解决超卖阿里巴巴:为了提升数据库性能,对数据库的源码级别做了改造——在DB内部实现内存队列,一次性接收很多的请求,一次性更新。京东:redis,mq&a…

无线通信基础

这里写目录标题 通信概述什么是无线通信无线通信电磁波 通信概述 什么是无线通信 无线通信 : 是指利用电磁波信号可以在自由空间中传播的特性进行信息交换的一种通信方式 无线通信的关键技术包括调制技术、解调技术、信道编码技术、信号处理技术、天线技术等。这些技术的不断…

医药垃圾分类管理系统|基于SSM医药垃圾分类管理系统的系统设计与实现(源码+数据库+文档)

医药垃圾分类管理系统 目录 基于SSM医药垃圾分类管理系统设计与实现 一、前言 二、系统设计 三、系统功能设计 1系统登录模块 2管理员模块实现 3用户模块实现 四、数据库设计 五、核心代码 六、论文参考 七、最新计算机毕设选题推荐 八、源码获取: 博…

《十九》Qt Http协议及实战

前言 本篇文章来给大家讲解QT中的Http协议,Http协议主要用于网络中数据的请求和响应,那么这篇文章将给大家讲解一下这个协议。 一、HTTP概述 HTTP(超文本传输协议)是互联网上应用最为广泛的协议之一,它定义了客户端…

LM4562NA 直插DIP8双运放 音频hifi运算放大器

LM4562NA是一款高性能音频运算放大器,其应用领域主要集中在音频和声音处理方面,包括但不限于: 1. 专业录音设备:在录音棚、广播电台和电视台等专业环境中,用于信号放大和处理,确保高质量的声音录制和传输…

北邮22级信通院DSP:实验三(1):FFT变换、IFFT变换(附每步8点变换蝶形图)保姆级讲解+用C++程序实现复数域的FFT变换和IFFT变换

北邮22信通一枚~ 跟随课程进度更新北邮信通院DSP的笔记、代码和文章,欢迎关注~ 获取更多文章,请访问专栏: 北邮22级信通院DSP_青山入墨雨如画的博客-CSDN博客 目录 一、预备知识 1.1 FFT算法 1.2.1由DFT到FFT 1.2.2 基2时域抽选算法 …

并发编程之阻塞队列BlockingQueue实战及其原理分析

1. 阻塞队列介绍 1.1 队列 是限定在一端进行插入,另一端进行删除的特殊线性表。 先进先出(FIFO)线性表。 允许出队的一端称为队头,允许入队的一端称为队尾。

使用开放式用户通信连接两台西门子S71200plc

步骤1.在项目中创建两台PLC。 步骤2.分别设置两个PLC的参数。 plc1 plc2 步骤3.对两个plc进行组态 步骤4.在plc1和plc2中各自创建DB块,用于通信。 须在块的属性中取消优化块的访问选项。 plc1 plc2 步骤5.往plc1的main块中编写代码。 步骤6.往plc2的main块中编写…

Markdown 精简教程(胎教级教程)

文章目录 一、关于 Markdown1. 什么是 Markdown?2. 为什么要用 Markdown?3. 怎么用 Markdown?(编辑软件) 二、标题1. 常用标题写法2. 可选标题写法3. 自定义标题 ID4. 注意事项 三、段落四、换行五、字体选项1. 粗体2.…

15.计算机网络

1.物理层的互联设备 中继器 和 集线器 2.集线器可以看做特殊的多路中继器 集线器 不可以做到自动寻址的功能 3.数据链路层 网桥 和 交换机 4.交换机是多端口网桥 5.网络层 路由器 6.应用层 网关 7.广播域 网络层 可以形成多个广播域 冲突域 网络层数据链路层 可以形成多个冲突域…

网络安全--红队资源大合集

目录 相关资源列表 攻防测试手册 内网安全文档 学习靶场 工具包集合 内网安全文档 学习手册相关资源 产品设计文档 版本管理平台漏洞收集 相关工具拓展插件 Kali 环境下拓展插件 Nessus 相关工具拓展插件 Awvs 相关工具拓展插件 红队攻击的生命周期,…

第三节课,前端

一、参考链接; 总 知识星球 | 深度连接铁杆粉丝,运营高品质社群,知识变现的工具 分 2022-03-18 星球直播笔记-用户中心(下) 语雀 二、登录 2.1登录网址 2.2前端页面修改 2.1 页面修改 2.2 页脚的超链接 网址&am…

Linux初识

1.操作系统的那点事 (1)结论:操作系统是作软硬件管理的软件; (2)计算机是操作系统,设备驱动,硬件三个相互结合发挥作用的,操作系统是用来管理硬件的,常见的…

简易录制视频做3D高斯

系统环境 ubuntu20 ,cuda11.8,anaconda配置好了3D高斯的环境。 具体参考3D高斯环境配置:https://blog.csdn.net/Son_of_the_Bronx/article/details/138527329?spm1001.2014.3001.5501 colmap安装:https://blog.csdn.net/Son_of…

嵌入式linux学习第三天汇编语言点灯

嵌入式linux学习第三天汇编语言点灯 今天学习如何在linux板子上点灯。 I.MX6U GPIO 详解 我们发现I.MX6U GPIO是分为两类的,:SNVS 域的和通用的。在讨论i.MX6U或类似的复杂微处理器时,了解其GPIO(通用输入输出)引脚…