LD-Pruner、EdgeFusion(On-Device T2I)、FreeDiff、TextCenGen、MemLLM

本文首发于公众号:机器感知

https://mp.weixin.qq.com/s/KiyNfwYWU-wBiCO-hE9qkA

图片

The devil is in the object boundary: towards annotation-free instance  segmentation using Foundation Models

图片

Foundation models, pre-trained on a large amount of data have demonstrated impressive zero-shot capabilities in various downstream tasks. However, in object detection and instance segmentation, two fundamental computer vision tasks heavily reliant on extensive human annotations, foundation models such as SAM and DINO struggle to achieve satisfactory performance. In this study, we reveal that the devil is in the object boundary, \textit{i.e.}, these foundation models fail to discern boundaries between individual objects. For the first time, we probe that CLIP, which has never accessed any instance-level annotations, can provide a highly beneficial and strong instance-level boundary prior in the clustering results of its particular intermediate layer. Following this surprising observation, we propose $\textbf{Zip}$ which $\textbf{Z}$ips up CL$\textbf{ip}$ and SAM in a novel classification-first-then-discovery pipeline, enabling annotation-free, complex-scene-capable, open-vocab......

LD-Pruner: Efficient Pruning of Latent Diffusion Models using  Task-Agnostic Insights

图片

Latent Diffusion Models (LDMs) have emerged as powerful generative models, known for delivering remarkable results under constrained computational resources. However, deploying LDMs on resource-limited devices remains a complex issue, presenting challenges such as memory consumption and inference speed. To address this issue, we introduce LD-Pruner, a novel performance-preserving structured pruning method for compressing LDMs. Traditional pruning methods for deep neural networks are not tailored to the unique characteristics of LDMs, such as the high computational cost of training and the absence of a fast, straightforward and task-agnostic method for evaluating model performance. Our method tackles these challenges by leveraging the latent space during the pruning process, enabling us to effectively quantify the impact of pruning on model performance, independently of the task at hand. This targeted pruning of components with minimal impact on the output allows for faster co......

EdgeFusion: On-Device Text-to-Image Generation

The intensive computational burden of Stable Diffusion (SD) for text-to-image generation poses a significant hurdle for its practical application. To tackle this challenge, recent research focuses on methods to reduce sampling steps, such as Latent Consistency Model (LCM), and on employing architectural optimizations, including pruning and knowledge distillation. Diverging from existing approaches, we uniquely start with a compact SD variant, BK-SDM. We observe that directly applying LCM to BK-SDM with commonly used crawled datasets yields unsatisfactory results. It leads us to develop two strategies: (1) leveraging high-quality image-text pairs from leading generative models and (2) designing an advanced distillation process tailored for LCM. Through our thorough exploration of quantization, profiling, and on-device deployment, we achieve rapid generation of photo-realistic, text-aligned images in just two steps, with latency under one second on resource-limited edge devices......

SKIP: Skill-Localized Prompt Tuning for Inference Speed Boost-Up

图片

Prompt-tuning methods have shown comparable performance as parameter-efficient fine-tuning (PEFT) methods in various natural language understanding tasks. However, existing prompt tuning methods still utilize the entire model architecture; thus, they fail to accelerate inference speed in the application. In this paper, we propose a novel approach called SKIll-localized Prompt tuning (SKIP), which is extremely efficient in inference time. Our method significantly enhances inference efficiency by investigating and utilizing a skill-localized subnetwork in a language model. Surprisingly, our method improves the inference speed up to 160% while pruning 52% of the parameters. Furthermore, we demonstrate that our method is applicable across various transformer-based architectures, thereby confirming its practicality and scalability. ......

TriForce: Lossless Acceleration of Long Sequence Generation with  Hierarchical Speculative Decoding

图片

With large language models (LLMs) widely deployed in long content generation recently, there has emerged an increasing demand for efficient long-sequence inference support. However, key-value (KV) cache, which is stored to avoid re-computation, has emerged as a critical bottleneck by growing linearly in size with the sequence length. Due to the auto-regressive nature of LLMs, the entire KV cache will be loaded for every generated token, resulting in low utilization of computational cores and high latency. While various compression methods for KV cache have been proposed to alleviate this issue, they suffer from degradation in generation quality. We introduce TriForce, a hierarchical speculative decoding system that is scalable to long sequence generation. This approach leverages the original model weights and dynamic sparse KV cache via retrieval as a draft model, which serves as an intermediate layer in the hierarchy and is further speculated by a smaller model to reduce its......

FreeDiff: Progressive Frequency Truncation for Image Editing with  Diffusion Models

图片

Precise image editing with text-to-image models has attracted increasing interest due to their remarkable generative capabilities and user-friendly nature. However, such attempts face the pivotal challenge of misalignment between the intended precise editing target regions and the broader area impacted by the guidance in practice. Despite excellent methods leveraging attention mechanisms that have been developed to refine the editing guidance, these approaches necessitate modifications through complex network architecture and are limited to specific editing tasks. In this work, we re-examine the diffusion process and misalignment problem from a frequency perspective, revealing that, due to the power law of natural images and the decaying noise schedule, the denoising network primarily recovers low-frequency image components during the earlier timesteps and thus brings excessive low-frequency signals for editing. Leveraging this insight, we introduce a novel fine-tuning free a......

TextCenGen: Attention-Guided Text-Centric Background Adaptation for  Text-to-Image Generation

图片

Recent advancements in Text-to-image (T2I) generation have witnessed a shift from adapting text to fixed backgrounds to creating images around text. Traditional approaches are often limited to generate layouts within static images for effective text placement. Our proposed approach, TextCenGen, introduces a dynamic adaptation of the blank region for text-friendly image generation, emphasizing text-centric design and visual harmony generation. Our method employs force-directed attention guidance in T2I models to generate images that strategically reserve whitespace for pre-defined text areas, even for text or icons at the golden ratio. Observing how cross-attention maps affect object placement, we detect and repel conflicting objects using a force-directed graph approach, combined with a Spatial Excluding Cross-Attention Constraint for smooth attention in whitespace areas. As a novel task in graphic design, experiments indicate that TextCenGen outperforms existing methods with......

Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale  Approach

图片

The emergence of attention-based transformer models has led to their extensive use in various tasks, due to their superior generalization and transfer properties. Recent research has demonstrated that such models, when prompted appropriately, are excellent for few-shot inference. However, such techniques are under-explored for dense prediction tasks like semantic segmentation. In this work, we examine the effectiveness of prompting a transformer-decoder with learned visual prompts for the generalized few-shot segmentation (GFSS) task. Our goal is to achieve strong performance not only on novel categories with limited examples, but also to retain performance on base categories. We propose an approach to learn visual prompts with limited examples. These learned visual prompts are used to prompt a multiscale transformer decoder to facilitate accurate dense predictions. Additionally, we introduce a unidirectional causal attention mechanism between the novel prompts, learned with ......

MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory

图片

While current large language models (LLMs) demonstrate some capabilities in knowledge-intensive tasks, they are limited by relying on their parameters as an implicit storage mechanism. As a result, they struggle with infrequent knowledge and temporal degradation. In addition, the uninterpretable nature of parametric memorization makes it challenging to understand and prevent hallucination. Parametric memory pools and model editing are only partial solutions. Retrieval Augmented Generation (RAG) $\unicode{x2013}$ though non-parametric $\unicode{x2013}$ has its own limitations: it lacks structure, complicates interpretability and makes it hard to effectively manage stored knowledge. In this paper, we introduce MemLLM, a novel method of enhancing LLMs by integrating a structured and explicit read-and-write memory module. MemLLM tackles the aforementioned challenges by enabling dynamic interaction with the memory and improving the LLM's capabilities in using stored knowledge. Our......

SNP: Structured Neuron-level Pruning to Preserve Attention Scores

图片

Multi-head self-attention (MSA) is a key component of Vision Transformers (ViTs), which have achieved great success in various vision tasks. However, their high computational cost and memory footprint hinder their deployment on resource-constrained devices. Conventional pruning approaches can only compress and accelerate the MSA module using head pruning, although the head is not an atomic unit. To address this issue, we propose a novel graph-aware neuron-level pruning method, Structured Neuron-level Pruning (SNP). SNP prunes neurons with less informative attention scores and eliminates redundancy among heads. Specifically, it prunes graphically connected query and key layers having the least informative attention scores while preserving the overall attention scores. Value layers, which can be pruned independently, are pruned to eliminate inter-head redundancy. Our proposed method effectively compresses and accelerates Transformer-based models for both edge devices and server......

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/diannao/576.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

# 从浅入深 学习 SpringCloud 微服务架构(三)注册中心 Eureka(1)

从浅入深 学习 SpringCloud 微服务架构(三)注册中心 Eureka(1) 段子手168 1、微服务的注册中心 注册中心可以说是微服务架构中的”通讯录”,它记录了服务和服务地址的映射关系。 在分布式架构中服务会注册到这里&am…

Docker使用教程及docker部署Vue项目

什么是Docker及其工作原理 虚拟化技术Docker是什么?三大基本术语核心算法原理和具体操作步骤 Docker和传统虚拟化技术区别为什么使用Docker?Docker有什么作用?1.解决应用部署的环境问题遇到问题达到效果 2.容器化 docker的各种命令解释运行机…

6.GodotCanvasItem、Node2D及自定义节点

CanvasItem节点 CanvasItem节点,CanvasItem -> Node,所以CanvasItem继承了Node的所有功能Canvas是画布的意思,所以CanvasItem代表了就是可以被绘制的节点,可以设置可视化界面和材质的颜色所有的2D节点和GUI节点都继承于CanvasI…

网络行为分析与异常检测

构建防火墙和使用简单的安全解决方案不足以保护网络免受网络异常或攻击,因为DDoS攻击、未知恶意软件和其他安全威胁一直在上升,改变了网络安全格局。网络管理员必须积极主动地分析网络,获得对网络的完全控制,并全面了解网络流量活…

访问云平台中linux系统图形化界面,登录就出现黑屏的问题解决(ubuntu图形界面)

目录 一、问题-图形化界面访问黑屏 二、系统环境 (一)网络结构示意图 (二)内部机器版本 三、分析 四、解决过程 (一)通过MobaXterm远程访问图形化界面(未成功) 1、连接方法 2、连接结果 &#xf…

acwing-y总基础课算法笔记整理

技巧 vector, 变长数组&#xff0c;倍增的思想size() 返回元素个数 capacity() 容量empty() 返回是否为空clear() 清空front()/back()push_back()/pop_back()begin()/end()[]支持比较运算&#xff0c;按字典序pair<int, int>first, 第一个元素second, 第二个元素支持…

pt格式文件转engine小记【yolov5-6.0版本】

背景 项目是使用yolov5-6.0的版本&#xff0c;需要加一个新模型进去&#xff0c;yolov5提供的类别有很多&#xff0c;我这里使用chair椅子。第一步就是先把提供的pt文件转化为tensorrt所需要的engine格式的文件&#xff0c;在官网上有提供转换方法。&#xff08;似乎高版本的y…

查看apk是64位32位(三种方法)

通过检查APK文件&#xff0c;你可以确定该APK支持的架构类型&#xff0c;包括它是为64位&#xff08;例如arm64-v8a、x86_64&#xff09;还是32位&#xff08;例如armeabi-v7a、x86&#xff09;架构准备的。Android应用程序可以包含多个不同的二进制文件&#xff0c;每个文件针…

1097 矩阵行平移(语文题,选做)

输入样例&#xff1a; 7 2 99 11 87 23 67 20 75 89 37 94 27 91 63 50 11 44 38 50 26 40 26 24 73 85 63 28 62 18 68 15 83 27 97 88 25 43 23 78 98 20 30 81 99 77 36 48 59 25 34 22 输出样例&#xff1a; 529 481 479 263 417 342 343 样例解读 需要平移的是第 1、…

【Java】常见锁策略 CAS机制 锁优化策略

前言 在本文会详细介绍各种锁策略、CAS机制以及锁优化策略 不仅仅局限于Java&#xff0c;任何和锁相关的话题&#xff0c;都可能会涉及到下面的内容。 这些特性主要是给锁的实现者来参考的. 普通的程序猿也需要了解一些, 对于合理的使用锁也是有很大帮助的 文章目录 前言✍一、…

Spring Boot 2.x 将 logback 1.2.x 升级至 1.3.x

场景 安全部门针对代码进行漏洞扫描时&#xff0c;发现 logback-core 和 logback-classic 都属于 1.2.x 版本&#xff0c;这个版本存在 CVE 漏洞&#xff0c;并且建议升级到 1.3.x 版本。 问题 将两个包直接升级到 1.3.x 版本时&#xff0c;Spring Boot Web 服务启动直接出现…

CNN卷积神经网络之LeNet-5原理与实战

文章目录 CNN卷积神经网络之LeNet-5原理与实战1、LeNet-5网络结构&#xff1a;1.1、LeNet-5由两个部分组成&#xff1a;1.2、模型单元结构&#xff1a;1.3、数据的传输&#xff1a; 2、LeNet-5网络参数详解&#xff1a; CNN卷积神经网络之LeNet-5原理与实战 1、LeNet-5网络结构…

MySql8快速迁移版的制作过程

首先说明&#xff0c;mysql 8的安装不同与mysql5.x。 做程序的朋友都知道&#xff0c;程序好做&#xff0c;客户难伺候&#xff0c;因为限于用户的情况&#xff0c;如何能让用户把程序运行起来很关键&#xff0c;比如日前我在做 山东高中信息技术 学考 考前练习 系统时&#x…

VirtualBox虚拟机使用win11系统,忘记密码如何重置密码

1. 点击重启同时按住Shift&#xff08;按住不放&#xff09; 2. 直到出现下面的界面&#xff0c;释放Shift&#xff0c;并进入疑难解答 3. 进入高级选项 4. 进入命令提示符 5. 发现当前是在X盘&#xff1f; 6. 进入C:\Windows\System32 c: cd Windows\System32 7. 备份osk.exe…

27个必备的Python技巧,你一定要知道!

目 录 01. 为什么使用缩进来分组语句&#xff1f; Guido van Rossum 认为使用缩进进行分组非常优雅&#xff0c;并且大大提高了普通 Python 程序的清晰度。大多数人在一段时间后就学会并喜欢上这个功能。 由于没有开始/结束括号&#xff0c;因此解析器感知的分组与人…

C++能不能调用C语言的动态库?

能。C当然可以调用C语言编写的动态库。 在C和C开发中&#xff0c;有许多知名的C库被广泛使用&#xff0c;比如C标准库、zlib、libcurl、sqlite、cairo等等等等。这些库在C项目中经常会被用到。在开始前我有一些资料&#xff0c;是我根据网友给的问题精心整理了一份「C语言的资…

Rust入门-引用借用

一、引用借用&#xff0c;是什么、为什么、怎么用 所有权上篇我们已经讨论过了&#xff0c;所以这篇我们讨论Rust的引用借用 1、引用借用 是什么&#xff1f; Rust 通过借用(Borrowing) 这个概念来达成上述的目的&#xff0c;获取变量的引用&#xff0c;称之为借用(borrowin…

维护SQLite的私有分支(二十六)

返回&#xff1a;SQLite—系列文章目录 上一篇&#xff1a;SQLite、MySQL 和 PostgreSQL 数据库速度比较&#xff08;本文阐述时间很早比较&#xff0c;不具有最新参考性&#xff09;&#xff08;二十五&#xff09; 下一篇&#xff1a;SQLite数据库中JSON 函数和运算符 1…

汇编语言 实验10.1

汇编语言 实验10.1 assume cs:code,ds:datasgdatasg segmentdb welcome to masm!,0 datasg endsstack segmentdw 0,0,0,0,0,0,0,0 stack endscode segment ;代码段start: mov dh,8mov dl,3mov cl,2mov ax,datasgmov ds,axmov si,0call show_strmov ax,4c00h ;程序返回int 21hs…

已经下载了pytorch,但在正确使用一段时间后出现No module named torch的错误

问题描述 使用的是叫做m2release的虚拟环境&#xff0c;在此环境下使用conda list可以发现是存在pytorch的&#xff0c;但是运行代码时却报No module named torch的错误。 解决方案 想尝试卸掉这个pytorch重新装一次&#xff0c;但是想卸载会提示找不到&#xff0c;想重新…