[TPAMI 2024]Vision-Language Models for Vision Tasks: A Survey

论文网址:Vision-Language Models for Vision Tasks: A Survey | IEEE Journals & Magazine | IEEE Xplore

论文Github页面:GitHub - jingyi0000/VLM_survey: Collection of AWESOME vision-language models for vision tasks

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Background

2.3.1. Training Paradigms for Visual Recognition

2.3.2. Development of VLMs for Visual Recognition

2.3.3. Relevant Surveys

2.4. VLM Foundations

2.4.1. Network Architectures

2.4.2. VLM Pre-Training Objectives

2.4.3. VLM Pre-Training Frameworks

2.4.4. Evaluation Setups and Downstream Tasks

2.5. Datasets

2.5.1. Datasets for Pre-Training VLMs

2.5.2. Datasets for VLM Evaluation

2.6. Vision-Language Model Pre-Training

2.6.1. VLM Pre-Training With Contrastive Objectives

2.6.2. VLM Pre-Training With Generative Objectives

2.6.3. VLM Pre-Training With Alignment Objectives

2.6.4. Summary and Discussion

2.7. VLM Transfer Learning

2.7.1. Motivation of Transfer Learning

2.7.2. Common Setup of Transfer Learning

2.7.3. Common Transfer Learning Methods

2.7.4. Summary and Discussion

2.8. VLM Knowledge Distillation

2.8.1. Motivation of Distilling Knowledge From VLMs

2.8.2. Common Knowledge Distillation Methods

2.8.3. Summary and Discussion

2.9. Performance Comparison

2.9.1. Performance of VLM Pre-Training

2.9.2. Performance of VLM Transfer Learning

2.9.3. Performance of VLM Knowledge Distillation

2.9.4. Summary

2.10. Future Directions

2.11. Conclusion

3. Reference


1. 心得

(1)依旧放松一下,以及很久没看TPAMI了,感觉一直很认可TPAMI的质量啊,拜读一下

(2)感觉比起长篇大论的n个模型介绍,突出每种模型的重点也是非常不错的。和我之前看的一个TPAMI综述一样,就写了损失。然后数据集介绍太多其实也有点睿智,这篇就是精简了。挺好的挺好的顾得顾得

(3)好就好在有些总结放表格里,没有那种给我硬塞文本的恶心感

(4)感觉如果要介绍一些新颖的模型,可以不从头到尾全部说一遍,而是突出它们某个方面的新颖,就把创新写了就行了

(5)还可以,这边给到一个较高的评价

2. 论文逐段精读

2.1. Abstract

        ①Existing problems: train DNN for each visual task, which is laborious and time costing

        ②Content: a) background of VLM in visual task, b) doundations of VLM, c) datasets, d) pretraining, transfer learning and knowledge distillation methods of VLM, e) benchmarks, f) challenges

laborious  adj.费力的;辛苦的

2.2. Introduction

        ①New paradigm: Pre-training (on large scale data w/ or w/o label), Fune-tuning (for specific labelled training data), and Prediction, see (a) and (b):

        ②Vision-Language Model Pre-training and Zero-shot Prediction which do not need fune-tuning:

        ③VLM publication number on Google Scholar:

frisbee  n.(投掷游戏用的)飞盘;飞碟

2.3. Background

2.3.1. Training Paradigms for Visual Recognition

(1)Traditional Machine Learning and Prediction

        ①Mostly hand-crafted and lightweight but hard to cope with complex or multi tasks

        ②Poor scalability

(2)Deep Learning From Scratch and Prediction

        ①Low speed convergence from scratch

        ②A mount of labels needed

(3)Supervised Pre-Training, Fine-Tuning and Prediction

        ①Speed up convergence

(4)Unsupervised Pre-Training, Fine-Tuning & Prediction

        ①Does not require labelled data

        ②Beter performance due to larger samples learning

(5)VLM Pre-Training and Zero-Shot Prediction

        ①Discarding fine-tuning

        ②Future directions: a) large scale informative image-text data, b) high-capacity models, c) new pre-training objectives

2.3.2. Development of VLMs for Visual Recognition

        ①3 improvements to VLMs:

2.3.3. Relevant Surveys

        ①Framework of their review:

2.4. VLM Foundations

2.4.1. Network Architectures

        ①Number of image-text pairs: N

        ②Features extracted from pairs: \mathcal{D}=\left \{ x^I_n, x^T_n\right \}^N_{n=1}, where x with superscript I denotes image sample with T denotes text

        ③Image encoder and text encoder in DNN: f_\theta / f_\phi

        ④Encoding operation: z_n^I=f_\theta(x_n^I) and z_n^T=f_\theta(x_n^T)

(1)Architectures for Learning Image Features

        ①CNN-based architectures: such as VGG, ResNet and EfficientNet

        ②Transformer-base architectures: such as ViT

(2)Architectures for Learning Language Features

        ①The framework of standard Transformer: 6 blocks in encoder (each with a multi-head attention layer and MLP) and 6 blocks in decoder (each with a multi-head attention layer, a masked multi-head layer and MLP)

2.4.2. VLM Pre-Training Objectives

(1)Contrastive Objectives

        ①Image Contrastive Learning: close with positive keys and faraway from negative keys in embedding space. For B images(实际上作者这里表达得很特殊,他们是说“对于这样的batch size”大小,这是比较贴近代码的表达,如果要概念上的表达其实就看成总共有这么多样本就好), this loss always be:

\mathcal{L}_I^\mathrm{InfoNCE}=-\frac{1}{B}\sum_{i=1}^{B}\log\frac{\exp{(z_i^I\cdot z_+^I/\tau)}}{\sum_{j=1,j\neq i}^{B+1}\exp(z_i^I\cdot z_j^I/\tau)}

where z_i^I denotes query embedding, \{z_j^I\}_{j=1,j\neq i}^{B+1} denotes key embeddings, z_+^I denotes positive keys in the i-th sample, \tau denotes temperature hyper-parameter

        ②Image-Text Contrastive Learning: pull paired embeddings closed and others away:

\begin{gathered} \mathcal{L}_{I\to T} =-\frac1B\sum_{i=1}^B\log\frac{\exp{(z_i^I\cdot z_i^T/\tau)}}{\sum_{j=1}^B\exp(z_i^I\cdot z_j^T/\tau)} \\ \mathcal{L}_{T\to I} =-\frac1B\sum_{i=1}^B\log\frac{\exp{(z_i^T\cdot z_i^I/\tau)}}{\sum_{j=1}^B\exp(z_i^T\cdot z_j^I/\tau)}\\ \mathcal{L}_{\mathrm{infoNCE}}^{IT}=\mathcal{L}_{I\to T}+\mathcal{L}_{T\to I} \end{gathered}

where \mathcal{L}_{I\to T} denotes contrasting the query image with the text keys, \mathcal{L}_{T\to I} denotes contrasting the query text with image keys

        ③Image-Text-Label Contrastive Learning: supervised:

\begin{gathered} \mathcal{L}_{I\to T}^{ITL} =-\sum_{i=1}^B\frac1{|\mathcal{P}(i)|}\sum_{k\in\mathcal{P}(i)}\log\frac{\exp{(z_i^I\cdot z_k^T/\tau)}}{\sum_{j=1}^B\exp(z_i^I\cdot z_j^T/\tau)} \\ \mathcal{L}_{T\to I}^{ITL} =-\sum_{i=1}^B\frac1{|\mathcal{P}(i)|}\sum_{k\in\mathcal{P}(i)}\log\frac{\exp{(z_i^T\cdot z_k^I/\tau)}}{\sum_{j=1}^B\exp(z_i^T\cdot z_j^I/\tau)}\\ \mathcal{L}_{\mathrm{infoNCE}}^{ITL}=\mathcal{L}_{I\to T}^{ITL}+\mathcal{L}_{T\to I}^{ITL} \end{gathered}

where k\in\mathcal{P}(i)=\{k|k\in B,y_k=y_i\}y denotes the class label of (z^I,z^T)(相当于多增加了一个样本类循环)

(2)Generative Objectives

        ①Masked Image Modelling: learns cross-patch correlation by masking a set of patches and reconstructing images. The loss usually is:

\mathcal{L}_{MIM}=-\frac1B\sum_{i=1}^B\log f_\theta(\overline{x}_i^I\mid\hat{x}_i^I)

where \overline{x}_i^I denotes masked patches, \hat{x}_i^I denotes unmasked patches(这“|”什么玩意儿啊条件概率吗但是说不通?在不mask的情况下mask的概率???怎么感觉反了呢还是我有问题

        ②Masked Language Modelling: mask at a specific ratio:

\mathcal{L}_{MLM}=-\frac1B\sum_{i=1}^B\log f_\phi(\overline{x}_i^T\mid\hat{x}_i^T)

        ③Masked Cross-Modal Modelling: randomly masks a subset of image patches and a subset of text tokens then reconstruct by unmasked ones:

\mathcal{L}_{MCM}=-\frac{1}{B}\sum_{i=1}^{B}[\log f_{\theta}(\overline{x}_{i}^{I}|\hat{x}_{i}^{I},\hat{x}_{i}^{T})+\log f_{\phi}(\overline{x}_{i}^{T}|\hat{x}_{i}^{I},\hat{x}_{i}^{T})]

        ④Image-to-Text Generation: through image and text pairs to predict text:

\mathcal{L}_{ITG}=-\sum_{l=1}^L \log f_\theta(x^T\mid x_{<l}^T,z^I)

where L denotes the number of tokens, z^I is the embedding of the image paired with x^T

(3)Alignment Objectives

        ①Image-Text Matching: BCE loss:

\mathcal{L}_{IT}=p\log\mathcal{S}(z^I,z^T)+(1-p)\log(1-\mathcal{S}(z^I,z^T))

where \mathcal{S}\left ( \cdot \right ) measures the alignment probability between the image and text, p=1 when matches otherwise 0

        ②Region-Word Matching: model local cross-modal correlation in dense scenes:

\mathcal{L}_{RW}=p\log\mathcal{S}^r(r^I,w^T)+(1-p)\log(1-\mathcal{S}^r(r^I,w^T))

where (r^I,w^T) denotes a region-word pair, p=1 when matches otherwise 0

2.4.3. VLM Pre-Training Frameworks

        ①two-tower, two-leg and one-tower pre-training approaches:

2.4.4. Evaluation Setups and Downstream Tasks

(1)Zero-Shot Prediction

        ①Image Classification: apply prompt engineering and compare embeddings of images and texts

        ②Semantic Segmentation: comparing the embeddings of the given image pixels and texts

        ③Object Detection: comparing the embeddings of the given object proposals and texts

        ④Image-Text Retrieval: retrieve the demanded samples from one modality given the cues from another modality, text-to-image or image-to-text

(2)Linear Probing

        ①freeze pre-trained VLM→get embedding→train a linear classifier to classify these embeddings

2.5. Datasets

         ①Widely Used Image-Text Datasets for VLM Pre-Training:

        ②Widely-Used Visual Recognition Datasets for VLM Evaluation:

2.5.1. Datasets for Pre-Training VLMs

        ①Collection of image-text data is easier and cheaper than traditional crowd-labelled data

        ②⭐Some researches utilize auxiliary datasets to provide additional information for better vision-language modelling, such as GLIP leverages Object365 for extracting region-level features

2.5.2. Datasets for VLM Evaluation

        ①Count each type of datasets

2.6. Vision-Language Model Pre-Training

        ①Vision-Language Model Pre-Training Methods:

2.6.1. VLM Pre-Training With Contrastive Objectives

(1)Image Contrastive Learning

        ①e.g. SLIP utilizes infoNCE loss to learn the discriminative image features

(2)Image-Text Contrastive Learning

        ①Learning the correlation between pair image-text, and pull irrelevant matchings away:

(3)Image-Text-Label Contrastive Learning

        ①Encodding image-text-label to one shared space:

(4)Discussion

        ①Challenge 1: Joint optimizing positive and negative pairs is complicated and challenging

        ②Challenge 2: Heuristic temperature hyper-parameter selection

2.6.2. VLM Pre-Training With Generative Objectives

(1)Masked Image Modelling

        ①Image patches mask strategy:

(2)Masked Language Modelling

        ①Text mask strategy:

(3)Masked Cross-Modal Modelling

        ①Mask image and text at the same time

(4)Image-to-Text Generation

        ①Encode images and then decode them to match the texts

(5)Discussion

        ①Learning context information

2.6.3. VLM Pre-Training With Alignment Objectives

(1)Image-Text Matching

        ①Match image and text pairs

(2)Region-Word Matching

        ①Match region and text pairs:

(3)Discussion

        ①Alignment always be context information or correlation enhancing

2.6.4. Summary and Discussion

        ①Recent VLM pre-training focuses on learning global vision-language correlation or models local fine-grained vision-language correlation via region-word matching

2.7. VLM Transfer Learning

2.7.1. Motivation of Transfer Learning

        ①Chanllenges for pretrained VLM: a) different downstream distribution,b) different downstream task

2.7.2. Common Setup of Transfer Learning

        ①Unsupervised methods are more efficient and promising

2.7.3. Common Transfer Learning Methods

        ①3 types of VLM transfer models:

(1)Transfer Via Prompt Tuning

        ①Transfer with Text Prompt Tuning: 

        ②Transfer with Visual Prompt Tuning: 

        ③Transfer with Text-Visual Prompt Tuning: tune image and text together

        ④Discussion: Challenge of this is low flexibility by following the manifold (distribution) of the original VLMs in prompting

(2)Transfer Via Feature Adaptation

        ①Fine-tune the feature by additional feature adapter:

but has intellectual property problem

(3)Other Transfer Methods

        ①Lists other methods

2.7.4. Summary and Discussion

        ①2 main methods of VLM transfer learning: prompt tuning and feature adapter

2.8. VLM Knowledge Distillation

2.8.1. Motivation of Distilling Knowledge From VLMs

        ①VLM knowledge distillation distils general and robust VLM knowledge to task-specific models without the restriction of VLM architecture

intact  adj.完整的;完整;完好无损

2.8.2. Common Knowledge Distillation Methods

(1)Knowledge Distillation for Object Detection

        ①Introduced basic and prompt based knowledge distillation for open vocabulary object detection

(2)Knowledge Distillation for Semantic Segmentation

        ①Also basic and weak supervised distillation methods

2.8.3. Summary and Discussion

        ①More flexible than transfer learning

2.9. Performance Comparison

2.9.1. Performance of VLM Pre-Training

        ①Performance comparison on image classification:

        ②Data and model size test:

        ③The main source of VLM advantages: a) large samples, b) large model, c) task-agnostic learning

        ④Segmentation performance:

        ⑤Detection performance:

        ⑥Limitation: a) saturates when continuously expanding the scale of the model, 2) computing costs in pre-training, c) excessive computation and memory overheads in both training and inference

2.9.2. Performance of VLM Transfer Learning

        ①Image classification performance:

2.9.3. Performance of VLM Knowledge Distillation

        ①Object detection performance:

        ②Semantic segmentation performance:

2.9.4. Summary

        ①The baseline tests are not unified

2.10. Future Directions

(1)For VLM pretraining:

        ①Fine-grained vision-language correlation modelling

        ②Unification of vision and language learning

        ③Pre-training VLMs with multiple languages

        ④Data-efficient VLMs: increase supervision among image-text pairs training

        ⑤Pre-training VLMs with LLMs: 

(2)For VLM transfer learning:

        ①Unsupervised VLM transfer

        ②VLM transfer with visual prompt/adapter

        ③Test-time VLM transfer

        ④VLM transfer with LLMs

(3)VLM knowledge distillation

        ①Extract knowledge from multi VLMs

        ②Other visual tasks, such as instance segmentation, panoptic segmentation, person re-identification etc.

panoptic  adj.全景的;(用图)表示物体(一眼可见,显示)全貌的;一目了然的

2.11. Conclusion

        good

3. Reference

Zhang, J. et al. (2024) Vision-Language Models for Vision Tasks: A Survey. TPAMI, 46(8): 5625-5644. doi: 10.1109/TPAMI.2024.3369699

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/bicheng/63005.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

深度学习模型:门控循环单元(GRU)详解

本文深入探讨了门控循环单元&#xff08;GRU&#xff09;&#xff0c;它是一种简化版的长短期记忆网络&#xff08;LSTM&#xff09;&#xff0c;在处理序列数据方面表现出色。文章详细介绍了 GRU 的基本原理、与 LSTM 的对比、在不同领域的应用以及相关的代码实现&#xff0c;…

HCIA笔记6--路由基础与静态路由:浮动路由、缺省路由、迭代查找

文章目录 0. 概念1.路由器工作原理2. 跨网访问流程3. 静态路由配置4. 静态路由的应用场景4.1 路由备份4.2 浮动路由4.3 缺省路由 5. 迭代路由6 问题6.1 为什么路由表中有的下一跳的地址有接口&#xff1f;6.2 个人电脑的网关本质是什么&#xff1f; 0. 概念 自治系统&#xff…

Spark常问面试题---项目总结

一、数据清洗&#xff0c;你都清洗什么&#xff1f;或者说 ETL 你是怎么做的&#xff1f; 我在这个项目主要清洗的式日志数据&#xff0c;日志数据传过来的json格式 去除掉无用的字段&#xff0c;过滤掉json格式不正确的脏数据 过滤清洗掉日志中缺少关键字段的数据&#xff…

【北京迅为】iTOP-4412全能版使用手册-第三十二章 网络通信-TCP套字节

iTOP-4412全能版采用四核Cortex-A9&#xff0c;主频为1.4GHz-1.6GHz&#xff0c;配备S5M8767 电源管理&#xff0c;集成USB HUB,选用高品质板对板连接器稳定可靠&#xff0c;大厂生产&#xff0c;做工精良。接口一应俱全&#xff0c;开发更简单,搭载全网通4G、支持WIFI、蓝牙、…

【乐企文件生成工程】搭建docker环境,使用docker部署工程

1、自行下载docker 2、自行下载docker-compose 3、编写Dockerfile文件 # 使用官方的 OpenJDK 8 镜像 FROM openjdk:8-jdk-alpine# 设置工作目录 WORKDIR ./app# 复制 JAR 文件到容器 COPY ../lq-invoice/target/lq-invoice.jar app.jar # 暴露应用程序监听的端口 EXPOSE 1001…

介绍下你们电商搜索的整体Java技术架构?

大家好&#xff0c;我是锋哥。今天分享关于【介绍下你们电商搜索的整体Java技术架构&#xff1f;】面试题。希望对大家有帮助&#xff1b; 介绍下你们电商搜索的整体Java技术架构&#xff1f; 1000道 互联网大厂Java工程师 精选面试题-Java资源分享网 在电商平台的搜索系统中…

【趣味】斗破苍穹修炼文字游戏HTML,CSS,JS

目录 图片展示 游戏功能 扩展功能 完整代码 实现一个简单的斗破苍穹修炼文字游戏&#xff0c;你可以使用HTML、CSS和JavaScript结合来构建游戏的界面和逻辑。以下是一个简化版的游戏框架示例&#xff0c;其中包含玩家修炼的过程、增加修炼进度和显示经验值的基本功能。 图片…

永磁同步电机负载估计算法--滑模扰动观测器

一、原理介绍 为了进一步提高永磁同步电机的抗干扰性能&#xff0c;采用SMDO在线估计负载扰动&#xff0c;然后将估计的负载扰动作为前馈信号补偿速度控制器的输出 根据永磁同步电机的运动方程&#xff0c;可以建立滑模观测器的模型如下&#xff1a; 式中l2为观测器的反馈增益…

BiGRU:双向门控循环单元在序列处理中的深度探索

一、引言 在当今的人工智能领域&#xff0c;序列数据的处理是一个极为重要的任务&#xff0c;涵盖了自然语言处理、语音识别、时间序列分析等多个关键领域。循环神经网络&#xff08;RNN&#xff09;及其衍生结构在处理序列数据方面发挥了重要作用。然而&#xff0c;传统的 RN…

云服务器重装系统后 一些报错与解决[ vscode / ssh / 子用户]

碰见的三个问题&#xff1a; 1.vscode连接失败 2.登录信息配置 3.新建子用户的一些设置 思考&#xff1a;遇见问题&#xff0c;第一反应 应该如何解决 目录 1. 错误 解决方法 原因 步骤 1&#xff1a;找到known_hosts文件并编辑 步骤 2&#xff1a;通过VSCode终端输入…

QT实战-qt各种菜单样式实现

本文主要介绍了qt普通菜单样式、带选中样式、带子菜单样式、超过一屏幕菜单样式、自定义带有滚动条的菜单样式&#xff0c; 先上图如下&#xff1a; 1.普通菜单样式 代码&#xff1a; m_pmenu new QMenu(this);m_pmenu->setObjectName("quoteListMenu"); qss文…

基于BM1684的AI边缘服务器-模型转换,大模型一体机

介绍 我们属于SoC模式&#xff0c;即我们在x86主机上基于tpu-nntc和libsophon完成模型的编译量化与程序的交叉编译&#xff0c;部署时将编译好的程序拷贝至SoC平台&#xff08;1684开发板/SE微服务器/SM模组&#xff09;中执行。 注&#xff1a;以下都是在Ubuntu20.04系统上操…

Redis+Caffeine 多级缓存数据一致性解决方案

RedisCaffeine 多级缓存数据一致性解决方案 背景 之前写过一篇文章RedisCaffeine 实现两级缓存实战&#xff0c;文章提到了两级缓存RedisCaffeine可以解决缓存雪等问题也可以提高接口的性能&#xff0c;但是可能会出现缓存一致性问题。如果数据频繁的变更&#xff0c;可能会导…

计算机网络——不同版本的 HTTP 协议

介绍 HTTP&#xff0c;即超文本传输协议&#xff08;HyperText Transfer Protocol&#xff09;&#xff0c;是应用层的一个简单的请求-响应协议&#xff0c;它指定了客户端可能发送给服务器什么样的消息以及得到什么样的响应。本文将介绍 HTTP 协议各个版本。 HTTP/1.0 HTTP/1…

【ArkTS】使用AVRecorder录制音频 --内附录音机开发详细代码

系列文章目录 【ArkTS】关于ForEach的第三个参数键值 【ArkTS】“一篇带你读懂ForEach和LazyForEach” 【小白拓展】 【ArkTS】“一篇带你掌握TaskPool与Worker两种多线程并发方案” 【ArkTS】 一篇带你掌握“语音转文字技术” --内附详细代码 【ArkTS】技能提高–“用户授权”…

P1319 压缩技术 P1320 压缩技术(续集版)

题目传送门 P1319 压缩技术 P1320 压缩技术&#xff08;续集版&#xff09; P1319 压缩技术 输入格式 数据输入一行&#xff0c;由空格隔开的若干个整数&#xff0c;表示压缩码。 其中&#xff0c;压缩码的第一个数字就是 N N N&#xff0c;表示这个点阵应当是 N N N\t…

【CSS】一篇掌握CSS

不是因为有了希望才去坚持,而是坚持了才有了希望 目录 一.导入方式 1.行内样式 2.内部样式 3.外部样式(常用) 二.选择器 1.基本选择器(常用) 1.1标签选择器 1.2类选择器 1.3id选择器 2.层次选择器 2.1后代选择器 2.2子选择器 2.3相邻兄弟选择器 2.4通用兄弟选择器…

linux 获取公网流量 tcpdump + python + C++

前言 需求为&#xff0c;统计linux上得上下行公网流量&#xff0c;常规得命令如iftop 、sar、ifstat、nload等只能获取流量得大小&#xff0c;不能区分公私网&#xff0c;所以需要通过抓取网络包并排除私网段才能拿到公网流量。下面提供了一些有效得解决思路&#xff0c;提供了…

Rain后台权限管理系统,快速开发

这段时间一直没有更新&#xff0c;因为在写一个脚手架&#xff0c;今天Rain项目终于完工&#xff0c;已经发布到github,免费使用 项目github链接 https://github.com/Rain-hechang/Rain 简介 前端采用Vue3.x、Element UI。 后端采用Spring Boot、Spring Security、Redis &…

scroll-view组件,在iOS设备上禁用橡皮筋回弹效果

问题描述 在实现uniapp微信小程序的一个项目时&#xff0c;ios真机测试&#xff0c;scroll-view组件可以向下拉动一段距离然后又回弹。 如下图 解决方法&#xff1a; 可以通过设置scroll-view组件的属性来禁用橡皮筋回弹效果。如下&#xff0c;设置enhanced"true&…