lang-segment-anything使用介绍

Language Segment-Anything 是一个开源项目，它结合了实例分割和文本提示的强大功能，为图像中的特定对象生成蒙版。它建立在最近发布的 Meta 模型、segment-anything 和 GroundingDINO 检测模型之上，是一款易于使用且有效的对象检测和图像分割工具。

基于GroundingDINO 实现Zero-shot text-to-bbox方法。
使用 Lightning AI 应用程序平台轻松部署。
可自定义的文本提示，用于精确的对象分割。

项目地址：https://github.com/luca-medeiros/lang-segment-anything/tree/main
在这里插入图片描述

1、安装命令

1.1 安装groundingdino

直接安装lang-segment-anything，可能会存在各种报错。故而需要先安装groundingdino。

git clone https://github.com/IDEA-Research/GroundingDINO.git && cd GroundingDINO
pip install -e .mkdir weights

在这里插入图片描述
----------以下步骤非必要------------
下载 https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth ，然后存储到weights目录中。

1.2 安装lang-segment-anything

可以基于git命令下载项目源码，也可以打开git下载源码

git clone https://github.com/luca-medeiros/lang-segment-anything && cd lang-segment-anything
pip install torch torchvision

打开lang-segment-anything-main\pyproject.toml，注释掉对groundingdino的依赖，具体如下操作
在这里插入图片描述

pip install -e .

在这里插入图片描述
安装特定版本的urllib3，避免在运行LangSAM时，有梯子还是报ProxyError

pip install urllib3==1.25.11

1.3 相关报错解决

如果pytorch版本与cuda版本不兼容，可能会出现以下报错（pytorch需要的cuda版本为12.1，但系统安装的cuda版本为11.7）。此时，更改cuda版本或者pytorch版本即可。
在这里插入图片描述
博主这里安装了2个版本的cuda，此时只需要修改一下环境变量即可。

后续安装，可能由于是win系统限制，会出现报错 文件名或扩展名太长

参考 https://blog.csdn.net/BBJG_001/article/details/105464695进行解决

同时参考https://blog.csdn.net/ZxqSoftWare/article/details/108519131，在策略管理器进行修改
在这里插入图片描述

2、使用lang-sam

'''This is just adapted from the example in the readme,The main usage is for the built image to have the weights cached.
'''
import numpy as np
from PIL import Image
from lang_sam import LangSAM
from lang_sam.utils import draw_image
import os
os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"if __name__=="__main__":model = LangSAM()image_pil = Image.open("./assets/car.jpeg").convert("RGB")text_prompt = "wheel"masks, boxes, phrases, logits = model.predict(image_pil, text_prompt)labels = [f"{phrase} {logit:.2f}" for phrase, logit in zip(phrases, logits)]image_array = np.asarray(image_pil)image = draw_image(image_array, masks, boxes, labels)image = Image.fromarray(np.uint8(image)).convert("RGB")image.show()print('all ok')

第一次运行会有以下输出，是用于下载模型文件的
在这里插入图片描述
识别效果如下所示

3、使用LitGradio进行部署

3.1 安装依赖项

pip install backoff
pip install deepdiff
pip install lightning_cloud
pip install lightning==2.0.1

3.2 部署代码

app.py中提供了部署web demo部署代码，但是在windows中运行会存在bug。故此将以下代码添加到app.py中最后一行的最前面

import os
os.environ["LIGHTNING_DETECTED_DEBUGGER"] = "1"

具体效果如下所示

import os
import warningsimport gradio as gr
import lightning as L
import numpy as np
from lightning.app.components.serve import ServeGradio
from PIL import Imagefrom lang_sam import LangSAM
from lang_sam import SAM_MODELS
from lang_sam.utils import draw_image
from lang_sam.utils import load_imagewarnings.filterwarnings("ignore")class LitGradio(ServeGradio):inputs = [gr.Dropdown(choices=list(SAM_MODELS.keys()), label="SAM model", value="vit_h"),gr.Slider(0, 1, value=0.3, label="Box threshold"),gr.Slider(0, 1, value=0.25, label="Text threshold"),gr.Image(type="filepath", label='Image'),gr.Textbox(lines=1, label="Text Prompt"),]outputs = [gr.outputs.Image(type="pil", label="Output Image")]examples = [['vit_h',0.36,0.25,os.path.join(os.path.dirname(__file__), "assets", "fruits.jpg"),"kiwi",],['vit_h',0.3,0.25,os.path.join(os.path.dirname(__file__), "assets", "car.jpeg"),"wheel",],['vit_h',0.3,0.25,os.path.join(os.path.dirname(__file__), "assets", "food.jpg"),"food",],]def __init__(self, sam_type="vit_h"):super().__init__()self.ready = Falseself.sam_type = sam_typedef predict(self, sam_type, box_threshold, text_threshold, image_path, text_prompt):print("Predicting... ", sam_type, box_threshold, text_threshold, image_path, text_prompt)if sam_type != self.model.sam_type:self.model.build_sam(sam_type)image_pil = load_image(image_path)masks, boxes, phrases, logits = self.model.predict(image_pil, text_prompt, box_threshold, text_threshold)labels = [f"{phrase} {logit:.2f}" for phrase, logit in zip(phrases, logits)]image_array = np.asarray(image_pil)image = draw_image(image_array, masks, boxes, labels)image = Image.fromarray(np.uint8(image)).convert("RGB")return imagedef build_model(self, sam_type="vit_h"):model = LangSAM(sam_type)self.ready = Truereturn modellg=LitGradio()
import os
os.environ["LIGHTNING_DETECTED_DEBUGGER"] = "1"
app = L.LightningApp(lg)