最近用到 CLIPScore计算 text 和 image 相似度,运行以下程序:
import torch
from PIL import Image
from torchvision import transforms
from torchmetrics.multimodal.clip_score import CLIPScoredef prompt_image_cal(prompt_text, image_path):'''判断 prompt 与 image 的相关性, 使用 CLIP Score [0,100],取值越大代表二者越相关'''transform = transforms.Compose([transforms.ToTensor()])image = Image.open(image_path)image_tensor = transform(image) * 255.0image_tensor = torch.clamp(image_tensor, 0, 255)clip_score_fun = CLIPScore(model_name_or_path="/newdata/SD/PMs/clip-vit-large-patch14")clip_score = clip_score_fun(image_tensor, prompt_text)return clip_score.item()
报错如下:
Unused or unrecognized kwargs: padding.
Traceback (most recent call last):File "/newdata/PromptPricing/evaluation.py", line 111, in <module>clip_score = prompt_image_cal(prompt_text, image_path)File "/newdata/PromptPricing/evaluation.py", line 96, in prompt_image_calclip_score = clip_score_fun(image_tensor, prompt_text)File "/root/anaconda3/envs/clipiqa/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_implreturn forward_call(*input, **kwargs)File "/root/anaconda3/envs/clipiqa/lib/python3.8/site-packages/torchmetrics/metric.py", line 302, in forwardself._forward_cache = self._forward_full_state_update(*args, **kwargs)File "/root/anaconda3/envs/clipiqa/lib/python3.8/site-packages/torchmetrics/metric.py", line 317, in _forward_full_state_updateself.update(*args, **kwargs)File "/root/anaconda3/envs/clipiqa/lib/python3.8/site-packages/torchmetrics/metric.py", line 466, in wrapped_funcupdate(*args, **kwargs)File "/root/anaconda3/envs/clipiqa/lib/python3.8/site-packages/torchmetrics/multimodal/clip_score.py", line 133, in updatescore, n_samples = _clip_score_update(images, text, self.model, self.processor)File "/root/anaconda3/envs/clipiqa/lib/python3.8/site-packages/torchmetrics/functional/multimodal/clip_score.py", line 67, in _clip_score_updateprocessed_input = processor(text=text, images=[i.cpu() for i in images], return_tensors="pt", padding=True)File "/root/anaconda3/envs/clipiqa/lib/python3.8/site-packages/transformers/models/clip/processing_clip.py", line 104, in __call__image_features = self.image_processor(images, return_tensors=return_tensors, **kwargs)File "/root/anaconda3/envs/clipiqa/lib/python3.8/site-packages/transformers/image_processing_utils.py", line 551, in __call__return self.preprocess(images, **kwargs)File "/root/anaconda3/envs/clipiqa/lib/python3.8/site-packages/transformers/models/clip/image_processing_clip.py", line 319, in preprocessimages = [File "/root/anaconda3/envs/clipiqa/lib/python3.8/site-packages/transformers/models/clip/image_processing_clip.py", line 320, in <listcomp>self.resize(image=image, size=size, resample=resample, input_data_format=input_data_format)File "/root/anaconda3/envs/clipiqa/lib/python3.8/site-packages/transformers/models/clip/image_processing_clip.py", line 187, in resizereturn resize(File "/root/anaconda3/envs/clipiqa/lib/python3.8/site-packages/transformers/image_transforms.py", line 330, in resizeresized_image = image.resize((width, height), resample=resample, reducing_gap=reducing_gap)
TypeError: resize() got an unexpected keyword argument 'reducing_gap'
发现是 transformers 库中的 image_transforms.py 文件出了问题,定位到问题代码行:
resized_image = image.resize((width, height), resample=resample, reducing_gap=reducing_gap)
感觉这种报错像是因为版本问题,一开始卸载了 transformers 库又重装,发现问题仍然存在。后来查看resize函数的主体image,发现是PIL类型的,于是想到可能是pillow库的版本冲突。于是使用pip show pillow
查看pillow的版本:
╰─pip show pillow 1 ↵
Name: Pillow
Version: 6.1.0
Summary: Python Imaging Library (Fork)
Home-page: http://python-pillow.org
Author: Alex Clark (Fork Author)
Author-email: aclark@aclark.net
License: UNKNOWN
Location: /root/anaconda3/envs/clipiqa/lib/python3.8/site-packages
Requires:
Required-by: clip-score, facexlib, mmcv-full, torchvision
发现版本太旧,于是pip uninstall pillow
+ pip install pillow
重装,得到当前最新版本的pillow库:
─# pip show pillow
Name: pillow
Version: 10.3.0
Summary: Python Imaging Library (Fork)
Home-page:
Author:
Author-email: "Jeffrey A. Clark" <aclark@aclark.net>
License: HPND
Location: /root/anaconda3/envs/clipiqa/lib/python3.8/site-packages
Requires:
Required-by: clip-score, facexlib, mmcv-full, torchvision
再次运行函数,成功✌️
Unused or unrecognized kwargs: padding.
Unused or unrecognized kwargs: padding.
8.029139518737793
PS:不太清楚程序输出的warnings对结果会不会有什么影响,暂时找不到排错的方法,如果有知道的小伙伴还请不吝赐教~