CogVLM 是由智谱AI&清华KEG基于对视觉和语言信息之间融合的理解,所推出的多模态大模型。在本文中,我们将展示在矩池云上使用CogVLM的方法。
硬件要求
使用 CogVLM 需要 CUDA 11.8 及以上环境,推理总显存需要40G以上,可以直接使用 A40 A6000,或者 A100,也可以使用 2卡3090/A30/4090等。
上传数据模型
首先使用矩池云网盘上传需要的模型文件,本次使用的cogvlm-chat
模型,另外还需要vicuna-7b-v1.5
,这两个模型文件可以从 modelscope 平台进行下载,地址如下:
https://www.modelscope.cn/models/ZhipuAI/cogvlm-chat
https://modelscope.cn/models/AI-ModelScope/vicuna-7b-v1.5
使用网盘客户端 https://matpool.com/download/netdisk 可以直接将 modelscope 文件导入网盘指定目录。操作顺序如下图所示:
- 点击
应用
–ModelScope
- 输入要上传的
ModelScope 模型链接
- 点击模型地址输入框旁的
浏览
按钮,勾选所有文件
- 点击
导入网盘
- 选择要存入的目录,然后点击
确定
即可
等模型文件上传时,我们可以同步进行下一步环境配置操作。
租用机器,配置环境
租用机器
在矩池云主机市场:https://matpool.com/host-market/gpu,选择一个A系列显卡,比如 A2000 (先租用便宜机器配置好环境,配置好可以保存环境)。 然后点击租用按钮。(也可以选其他支持 CUDA 11.8 显卡)
在租用页面,搜索 Pytorch2.1 ,选择 Pytorch 2.1.1 这个镜像,并在高级选项里自定义一个 8501端口(后面部署的 CogVLM webui 项目在这个端口),具体步骤如下图所示。
机器租用成功后,你会看到 8501 端口对应链接,后面启动 CogVLM web服务会在这个端口。
我们先点击打开 Jupyterlab,然后新建一个 Terminal 进行环境配置。
配置环境
首先需要 clone 下项目代码,Terminal 里输入以下指令即可。
git clone https://mirror.ghproxy.com/https://github.com/THUDM/CogVLM.git
clone 完成后,我们可以点击左侧的目录导航里的 CogVLM 打开项目文件夹,或者 Terminal 里输入指令进入 CogVLM 目录。
输入以下指令安装需要的Python依赖环境:
pip install -r requirements.txt
安装有可能遇到这种依赖错误,理论上不影响我们后续的操作和最终CogVLM的使用。
当然,如果担心也可以继续输入以下指令安装对应版本包。
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
以上内容配置完毕后,在租用页面点击更多
–保存到个人环境
。
等模型文件都下载好就可以开启下一步了。
运行程序
租用页面租用机器,这次运行程序,我们可以选择2卡A30/3090,或者单卡的A40/A6000,如下图所示,以租用 A6000 为例子,点击租用后。
这里我们试运行CogVLM
项目下的composite_demo
,打开 composite_demo
下的 client.py
,由于本地只测试cogvlm-chat-hf
模型,所以我吗注释掉24-40行代码(用于测试 agent_chat、vlm_chat、vlm_grounding的配置代码),取消44-52行的单模型测试配置代码,并将模型路径改成我们前面下载到网盘的绝对路径,以我自己为例,我是存放在/mnt/example/vicuna-7b-v1.5
和/mnt/example/cogvlm-chat
中。
# if you just use one model, use like this
# /mnt/example 替换成自己的模型所在路径
models_info = {'tokenizer': {'path': os.environ.get('TOKENIZER_PATH', '/mnt/example/vicuna-7b-v1.5'),},'vlm_chat': {'path': os.environ.get('MODEL_PATH_AGENT_CHAT', '/mnt/example/cogvlm-chat'),'device': ['cuda:0']},
}
Jupyterlab 里新建一个Terminal,输入以下指令:
# 进入项目目录
cd /CogVLM/composite_demo
# 运行测试demo
streamlit run main.py
运行后服务会启动到 8051 端口,我们访问租用页面 8051 端口链接即可访问到对应服务了。
比如:https://hz.xxxx.com:xxxx/?token=xxxxx
需要注意,实际我们请求不需要 token,所以直接用:https://hz.xxxx.com:xxxx 这段即可。
访问后,模型开始加载,大概需要6-10分钟。模型完全加载到单卡A6000中占用显存 33.5 GB左右。
模型加载完毕,点击左侧的 Browse files 传一张图片进去,就可以提问进行测试:
Q: describe this paintingA: This is a pixelated artwork depicting a butterfly with vibrant colors. The background consists of layered green and teal mountains, and there's a winding golden pathway that seems to lead the viewer's eyes towards the butterfly. The butterfly itself has a predominantly pink and orange hue with white wings and a red and yellow head.
对比测试
在此基础上,我们对CogVLM、GPT4和Gemini的表现进行了一些测试。
描述图片中的时间
输入如下图片,CogVLM、GPT4和Gemini都能基本准确的进行对其进行描述。
CogVLM
GPT4
Gemini
绘画鉴赏(印象派)
当我们加大一定难度,输入的图片为莫奈的《打阳伞的女人》时,具体给定描述语为:This is a famous oil painting. Do you know the author and name of this painting? Please appreciate this painting and analyze it from the perspective of composition, brightness, saturation, contrast, and emotional atmosphere. 不同模型的反应变得较有趣味,CogVLM虽然输出不多但是描述基本准确,GPT4输出较长,而且有一定逻辑性,而Gemini对于有人像的图片(即便是油画)也拒绝评论。
CogVLM
Answer:Yes, this painting is ‘Woman with a Parasol’ by Claude Monet. It’s a masterpiece of Impressionism, showcasing Monet’s unique brushwork and emphasis on light and atmosphere. The composition is balanced with the woman and the parasol occupying the left side, and the child on the right. The brightness and saturation are high, with vivid colors capturing the essence of a sunny day. The contrast is evident between the bright sky and the darker figures, drawing attention to them. The emotional atmosphere is one of tranquility and leisure, capturing a fleeting moment in time.
GPT4
Gemini
绘画鉴赏(抽象派)
我们输入毕加索的作品《蓝色房间》。具体描述语 This is a famous oil painting. Do you know the author and name of this painting? Please appreciate this painting and analyze it from the perspective of composition, brightness, saturation, contrast, and emotional atmosphere. 在这个场景下,其对图片的理解差距并不是非常大。
CogVLM
Yes, this painting is ‘The Blue Room’ by Pablo Picasso. It’s a masterpiece that showcases Picasso’s unique style and his ability to capture emotion and atmosphere in his work. From a composition perspective, the painting is balanced with the figure in the foreground and the bed in the background. The brightness and saturation are used to create a sense of warmth and intimacy, contrasting with the cool blue tones that dominate the room. The emotional atmosphere is one of tranquility and introspection, with the figure lost in thought and the room bathed in soft light.
GPT4
This painting is titled “The Blue Room” and was painted by the Russian artist Valentin Serov in 1904. Serov was one of the foremost portrait artists of his time and a key figure in the Russian art scene at the turn of the 20th century.
In terms of composition, “The Blue Room” features a central figure, a nude woman, engaged in what appears to be a bathing ritual. The woman is positioned in a manner that suggests a candid moment rather than a posed one, creating a sense of intimacy. The composition is balanced, with the elements of the room leading the viewer’s eyes around the scene, from the striped item on the left to the flower arrangement on the right, and finally to the artwork and the wall’s details.
The brightness and saturation levels of the painting contribute to the calm and somewhat reflective mood. The cool blue tones dominate the scene, establishing a serene and tranquil atmosphere. Alongside these blues are touches of warmer hues like the reds and oranges found in the patterned textile and the bouquet of flowers. These elements provide a contrast that enlivens the overall cool color scheme without overpowering it.
Contrast is also evident in the treatment of light and shadow. The white of the bed linens and the brighter spots on the wall stand out against the muted blues, lending depth to the scene, and highlighting the form of the woman and the ripple of the bedclothes.
Emotionally, the setting is peaceful and introspective. The posture of the woman, with her head down and her body slightly hunched, suggests a moment of vulnerability or self-absorption. Combined with the cool color palette and the soft lighting, the painting evokes a deep sense of solitude and quietude.
Overall, “The Blue Room” is a masterful display of Serov’s ability to capture a fleeting, personal moment with sensitivity and elegance, using color, light, and composition to express a quiet but profound emotional narrative.
Gemini
The painting you sent is The Blue Room by Pablo Picasso. It is an oil on canvas painting that he painted in 1901 during his Blue Period.
Composition
The painting is composed of two main figures: a nude woman bathing in a tub and a bearded man standing in the doorway. The woman is the central focus of the painting, and her body is positioned diagonally across the canvas, creating a sense of movement and energy. The man is positioned off-center, and his body is turned away from the viewer, creating a sense of mystery and distance.
The painting is also divided into two main planes: the foreground and the background. The foreground is occupied by the woman and the tub, while the background is occupied by the wall and the doorway. The contrast between the two planes creates a sense of depth and perspective.
Brightness
The painting is relatively dark, with the exception of the woman’s skin and the white towel draped over the tub. The darkness of the painting creates a sense of intimacy and moodiness.
Saturation
The colors in the painting are muted and desaturated. This is characteristic of Picasso’s Blue Period paintings, which are often dominated by shades of blue and green. The muted colors create a sense of melancholy and loneliness.
Contrast
The contrast between the light and dark areas of the painting is relatively low. This creates a sense of unity and harmony. However, the contrast between the woman’s skin and the darkness of the background creates a sense of focus and importance.
Emotional atmosphere
The overall emotional atmosphere of the painting is one of melancholy and loneliness. This is evident in the muted colors, the dark background, and the woman’s solitary pose. The painting also conveys a sense of intimacy and voyeurism.
Analysis
The Blue Room is a complex and powerful painting that can be interpreted in many ways. On one level, it can be seen as a reflection of Picasso’s own personal feelings of melancholy and loneliness during his Blue Period. On another level, it can be seen as a more general meditation on the human condition.
The painting is also notable for its innovative composition and use of color. Picasso’s use of diagonal lines and muted colors creates a sense of movement, depth, and mood. The painting is also a masterpiece of light and shadow.
Overall, The Blue Room is a masterpiece of modern art. It is a painting that continues to fascinate and inspire viewers today.
细粒度物体识别 (飞机)
当我们将输入内容换成空客A320飞机样图 (airbus A320),向各个模型询问:Please tell me the type of the plane。CogVLM给出的结论依旧非常简洁,GPT4和Gemini给出的答案会更详细一些,Gemini会给到更详细一些的型号。
CogVLM
The plane in the image is an Airbus A320.
GPT4
Gemini
物品计数
我们输入的图片是在非正面视角下的图书,要求各个模型对图中书的数目进行计数 ,人眼可分辨有20本左右,前面14本确实很容易分辨,后面的6本较难识别。
CogVLM能辨认出14本;GPT的能力较强,他认为有20本书,并且说有一些书不完全在视野中;而Gemini只看出来了5本。
CogVLM
GPT4
Gemini
由此简单的测试可看到,CogVLM的能力还是比较强的。