实现llava的【单轮对话】调整成【多轮对话】（输入图片/多模态/多轮对话/llava）

使用llava时，将llava的单轮对话调整成多轮对话

先说好
方法一，直接对官方网站的quick start代码进行修改
方法二：使用bash结合基础的单轮对话代码进行修改
2.1 首先是最基础的llava官网代码回顾
- 2.2 伪多轮对话方法（每次新的prompt输入）
- 2.3 真正的==多轮对话==，也就是每次输入问题prompt的时候，把上次模型的回答（对话历史）也一起输入进去，可以把文件修改成这样：

先说好

看方法二！看方法二！看方法二！
方法二的帮助更大，格式通俗易懂！输出文件什么的都设置好了写好了
方法一我是一个py文件里实现，但目前效果不太好，因为太费内存了！还是方法二，分开成py和bash文件效果更好！

方法一，直接对官方网站的quick start代码进行修改

llava_github官网里的单轮对话代码，我能跑起来的如下所示：

import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"  # 指定使用 GPU 0from llava.model.builder import load_pretrained_model
from llava.mm_utils import get_model_name_from_path
from llava.eval.run_llava import eval_modelmodel_path = "/data1/yjgroup/tym/lab_sync_mac/LLaVA/checkpoints/llava-v1.6-vicuna-7b"tokenizer, model, image_processor, context_len = load_pretrained_model(model_path=model_path,model_base=None,model_name=get_model_name_from_path(model_path),# load_4bit=True,# load_8bit=True
)model_path = "/data1/yjgroup/tym/lab_sync_mac/LLaVA/checkpoints/llava-v1.6-vicuna-7b"
prompt = "What are the things I should be cautious about when I visit here?"
image_file = "/home/data/yjgroup/fsy/VG_100K/1.jpg"args = type('Args', (), {"model_path": model_path,"model_base": None,"model_name": get_model_name_from_path(model_path),"query": prompt,"conv_mode": None,"image_file": image_file,"sep": ",","temperature": 0.2,"top_p":None,"num_beams": 1,'max_new_tokens': 512
})()eval_model(args)

现在我希望能够不仅实现输入一张图片后，进行多模态单轮对话，还可以进行多轮对话，我可以将代码修改成如下所示：
但是请注意：这样修改以后会很占内存

import os
from llava.model.builder import load_pretrained_model
from llava.mm_utils import get_model_name_from_path
from llava.eval.run_llava import eval_model# 设置环境变量
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"  # 指定使用 GPU 0# 加载预训练模型
model_path = "/data1/yjgroup/tym/lab_sync_mac/LLaVA/checkpoints/llava-v1.6-vicuna-7b"
tokenizer, model, image_processor, context_len = load_pretrained_model(model_path=model_path,model_base=None,model_name=get_model_name_from_path(model_path)
)# 定义初始对话内容和图片路径
initial_prompt = "What are the things I should be cautious about when I visit here?"
image_file = "/home/data/yjgroup/fsy/VG_100K/1.jpg"# 创建一个类来存储对话上下文
class MultiTurnDialog:def __init__(self, initial_prompt, image_file):self.dialog = [{"role": "user", "content": initial_prompt}]self.image_file = image_filedef add_turn(self, role, content):self.dialog.append({"role": role, "content": content})def get_dialog(self):return "\n".join([f"{turn['role'].capitalize()}: {turn['content']}" for turn in self.dialog])# 初始化对话
multi_turn_dialog = MultiTurnDialog(initial_prompt, image_file)# 定义一个函数来运行模型并获取响应
def get_model_response(dialog, image_file):args = type('Args', (), {"model_path": model_path,"model_base": None,"model_name": get_model_name_from_path(model_path),"query": dialog,"conv_mode": None,"image_file": image_file,"sep": ",","temperature": 0.2,"top_p": None,"num_beams": 1,'max_new_tokens': 512})()return eval_model(args)# 初次运行模型获取响应
response = get_model_response(multi_turn_dialog.get_dialog(), multi_turn_dialog.image_file)
print(f"Assistant: {response}")# 添加初次对话到对话上下文
multi_turn_dialog.add_turn("assistant", response)# 示例多轮对话
user_inputs = ["Can you tell me more about the safety measures?","What about the local customs and traditions?"
]for user_input in user_inputs:# 添加用户输入到对话上下文multi_turn_dialog.add_turn("user", user_input)# 获取当前对话上下文current_dialog = multi_turn_dialog.get_dialog()# 运行模型获取响应response = get_model_response(current_dialog, multi_turn_dialog.image_file)print(f"Assistant: {response}")# 添加模型响应到对话上下文multi_turn_dialog.add_turn("assistant", response)# 最终对话内容
print("Final dialog context:")
print(multi_turn_dialog.get_dialog())

模型输出：

（下面这个输出说明一件事。我们的输出内容没有保存好，所以我又重新修改了一下代码）

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:07<00:00, 2.35s/it]
You are using a model of type llava to instantiate a model of type llava_llama. This is not supported for all configurations of models and can yield errors.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:06<00:00, 2.25s/it]
When visiting a place like the one shown in the image, which appears to be a city street with parked cars, trees, and pedestrians, here are some general things to be cautious about:Pedestrian Safety: Always be aware of your surroundings and watch for vehicles when crossing streets.Parking Regulations: If you're parking your vehicle, make sure to follow the parking signs and regulations to avoid fines or towing.Traffic Signals: Pay attention to traffic lights and crosswalks to ensure you're following the rules of the road.Personal Safety: Keep an eye on your belongings to prevent theft, and be mindful of your personal space.Local Laws and Customs: Familiarize yourself with local laws and customs to avoid inadvertently breaking the law or offending someone.Weather Conditions: Depending on the season, be prepared for the weather. If it's cold, dress appropriately. If it's hot, stay hydrated.Health Precautions: Depending on the current health advisories, you may need to take precautions such as wearing a mask or using hand sanitizer.Communication: If you're in a foreign country, have a way to communicate in case of an emergency, such as a local SIM card for your phone or a translation app.Emergency Services: Know the local emergency numbers and the location of the nearest embassy or consulate if you're traveling internationally.Cultural Sensitivity: Be respectful of the local culture and traditions. This includes dress codes, behavior in public spaces, and respect for religious sites.Remember, these are general tips and the specific precautions you should take may vary depending on the specific location and your own personal circumstances.
You are using a model of type llava to instantiate a model of type llava_llama. This is not supported for all configurations of models and can yield errors.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00, 1.65it/s]
You are using a model of type llava to instantiate a model of type llava_llama. This is not supported for all configurations of models and can yield errors.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:07<00:00, 2.47s/it]
When visiting an area like the one shown in the image, which appears to be a city street with a sidewalk, here are some general things to be cautious about:Traffic: Always be aware of the traffic around you, especially if you're walking near the road. Cars and bicycles can move quickly, so make sure to stay on the sidewalk and follow traffic rules.Pedestrian Safety: Look both ways before crossing the street, even if there's a crosswalk. Be mindful of vehicles that may not see you.Personal Safety: Keep an eye on your belongings to prevent theft. In busy areas, pickpockets can be a concern.Local Laws and Customs: Be aware of local laws and customs. Some places have specific rules about littering, smoking, or drinking in public.Weather Conditions: Depending on the season, be prepared for the weather. If it's cold, wear appropriate clothing. If it's hot, stay hydrated and wear sunscreen.Health Precautions: Depending on the region, there may be health advisories or precautions to take, such as vaccinations or avoiding certain foods.Communication: Have a way to communicate in case of an emergency. This could be a local phone, a map, or a translation app.Emergency Services: Know the local emergency numbers and the location of the nearest embassy or consulate if you're traveling internationally.Scams: Be wary of common tourist scams. They can range from overpriced services to more elaborate cons.Cultural Sensitivity: Be respectful of local customs and traditions. This can include dress codes, behavior in religious sites, and respect for local laws.Remember, these are general tips and the specific precautions can vary depending on the exact location and the time of your visit.
Assistant: None
You are using a model of type llava to instantiate a model of type llava_llama. This is not supported for all configurations of models and can yield errors.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:07<00:00, 2.42s/it]
None
Assistant: None
You are using a model of type llava to instantiate a model of type llava_llama. This is not supported for all configurations of models and can yield errors.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:07<00:00, 2.49s/it]
None
Assistant: None
Final dialog context:
User: What are the things I should be cautious about when I visit here?
Assistant: None
User: Can you tell me more about the safety measures?
Assistant: None
User: What about the local customs and traditions?
Assistant: None

第三次修改：
这次修改的问题是：确实把输出都保存到prompt里了，但这样修改太占用内存了，所以很容易跑不起来，所以我换了一种思路（方法二）

import os
from llava.model.builder import load_pretrained_model
from llava.mm_utils import get_model_name_from_path
from llava.eval.run_llava import eval_model
from io import StringIO
import sys# 设置环境变量
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"  # 指定使用 GPU 0# 加载预训练模型
model_path = "/data1/yjgroup/tym/lab_sync_mac/LLaVA/checkpoints/llava-v1.6-vicuna-7b"
tokenizer, model, image_processor, context_len = load_pretrained_model(model_path=model_path,model_base=None,model_name=get_model_name_from_path(model_path)
)# 定义初始对话内容和图片路径
initial_prompt = "What are the things I should be cautious about when I visit here?"
image_file = "/home/data/yjgroup/fsy/VG_100K/1.jpg"# 创建一个类来存储对话上下文
class MultiTurnDialog:def __init__(self, initial_prompt, image_file):self.dialog = [{"role": "user", "content": initial_prompt}]self.image_file = image_filedef add_turn(self, role, content):self.dialog.append({"role": role, "content": content})def get_dialog(self):return "\n".join([f"{turn['role'].capitalize()}: {turn['content']}" for turn in self.dialog])# 初始化对话
multi_turn_dialog = MultiTurnDialog(initial_prompt, image_file)# 定义一个函数来运行模型并获取响应
def get_model_response(dialog, image_file):args = type('Args', (), {"model_path": model_path,"model_base": None,"model_name": get_model_name_from_path(model_path),"query": dialog,"conv_mode": None,"image_file": image_file,"sep": ",","temperature": 0.2,"top_p": None,"num_beams": 1,'max_new_tokens': 512})()# 捕获标准输出old_stdout = sys.stdoutsys.stdout = mystdout = StringIO()try:eval_model(args)finally:# 恢复标准输出sys.stdout = old_stdout# 获取模型输出model_output = mystdout.getvalue().strip()return model_output# 初次运行模型获取响应
response = get_model_response(multi_turn_dialog.get_dialog(), multi_turn_dialog.image_file)
print(f"Assistant: {response}")# 添加初次对话到对话上下文
multi_turn_dialog.add_turn("assistant", response)# 示例多轮对话
user_inputs = ["Can you tell me more about the safety measures?","What about the local customs and traditions?"
]for user_input in user_inputs:# 添加用户输入到对话上下文multi_turn_dialog.add_turn("user", user_input)# 获取当前对话上下文current_dialog = multi_turn_dialog.get_dialog()# 运行模型获取响应response = get_model_response(current_dialog, multi_turn_dialog.image_file)print(f"Assistant: {response}")# 添加模型响应到对话上下文multi_turn_dialog.add_turn("assistant", response)# 最终对话内容
print("Final dialog context:")
print(multi_turn_dialog.get_dialog())

方法二：使用bash结合基础的单轮对话代码进行修改

2.1 首先是最基础的llava官网代码回顾

import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"  # 指定使用 GPU 0from llava.model.builder import load_pretrained_model
from llava.mm_utils import get_model_name_from_path
from llava.eval.run_llava import eval_modelmodel_path = "/data1/yjgroup/tym/lab_sync_mac/LLaVA/checkpoints/llava-v1.6-vicuna-7b"tokenizer, model, image_processor, context_len = load_pretrained_model(model_path=model_path,model_base=None,model_name=get_model_name_from_path(model_path),# load_4bit=True,# load_8bit=True
)model_path = "/data1/yjgroup/tym/lab_sync_mac/LLaVA/checkpoints/llava-v1.6-vicuna-7b"
prompt = "What are the things I should be cautious about when I visit here?"
image_file = "/home/data/yjgroup/fsy/VG_100K/1.jpg"args = type('Args', (), {"model_path": model_path,"model_base": None,"model_name": get_model_name_from_path(model_path),"query": prompt,"conv_mode": None,"image_file": image_file,"sep": ",","temperature": 0.2,"top_p":None,"num_beams": 1,'max_new_tokens': 512
})()eval_model(args)

2.2 伪多轮对话方法（每次新的prompt输入）

建立一个Bash 脚本（multi_round_dialogue.sh）

#!/bin/bash# 定义变量
MODEL_PATH="/data1/yjgroup/tym/lab_sync_mac/LLaVA/checkpoints/llava-v1.6-vicuna-7b"
IMAGE_FILE="/home/data/yjgroup/fsy/VG_100K/1.jpg"
OUTPUT_FILE="dialogue_output.txt"
CUDA_DEVICE="0"  # 指定使用的 CUDA 设备# 清空输出文件
> $OUTPUT_FILE# 设置每轮对话的提示
PROMPTS=("What are the things I should be cautious about when I visit here?""Can you suggest some local foods to try?""What are the best tourist attractions in this area?"
)# 对话轮数（根据提示数量设置）
ROUNDS=${#PROMPTS[@]}for ((i=0; i<ROUNDS; i++))
doPROMPT=${PROMPTS[$i]}echo "Round $((i+1))" >> $OUTPUT_FILEecho "User: $PROMPT" >> $OUTPUT_FILE# 调用 Python 脚本并获取输出RESPONSE=$(CUDA_VISIBLE_DEVICES=$CUDA_DEVICE python3 run_llava_dialogue.py "$MODEL_PATH" "$IMAGE_FILE" "$PROMPT")echo "Assistance: $RESPONSE" >> $OUTPUT_FILEecho "" >> $OUTPUT_FILE
done

建立一个修改后的 Python 脚本（run_llava_dialogue.py）

import os
import sys
from llava.model.builder import load_pretrained_model
from llava.mm_utils import get_model_name_from_path
from llava.eval.run_llava import eval_model# 从命令行参数获取输入
model_path = sys.argv[1]
image_file = sys.argv[2]
prompt = sys.argv[3]os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = os.getenv("CUDA_VISIBLE_DEVICES", "0")  # 获取 CUDA 设备# 加载预训练模型
tokenizer, model, image_processor, context_len = load_pretrained_model(model_path=model_path,model_base=None,model_name=get_model_name_from_path(model_path),# load_4bit=True,# load_8bit=True
)args = type('Args', (), {"model_path": model_path,"model_base": None,"model_name": get_model_name_from_path(model_path),"query": prompt,"conv_mode": None,"image_file": image_file,"sep": ",","temperature": 0.2,"top_p": None,"num_beams": 1,'max_new_tokens': 512
})()# 运行模型评估
response = eval_model(args)# 打印响应以便 Bash 脚本捕获
print(response)

使用方法：
- 确保 multi_round_dialogue.sh 和 run_llava_dialogue.py 放在同一目录下。
- 给Bash脚本添加可执行权限：
```
chmod +x multi_round_dialogue.sh
```
- 运行 Bash 脚本：
```
./multi_round_dialogue.sh
```

这个脚本会运行三轮对话，并将每轮对话的用户输入和助手响应保存到 dialogue_output.txt 文件中。每轮对话的问题已经提前设置好，并在每次运行时指定 CUDA 的卡号。

2.3 真正的多轮对话，也就是每次输入问题prompt的时候，把上次模型的回答（对话历史）也一起输入进去，可以把文件修改成这样：

修改后的 Bash 脚本（multi_round_dialogue.sh）

#!/bin/bash# 定义变量
MODEL_PATH="/data1/yjgroup/tym/lab_sync_mac/LLaVA/checkpoints/llava-v1.6-vicuna-7b"
IMAGE_FILE="/home/data/yjgroup/fsy/VG_100K/1.jpg"
OUTPUT_FILE="dialogue_output.txt"
CUDA_DEVICE="0"  # 指定使用的 CUDA 设备# 清空输出文件
> $OUTPUT_FILE# 设置每轮对话的提示
PROMPTS=("What are the things I should be cautious about when I visit here?""Can you suggest some local foods to try?""What are the best tourist attractions in this area?"
)# 对话轮数（根据提示数量设置）
ROUNDS=${#PROMPTS[@]}# 初始化对话历史
DIALOGUE_HISTORY=""for ((i=0; i<ROUNDS; i++))
doPROMPT=${PROMPTS[$i]}echo "Round $((i+1))" >> $OUTPUT_FILEecho "User: $PROMPT" >> $OUTPUT_FILE# 组合当前对话历史和新的提示if [ -z "$DIALOGUE_HISTORY" ]; thenINPUT_PROMPT="$PROMPT"elseINPUT_PROMPT="$DIALOGUE_HISTORY\nUser: $PROMPT"fi# 调用 Python 脚本并获取输出RESPONSE=$(CUDA_VISIBLE_DEVICES=$CUDA_DEVICE python3 run_llava_dialogue.py "$MODEL_PATH" "$IMAGE_FILE" "$INPUT_PROMPT")echo "Assistance: $RESPONSE" >> $OUTPUT_FILEecho "" >> $OUTPUT_FILE# 更新对话历史DIALOGUE_HISTORY="$INPUT_PROMPT\nAssistance: $RESPONSE"
done

修改后的 Python 脚本（run_llava_dialogue.py）

import os
import sys
from llava.model.builder import load_pretrained_model
from llava.mm_utils import get_model_name_from_path
from llava.eval.run_llava import eval_model# 从命令行参数获取输入
model_path = sys.argv[1]
image_file = sys.argv[2]
prompt = sys.argv[3]os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = os.getenv("CUDA_VISIBLE_DEVICES", "0")  # 获取 CUDA 设备# 加载预训练模型
tokenizer, model, image_processor, context_len = load_pretrained_model(model_path=model_path,model_base=None,model_name=get_model_name_from_path(model_path),# load_4bit=True,# load_8bit=True
)args = type('Args', (), {"model_path": model_path,"model_base": None,"model_name": get_model_name_from_path(model_path),"query": prompt,"conv_mode": None,"image_file": image_file,"sep": ",","temperature": 0.2,"top_p": None,"num_beams": 1,'max_new_tokens': 512
})()# 运行模型评估
response = eval_model(args)# 打印响应以便 Bash 脚本捕获
print(response)