参考:https://github.com/unum-cloud/uform
https://huggingface.co/unum-cloud/uform-gen2-qwen-500m
https://baijiahao.baidu.com/s?id=1787054120353641459&wfr=spider&for=pc
demo:https://huggingface.co/spaces/unum-cloud/uform-gen2-qwen-500m-demo
UForm相比其他多模态模型小很多,不到5G参数
UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model consists of two parts:CLIP-like ViT-H/14
Qwen1