MiniCPM-V MiniCPM-V & o Cookbook

SWIFT

:::{Note} 支持版本: MiniCPM-V 2.6 :::

SWIFT 是一个高效、可扩展的大模型微调框架,支持 LoRA、Adapter、Prompt Tuning 等多种参数高效微调方法。

安装 SWIFT

可以用以下命令快速安装 SWIFT:

git clone https://github.com/modelscope/swift.git
cd swift
pip install -r requirements.txt
pip install -e '.[llm]'

训练

准备数据

可以参考下方格式准备自己的数据集。自定义数据集支持 JSON 与 JSONL 格式。

{"query": "What does this picture describe?", "response": "This picture has a giant panda.", "images": ["local_image_path"]}
{"query": "What does this picture describe?", "response": "This picture has a giant panda.", "history": [], "images": ["local_image_path"]}
{"query": "Is bamboo tasty?", "response": "It seems pretty tasty judging by the panda's expression.", "history": [["What's in this picture?", "There's a giant panda in this picture."], ["What is the panda doing?", "Eating bamboo."]], "images": ["image_url"]}

也可以直接使用 ModelScope 上的数据集,例如图像数据集 coco-en-mini 或视频数据集 video-chatgpt

图像微调

我们使用 coco-en-mini 数据集做微调,任务是描述图片内容。

下面是脚本配置:

# 默认情况下,`lora_target_modules` 会被设置成 `llm` 与 `resampler` 中所有的 linear 层
CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft \
  --model_type minicpm-v-v2_6-chat \
  --model_id_or_path OpenBMB/MiniCPM-V-2_6 \
  --sft_type lora \
  --dataset coco-en-mini#20000 \
  --deepspeed default-zero2

如果想用自定义数据集,按下面方式指定即可:

  --dataset train.jsonl \
  --val_dataset val.jsonl \

微调后的推理脚本如下:

# 设 `--show_dataset_sample -1` 跑完整评测
CUDA_VISIBLE_DEVICES=0 swift infer \
    --ckpt_dir output/minicpm-v-v2_6-chat/vx-xxx/checkpoint-xxx \
    --load_dataset_config true --merge_lora true

视频微调

我们使用 video-chatgpt 数据集做微调,任务是描述视频内容。

下面是脚本配置:

CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft \
  --model_type minicpm-v-v2_6-chat \
  --model_id_or_path OpenBMB/MiniCPM-V-2_6 \
  --sft_type lora \
  --dataset video-chatgpt \
  --deepspeed default-zero2

如果想用自定义数据集:

  --dataset train.jsonl \
  --val_dataset val.jsonl \

自定义数据集支持 JSON 与 JSONL 格式。下面是视频数据集示例:

{"query": "<video>Describe what is happening in this video.", "response": "A dog is playing with a ball in a park.", "videos": ["path/to/video1.mp4"]}
{"query": "What are the people doing in the video?<video>Can you see any vehicles?<video>", "response": "People are walking on the street, and there are cars and bicycles.", "history": [], "videos": ["path/to/video2.mp4", "path/to/video3.mp4"]}
{"query": "Was there a red car in the previous video?", "response": "Yes, there was a red car parked near the sidewalk.", "history": [["What did you see in the video?", "There was a car, a bicycle, and several pedestrians."], ["What time was it?", "It seemed to be in the afternoon."]], "videos": []}

微调后的推理脚本如下:

CUDA_VISIBLE_DEVICES=0 swift infer \
    --ckpt_dir output/minicpm-v-v2_6-chat/vx-xxx/checkpoint-xxx \
    --load_dataset_config true --merge_lora true

推理

下面这段命令会下载 MiniCPM-V 2.6 模型并直接推理:

CUDA_VISIBLE_DEVICES=0 swift infer \
  --model_type minicpm-v-v2_6-chat \
  --model_id_or_path OpenBMB/MiniCPM-V-2_6