MiniCPM-V MiniCPM-V & o Cookbook

Dataset Preparation Guide

This tutorial will guide you on how to prepare datasets for supervised fine-tuning of MiniCPM-o. Please strictly follow the format and requirements below to organize your data.

1. Data Format Description

Each data sample should be a dictionary containing the image path and multi-turn conversation content. For example:

{
  "image": "path/to/image.jpg",
  "conversations": [
    {"role": "user", "content": "Describe this image."},
    {"role": "assistant", "content": "This is a cat."}
  ]
}

Multi-image Input Format

To support multiple images, the image field should be a dictionary, with keys like <image_00>, <image_01>, and values as image paths:

{
  "image": {
    "<image_00>": "path/to/image1.jpg",
    "<image_01>": "path/to/image2.jpg"
  },
  "conversations": [
    {"role": "user", "content": "Compare <image_00> and <image_01>."},
    {"role": "assistant", "content": "The first image is a cat, the second is a dog."}
  ]
}

2. Dataset Loading and Preprocessing

Dataset loading and preprocessing mainly rely on the SupervisedDataset class. The core process is as follows:

Main Parameter Description

3. Dataset Examples

Single-image multi-turn conversation example:

{
  "image": "images/cat.jpg",
  "conversations": [
    {"role": "user", "content": "<image> What animal is this?"},
    {"role": "assistant", "content": "This is a cat."},
    {"role": "user", "content": "What is it doing?"},
    {"role": "assistant", "content": "It is sleeping."}
  ]
}

Multi-image conversation example:

{
  "image": {
    "<image_00>": "images/cat.jpg",
    "<image_01>": "images/dog.jpg"
  },
  "conversations": [
    {"role": "user", "content": "Compare <image_00> and <image_01>."},
    {"role": "assistant", "content": "The first image is a cat, the second is a dog."}
  ]
}

4. Common Issues and Notes