MiniCPM-V & o Cookbook

Cook up amazing multimodal AI applications effortlessly with MiniCPM-V and MiniCPM-o, bringing vision, speech, and live-streaming capabilities right to your fingertips.

What's new

🎉 MiniCPM-V 4.6 released — Instruct + Thinking variants, Qwen3.5 hybrid backbone, 256K context, simplified vision merger.
Inference: Single-image QA · Multi-image QA · Video · OCR · PDF · Grounding
Deployment: vLLM · SGLang · llama.cpp · Ollama
Quantization: GGUF · BNB · AWQ

Pick the right recipe

Individuals

Effortless inference on your own machine — runs on CPU + GPU, macOS / Linux / Windows, even on phones.

Ollama — easiest setup
llama.cpp — fastest CPU inference
iOS demo — runs on iPhone / iPad

Enterprises

High-throughput, scalable serving:

vLLM — production-grade GPU inference
SGLang — high-throughput GPU inference

Researchers

Train / fine-tune / customize:

Versions

This cookbook tracks all currently supported MiniCPM-V & o releases:

Version	Status	Modalities	Backbone	Context
MiniCPM-V 4.6 (latest)	Recommended	Image, Video	Qwen3.5 hybrid	256K
MiniCPM-V 4.5	Stable	Image, Video	Qwen3	32K
MiniCPM-o 4.5	Stable	Image, Video, Audio	Qwen3	32K

Use the version switcher in the sidebar to jump between releases.