MiniCPM-V & o Cookbook
Cook up amazing multimodal AI applications effortlessly with MiniCPM-V and MiniCPM-o, bringing vision, speech, and live-streaming capabilities right to your fingertips.
What's new
- π MiniCPM-V 4.6 released β Instruct + Thinking variants, Qwen3.5 hybrid backbone, 256K context, simplified vision merger.
- Inference: Single-image QA Β· Multi-image QA Β· Video Β· OCR Β· PDF Β· Grounding
- Deployment: vLLM Β· SGLang Β· llama.cpp Β· Ollama
- Quantization: GGUF Β· BNB Β· AWQ
Pick the right recipe
Individuals
Effortless inference on your own machine β runs on CPU + GPU, macOS / Linux / Windows, even on phones.
Enterprises
High-throughput, scalable serving:
Researchers
Train / fine-tune / customize:
Versions
This cookbook tracks all currently supported MiniCPM-V & o releases:
| Version | Status | Modalities | Backbone | Context |
|---|---|---|---|---|
| MiniCPM-V 4.6 (latest) | Recommended | Image, Video | Qwen3.5 hybrid | 256K |
| MiniCPM-V 4.5 | Stable | Image, Video | Qwen3 | 32K |
| MiniCPM-o 4.5 | Stable | Image, Video, Audio | Qwen3 | 32K |
Use the version switcher in the sidebar to jump between releases.
Resources
- π€ HuggingFace
- π€ ModelScope
- π Technical Blog
- π¬ Discord
- π Open an Issue