Back to Models
Qwen: Qwen3 VL 32B Instruct
qwen/qwen3-vl-32b-instructOct 23, 2025131.1K context32.8K max output$0.16/M in · $0.64/M out
Description
Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text comprehension, enabling fine-grained spatial reasoning, document and scene analysis, and long-horizon video understanding.Robust OCR in 32 languages, and enhanced multimodal fusion through Interleaved-MRoPE and DeepStack architectures. Optimized for agentic interaction and visual tool use, Qwen3-VL-32B delivers state-of-the-art performance for complex real-world multimodal tasks.
Specifications
Provider
qwen
Context Length
131.1K
Max Output
32.8K
Modality
Intextimage
Outtext
Pricing
| Type | Price / 1M tokens |
|---|---|
| Input | $0.16 |
| Output | $0.64 |
Quick Start
curl https://api.ominigate.ai/v1/chat/completions \
-H "Authorization: Bearer sk-omg-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen/qwen3-vl-32b-instruct",
"messages": [{"role": "user", "content": "Hello!"}]
}'