QWEN CHAT
GITHUB
HUGGING FACE
MODELSCOPE
DISCORD
Introduction
At the end of January this year, we launched the Qwen2.5-VL series of models, which received widespread attention and positive feedback from the community. Building on the Qwen2.5-VL series, we continued to optimize the model using reinforcement learning and open-sourced the new VL model with the beloved 32B parameter scale under the Apache 2.0 license — Qwen2.5-VL-32B-Instruct. Compared to the previously released Qwen2.5-VL series models, the features of this 32B VL model are as follows:
- Responses More Aligned with Human Preferences: Adjusted the output style to provide more detailed, better-formatted answers that align more closely with human preferences.
- Mathematical Reasoning: Significant improvement in the accuracy of solving complex mathematical problems.
- Fine-grained Image Understanding and Reasoning: Enhanced accuracy and detailed analysis in tasks such as image parsing, content recognition, and visual logic deduction.
Performance
Extensive benchmarking against state-of-the-art (SoTA) models of comparable scale, Qwen2.5-VL-32B-Instruct has demonstrated superiority over baselines, e.g., Mistral-Small-3.1-24B and Gemma-3-27B-IT, even surpassing the larger Qwen2-VL-72B-Instruct. Notably, it achieves significant advantages in multimodal tasks such as MMMU, MMMU-Pro, and MathVista, which focus on complex, multi-step reasoning. On MM-MT-Bench, a benchmark emphasizing subjective user experience evaluation, Qwen2.5-VL-32B-Instruct outperforms its predecessor Qwen2-