Alibaba has introduced Qwen-Image, a new open-source image generation model that stands out for its ability to accurately render complex and multilingual text within images—a challenge that many AI tools still face. Developed by Alibaba’s Qwen Team, Qwen-Image is designed to generate clear and readable text in diverse contexts, ranging from handwritten poetry and bilingual posters to e-commerce labels and educational diagrams. The model supports both alphabetic scripts such as English and logographic scripts like Chinese, making it particularly effective for multilingual applications.
Users can experience Qwen-Image through the Qwen Chat website by switching to the “Image Generation” mode. Released under the Apache 2.0 license, the model is freely available for businesses and developers to use, modify, and distribute—including for commercial purposes—provided proper attribution is given.
The training of Qwen-Image involved billions of image-text pairs sourced from natural scenes, portraits, artistic posters, and synthetically generated text data. Notably, all synthetic data was created internally by Alibaba, without relying on AI-generated images from other models. This unique approach helped the model master handling rare and complex characters, particularly in Chinese.
Alibaba adopted a staged training process, beginning with simple captioned images and progressively advancing to complex layouts featuring dense, multilingual text. This curriculum-style method enabled Qwen-Image to generalize well across a wide range of formats.
At the core, Qwen-Image integrates three key components: Qwen2.5-VL, a multimodal language model that understands context; a VAE encoder/decoder optimized for high-resolution layouts; and MMDiT, a diffusion model with a specialized encoding system for precise spatial alignment. Together, these elements enable the generation of visually appealing images with accurate text placement and formatting.
Alibaba reports that Qwen-Image has been evaluated against various industry benchmarks for text clarity, layout accuracy, and prompt adherence. On the AI Arena public leaderboard, which ranks AI image models through human evaluations, Qwen-Image currently holds third place overall and is the highest-ranked open-source model.