Alibaba Unveils Qwen-Image, an Open-Source AI Model Excelling at Multilingual Text in Images

Alibaba Unveils Qwen-Image, an Open-Source AI Model Excelling at Multilingual Text in Images

Alibaba has introduced Qwen-Image, a new open-source image generation model that stands out for its ability to accurately render complex and multilingual text within images—a challenge that many AI tools still face. Developed by Alibaba’s Qwen Team, Qwen-Image is designed to generate clear and readable text in diverse contexts, ranging from handwritten poetry and bilingual posters to e-commerce labels and educational diagrams. The model supports both alphabetic scripts such as English and logographic scripts like Chinese, making it particularly effective for multilingual applications.

Users can experience Qwen-Image through the Qwen Chat website by switching to the “Image Generation” mode. Released under the Apache 2.0 license, the model is freely available for businesses and developers to use, modify, and distribute—including for commercial purposes—provided proper attribution is given.

The training of Qwen-Image involved billions of image-text pairs sourced from natural scenes, portraits, artistic posters, and synthetically generated text data. Notably, all synthetic data was created internally by Alibaba, without relying on AI-generated images from other models. This unique approach helped the model master handling rare and complex characters, particularly in Chinese.

Alibaba adopted a staged training process, beginning with simple captioned images and progressively advancing to complex layouts featuring dense, multilingual text. This curriculum-style method enabled Qwen-Image to generalize well across a wide range of formats.

At the core, Qwen-Image integrates three key components: Qwen2.5-VL, a multimodal language model that understands context; a VAE encoder/decoder optimized for high-resolution layouts; and MMDiT, a diffusion model with a specialized encoding system for precise spatial alignment. Together, these elements enable the generation of visually appealing images with accurate text placement and formatting.

Alibaba reports that Qwen-Image has been evaluated against various industry benchmarks for text clarity, layout accuracy, and prompt adherence. On the AI Arena public leaderboard, which ranks AI image models through human evaluations, Qwen-Image currently holds third place overall and is the highest-ranked open-source model.

 

 

- Advertisement -

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles

error: Content is protected !!

Share your details to download the Cybersecurity Report 2025

Share your details to download the CISO Handbook 2025

Sign Up for CXO Digital Pulse Newsletters

Share your details to download the Research Report

Share your details to download the Coffee Table Book

Share your details to download the Vision 2023 Research Report

Download 8 Key Insights for Manufacturing for 2023 Report

Sign Up for CISO Handbook 2023

Download India’s Cybersecurity Outlook 2023 Report

Unlock Exclusive Insights: Access the article

Download CIO VISION 2024 Report

Share your details to download the report

Share your details to download the CISO Handbook 2024

Fill your details to Watch