Imagen

updated 1m ago 23 0 0

Google AI text-to-image generation model

published date:
2025-03-17
ImagenImagen
Imagen

Imagen is a text-to-image diffusion model with unprecedented realism and deep language understanding. Imagen builds on the power of large Transformer language models for understanding text, and builds on the strength of diffusion models for high-fidelity image generation. Our key finding is that generic pre-trained large language models (e.g., T5), pre-trained on a pure text corpus, are surprisingly effective at encoding text in image synthesis: increasing the size of the language model in Imagen improves sample realism and image-text alignment more than increasing the size of the image diffusion model. Imagen achieves a new state-of-the-art FID score of 7.27 on the COCO dataset, having never been trained on COCO; human raters rate Imagen samples as comparable to the COCO data itself in terms of image-text alignment. To more deeply evaluate text-to-image models, we introduce DrawBench, a comprehensive and challenging benchmark for text-to-image models. Using DrawBench, we compare ImageN to recent methods including VQ-GAN+CLIP, Latent Diffusion Model, and DALL-E 2, and find that human evaluators prefer ImageN in terms of both sample quality and image-text alignment in side-by-side comparisons.

Similar Sites

No comments yet...

none
No comments yet...