PromptEnhancer – an open-source text-to-image prompt enhancement framework by Tencent

AI Tools updated 2d ago dongdong
14 0

What is PromptEnhancer?

PromptEnhancer is an open-source prompt rewriting framework developed by Tencent’s Hunyuan team, designed to enhance text-to-image (T2I) models. By leveraging Chain-of-Thought (CoT) prompt rewriting and a dedicated reward model called AlignEvaluator, it significantly improves T2I models’ understanding of complex user instructions and the accuracy of generated images. The framework does not require modifying the weights of T2I models, making it general-purpose and plug-and-play for various pretrained models. PromptEnhancer optimizes prompts through a two-stage training process (supervised fine-tuning and reinforcement learning), producing images that better align with user intentions.

PromptEnhancer – an open-source text-to-image prompt enhancement framework by Tencent


Key Features of PromptEnhancer

  • Improved Accuracy and Alignment for T2I Models: By optimizing user input prompts, PromptEnhancer significantly enhances the accuracy and alignment of T2I model outputs with user intentions, handling complex instructions including attribute binding, negations, and intricate relational descriptions.

  • General-Purpose and Plug-and-Play: Works with multiple pretrained models such as HunyuanImage, Stable Diffusion, and Imagen without modifying model weights, reducing optimization costs.

  • High-Quality Benchmark Dataset: Open-source dataset includes 6,000 prompts with multi-dimensional, fine-grained annotations, providing an important resource for researchers to study interpretability and reproducibility in prompt optimization.


Technical Principles of PromptEnhancer

  • Chain-of-Thought (CoT) Prompt Rewriting: Introduces a CoT mechanism to mimic human designers’ reasoning, breaking down simple user instructions into three steps: “core elements – potential ambiguities – detail supplementation.”

  • Dedicated Reward Model (AlignEvaluator): Uses a 6-category, 24-dimension evaluation system trained on large-scale annotated data to provide a “precision score” for generated images. Dimensions include language understanding (e.g., negation, pronoun reference), visual attributes (e.g., object count, material, expression), and complex relations (e.g., inclusion, similarity, counterfactual scenarios).

Two-Stage Training

  1. Stage 1 – Supervised Fine-Tuning (SFT): Initializes the CoT rewriter to generate grammatically and logically refined prompts, trained on large datasets of “original prompt – chain-of-thought – refined prompt” pairs generated by large models.

  2. Stage 2 – Generation Reward Policy Optimization (GRPO): Multiple candidate prompts from the rewriter are fed into a frozen T2I model, scored by AlignEvaluator. Prompts with higher rewards are prioritized, optimizing the rewriter to produce prompts that maximize alignment between generated images and user intent.


Project Links


Application Scenarios of PromptEnhancer

  • Advertising Design: Quickly generate high-quality posters and promotional materials, improving design efficiency.

  • Illustration Creation: Helps illustrators rapidly produce creative sketches, saving time and effort.

  • Game Design: Assists game developers in generating concept art for characters, scenes, and props, accelerating game development.

  • Social Media Content: Quickly create engaging images and videos for social media, enhancing content appeal.

  • Video Production: Generate high-quality frames or concept visuals for video creation, aiding editing and special effects production.

© Copyright Notice

Related Posts

No comments yet...

none
No comments yet...