UniCombine: DiT-Powered Triple-Control Fusion — MMDiT + LoRA Redefining Controllable Generation
UniCombine is a Diffusion Transformer (DiT)-powered framework that achieves state-of-the-art performance across multiple generation tasks by flexibly integrating text prompts, spatial mappings, and subject images in arbitrary combinations. The framework introduces a Conditional Multi-Modal Diffusion Transformer (MMDiT) attention mechanism and trainable LoRA modules, supporting both training-free and training-based implementations. Furthermore, we open-source the SubjectSpatial200K dataset—the first large-scale benchmark addressing the scarcity of multi-condition generation data. UniCombine demonstrates exceptional framework versatility and conditional consistency, paving the way for advancements in controllable content creation.
© Copyright Notice
The copyright of the article belongs to the author. Please do not reprint without permission.
Related Posts
No comments yet...