FIBO – An open-source image generation model and the first to natively support JSON

What is FIBO？

FIBO is the first open-source text-to-image generation model that natively supports JSON, designed specifically for long and structured descriptions. Trained on over 100 million structured JSON descriptions (each around 1,000 words), FIBO enables precise and reproducible control over lighting, composition, color, and camera parameters.
It supports three modes — generation, refinement, and inspiration — and features attribute disentanglement, allowing users to adjust individual properties (e.g., camera angle) without disrupting the overall scene.
FIBO uses 100% licensed data, ensuring compliance and legal transparency, making it suitable for professional production workflows.

Key Features of FIBO

Text-to-Image Generation:
Generates high-quality images from textual descriptions provided by the user.
Structured JSON Prompts:
Expands short text prompts into detailed structured JSON descriptions, including elements like lighting, composition, and color.
Iterative & Controllable Generation:
Supports generating images from short prompts or refining existing images through multiple rounds of structured JSON updates.
Feature Disentanglement Control:
Allows individual adjustment of attributes (such as camera angle or lighting) without affecting the entire scene.
Inspiration Mode:
Extracts structured prompts from input images to generate related visual outputs, sparking creative ideas.
Enterprise-Grade Compliance:
Trained on 100% licensed data, ensuring full compliance, transparency, and reproducibility.
Production-Ready Integration:
Supports API access, ComfyUI nodes, and local inference, making it easy to integrate into professional pipelines.

Technical Architecture of FIBO

Architecture:
Built on an 8B-parameter DiT architecture, trained using the Flow Matching approach.
Text Encoder:
Utilizes SmolLM3-3B combined with an innovative DimFusion conditional architecture, enabling efficient training on long structured descriptions.
VAE (Variational Autoencoder):
Uses Wan 2.2 for image encoding and decoding.
VLM Guidance:
Employs a Visual Language Model (VLM) to expand short text prompts into detailed structured JSON prompts.
Structured Supervision:
Training with structured JSON descriptions promotes feature disentanglement and prevents prompt drift.
Data Compliance:
Trained on over 100 million licensed long structured JSON descriptions, ensuring data legality and ethical transparency.

Project Links

GitHub Repository: https://github.com/Bria-AI/FIBO
HuggingFace Model Page: https://huggingface.co/briaai/FIBO
Online Demo: https://huggingface.co/spaces/briaai/FIBO

Application Scenarios of FIBO

Professional Design & Creative Workflows:
Generates high-quality images for advertising, product design, and graphic design with rapid iteration and precise control to boost creative efficiency.
Film and Entertainment:
Creates concept art and scene designs for movies, games, and animations, accelerating the visual development process.
Education & Training:
Produces educational illustrations and virtual experiment environments to enhance learning and content creation.
Scientific Research:
Converts scientific data into intuitive visual representations, supporting research communication and data visualization.
Medical & Healthcare:
Generates medical diagrams and virtual surgery scenes, aiding medical education and surgical training.