Concept Lancet – An Image Editing Framework Developed by the University of Pennsylvania

What is Concept Lancet?

Concept Lancet (CoLan) is a plug-and-play, zero-shot image editing framework developed by a research team at the University of Pennsylvania. Concept Lancet is based on sparse decomposition of images in the latent space, representing an image as a linear combination of visual concepts. It enables precise concept transplantation according to editing tasks such as replacing, adding, or removing concepts.
CoLan leverages the CoLan-150K dataset, which contains descriptions of over 150,000 visual concepts, to accurately estimate the presence of each concept and achieve precise and visually consistent image editing.

Key Features of Concept Lancet

Precise Concept Replacement: Supports accurately replacing one concept in an image with another (e.g., replacing “cat” with “dog”).
Concept Addition and Removal: Supports adding new concepts to images (e.g., “adding watercolor style”) or removing existing ones (e.g., “removing clouds from the background”).
Visual Consistency Preservation: Maintains the overall visual consistency of the image during editing, avoiding visual distortion caused by over-editing or under-editing.
Zero-Shot Plug-and-Play: Can be directly applied to existing diffusion models without the need for retraining or fine-tuning, providing strong generality and flexibility.

Technical Principles of Concept Lancet

Concept Dictionary Construction:
- Visual Concept Extraction: Visual Language Models (VLMs) parse input images and prompts to generate a list of visual concepts relevant to the editing task, covering objects, attributes, scenes, etc.
- Concept Stimuli Generation: Large Language Models (LLMs) generate diverse descriptions and scenarios (called concept stimuli) for each concept, capturing the appearance of concepts in different contexts.
- Concept Vector Extraction: Concept stimuli are mapped into the latent space of diffusion models (such as text embedding space or score space) to extract representative vectors for each concept, forming the concept dictionary.
Sparse Decomposition:
The latent representation of the input image (e.g., text embedding or scores) is decomposed into a linear combination of entries in the concept dictionary. By solving for sparse coefficients (minimizing reconstruction error and regularization terms such as L1 regularization), the presence of each concept in the source image is accurately and concisely estimated.
Concept Transplantation:
Based on the editing task (replacement, addition, or removal), the decomposed coefficients are adjusted accordingly. For example, replacing the coefficient of a source concept with that of a target concept achieves precise transplantation. The adjusted coefficients are recombined into a new latent representation, and the edited image is generated through the diffusion model’s generation process.
Dataset Support:
To thoroughly model the concept space, a dataset with over 150,000 concept descriptions and scenarios was constructed. The concept stimuli provide rich contextual information for each concept, enabling more accurate and robust concept vectors.

Project Links

Official Website: https://peterljq.github.io/project/colan/
GitHub Repository: https://github.com/peterljq/Concept-Lancet
arXiv Paper: https://arxiv.org/pdf/2504.02828

Application Scenarios of Concept Lancet

Creative Design: Quickly transform sketches into artworks, add branding elements, and boost design efficiency.
Film Production: Rapidly generate concept art and scene designs, modify character appearances, and adapt to different storylines.
Game Development: Generate diverse game scenes and character variants, such as changing from day to night, enhancing development productivity.
Education and Training: Produce educational illustrations, transform historical scenes into modern ones, and help students better understand content.
Social Media: Transform ordinary photos into artistic styles, add eye-catching elements, and enhance content appeal.