Primitive Anything – A Novel 3D Shape Generation Framework Jointly Launched by Tencent and Tsinghua University

What is PrimitiveAnything?

PrimitiveAnything is a novel 3D shape generation framework jointly developed by Tencent’s AI Platform Department (AIPD) and Tsinghua University. It decomposes complex 3D shapes into simple primitive elements and generates these primitives in an autoregressive manner, ultimately reassembling them into complete 3D models. The core strengths of the framework lie in its high-quality generation capabilities, strong generalization ability, and efficiency.

Key Features of PrimitiveAnything

High-Quality Primitive-Based 3D Generation
It can generate high-fidelity assemblies of 3D primitives that are not only geometrically faithful to the original models but also aligned with human perceptual understanding of shape structures.
Diverse 3D Content Creation
Supports 3D content generation from text or image conditions, providing users with flexible ways to create.
Efficient Storage and Editing
Leveraging primitive representations, the generated 3D models are more storage-efficient and easier to edit and modify.
Autoregressive Transformer Architecture
Utilizes an autoregressive Transformer to generate primitives step by step, handling sequences of varying lengths and allowing easy extension to new types of primitives.
Unambiguous Parametrization Scheme
Resolves ambiguity in parametrization to ensure stability and accuracy during training and generation.
Geometric Fidelity and Semantic Consistency
Maintains high geometric accuracy while producing semantically meaningful decompositions that align with human cognition.
Modular Design
The framework’s modular structure supports seamless integration of new primitive types without changing the overall architecture, enabling adaptability to various primitive representations.

Technical Principles of PrimitiveAnything

1. Unambiguous Parametrization Scheme

Unified Representation:
Uses a common parameterization scheme to represent various primitive types (e.g., cubes, elliptical cylinders, ellipsoids). Each primitive’s type, position, rotation, and scaling parameters are encoded and fed into the model.
Ambiguity Elimination:
To resolve inherent ambiguities (e.g., different parameter combinations producing identical shapes), the team developed a comprehensive rule set. By analyzing the symmetry of primitives and selecting the rotation configuration with the minimal L1 norm, the system ensures a unique and stable representation.

2. Autoregressive Transformer Architecture

Shape-Conditioned Generation:
Employs a decoder-based Transformer that generates variable-length sequences of primitives conditioned on shape features. A point cloud encoder extracts features from the 3D shape, which are then used by the autoregressive Transformer—along with previously generated primitives—to predict the next primitive.
Cascaded Decoder:
To capture dependencies among primitive attributes, a cascaded decoder predicts attributes (type, position, rotation, scale) in sequence. This models natural correlations—e.g., the primitive type may influence its plausible position or orientation—aligning with human reasoning in assembly tasks.

3. Autoregressive Generation Process

Sequence Generation:
Reframes the primitive abstraction process as a sequence generation task. Given a point cloud as input, the model generates a primitive sequence autoregressively until a special end token is predicted.
Training Objectives:
The training process integrates cross-entropy loss, Chamfer distance (for reconstruction accuracy), and Gumbel-Softmax (for differentiable sampling), enabling flexible and human-like decomposition of complex 3D shapes.

Project Links for PrimitiveAnything

Project Website: https://primitiveanything.github.io/
GitHub Repository: https://github.com/PrimitiveAnything/PrimitiveAnything
HuggingFace Model Hub: https://huggingface.co/hyz317/PrimitiveAnything
arXiv Technical Paper: https://arxiv.org/pdf/2505.04622

Application Scenarios of PrimitiveAnything

3D Modeling and Design
Enables rapid generation of the geometric “skeleton” of complex 3D models, allowing designers to focus on detailed refinement and significantly improve productivity and design speed.
Game Asset Generation
Game designers can quickly produce various scene and character models. Players can also create new characters or items by assembling basic geometric blocks, with AI automatically optimizing and integrating them into game engines.
User-Generated Content (UGC)
Supports generating 3D content from text or image inputs. Users can easily edit results, opening up new possibilities for UGC in gaming environments.
Virtual and Augmented Reality (VR/AR)
In immersive VR and AR settings, PrimitiveAnything can rapidly generate realistic 3D objects to enhance user experience.