SenseNova-SI — SenseTime’s open-source Spatial Intelligence large model

What is SenseNova-SI?

SenseNova-SI is SenseTime’s open-source spatial intelligence large model, designed specifically to enhance spatial understanding. Trained on large-scale, high-quality spatial datasets, the model significantly improves its capabilities in core dimensions such as spatial measurement, relational understanding, and viewpoint transformation.
In multiple authoritative benchmarks, SenseNova-SI outperforms open-source models of similar size and even surpasses top-tier closed-source models like GPT-5. The project provides detailed installation and usage guides to help developers get started quickly, advancing embodied AI and world-model research while laying the foundation for AI systems that can understand the 3D world.

Key Features of SenseNova-SI

Spatial measurement and estimation:
Accurately estimates object dimensions, distances, and other quantitative spatial attributes.
Spatial relationship understanding:
Understands relative positions, orientations, and spatial layouts between objects.
Viewpoint transformation:
Handles changes in scene appearance from different viewpoints and infers the effects of viewpoint shifts.
Spatial reconstruction and deformation:
Understands 3D structures of objects and maintains spatial awareness after deformation or reconstruction.
Spatial reasoning:
Performs logical reasoning based on spatial information, such as predicting object movement or layout changes.
Multimodal fusion:
Integrates image, text, and other modalities to better comprehend complex spatial scenes.

Technical Principles of SenseNova-SI

Scale effect:
Through training on massive amounts of high-quality spatial data, SenseTime validates the “scale effect”—increased data volume leads to significant improvements in spatial cognition. This is the core driver behind SenseNova-SI’s performance jump.
Systematic training methodology:
SenseTime defines a classification framework for spatial capabilities and expands the dataset based on it, applying a systematic training approach that improves all dimensions of spatial intelligence in a consistent manner.
Multimodal fusion architecture:
Built on architectures such as InternVL, SenseNova-SI effectively fuses visual and textual information to enhance understanding of complex scenes.

Project Links

GitHub repository: https://github.com/OpenSenseNova/SenseNova-SI
HuggingFace model collection: https://huggingface.co/collections/sensenova/sensenova-si

Application Scenarios of SenseNova-SI

Autonomous driving:
With accurate spatial measurement and viewpoint transformation, vehicles can better interpret road environments, predict object movement, and improve safety and reliability.
Robotic navigation and manipulation:
Spatial relationship understanding and reasoning enable robots to navigate complex environments and operate objects with greater precision.
Virtual reality and augmented reality:
Enhances the realism of virtual spaces and enables more natural user interaction in VR/AR environments.
Intelligent security systems:
Analyses surveillance footage with spatial intelligence to quickly detect anomalies or changes in object positions, improving the efficiency and accuracy of security monitoring.
Architecture and spatial planning:
Assists designers with 3D spatial layout planning and rapidly generates or optimizes design schemes using spatial reconstruction capabilities.