DeepSeek-V3.2 – Experimental Exp Version of DeepSeek’s Open-Source AI Model

What is DeepSeek-V3.2?

DeepSeek-V3.2-Exp is an experimental AI model released by DeepSeek-AI. By introducing the DeepSeek Sparse Attention (DSA) mechanism, it significantly improves efficiency in processing long texts. The model is continuously trained from DeepSeek-V3.1-Terminus and only modifies the architecture with DSA, implementing a fine-grained sparse attention mechanism. Using the Lightning Indexer, it efficiently selects key information, greatly accelerating training and inference for long text sequences.

In terms of performance, DeepSeek-V3.2-Exp achieves results comparable to DeepSeek-V3.1-Terminus on multiple public benchmarks, demonstrating strong capabilities across different domains. The model is open-sourced on Hugging Face and ModelScope, enabling researchers and developers to explore and apply it. Additionally, its API pricing has been significantly reduced, lowering costs for developers and promoting wider deployment in real-world applications.

Key Features of DeepSeek-V3.2-Exp

Architectural Innovation: Introduces DeepSeek Sparse Attention (DSA) on top of DeepSeek-V3.1-Terminus. Using the Lightning Indexer and fine-grained token selection, the model achieves significant efficiency gains, especially in long-text scenarios.
Performance Optimization: Maintains comparable performance to V3.1-Terminus on multiple benchmarks while reducing inference complexity from $O (L^{2})$ to $O (L k)$ , greatly improving long-text processing efficiency.
Open Source: Available on Hugging Face and ModelScope with detailed implementation and model weights for research and development purposes.
Cost Reduction: API pricing has been drastically lowered, allowing more developers to access the model at minimal cost, promoting broad adoption.
Platform Integration: The model has been deployed on official apps, web platforms, and mini-programs, providing users with a more efficient and cost-effective AI experience.

Technical Principles

Sparse Attention Mechanism: DSA computes index scores between query tokens and preceding tokens using the Lightning Indexer, selecting key value entries to achieve fine-grained sparse attention, improving efficiency for long texts.
Lightning Indexer: Core component of DSA, calculates index scores for query-preceding token pairs using a small number of index heads and efficient computation to quickly identify the most important tokens.
Fine-Grained Token Selection: Selects the top-k key value entries for attention computation based on index scores, reducing unnecessary calculations and improving inference speed.
MLA-Based Implementation: DSA is implemented under Multi-Layer Attention (MLA) architecture with Multi-Query Attention (MQA), allowing key entries to be shared across multiple queries for computational efficiency.
Continuous Training & Optimization: Starting from the V3.1-Terminus checkpoint, the model undergoes dense warm-up and sparse training phases to optimize the Lightning Indexer and overall model for sparse attention.

Project Links

Hugging Face: https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp
ModelScope: https://modelscope.cn/models/deepseek-ai/DeepSeek-V3.2-Exp
Technical Paper: https://github.com/deepseek-ai/DeepSeek-V3.2-Exp/blob/main/DeepSeek_V3_2.pdf

How to Use DeepSeek-V3.2

Via API: Developers can call the DeepSeek-V3.2-Exp API to integrate its capabilities into applications. Reduced API pricing lowers access costs.
Local Deployment: Download model weights from Hugging Face and follow the local deployment guide to run interactive inference.
Official Applications: Available on DeepSeek’s official app, web, and mini-program platforms for immediate use.
Fine-Tuning: Users can fine-tune the model for specific tasks or domains to improve performance in specialized scenarios.
Secondary Development: Open-source code allows developers to study model workings and implement custom adaptations.

Application Scenarios

Long Text Processing: Ideal for tasks such as long document analysis and long-text generation, with sparse attention significantly improving efficiency.
Search & Information Retrieval: Useful for search agents, helping users retrieve information quickly and accurately.
Code Generation & Programming Assistance: Supports code completion, optimization, and other programming tasks, enhancing efficiency and code quality.
Mathematics & Logical Reasoning: Excels at complex math problem solving and multi-step logical reasoning.
Multilingual Processing: Supports cross-language text generation and translation tasks.
Intelligent Agents & Interaction: Serves as a core model for building intelligent assistants, chatbots, and other natural language interaction services.