Tongyi DeepResearch – An open-source deep research agent launched by Alibaba
What is Tongyi DeepResearch?
Tongyi DeepResearch is an open-source deep research agent launched by Alibaba, designed for long-horizon, deep information-retrieval tasks. It has 30 billion parameters with 3 billion parameters activated per inference, and supports both the ReAct mode and a deep mode (Heavy Mode). The deep mode improves complex reasoning via an iterative research paradigm (IterResearch). The agent uses an end-to-end synthetic data pipeline that can generate high-quality datasets without human intervention, helping to push beyond conventional agent capability limits. Its training pipeline includes agentic continuous pretraining (Agentic CPT), supervised fine-tuning (SFT), and reinforcement learning (RL), forming a complete end-to-end training chain. Tongyi DeepResearch has already powered multiple internal Alibaba applications, such as Amap’s AI-native travel agent and the legal product “Tongyi FaRui.”
Key features of Tongyi DeepResearch
-
Long-horizon, deep information retrieval: Designed for complex, multi-step reasoning and planning tasks suitable for academic research, market analysis, policy formulation, and similar scenarios.
-
Multi-mode reasoning support: Offers ReAct mode (following the “reason-action-observe” loop) for evaluating core capabilities, and a Heavy Mode that improves complex reasoning via the IterResearch iterative research paradigm.
-
End-to-end synthetic data generation: Employs an in-house synthetic data solution to automatically produce high-quality datasets across pretraining, fine-tuning, and RL stages—reducing or eliminating manual data labeling and enabling capability scaling.
-
End-to-end reinforcement learning: Uses customized RL algorithms (e.g., Group Relative Policy Optimization, GRPO) to align agent behaviors with high-level objectives and improve adaptability and stability in dynamic environments.
-
Practical deployment: Has been applied inside Alibaba in products such as Amap’s AI-native travel agent and the legal assistant “Tongyi FaRui,” demonstrating practical value.
-
Open source and community driven: The project is fully open source, providing code, models, and data to encourage developer participation and collaborative innovation.
Technical principles
-
Full-pipeline synthetic data scheme: Automatically generates training data without human intervention, supporting a complete training pipeline from pretraining to fine-tuning and reinforcement learning.
-
Iterative research paradigm (IterResearch): Breaks complex tasks into multiple research rounds; each round dynamically reconstructs a compact working space and follows a “think → synthesize → act” process to improve complex reasoning and decision quality.
-
End-to-end reinforcement learning: Applies customized RL algorithms such as Group Relative Policy Optimization (GRPO) to ensure training signals match current model capabilities, improving adaptability and stability in dynamic settings.
-
Large-scale continuous pretraining: Builds an open-world knowledge memory from continuously updated documents, crawled data, and knowledge graphs to generate multi-style (question, answer) pairs and steadily expand model capabilities.
-
Automated data management: Optimizes training data in real time guided by training dynamics, using fully automated data synthesis and a dynamic data funnel to stabilize and improve training performance.
-
Stable and efficient tool sandbox: Provides a unified sandbox environment to handle concurrency and failures, ensuring reliable and robust tool calls and fast, resilient agent interactions.
Project links
-
Project page: https://tongyi-agent.github.io/blog/introducing-tongyi-deep-research/
-
GitHub repository: https://github.com/Alibaba-NLP/DeepResearch
-
Hugging Face model hub: https://huggingface.co/Alibaba-NLP/Tongyi-DeepResearch-30B-A3B
Application scenarios
-
Academic research: Accelerates literature reviews and supports complex scholarly research workflows.
-
Market analysis: Produces competitor analyses and industry trend reports to assist corporate strategy.
-
Legal research: Powers legal applications (e.g., “Tongyi FaRui”) that automatically retrieve statutes, precedents, and rulings and perform deep summarization and analysis.
-
Travel and routing: Integrated with Amap to provide an AI-native travel agent that combines real-time data for accurate trip planning.
-
Complex information retrieval: Suited for multi-step reasoning and planning tasks across domains—cross-domain research, policy design, and other scenarios requiring deep, iterative information synthesis.