Tongyi DeepResearch – An open-source deep research agent launched by Alibaba

AI Tools updated 1d ago dongdong
22 0

What is Tongyi DeepResearch?

Tongyi DeepResearch is an open-source deep research agent launched by Alibaba, designed for long-horizon, deep information-retrieval tasks. It has 30 billion parameters with 3 billion parameters activated per inference, and supports both the ReAct mode and a deep mode (Heavy Mode). The deep mode improves complex reasoning via an iterative research paradigm (IterResearch). The agent uses an end-to-end synthetic data pipeline that can generate high-quality datasets without human intervention, helping to push beyond conventional agent capability limits. Its training pipeline includes agentic continuous pretraining (Agentic CPT), supervised fine-tuning (SFT), and reinforcement learning (RL), forming a complete end-to-end training chain. Tongyi DeepResearch has already powered multiple internal Alibaba applications, such as Amap’s AI-native travel agent and the legal product “Tongyi FaRui.”

Tongyi DeepResearch – An open-source deep research agent launched by Alibaba


Key features of Tongyi DeepResearch

  • Long-horizon, deep information retrieval: Designed for complex, multi-step reasoning and planning tasks suitable for academic research, market analysis, policy formulation, and similar scenarios.

  • Multi-mode reasoning support: Offers ReAct mode (following the “reason-action-observe” loop) for evaluating core capabilities, and a Heavy Mode that improves complex reasoning via the IterResearch iterative research paradigm.

  • End-to-end synthetic data generation: Employs an in-house synthetic data solution to automatically produce high-quality datasets across pretraining, fine-tuning, and RL stages—reducing or eliminating manual data labeling and enabling capability scaling.

  • End-to-end reinforcement learning: Uses customized RL algorithms (e.g., Group Relative Policy Optimization, GRPO) to align agent behaviors with high-level objectives and improve adaptability and stability in dynamic environments.

  • Practical deployment: Has been applied inside Alibaba in products such as Amap’s AI-native travel agent and the legal assistant “Tongyi FaRui,” demonstrating practical value.

  • Open source and community driven: The project is fully open source, providing code, models, and data to encourage developer participation and collaborative innovation.


Technical principles

  • Full-pipeline synthetic data scheme: Automatically generates training data without human intervention, supporting a complete training pipeline from pretraining to fine-tuning and reinforcement learning.

  • Iterative research paradigm (IterResearch): Breaks complex tasks into multiple research rounds; each round dynamically reconstructs a compact working space and follows a “think → synthesize → act” process to improve complex reasoning and decision quality.

  • End-to-end reinforcement learning: Applies customized RL algorithms such as Group Relative Policy Optimization (GRPO) to ensure training signals match current model capabilities, improving adaptability and stability in dynamic settings.

  • Large-scale continuous pretraining: Builds an open-world knowledge memory from continuously updated documents, crawled data, and knowledge graphs to generate multi-style (question, answer) pairs and steadily expand model capabilities.

  • Automated data management: Optimizes training data in real time guided by training dynamics, using fully automated data synthesis and a dynamic data funnel to stabilize and improve training performance.

  • Stable and efficient tool sandbox: Provides a unified sandbox environment to handle concurrency and failures, ensuring reliable and robust tool calls and fast, resilient agent interactions.


Project links


Application scenarios

  • Academic research: Accelerates literature reviews and supports complex scholarly research workflows.

  • Market analysis: Produces competitor analyses and industry trend reports to assist corporate strategy.

  • Legal research: Powers legal applications (e.g., “Tongyi FaRui”) that automatically retrieve statutes, precedents, and rulings and perform deep summarization and analysis.

  • Travel and routing: Integrated with Amap to provide an AI-native travel agent that combines real-time data for accurate trip planning.

  • Complex information retrieval: Suited for multi-step reasoning and planning tasks across domains—cross-domain research, policy design, and other scenarios requiring deep, iterative information synthesis.

© Copyright Notice

Related Posts

No comments yet...

none
No comments yet...