WebSailor – An Open-Source Web Intelligence Agent by Alibaba DAMO Academy

What is WebSailor?

WebSailor is an open-source web agent developed by Alibaba’s Tongyi Lab, specialized in complex information retrieval and reasoning tasks. Leveraging innovative data synthesis methods (such as SailorFog-QA) and advanced training techniques (like Rejection Sampling Fine-Tuning and the DUPO algorithm), WebSailor excels in high-difficulty scenarios. It has outperformed many well-known models in benchmarks like BrowseComp, topping the open-source web agent leaderboard.

Its reasoning reconstruction techniques enable efficient processing of complex tasks by generating clear and accurate chains of reasoning. WebSailor demonstrates strong generalization on simple tasks and outstanding performance in complex, real-world information search scenarios.

WebSailor – An Open-Source Web Intelligence Agent by Alibaba DAMO Academy

Key Features of WebSailor

Complex Task Data Synthesis:
Uses the SailorFog-QA method to generate high-uncertainty, complex task data, simulating real-world information search challenges.
Multi-Round Tool Use and Reasoning Reconstruction:
By leveraging open-source reasoning models, WebSailor performs multi-step tool usage and reconstructs the reasoning process to efficiently solve complex problems.
Reinforcement Learning Optimization:
Implements the DUPO algorithm with dynamic sampling strategies to optimize training efficiency and significantly enhance decision-making capabilities.
Information Retrieval and Analysis:
Proactively searches and browses multiple web pages, analyzes interrelated information, and provides complete and accurate answers.

Technical Principles of WebSailor

Data Synthesis (SailorFog-QA):
WebSailor uses the SailorFog-QA method to create high-uncertainty, complex task data. It employs a “knowledge graph random walk” technique, starting from obscure entities in databases like Wikidata, and expands the graph to construct complex, non-linear relational networks. Problem descriptions are then fuzzified (e.g., replacing exact years with time ranges or hiding specific details) to increase initial uncertainty.
Model Training (RFT Cold Start):
During training, Rejection Sampling Fine-Tuning (RFT) is used for cold start initialization. RFT aligns the model with high-quality solution trajectories, helping it develop foundational reasoning and tool-usage habits.
Reinforcement Learning (DUPO Algorithm):
After cold start, WebSailor enters the reinforcement learning phase using DUPO (Duplicate Sampling Policy Optimization). DUPO’s core strategy is dynamic sampling: filtering out overly simple samples before training, and repeatedly sampling challenging trajectories during training. This dramatically improves training efficiency and enables rapid iteration in complex tasks.

Project Repository

GitHub: https://github.com/Alibaba-NLP/WebAgent

Application Scenarios

Complex Information Retrieval:
Handles ambiguous and complex queries using multi-step reasoning and cross-validation to locate and verify answers from large-scale data.
Multi-Hop Question Answering:
Excels in multi-hop QA by using multiple tools and reasoning steps to decompose and solve complex questions.
Scientific Research and Data Analysis:
Assists researchers and analysts in mapping out complex knowledge networks and synthesizing information from multiple sources to deliver comprehensive and accurate conclusions.
High-Difficulty Task Training and Optimization:
The SailorFog-QA dataset simulates real-world search tasks with no pre-defined solutions, making WebSailor well-suited for training and solving tasks with high uncertainty and complex relational structures.