LoopTool – an automated data-evolution framework developed by Shanghai Jiao Tong University and Xiaohongshu
What is LoopTool?
LoopTool is an automated, model-aware, iterative data-evolution framework developed by Shanghai Jiao Tong University and the Xiaohongshu team. It enhances large language models (LLMs) on tool-use tasks.
Through a closed-loop optimization process, LoopTool tightly integrates data generation, label correction, and model training, forming a dynamic feedback mechanism. It consists of two major stages — seed data construction and iterative optimization.
The iterative stage includes core modules such as Greedy Capability Probing, Judge-Guided Label Verification, and Error-Driven Data Expansion. These modules dynamically adjust training data and precisely strengthen the model’s weak points.
Experiments show that LoopTool significantly improves model performance on tool-use tasks and achieves state-of-the-art results among open-source models on several public benchmarks.

Key Features of LoopTool
Automated Data Generation:
Builds high-quality seed datasets with multi-agent dialogue generation, ensuring diversity and consistency.
Dynamic Data Optimization:
Automatically identifies and improves weak areas based on model performance, generating more challenging training samples.
Label Verification & Correction:
Uses open-source models to compare predictions with labels and correct incorrect annotations, reducing the impact of noisy data.
Model Performance Enhancement:
Significantly improves tool-calling capability across multiple benchmarks while enhancing general reasoning ability.
Technical Principles of LoopTool
Automated Tool-Augmented Seed Generation:
Uses semantic trees and constraint trees to synthesize API definitions aligned with functional intent and structural rules.
Employs a multi-agent generation pipeline — including a Planner Agent, User Agent, Assistant Agent, and Tool Agent — to construct high-quality seed datasets.
Closed-loop Iterative Model Training & Data Evolution:
-
GRPO Reinforcement Learning:
Optimizes tool-use performance via a binary reward function. -
Greedy Capability Probing (GCP):
Identifies mastered, failed, and boundary samples; retains high-perplexity samples for the next training round. -
Judge-Guided Label Verification (JGLV):
Uses open-source models to compare predictions with original labels and correct mismatches. -
Error-Driven Data Expansion (EDDE):
Generates new samples with similar structure but diverse contexts based on error cases to strengthen learning on difficult instances. -
Closed-loop Updates:
Each training round uses high-perplexity samples, corrected error samples, newly generated samples, and unused subsamples, forming a complete “train–evaluate–correct–expand” loop.
Project Links for LoopTool
GitHub Repository: https://github.com/Rednote-DeepExperience/LoopTool
HuggingFace Model Hub: https://huggingface.co/papers/2511.09148
arXiv Paper: https://arxiv.org/pdf/2511.09148
Application Scenarios of LoopTool
API Calling:
Useful for intelligent customer service and automated task execution, enhancing LLMs’ ability to complete queries and data interactions.
Multi-turn Task Planning:
Improves LLM performance in multi-step conversations, enabling effective planning for complex tasks such as multi-step assistant workflows.
Knowledge Retrieval:
Enhances accuracy and efficiency of information retrieval in QA systems, improving the model’s understanding of user intent.
Code Generation & Execution:
Improves accuracy in generating and executing code, suitable for programming assistance and educational platforms, reducing code errors.
Multimodal Tasks:
Optimizes model capabilities in calling multimodal tools, boosting performance in scenarios such as smart security and image recognition.