PC Agent – E: An intelligent agent training framework jointly launched by Shanghai Jiao Tong University and SII

What is PC Agent-E?

PC Agent-E is a highly efficient intelligent agent training framework jointly developed by Shanghai Jiao Tong University and SII. Using only 312 human-annotated computer usage trajectories, the framework leverages the Claude 3.7 Sonnet model to synthesize diverse action decisions, significantly improving data quality. The framework consists of four key components: trajectory collection, thought chain completion, trajectory augmentation, and agent training. PC Agent-E achieves a 241% performance improvement on the WindowsAgentArena-V2 benchmark, surpassing Claude 3.7 Sonnet’s extended thinking mode, making it the new state-of-the-art (SOTA) open-source PC agent for Windows systems.

PC Agent - E: An intelligent agent training framework jointly launched by Shanghai Jiao Tong University and SII

Key Features of PC Agent-E

Efficient Training: Requires only 312 human-labeled trajectories and enhances model performance through data augmentation.
Cross-Platform Generalization: Demonstrates strong cross-platform capabilities in the OSWorld benchmark, adaptable to different operating systems.
Task Execution: Capable of performing various complex tasks such as file operations, software usage, and web browsing.
Data Augmentation: Synthesizes diverse action decisions to enrich trajectory data and improve model generalization.

Technical Principles of PC Agent-E

Trajectory Collection: Uses the PC Tracker tool to record human operation trajectories, including task descriptions, screen captures, and keyboard/mouse actions. A simple annotation process collects a small but high-quality set of human demonstrations.
Thought Completion: Employs the Claude 3.7 Sonnet model to supplement each action step with underlying reasoning. Given the task description, action history, and current state, the model generates human-like chains of thought.
Trajectory Boost: Generates multiple action decisions for each trajectory step using the Claude 3.7 Sonnet model. This captures task diversity and enriches the dataset.
Agent Training: Trains on the open-source Qwen2.5-VL-72B model using a streamlined end-to-end framework that ensures effective learning and execution.
Evaluation and Validation: Performance is validated on benchmarks like WindowsAgentArena-V2 and OSWorld. By adjusting the number of synthesized actions, the framework verifies the critical impact of trajectory augmentation on performance.

Project Links for PC Agent-E

Project Website: https://gair-nlp.github.io/PC-Agent-E/
GitHub Repository: https://github.com/GAIR-NLP/PC-Agent-E
HuggingFace Model Hub: https://huggingface.co/henryhe0123/PC-Agent-E
arXiv Technical Paper: https://arxiv.org/pdf/2505.13909

Application Scenarios for PC Agent-E

Office Automation: Automatically completes tasks such as document editing and data analysis to improve office productivity.
Software Testing: Simulates user operations to detect bugs and issues, enhancing software quality.
Educational Assistance: Serves as a virtual teaching assistant to help students complete computer tasks and provide real-time guidance.
Assistive Technology for the Disabled: Offers support functionalities to make computer usage more accessible for individuals with disabilities.
Cross-Platform Compatibility: Enables seamless task execution across different operating systems.