PC Agent – E: An intelligent agent training framework jointly launched by Shanghai Jiao Tong University and SII

AI Tools updated 2w ago dongdong
14 0

What is PC Agent-E?

PC Agent-E is a highly efficient intelligent agent training framework jointly developed by Shanghai Jiao Tong University and SII. Using only 312 human-annotated computer usage trajectories, the framework leverages the Claude 3.7 Sonnet model to synthesize diverse action decisions, significantly improving data quality. The framework consists of four key components: trajectory collection, thought chain completion, trajectory augmentation, and agent training. PC Agent-E achieves a 241% performance improvement on the WindowsAgentArena-V2 benchmark, surpassing Claude 3.7 Sonnet’s extended thinking mode, making it the new state-of-the-art (SOTA) open-source PC agent for Windows systems.

PC Agent - E: An intelligent agent training framework jointly launched by Shanghai Jiao Tong University and SII


Key Features of PC Agent-E

  • Efficient Training: Requires only 312 human-labeled trajectories and enhances model performance through data augmentation.

  • Cross-Platform Generalization: Demonstrates strong cross-platform capabilities in the OSWorld benchmark, adaptable to different operating systems.

  • Task Execution: Capable of performing various complex tasks such as file operations, software usage, and web browsing.

  • Data Augmentation: Synthesizes diverse action decisions to enrich trajectory data and improve model generalization.


Technical Principles of PC Agent-E

  • Trajectory Collection: Uses the PC Tracker tool to record human operation trajectories, including task descriptions, screen captures, and keyboard/mouse actions. A simple annotation process collects a small but high-quality set of human demonstrations.

  • Thought Completion: Employs the Claude 3.7 Sonnet model to supplement each action step with underlying reasoning. Given the task description, action history, and current state, the model generates human-like chains of thought.

  • Trajectory Boost: Generates multiple action decisions for each trajectory step using the Claude 3.7 Sonnet model. This captures task diversity and enriches the dataset.

  • Agent Training: Trains on the open-source Qwen2.5-VL-72B model using a streamlined end-to-end framework that ensures effective learning and execution.

  • Evaluation and Validation: Performance is validated on benchmarks like WindowsAgentArena-V2 and OSWorld. By adjusting the number of synthesized actions, the framework verifies the critical impact of trajectory augmentation on performance.


Project Links for PC Agent-E


Application Scenarios for PC Agent-E

  • Office Automation: Automatically completes tasks such as document editing and data analysis to improve office productivity.

  • Software Testing: Simulates user operations to detect bugs and issues, enhancing software quality.

  • Educational Assistance: Serves as a virtual teaching assistant to help students complete computer tasks and provide real-time guidance.

  • Assistive Technology for the Disabled: Offers support functionalities to make computer usage more accessible for individuals with disabilities.

  • Cross-Platform Compatibility: Enables seamless task execution across different operating systems.

© Copyright Notice

Related Posts

No comments yet...

none
No comments yet...