UFO² – Microsoft Launches a Windows Desktop Agent Operating System
What is UFO²?
UFO² is a multi-agent operating system (AgentOS) developed by Microsoft for the Windows desktop. It enables the automation of complex desktop tasks through deep system integration and natural language interaction. UFO² uses a central HostAgent to decompose tasks and coordinate multiple application-specific AppAgents for execution. By combining GUI interaction and native API calls, it enhances the efficiency and robustness of task execution.
UFO² introduces hybrid control detection, continuous knowledge integration, and a non-intrusive user experience design. It runs in isolated virtual desktops to avoid interfering with user operations. UFO² has demonstrated excellent performance across various real-world Windows applications, significantly improving the success rate and efficiency of automated tasks.
Key Features of UFO²
-
Deep OS Integration: UFO² integrates deeply into the Windows system, enabling fine-grained control of desktop applications.
-
Non-Intrusive User Experience: Runs in isolated virtual desktops, allowing both users and agents to operate simultaneously without interference.
-
Multi-turn Interaction: Supports multi-turn task execution, enabling users to iteratively refine instructions or intervene in the agent’s actions during a session.
-
Safety Mechanisms: Detects potentially risky operations and prompts users for confirmation before execution, ensuring data and system security.
Technical Architecture of UFO²
-
Multi-Agent Architecture:
-
HostAgent: Acts as the central control plane, responsible for parsing user instructions, decomposing tasks, scheduling AppAgents, and coordinating cross-application execution.
-
AppAgent: Execution modules tailored to specific applications, equipped with application-specific APIs, knowledge bases, and GUI/API hybrid action interfaces for efficient task execution.
-
-
Hybrid Control Detection: Combines structured data from Windows UI Automation (UIA) APIs with visual detection models to reliably identify both standard and custom UI elements.
-
Unified GUI/API Action Layer: Built on the Puppeteer module, UFO² dynamically selects between GUI operations and native API calls to optimize task execution paths and reduce GUI fragility.
-
Continuous Knowledge Integration: Utilizes Retrieval-Augmented Generation (RAG) to integrate external documents and historical execution logs into the agent’s knowledge base, enabling dynamic learning and improvement during runtime.
-
Speculative Multi-Action Execution: Predicts multiple actions in a single inference and verifies their feasibility at runtime, reducing inference frequency and improving execution efficiency.
Project Resources
-
Project Homepage: https://microsoft.github.io/UFO/
-
GitHub Repository: https://github.com/microsoft/UFO
-
arXiv Technical Paper: https://arxiv.org/pdf/2504.14603
Application Scenarios for UFO²
-
Office Automation: Automates tasks like processing Excel data, editing Word documents, and creating PowerPoint presentations.
-
Cross-Application Workflows: Coordinates multiple applications to complete complex tasks, such as importing data from Excel into Outlook.
-
Enterprise Task Automation: Reduces manual intervention and efficiently handles repetitive tasks like data entry and document processing.
-
Intelligent Customer Support: Responds quickly to user requests and resolves issues through natural language interaction.
-
Education and Training: Assists with teaching by automatically demonstrating operations or generating learning reports.