UFO² – Microsoft Launches a Windows Desktop Agent Operating System

AI Tools updated 7d ago dongdong
24 0

 

 

What is UFO²?

UFO² is a multi-agent operating system (AgentOS) developed by Microsoft for the Windows desktop. It enables the automation of complex desktop tasks through deep system integration and natural language interaction. UFO² uses a central HostAgent to decompose tasks and coordinate multiple application-specific AppAgents for execution. By combining GUI interaction and native API calls, it enhances the efficiency and robustness of task execution.

UFO² introduces hybrid control detection, continuous knowledge integration, and a non-intrusive user experience design. It runs in isolated virtual desktops to avoid interfering with user operations. UFO² has demonstrated excellent performance across various real-world Windows applications, significantly improving the success rate and efficiency of automated tasks.

UFO² – Microsoft Launches a Windows Desktop Agent Operating System


Key Features of UFO²

  • Deep OS Integration: UFO² integrates deeply into the Windows system, enabling fine-grained control of desktop applications.

  • Non-Intrusive User Experience: Runs in isolated virtual desktops, allowing both users and agents to operate simultaneously without interference.

  • Multi-turn Interaction: Supports multi-turn task execution, enabling users to iteratively refine instructions or intervene in the agent’s actions during a session.

  • Safety Mechanisms: Detects potentially risky operations and prompts users for confirmation before execution, ensuring data and system security.


Technical Architecture of UFO²

  • Multi-Agent Architecture:

    • HostAgent: Acts as the central control plane, responsible for parsing user instructions, decomposing tasks, scheduling AppAgents, and coordinating cross-application execution.

    • AppAgent: Execution modules tailored to specific applications, equipped with application-specific APIs, knowledge bases, and GUI/API hybrid action interfaces for efficient task execution.

  • Hybrid Control Detection: Combines structured data from Windows UI Automation (UIA) APIs with visual detection models to reliably identify both standard and custom UI elements.

  • Unified GUI/API Action Layer: Built on the Puppeteer module, UFO² dynamically selects between GUI operations and native API calls to optimize task execution paths and reduce GUI fragility.

  • Continuous Knowledge Integration: Utilizes Retrieval-Augmented Generation (RAG) to integrate external documents and historical execution logs into the agent’s knowledge base, enabling dynamic learning and improvement during runtime.

  • Speculative Multi-Action Execution: Predicts multiple actions in a single inference and verifies their feasibility at runtime, reducing inference frequency and improving execution efficiency.


Project Resources


Application Scenarios for UFO²

  • Office Automation: Automates tasks like processing Excel data, editing Word documents, and creating PowerPoint presentations.

  • Cross-Application Workflows: Coordinates multiple applications to complete complex tasks, such as importing data from Excel into Outlook.

  • Enterprise Task Automation: Reduces manual intervention and efficiently handles repetitive tasks like data entry and document processing.

  • Intelligent Customer Support: Responds quickly to user requests and resolves issues through natural language interaction.

  • Education and Training: Assists with teaching by automatically demonstrating operations or generating learning reports.

© Copyright Notice

Related Posts

No comments yet...

none
No comments yet...