Magentic-UI – Microsoft’s Open-Source Research Prototype for Human-AI Collaborative Agents

What is Magentic-UI?

Magentic-UI is an open-source research prototype developed by Microsoft to explore human-in-the-loop AI agent systems. Designed as a human-centered AI agent, Magentic-UI collaborates with users to complete complex web-based tasks such as browsing websites, executing code, and handling files. Its core features include co-planning, co-tasking, action guards (safety mechanisms), and plan learning. By enabling users to actively participate in both task planning and execution, Magentic-UI offers a transparent and controllable interaction experience. It improves task efficiency through human feedback, reduces manual effort, and serves as a research platform for studying human-agent collaboration.

Magentic-UI – Microsoft's Open-Source Research Prototype for Human-AI Collaborative Agents

Key Features of Magentic-UI

Co-Planning
Generates step-by-step task plans before execution. Users can review, modify, or approve the plan to ensure the process aligns with their intent.
Co-Tasking
Displays upcoming actions in real time, allowing users to intervene or take control at any moment to ensure task accuracy.
Action Guards
Requests user approval before executing critical or irreversible actions. Users can define custom approval policies to ensure safety and control.
Plan Learning and Reuse
After task completion, the plan is saved for future reuse or modification, improving efficiency for similar tasks.

Technical Overview

System Architecture
Magentic-UI is built on the Magentic-One system, powered by AutoGen. It uses a collection of specialized agents working together to complete tasks.
- Orchestrator: A language model-driven coordinator that collaborates with the user to co-plan tasks, decides when to request feedback, and delegates sub-tasks to other agents.
- WebSurfer: A browser-controlling LLM agent that performs actions like clicking, scrolling, and inputting based on Orchestrator’s instructions.
- Coder: A code-executing LLM agent running in a Docker container, which sends results back to the Orchestrator.
- FileSurfer: A file-handling LLM agent equipped with conversion tools and containerized execution, capable of locating files, converting them to Markdown, and answering content-based questions.
Interaction Flow
Users interact with Magentic-UI via text and optional image inputs. The Orchestrator creates a natural language task plan, which the user can edit through an interface. It then assigns each step to the most appropriate agent or the user. Once all steps are complete, the Orchestrator delivers the final output. If it detects plan deficiencies during execution, it re-plans with user approval.
Safety and Control
Users configure a whitelist of accessible websites; attempting to access others requires explicit user approval. Tasks can be interrupted at any stage, halting any pending code execution or web actions. All actions run in isolated Docker containers, safeguarding the host environment and preventing issues like credential leaks. Action approval policies allow users to set guardrails around critical operations.

Project Links

Official Website: https://www.microsoft.com/en-us/research/blog/magentic-ui
GitHub Repository: https://github.com/microsoft/magentic-ui

Application Scenarios for Magentic-UI

Complex Task Automation
Assists users in executing multi-step web tasks such as price comparison, filling out forms, or booking travel.
Coding Assistance
Generates and safely runs code snippets for use cases like data analysis or script generation.
File Handling and Information Retrieval
Converts file formats, searches file contents, and answers questions based on file data.
Research and Development
Serves as an experimental platform for researchers exploring new paradigms of human-agent collaboration.
Education and Training
Functions as a teaching tool to help users learn about task planning, automation, and AI interaction.