What is OAgents?
OAgents is an open-source foundational agent framework developed by OPPO PersonalAI Lab. Designed with a standardized evaluation protocol and modular architecture, OAgents aims to advance agent research. Through systematic empirical studies, the framework analyzes how design choices across key agent components—such as planning, tool usage, and memory—affect overall performance. It introduces a robust evaluation protocol and has achieved the highest average score of 73.93% on the GAIA benchmark, demonstrating strong effectiveness and robustness across various tasks. OAgents supports the integration of diverse agent modules, providing a solid foundation for future research in intelligent agents.
Key Features of OAgents
-
Multimodal Tool Integration: OAgents integrates tools for handling text, audio, images, and videos, enabling direct interaction with multimodal inputs. This enhances the agent’s ability to extract and interpret factual information from complex real-world environments.
-
Optimized Search Agent: With improved multi-source retrieval, query optimization, and a minimal browsing architecture, OAgents performs efficient web searches, expands its knowledge base, and supports complex information tasks with high accuracy.
-
Dynamic Planning & Task Decomposition: Uses dynamic planning mechanisms to break down complex tasks into executable subtasks. Plans are adjusted in real-time based on environmental feedback, improving task management and reasoning capabilities.
-
Memory-Augmented Knowledge System: OAgents includes a hierarchical memory system composed of current memory, memory summarization, vectorized retrieval, and long-term memory. This boosts the agent’s cognitive ability and helps it perceive, reason, and act effectively in complex environments.
-
Test-Time Augmentation Strategies: During inference, OAgents employs test-time strategies such as diversity enhancement, optimization, and reward modeling. These allow the agent to dynamically adapt decision-making processes and improve exploration and performance.
Technical Principles of OAgents
-
Multimodal Tool Mechanism: Converts non-text content (e.g., images, videos) into textual descriptions and performs semantic parsing across modalities. Interaction is formalized as:
Response = A(xtext, Timage(I), Tvideo(V))
,
whereA
is the agent function,xtext
is text input, andTimage
/Tvideo
are image and video tool functions. -
Search Agent Mechanism: Combines commercial APIs and document archives for multi-source search. Employs closed-loop query optimization via semantic alignment and morphological expansion. Simplified into three atomic functions—search, access, and read—to reduce system complexity.
-
Dynamic Planning Mechanism: Generates high-level plans and decomposes tasks into actionable steps. Plans are periodically revised based on observations. A dependency graph supports hierarchical task decomposition and dynamic scheduling.
-
Memory-Augmented System: Current memory stores short-term context; memory summarization extracts high-value insights; vector retrieval allows fast lookup of relevant history; long-term memory integrates past knowledge for optimized execution.
-
Test-Time Augmentation: Enhances diversity through mixed sampling strategies, optimizes decision paths using reward functions, and applies real-time self-reflection mechanisms for adaptive problem solving.
Project Links
-
GitHub Repository: https://github.com/OPPO-PersonalAI/OAgents
-
arXiv Technical Paper: https://arxiv.org/pdf/2506.15741
Application Scenarios of OAgents
-
Intelligent Customer Support: Delivers fast and accurate responses, personalized solutions, and handles complex queries to improve customer satisfaction.
-
Educational Tutoring: Provides personalized learning plans and adjusts content dynamically based on student feedback and progress, supporting diverse learning materials.
-
Medical Consultation: Assists doctors in analyzing medical records, suggesting diagnoses, and recommending treatments using up-to-date research and clinical guidelines.
-
Smart Office Assistant: Helps schedule meetings, write reports, organize meeting notes, and remembers user preferences to provide tailored support.
-
Smart Home Control: Integrates with smart home devices, enabling control through voice or text and automating scenarios for seamless, natural interactions.