Top researchers from Meta, Yale University, Stanford University, Google DeepMind, and Microsoft have systematically summarized the current understanding of Agents.
Top researchers from Meta, Yale University, Stanford University, Google DeepMind, and Microsoft have systematically summarized the current understanding of agents in a 264-page paper.
The following are some of their key findings:
They constructed a mapping relationship between various components of an intelligent agent (such as perception, memory, and world modeling) and different regions of the human brain, and conducted a comparison:
• The energy efficiency of the human brain is far higher than that of artificial intelligence systems.
• Intelligent agents do not possess true “experiences” or subjective consciousness.
• The human brain is capable of continuous learning, whereas intelligent agents are typically static (fixed after training).
An intelligent agent can be divided into the following parts:
Perception: The input mechanism of an intelligent agent. Its perception ability can be enhanced through multi-modal input, feedback mechanisms such as human error correction, etc.
Cognition: It includes core functions such as learning, reasoning, planning, and memory. Large language models (LLMs) play a crucial role in this aspect.
Action: Refers to the output capability of an intelligent agent and its ability to use tools.
The memory of an intelligent agent can be represented as:
Sensory memory: It refers to the instantaneous or short-term retention of input information, which is not particularly emphasized in the current intelligent agent systems.
Short-term memory: Corresponds to the context window of large language models, used to process information in the current task.
Long-term memory: Refers to an external storage system, such as retrieval-augmented generation (RAG) or knowledge graphs, used to store and retrieve long-term knowledge.
The memory system of intelligent agents can be improved and studied from the following aspects:
• Increasing the storage capacity of information
• How to retrieve the most relevant information
• Combining the memory within the context window with external memory
• Determining which memory content should be forgotten or updated
Paper link:https://huggingface.co/papers/2504.01990