AgentCLUE-ICabin – AI Agent Benchmark for Intelligent Vehicle Cabins
What is AgentCLUE-ICabin?
AgentCLUE-ICabin is an AI agent benchmark designed for intelligent vehicle cabin scenarios. It comprehensively evaluates large language models’ tool-calling capabilities in smart cabin environments. The benchmark is built on 12 common driving scenarios, covering a wide range of needs from daily commuting to long-distance road trips, closely aligned with real-life user interactions in China.
The evaluation involves 1–10 rounds of multi-turn dialogues, with each round requiring at least one tool call, fully testing the model’s interaction ability in complex environments.
AgentCLUE-ICabin adopts an objective 0/1 evaluation mechanism, comparing the consistency of function calls and post-execution system states to ensure fairness. The toolset spans five major categories—mobility, vehicle control, entertainment, safety, and general functions—covering more than 70 features, from navigation to seat adjustment. The evaluation process includes scenario collection, toolset construction, dialogue data generation, and answer verification to guarantee scientific rigor and practical value.
Key Features of AgentCLUE-ICabin
-
Scenario construction: Covers 12 common driving scenarios such as daily commuting, long-distance trips, and family travel, ensuring diverse testing conditions.
-
Multi-turn interaction: Designs 1–10 round multi-turn dialogues, with at least one tool call per round, simulating continuous interactions in real-world cabin use.
-
Tool invocation: Categorizes smart cabin tools into mobility, vehicle control, entertainment, safety, and general utilities, encompassing over 70 functions.
-
Evaluation mechanism: Uses a 0/1 scoring system by comparing function call consistency and post-execution system states to ensure fairness and objectivity.
-
Data generation: Employs large models to generate multi-turn interactive dialogue data, refined by human verification, forming high-quality automotive QA pairs.
Technical Principles of AgentCLUE-ICabin
Scenario-driven multi-turn interaction design
-
Scenario construction: Builds test sets around 12 typical driving scenarios (e.g., commuting, long trips, family travel), reflecting realistic user demands.
-
Multi-turn interaction: Simulates 1–10 rounds of dialogue with at least one tool call per round, testing continuity in smart cabin conversations.
-
Tool categorization:
-
Mobility services: navigation, traffic updates, gas station search.
-
Vehicle control: AC, windows, seat adjustment.
-
Entertainment: music, radio, video.
-
Safety: tire pressure monitoring, sentry mode, child lock.
-
General utilities: steering wheel adjustment, lighting, seat functions.
-
-
Tool invocation: Models must accurately call and execute the right functions based on user input.
Objective and fair evaluation mechanism
-
0/1 scoring: Matches function calls and compares resulting system states with expected states, ensuring unbiased evaluation.
-
Multi-round feedback: Models may retry up to three times per round with error feedback for adjustment.
-
Dialogue data generation: Multi-turn dialogues are generated by large models and human-validated.
-
State tracking: Monitors cabin state changes across interactions, requiring the model to manage state continuity.
-
State comparison: Confirms that both the tool call and its effect on the system state are correct.
Core Advantages of AgentCLUE-ICabin
-
Comprehensive coverage: 12 typical driving scenarios closely aligned with Chinese user needs, ensuring practical and valuable evaluation outcomes.
-
Complex interactions: Multi-turn dialogues with multiple tool calls simulate real-world cabin conversations, testing depth and robustness.
-
Objective scoring: 0/1 evaluation with state tracking ensures fairness and avoids subjective bias.
-
Rich toolset: 70+ functions across five categories provide extensive evaluation scope for smart cabin features.
-
High-quality data: Model-generated, human-validated dialogue data ensures accuracy and reliability for benchmarking and training.
Application Scenarios of AgentCLUE-ICabin
-
Daily commuting: Road condition queries, music playback, news updates for more convenient and enjoyable commutes.
-
Long-distance driving: Precision navigation, seat massage, gas station search for smooth and comfortable trips.
-
Family travel: Child lock control, rear-seat entertainment, family-friendly facility search for safety and convenience.
-
In-car office: Mobile office functions such as Bluetooth conferencing, voice notes, and in-car Wi-Fi.
-
Shopping trips: Mall navigation, parking spot search, trunk control for convenient shopping experiences.
-
School pick-up/drop-off: Temporary parking queries, preset cabin temperature, and precise navigation to schools to streamline the process.