MMBench

updated 6m ago 269 0 0

An all-round multimodal large model capability evaluation system.

published date:

2025-03-12

Visit Website Scan QR

AI Metrics

MMBench

MMBench is a multimodal benchmark test launched by researchers from the Shanghai AI Laboratory, Nanyang Technological University, The Chinese University of Hong Kong, National University of Singapore, and Zhejiang University. This system has developed a comprehensive evaluation process that assesses capabilities hierarchically from perception to cognition, covering 20 fine-grained abilities. It collects approximately 3,000 multiple-choice questions from the Internet and authoritative benchmark datasets. Breaking away from the conventional one-question-one-answer rule-based matching for option extraction and evaluation, it cyclically shuffles the options to verify the consistency of the output results and precisely matches the model’s responses to the options based on ChatGPT.

The Features and Advantages of MMBench

Based on perception and reasoning, the evaluation dimensions are subdivided step by step. There are approximately 3,000 single-choice questions, covering 20 fine-grained evaluation dimensions such as target detection, text recognition, action recognition, image understanding, and relational reasoning.
A more robust evaluation method. For the same single-choice question, ask questions with cyclic options. If the model’s outputs all point to the same answer, it is considered as passing. Compared with the traditional one-time pass evaluation, the top-1 accuracy rate drops by an average of 10% – 20%. This minimizes the impact of various noise factors on the evaluation results and ensures the reproducibility of the results.
A more reliable method for extracting model outputs. Based on ChatGPT to match model outputs with options, even if the model does not output according to the instructions, it can be accurately matched to the most reasonable option.