HELM

updated 1m ago 20 0 0

The large model evaluation system launched by Stanford University.

published date:
2025-03-12
HELMHELM
HELM

HELM, short for Holistic Evaluation of Language Models, is a large model evaluation system launched by Stanford University. This evaluation method mainly consists of three modules: scenarios, adaptations, and metrics. Each evaluation run requires specifying a scenario, a prompt for the adapted model, and one or more metrics. Its evaluation mainly focuses on English, with seven metrics including accuracy, uncertainty/calibration, robustness, fairness, bias, toxicity, and inference efficiency. The tasks include question answering, information retrieval, summarization, text classification, etc.

Similar Sites

No comments yet...

none
No comments yet...