
HELM
The large model evaluation system launched by Stanford University.
Biomedical Research Q&A Dataset and Model Score Ranking List
PubMedQA is a biomedical research question-answering dataset, which contains 1K expert-annotated, 61.2K unannotated, and 211.3K artificially generated QA instances. Currently, this leaderboard has included the medical test scores of 18 models.