
OpenCompass
The large model open evaluation system launched by Shanghai AI Laboratory
Biomedical Research Q&A Dataset and Model Score Ranking List
PubMedQA is a biomedical research question-answering dataset, which contains 1K expert-annotated, 61.2K unannotated, and 211.3K artificially generated QA instances. Currently, this leaderboard has included the medical test scores of 18 models.