
PubMedQA
Biomedical Research Q&A Dataset and Model Score Ranking List
The FlagEval (Tiancheng) large model evaluation platform launched by the Beijing Academy of Artificial Intelligence (BAAI).
FlagEval (Tianping) will be jointly developed by the Beijing Academy of Artificial Intelligence (BAAI) in collaboration with multiple university teams. It is a large model evaluation platform adopting a three-dimensional evaluation framework of “capability-task-indicator”, aiming to provide comprehensive and detailed evaluation results. This platform has offered a comprehensive evaluation covering more than 600 dimensions, including over 30 capabilities, 5 tasks, and 4 major categories of indicators. The task dimension includes 22 subjective and objective evaluation datasets and 84,433 questions.