一、使用 OpenCompass 评测 internlm2-chat-1.8b 模型在 MMLU 数据集上的性能
1、使用lmdeploy部署 internlm2-chat-1.8b模型
2、根据OpenCompass官网教程安装并下载数据集
opencompass/README_zh-CN.md at main · open-compass/opencompass · GitHub
注意:
pyhton3.11 安装pyext时报错
解决方法:
[Python]AttributeError: module ‘inspect‘ has no attribute ‘getargspec‘. Did you mean: ‘getargs‘解决方法_attributeerror: module 'inspect' has no attribute -CSDN博客x
下载源码:
pyext · PyPI
修改内容:
pyext-0.7/pyext.py
然后执行python setup install 进行安装
3、再里面的 opencompass/configs/eval_internlm_chat_lmdeploy_apiserver.py增加1.8b调用地址
注意:评测mmlu的时候注释掉其他数据集
4、运行评测脚本---使用API调用
python run.py configs/eval_internlm_chat_lmdeploy_apiserver.py -w outputs/turbomind/internlm-1-8b --datasets mmlu_ppl
评测结果:
dataset version metric mode internlm2-chat-1_8b
------------------------------------------------- --------- -------- ------ ---------------------
lukaemon_mmlu_college_biology 8c2e29 accuracy gen 46.53
lukaemon_mmlu_college_chemistry 0afccd accuracy gen 41.00
lukaemon_mmlu_college_computer_science c1c1b4 accuracy gen 41.00
lukaemon_mmlu_college_mathematics 9deed0 accuracy gen 33.00
lukaemon_mmlu_college_physics f5cf5e accuracy gen 36.27
lukaemon_mmlu_electrical_engineering 3d694d accuracy gen 40.00
lukaemon_mmlu_astronomy 7ef16f accuracy gen 48.03
lukaemon_mmlu_anatomy 2d597d accuracy gen 41.48
lukaemon_mmlu_abstract_algebra ec092c accuracy gen 33.00
lukaemon_mmlu_machine_learning d489ae accuracy gen 27.68
lukaemon_mmlu_clinical_knowledge af10df accuracy gen 52.83
lukaemon_mmlu_global_facts cad9e0 accuracy gen 24.00
lukaemon_mmlu_management 65f310 accuracy gen 68.93
lukaemon_mmlu_nutrition 80bf96 accuracy gen 50.65
lukaemon_mmlu_marketing 9a98c0 accuracy gen 68.38
lukaemon_mmlu_professional_accounting 9cc7e2 accuracy gen 28.01
lukaemon_mmlu_high_school_geography c28a4c accuracy gen 56.57
lukaemon_mmlu_international_law 408d4e accuracy gen 56.20
lukaemon_mmlu_moral_scenarios 9f30a6 accuracy gen 25.70
lukaemon_mmlu_computer_security 2753c1 accuracy gen 55.00
lukaemon_mmlu_high_school_microeconomics af9eae accuracy gen 52.52
lukaemon_mmlu_professional_law 7c7a62 accuracy gen 34.49
lukaemon_mmlu_medical_genetics b1a3a7 accuracy gen 56.00
lukaemon_mmlu_professional_psychology c6b790 accuracy gen 42.32
lukaemon_mmlu_jurisprudence f41074 accuracy gen 53.70
lukaemon_mmlu_world_religions d44a95 accuracy gen 61.40
lukaemon_mmlu_philosophy d36ef3 accuracy gen 47.91
lukaemon_mmlu_virology 0a5f8e accuracy gen 38.55
lukaemon_mmlu_high_school_chemistry 5b2ef9 accuracy gen 42.36
lukaemon_mmlu_public_relations 4c7898 accuracy gen 51.82
lukaemon_mmlu_high_school_macroeconomics 3f841b accuracy gen 47.95
lukaemon_mmlu_human_sexuality 4d1f3e accuracy gen 51.15
lukaemon_mmlu_elementary_mathematics 0f5d3a accuracy gen 32.54
lukaemon_mmlu_high_school_physics 0dd929 accuracy gen 31.79
lukaemon_mmlu_high_school_computer_science bf31fd accuracy gen 41.00
lukaemon_mmlu_high_school_european_history d1b67e accuracy gen 59.39
lukaemon_mmlu_business_ethics af53f3 accuracy gen 47.00
lukaemon_mmlu_moral_disputes 48239e accuracy gen 45.95
lukaemon_mmlu_high_school_statistics 47e18e accuracy gen 48.61
lukaemon_mmlu_miscellaneous 573569 accuracy gen 57.47
lukaemon_mmlu_formal_logic 7a0414 accuracy gen 31.75
lukaemon_mmlu_high_school_government_and_politics d907eb accuracy gen 61.66
lukaemon_mmlu_prehistory 65aa94 accuracy gen 50.00
lukaemon_mmlu_security_studies 9ea7d3 accuracy gen 53.06
lukaemon_mmlu_high_school_biology 775183 accuracy gen 55.48
lukaemon_mmlu_logical_fallacies 19746a accuracy gen 53.99
lukaemon_mmlu_high_school_world_history 6665dc accuracy gen 67.09
lukaemon_mmlu_professional_medicine a05bab accuracy gen 41.54
lukaemon_mmlu_high_school_mathematics 0e6a7e accuracy gen 28.52
lukaemon_mmlu_college_medicine 5215f1 accuracy gen 46.82
lukaemon_mmlu_high_school_us_history b5f235 accuracy gen 54.41
lukaemon_mmlu_sociology 4980ec accuracy gen 60.70
lukaemon_mmlu_econometrics 4d590b accuracy gen 29.82
lukaemon_mmlu_high_school_psychology 440e96 accuracy gen 65.50
lukaemon_mmlu_human_aging d0a8e1 accuracy gen 47.98
lukaemon_mmlu_us_foreign_policy adcc88 accuracy gen 72.00
lukaemon_mmlu_conceptual_physics a111d3 accuracy gen 34.04