书生·浦语大模型实战营之 OpenCompass大模型评测

书生·浦语大模型实战营之 OpenCompass :是骡子是马,拉出来溜溜

在这里插入图片描述

为什么要研究大模型的评测?

百家争鸣,百花齐放。

  • 首先,研究评测对于我们全面了解大型语言模型的优势和限制至关重要。尽管许多研究表明大型语言模型在多个通用任务上已经达到或超越了人类水平,但仍然存在质疑,即这些模型的能力是否只是对训练数据的记忆而非真正的理解。例如,即使只提供LeetCode题目编号而不提供具体信息,大型语言模型也能够正确输出答案,这暗示着训练数据可能存在污染现象。
  • 其次,研究评测有助于指导和改进人类与大型语言模型之间的协同交互。考虑到大型语言模型的最终服务对象是人类,为了更好地设计人机交互的新范式,我们有必要全面评估模型的各项能力。
  • 最后,研究评测可以帮助我们更好地规划大型语言模型未来的发展,并预防未知和潜在的风险。随着大型语言模型的不断演进,其能力也在不断增强。通过合理科学的评测机制,我们能够从进化的角度评估模型的能力,并提前预测潜在的风险,这是至关重要的研究内容。

OpenCompass介绍

上海人工智能实验室科学家团队正式发布了大模型开源开放评测体系 “司南” (OpenCompass2.0),用于为大语言模型、多模态模型等提供一站式评测服务。其主要特点如下:

  • 开源可复现:提供公平、公开、可复现的大模型评测方案
  • 全面的能力维度:五大维度设计,提供 70+ 个数据集约 40 万题的的模型评测方案,全面评估模型能力
  • 丰富的模型支持:已支持 20+ HuggingFace 及 API 模型
  • 分布式高效评测:一行命令实现任务分割和分布式评测,数小时即可完成千亿模型全量评测
  • 多样化评测范式:支持零样本、小样本及思维链评测,结合标准型或对话型提示词模板,轻松激发各种模型最大性能
  • 灵活化拓展:想增加新模型或数据集?想要自定义更高级的任务分割策略,甚至接入新的集群管理系统?OpenCompass 的一切均可轻松扩展!

评测对象

本算法库的主要评测对象为语言大模型与多模态大模型。我们以语言大模型为例介绍评测的具体模型类型。

  • 基座模型:一般是经过海量的文本数据以自监督学习的方式进行训练获得的模型(如OpenAI的GPT-3,Meta的LLaMA),往往具有强大的文字续写能力。
  • 对话模型:一般是在的基座模型的基础上,经过指令微调或人类偏好对齐获得的模型(如OpenAI的ChatGPT、上海人工智能实验室的书生·浦语),能理解人类指令,具有较强的对话能力。

工具架构

在这里插入图片描述

  • 模型层:大模型评测所涉及的主要模型种类,OpenCompass以基座模型和对话模型作为重点评测对象。
  • 能力层:OpenCompass从本方案从通用能力和特色能力两个方面来进行评测维度设计。在模型通用能力方面,从语言、知识、理解、推理、安全等多个能力维度进行评测。在特色能力方面,从长文本、代码、工具、知识增强等维度进行评测。
  • 方法层:OpenCompass采用客观评测与主观评测两种评测方式。客观评测能便捷地评估模型在具有确定答案(如选择,填空,封闭式问答等)的任务上的能力,主观评测能评估用户对模型回复的真实满意度,OpenCompass采用基于模型辅助的主观评测和基于人类反馈的主观评测两种方式。
  • 工具层:OpenCompass提供丰富的功能支持自动化地开展大语言模型的高效评测。包括分布式评测技术,提示词工程,对接评测数据库,评测榜单发布,评测报告生成等诸多功能。

设计思路

为准确、全面、系统化地评估大语言模型的能力,OpenCompass从通用人工智能的角度出发,结合学术界的前沿进展和工业界的最佳实践,提出一套面向实际应用的模型能力评价体系。OpenCompass能力维度体系涵盖通用能力和特色能力两大部分。

评测方法

OpenCompass采取客观评测与主观评测相结合的方法。针对具有确定性答案的能力维度和场景,通过构造丰富完善的评测集,对模型能力进行综合评价。针对体现模型能力的开放式或半开放式的问题、模型安全问题等,采用主客观相结合的评测方式。

客观评测

针对具有标准答案的客观问题,我们可以我们可以通过使用定量指标比较模型的输出与标准答案的差异,并根据结果衡量模型的性能。同时,由于大语言模型输出自由度较高,在评测阶段,我们需要对其输入和输出作一定的规范和设计,尽可能减少噪声输出在评测阶段的影响,才能对模型的能力有更加完整和客观的评价。 为了更好地激发出模型在题目测试领域的能力,并引导模型按照一定的模板输出答案,OpenCompass采用提示词工程 (prompt engineering)和语境学习(in-context learning)进行客观评测。 在客观评测的具体实践中,我们通常采用下列两种方式进行模型输出结果的评测:

  • 判别式评测:该评测方式基于将问题与候选答案组合在一起,计算模型在所有组合上的困惑度(perplexity),并选择困惑度最小的答案作为模型的最终输出。例如,若模型在 问题? 答案1 上的困惑度为 0.1,在 问题? 答案2 上的困惑度为 0.2,最终我们会选择 答案1 作为模型的输出。
  • 生成式评测:该评测方式主要用于生成类任务,如语言翻译、程序生成、逻辑分析题等。具体实践时,使用问题作为模型的原始输入,并留白答案区域待模型进行后续补全。我们通常还需要对其输出进行后处理,以保证输出满足数据集的要求。

主观评测

语言表达生动精彩,变化丰富,大量的场景和能力无法凭借客观指标进行评测。针对如模型安全和模型语言能力的评测,以人的主观感受为主的评测更能体现模型的真实能力,并更符合大模型的实际使用场景。 OpenCompass采取的主观评测方案是指借助受试者的主观判断对具有对话能力的大语言模型进行能力评测。在具体实践中,我们提前基于模型的能力维度构建主观测试问题集合,并将不同模型对于同一问题的不同回复展现给受试者,收集受试者基于主观感受的评分。由于主观测试成本高昂,本方案同时也采用使用性能优异的大语言模拟人类进行主观打分。在实际评测中,本文将采用真实人类专家的主观评测与基于模型打分的主观评测相结合的方式开展模型能力评估。 在具体开展主观评测时,OpenComapss采用单模型回复满意度统计和多模型满意度比较两种方式开展具体的评测工作。

快速开始

在这里插入图片描述

概览

在 OpenCompass 中评估一个模型通常包括以下几个阶段:配置 -> 推理 -> 评估 -> 可视化。

  • 配置:这是整个工作流的起点。您需要配置整个评估过程,选择要评估的模型和数据集。此外,还可以选择评估策略、计算后端等,并定义显示结果的方式。
  • 推理与评估:在这个阶段,OpenCompass 将会开始对模型和数据集进行并行推理和评估。推理阶段主要是让模型从数据集产生输出,而评估阶段则是衡量这些输出与标准答案的匹配程度。这两个过程会被拆分为多个同时运行的“任务”以提高效率,但请注意,如果计算资源有限,这种策略可能会使评测变得更慢。如果需要了解该问题及解决方案,可以参考 FAQ: 效率。
  • 可视化:评估完成后,OpenCompass 将结果整理成易读的表格,并将其保存为 CSV 和 TXT 文件。你也可以激活飞书状态上报功能,此后可以在飞书客户端中及时获得评测状态报告。 接下来,我们将展示 OpenCompass 的基础用法,展示书生浦语在 C-Eval 基准任务上的评估。它们的配置文件可以在 configs/eval_demo.py 中找到。

环境配置

创建开发机和 conda 环境

在这里插入图片描述

面向GPU的环境安装

studio-conda -o internlm-base -t opencompass
source activate opencompass
git clone -b 0.2.4 https://github.com/open-compass/opencompass
cd opencompass
pip install -e .

在这里插入图片描述

pip install -r requirements.txt

在这里插入图片描述

数据准备

解压评测数据集到 data/ 处

cp /share/temp/datasets/OpenCompassData-core-20231110.zip /root/opencompass/
unzip OpenCompassData-core-20231110.zip

在opencompass下看到data文件夹
在这里插入图片描述

查看支持的数据集和模型

列出所有跟 internlm 及 ceval 相关的配置

 python tools/list_configs.py internlm ceval
(opencompass) root@intern-studio-061925:~/opencompass# python tools/list_configs.py internlm ceval
+----------------------------------------+----------------------------------------------------------------------+
| Model                                  | Config Path                                                          |
|----------------------------------------+----------------------------------------------------------------------|
| hf_internlm2_1_8b                      | configs/models/hf_internlm/hf_internlm2_1_8b.py                      |
| hf_internlm2_20b                       | configs/models/hf_internlm/hf_internlm2_20b.py                       |
| hf_internlm2_7b                        | configs/models/hf_internlm/hf_internlm2_7b.py                        |
| hf_internlm2_base_20b                  | configs/models/hf_internlm/hf_internlm2_base_20b.py                  |
| hf_internlm2_base_7b                   | configs/models/hf_internlm/hf_internlm2_base_7b.py                   |
| hf_internlm2_chat_1_8b                 | configs/models/hf_internlm/hf_internlm2_chat_1_8b.py                 |
| hf_internlm2_chat_1_8b_sft             | configs/models/hf_internlm/hf_internlm2_chat_1_8b_sft.py             |
| hf_internlm2_chat_20b                  | configs/models/hf_internlm/hf_internlm2_chat_20b.py                  |
| hf_internlm2_chat_20b_sft              | configs/models/hf_internlm/hf_internlm2_chat_20b_sft.py              |
| hf_internlm2_chat_20b_with_system      | configs/models/hf_internlm/hf_internlm2_chat_20b_with_system.py      |
| hf_internlm2_chat_7b                   | configs/models/hf_internlm/hf_internlm2_chat_7b.py                   |
| hf_internlm2_chat_7b_sft               | configs/models/hf_internlm/hf_internlm2_chat_7b_sft.py               |
| hf_internlm2_chat_7b_with_system       | configs/models/hf_internlm/hf_internlm2_chat_7b_with_system.py       |
| hf_internlm2_chat_math_20b             | configs/models/hf_internlm/hf_internlm2_chat_math_20b.py             |
| hf_internlm2_chat_math_20b_with_system | configs/models/hf_internlm/hf_internlm2_chat_math_20b_with_system.py |
| hf_internlm2_chat_math_7b              | configs/models/hf_internlm/hf_internlm2_chat_math_7b.py              |
| hf_internlm2_chat_math_7b_with_system  | configs/models/hf_internlm/hf_internlm2_chat_math_7b_with_system.py  |
| hf_internlm_20b                        | configs/models/hf_internlm/hf_internlm_20b.py                        |
| hf_internlm_7b                         | configs/models/hf_internlm/hf_internlm_7b.py                         |
| hf_internlm_chat_20b                   | configs/models/hf_internlm/hf_internlm_chat_20b.py                   |
| hf_internlm_chat_7b                    | configs/models/hf_internlm/hf_internlm_chat_7b.py                    |
| hf_internlm_chat_7b_8k                 | configs/models/hf_internlm/hf_internlm_chat_7b_8k.py                 |
| hf_internlm_chat_7b_v1_1               | configs/models/hf_internlm/hf_internlm_chat_7b_v1_1.py               |
| internlm_7b                            | configs/models/internlm/internlm_7b.py                               |
| lmdeploy_internlm2_chat_20b            | configs/models/hf_internlm/lmdeploy_internlm2_chat_20b.py            |
| lmdeploy_internlm2_chat_7b             | configs/models/hf_internlm/lmdeploy_internlm2_chat_7b.py             |
| ms_internlm_chat_7b_8k                 | configs/models/ms_internlm/ms_internlm_chat_7b_8k.py                 |
+----------------------------------------+----------------------------------------------------------------------+
+--------------------------------+------------------------------------------------------------------+
| Dataset                        | Config Path                                                      |
|--------------------------------+------------------------------------------------------------------|
| ceval_clean_ppl                | configs/datasets/ceval/ceval_clean_ppl.py                        |
| ceval_contamination_ppl_810ec6 | configs/datasets/contamination/ceval_contamination_ppl_810ec6.py |
| ceval_gen                      | configs/datasets/ceval/ceval_gen.py                              |
| ceval_gen_2daf24               | configs/datasets/ceval/ceval_gen_2daf24.py                       |
| ceval_gen_5f30c7               | configs/datasets/ceval/ceval_gen_5f30c7.py                       |
| ceval_internal_ppl_1cd8bf      | configs/datasets/ceval/ceval_internal_ppl_1cd8bf.py              |
| ceval_ppl                      | configs/datasets/ceval/ceval_ppl.py                              |
| ceval_ppl_1cd8bf               | configs/datasets/ceval/ceval_ppl_1cd8bf.py                       |
| ceval_ppl_578f8d               | configs/datasets/ceval/ceval_ppl_578f8d.py                       |
| ceval_ppl_93e5ce               | configs/datasets/ceval/ceval_ppl_93e5ce.py                       |
| ceval_zero_shot_gen_bd40ef     | configs/datasets/ceval/ceval_zero_shot_gen_bd40ef.py             |
+--------------------------------+------------------------------------------------------------------+
(opencompass) root@intern-studio-061925:~/opencompass#

在这里插入图片描述
在这里插入图片描述

启动评测 (10% A100 8GB 资源)

确保按照上述步骤正确安装 OpenCompass 并准备好数据集后,可以通过以下命令评测 InternLM2-Chat-1.8B 模型在 C-Eval 数据集上的性能。由于 OpenCompass 默认并行启动评估过程,我们可以在第一次运行时以 --debug 模式启动评估,并检查是否存在问题。在 --debug 模式下,任务将按顺序执行,并实时打印输出。

python run.py --datasets ceval_gen --hf-path /share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b --tokenizer-path /share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b --tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True --model-kwargs trust_remote_code=True device_map='auto' --max-seq-len 1024 --max-out-len 16 --batch-size 2 --num-gpus 1 --debug

命令解析

python run.py
--datasets ceval_gen \
--hf-path /share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b \  # HuggingFace 模型路径
--tokenizer-path /share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b \  # HuggingFace tokenizer 路径(如果与模型路径相同,可以省略)
--tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True \  # 构建 tokenizer 的参数
--model-kwargs device_map='auto' trust_remote_code=True \  # 构建模型的参数
--max-seq-len 1024 \  # 模型可以接受的最大序列长度
--max-out-len 16 \  # 生成的最大 token 数
--batch-size 2  \  # 批量大小
--num-gpus 1  # 运行模型所需的 GPU 数量
--debug

在这里插入图片描述
遇到 问题 解决方案:
pip install protobuf
在这里插入图片描述
重新运行脚本:

v
遇到错误mkl-service + Intel® MKL MKL_THREADING_LAYER=INTEL is incompatible with libgomp.so.1 … 解决方案:

export MKL_SERVICE_FORCE_INTEL=1
#或
export MKL_THREADING_LAYER=GNU

在这里插入图片描述

重新运行,大模型评测结果如下:


(opencompass) root@intern-studio-061925:~/opencompass# export MKL_THREADING_LAYER=GNU
(opencompass) root@intern-studio-061925:~/opencompass#
(opencompass) root@intern-studio-061925:~/opencompass#
(opencompass) root@intern-studio-061925:~/opencompass#
(opencompass) root@intern-studio-061925:~/opencompass#
(opencompass) root@intern-studio-061925:~/opencompass# python run.py --datasets ceval_gen --hf-path /share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b --tokenizer-path /share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b --tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True --model-kwargs trust_remote_code=True device_map='auto' --max-seq-len 1024 --max-out-len 16 --batch-size 2 --num-gpus 1 --debug
04/22 15:26:54 - OpenCompass - INFO - Loading ceval_gen: configs/datasets/ceval/ceval_gen.py
04/22 15:26:54 - OpenCompass - INFO - Loading example: configs/summarizers/example.py
04/22 15:26:55 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
04/22 15:26:55 - OpenCompass - INFO - Partitioned into 1 tasks.
04/22 15:27:36 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-college_economics,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-accountant,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-tax_accountant,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-physician,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-civil_servant,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-urban_and_rural_planner,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-teacher_qualification,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-college_programming,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-electrical_engineer,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-business_administration,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-art_studies,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-fire_engineer,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-environmental_impact_assessment_engineer,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-education_science,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-professional_tour_guide,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-college_chemistry,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-metrology_engineer,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-mao_zedong_thought,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-law,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-veterinary_medicine,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-modern_chinese_history,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-chinese_language_and_literature,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-legal_professional,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-logic,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_history,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-plant_protection,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-clinical_medicine,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-computer_architecture,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_biology,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_politics,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_chemistry,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_history,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-computer_network,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-operating_system,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-college_physics,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-advanced_mathematics,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_physics,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_chemistry,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_biology,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_mathematics,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_physics,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-marxism,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_politics,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_geography,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-ideological_and_moral_cultivation,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_chinese,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-sports_science,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-basic_medicine,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-probability_and_statistics,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_mathematics,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-discrete_mathematics,opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_geography]
Loading checkpoint shards: 100%|████████████████████████████████████████████████| 2/2 [00:50<00:00, 25.22s/it]
04/22 15:30:28 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-college_economics]
100%|████████████████████████████████████████████████████████████████████| 55/55 [00:00<00:00, 1176973.06it/s]
[2024-04-22 15:30:28,693] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 28/28 [01:02<00:00,  2.24s/it]
04/22 15:31:31 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-accountant]
100%|████████████████████████████████████████████████████████████████████| 49/49 [00:00<00:00, 1447330.25it/s]
[2024-04-22 15:31:31,622] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 25/25 [01:05<00:00,  2.63s/it]
04/22 15:32:37 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-tax_accountant]
100%|████████████████████████████████████████████████████████████████████| 49/49 [00:00<00:00, 1208946.45it/s]
[2024-04-22 15:32:37,873] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 25/25 [01:05<00:00,  2.62s/it]
04/22 15:33:43 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-physician]
100%|█████████████████████████████████████████████████████████████████████| 49/49 [00:00<00:00, 856337.07it/s]
[2024-04-22 15:33:43,519] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 25/25 [00:42<00:00,  1.71s/it]
04/22 15:34:26 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-civil_servant]
100%|████████████████████████████████████████████████████████████████████| 47/47 [00:00<00:00, 1359533.02it/s]
[2024-04-22 15:34:26,580] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 24/24 [01:09<00:00,  2.90s/it]
04/22 15:35:36 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-urban_and_rural_planner]
100%|████████████████████████████████████████████████████████████████████| 46/46 [00:00<00:00, 1330606.79it/s]
[2024-04-22 15:35:36,320] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 23/23 [00:42<00:00,  1.85s/it]
04/22 15:36:19 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-teacher_qualification]
100%|████████████████████████████████████████████████████████████████████| 44/44 [00:00<00:00, 1408773.86it/s]
[2024-04-22 15:36:19,129] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 22/22 [00:37<00:00,  1.69s/it]
04/22 15:36:56 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-college_programming]
100%|█████████████████████████████████████████████████████████████████████| 37/37 [00:00<00:00, 952081.28it/s]
[2024-04-22 15:36:56,550] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 19/19 [00:41<00:00,  2.17s/it]
04/22 15:37:37 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-electrical_engineer]
100%|████████████████████████████████████████████████████████████████████| 37/37 [00:00<00:00, 1149549.99it/s]
[2024-04-22 15:37:37,913] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 19/19 [00:38<00:00,  2.02s/it]
04/22 15:38:16 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-business_administration]
100%|████████████████████████████████████████████████████████████████████| 33/33 [00:00<00:00, 1032925.61it/s]
[2024-04-22 15:38:16,512] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 17/17 [00:32<00:00,  1.90s/it]
04/22 15:38:48 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-art_studies]
100%|████████████████████████████████████████████████████████████████████| 33/33 [00:00<00:00, 1125301.07it/s]
[2024-04-22 15:38:48,984] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 17/17 [00:25<00:00,  1.48s/it]
04/22 15:39:14 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-fire_engineer]
100%|█████████████████████████████████████████████████████████████████████| 31/31 [00:00<00:00, 977619.73it/s]
[2024-04-22 15:39:14,342] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...69%|██████████████████████████████████████████████████▏                                                       75%|██████████████████████████████████████████████████████▊                                                   81%|███████████████████████████████████████████████████████████▎                                              88%|███████████████████████████████████████████████████████████████▉                                          94%|████████████████████████████████████████████████████████████████                                         100%|████████████████████████████████████████████████████████████████                                         100%|████████████████████████████████████████████████████████████████                                         █████████| 16/16 [00:34<00:00,  2.18s/it]
04/22 15:39:49 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-environmental_impact_assessment_engineer]
100%|████████████████████████████| 31/31 [00:00<00:00, 714414.42it/s]
[2024-04-22 15:39:49,343] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 16/16 [00:31<00:00,  1.99s/it]
04/22 15:40:21 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-education_science]
100%|████████████████████████████| 29/29 [00:00<00:00, 887845.37it/s]
[2024-04-22 15:40:21,321] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 15/15 [00:24<00:00,  1.65s/it]
04/22 15:40:46 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-professional_tour_guide]
100%|████████████████████████████| 29/29 [00:00<00:00, 800229.05it/s]
[2024-04-22 15:40:46,143] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 15/15 [00:30<00:00,  2.01s/it]
04/22 15:41:16 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-college_chemistry]
100%|████████████████████████████| 24/24 [00:00<00:00, 867787.03it/s]
[2024-04-22 15:41:16,371] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 12/12 [00:30<00:00,  2.56s/it]
04/22 15:41:47 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-metrology_engineer]
100%|████████████████████████████| 24/24 [00:00<00:00, 860370.05it/s]
[2024-04-22 15:41:47,324] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 12/12 [00:26<00:00,  2.21s/it]
04/22 15:42:14 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-mao_zedong_thought]
100%|████████████████████████████| 24/24 [00:00<00:00, 689474.63it/s]
[2024-04-22 15:42:14,099] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 12/12 [00:24<00:00,  2.01s/it]
04/22 15:42:38 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-law]
100%|█████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 906876.54it/s]
[2024-04-22 15:42:38,372] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 12/12 [00:29<00:00,  2.50s/it]
04/22 15:43:08 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-veterinary_medicine]
100%|█████████████████████████████████████████████████████████████████████| 23/23 [00:00<00:00, 725330.77it/s]
[2024-04-22 15:43:08,540] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 12/12 [00:20<00:00,  1.67s/it]
04/22 15:43:28 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-modern_chinese_history]
100%|█████████████████████████████████████████████████████████████████████| 23/23 [00:00<00:00, 853707.89it/s]
[2024-04-22 15:43:28,748] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 12/12 [00:24<00:00,  2.08s/it]
04/22 15:43:53 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-chinese_language_and_literature]
100%|█████████████████████████████████████████████████████████████████████| 23/23 [00:00<00:00, 665303.39it/s]
[2024-04-22 15:43:53,825] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 12/12 [00:19<00:00,  1.65s/it]
04/22 15:44:13 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-legal_professional]
100%|█████████████████████████████████████████████████████████████████████| 23/23 [00:00<00:00, 810663.80it/s]
[2024-04-22 15:44:13,833] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 12/12 [00:37<00:00,  3.15s/it]
04/22 15:44:51 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-logic]
100%|█████████████████████████████████████████████████████████████████████| 22/22 [00:00<00:00, 846556.77it/s]
[2024-04-22 15:44:51,898] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 11/11 [00:42<00:00,  3.84s/it]
04/22 15:45:34 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_history]
100%|█████████████████████████████████████████████████████████████████████| 22/22 [00:00<00:00, 775417.55it/s]
[2024-04-22 15:45:34,567] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 11/11 [00:23<00:00,  2.18s/it]
04/22 15:45:58 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-plant_protection]
100%|█████████████████████████████████████████████████████████████████████| 22/22 [00:00<00:00, 595320.57it/s]
[2024-04-22 15:45:58,686] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 11/11 [00:16<00:00,  1.53s/it]
04/22 15:46:15 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-clinical_medicine]
100%|█████████████████████████████████████████████████████████████████████| 22/22 [00:00<00:00, 795471.45it/s]
[2024-04-22 15:46:15,740] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 11/11 [00:28<00:00,  2.59s/it]
04/22 15:46:44 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-computer_architecture]
100%|█████████████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 587202.56it/s]
[2024-04-22 15:46:44,319] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 11/11 [00:28<00:00,  2.60s/it]
04/22 15:47:12 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_biology]
100%|█████████████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 752823.79it/s]
[2024-04-22 15:47:13,049] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 11/11 [00:20<00:00,  1.90s/it]
04/22 15:47:34 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_politics]
100%|█████████████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 579476.21it/s]
[2024-04-22 15:47:34,348] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 11/11 [00:38<00:00,  3.49s/it]
04/22 15:48:12 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_chemistry]
100%|█████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 612307.15it/s]
[2024-04-22 15:48:12,963] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 10/10 [00:31<00:00,  3.12s/it]
04/22 15:48:44 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_history]
100%|█████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 704925.04it/s]
[2024-04-22 15:48:44,261] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 10/10 [00:20<00:00,  2.01s/it]
04/22 15:49:04 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-computer_network]
100%|█████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 622592.00it/s]
[2024-04-22 15:49:04,556] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 10/10 [00:15<00:00,  1.58s/it]
04/22 15:49:20 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-operating_system]
100%|█████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 731117.21it/s]
[2024-04-22 15:49:20,460] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...10%|███████▍                                                                                                  20%|██████████████▊                                                                                           30%|██████████████████████▏                                                                                   40%|█████████████████████████████▌                                                                            50%|█████████████████████████████████████                                                                     60%|████████████████████████████████████████████▍                                                             70%|███████████████████████████████████████████████████▊                                                      80%|███████████████████████████████████████████████████████████▏                                              90%|████████████████████████████████████████████████████████████████                                         100%|████████████████████████████████████████████████████████████████                                         100%|████████████████████████████████████████████████████████████████                                         █████████| 10/10 [00:13<00:00,  1.36s/it]
04/22 15:49:34 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-college_physics]
100%|████████████████████████████| 19/19 [00:00<00:00, 692971.97it/s]
[2024-04-22 15:49:34,308] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 10/10 [00:26<00:00,  2.60s/it]
04/22 15:50:00 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-advanced_mathematics]
100%|████████████████████████████| 19/19 [00:00<00:00, 569226.97it/s]
[2024-04-22 15:50:00,577] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 10/10 [00:31<00:00,  3.19s/it]
04/22 15:50:32 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-high_school_physics]
100%|████████████████████████████| 19/19 [00:00<00:00, 485925.46it/s]
[2024-04-22 15:50:32,703] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 10/10 [00:17<00:00,  1.73s/it]
04/22 15:50:50 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-high_school_chemistry]
100%|████████████████████████████| 19/19 [00:00<00:00, 664098.13it/s]
[2024-04-22 15:50:50,151] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 10/10 [00:18<00:00,  1.89s/it]
04/22 15:51:09 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-high_school_biology]
100%|████████████████████████████| 19/19 [00:00<00:00, 498073.60it/s]
[2024-04-22 15:51:09,228] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 10/10 [00:19<00:00,  1.91s/it]
04/22 15:51:28 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-middle_school_mathematics]
100%|████████████████████████████| 19/19 [00:00<00:00, 608334.17it/s]
[2024-04-22 15:51:28,484] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 10/10 [00:25<00:00,  2.52s/it]
04/22 15:51:53 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-middle_school_physics]
100%|████████████████████████████| 19/19 [00:00<00:00, 699050.67it/s]
[2024-04-22 15:51:53,900] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 10/10 [00:18<00:00,  1.90s/it]
04/22 15:52:13 - OpenCompass - INFO - Start inferencing [opencompass.                                         models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-                                         1_8b/ceval-marxism]
100%|████████████████████████████| 19/19 [00:00<00:00, 504378.33it/s]
[2024-04-22 15:52:13,408] [opencompass.openicl.icl_inferencer.icl_gen                                         _inferencer] [INFO] Starting inference process...
100%|████████████████████████████████| 10/10 [00:15<00:00,  1.56s/it]
04/22 15:52:29 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_politics]
100%|█████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 664098.13it/s]
[2024-04-22 15:52:29,316] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 10/10 [00:38<00:00,  3.87s/it]
04/22 15:53:08 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_geography]
100%|█████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 744782.95it/s]
[2024-04-22 15:53:08,169] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 10/10 [00:16<00:00,  1.67s/it]
04/22 15:53:24 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-ideological_and_moral_cultivation]
100%|█████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 681126.29it/s]
[2024-04-22 15:53:25,034] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 10/10 [00:14<00:00,  1.44s/it]
04/22 15:53:39 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_chinese]
100%|█████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 705236.96it/s]
[2024-04-22 15:53:39,637] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 10/10 [00:36<00:00,  3.60s/it]
04/22 15:54:15 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-sports_science]
100%|█████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 692971.97it/s]
[2024-04-22 15:54:15,783] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 10/10 [00:15<00:00,  1.58s/it]
04/22 15:54:31 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-basic_medicine]
100%|█████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 681126.29it/s]
[2024-04-22 15:54:31,735] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|█████████████████████████████████████████████████████████████████████████| 10/10 [00:16<00:00,  1.66s/it]
04/22 15:54:48 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-probability_and_statistics]
100%|█████████████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 645277.54it/s]
[2024-04-22 15:54:48,620] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████████████████████████████████████| 9/9 [00:39<00:00,  4.43s/it]
04/22 15:55:28 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_mathematics]
100%|█████████████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 656499.76it/s]
[2024-04-22 15:55:28,709] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████████████████████████████████████| 9/9 [00:32<00:00,  3.60s/it]
04/22 15:56:01 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-discrete_mathematics]
100%|█████████████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 524288.00it/s]
[2024-04-22 15:56:01,183] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████████████████████████████████████| 8/8 [00:14<00:00,  1.78s/it]
04/22 15:56:15 - OpenCompass - INFO - Start inferencing [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_geography]
100%|█████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 479349.03it/s]
[2024-04-22 15:56:15,577] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|███████████████████████████████████████████████████████████████████████████| 6/6 [00:11<00:00,  1.95s/it]
04/22 15:56:27 - OpenCompass - INFO - time elapsed: 1730.61s
04/22 15:56:47 - OpenCompass - INFO - Partitioned into 52 tasks.
04/22 15:56:50 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-computer_network]: {'accuracy': 47.368421052631575}
04/22 15:56:52 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-operating_system]: {'accuracy': 47.368421052631575}
04/22 15:56:54 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-computer_architecture]: {'accuracy': 23.809523809523807}
04/22 15:56:56 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-college_programming]: {'accuracy': 13.513513513513514}
04/22 15:56:59 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-college_physics]: {'accuracy': 42.10526315789473}
04/22 15:57:01 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-college_chemistry]: {'accuracy': 33.33333333333333}
04/22 15:57:03 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-advanced_mathematics]: {'accuracy': 10.526315789473683}
04/22 15:57:05 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-probability_and_statistics]: {'accuracy': 38.88888888888889}
04/22 15:57:08 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-discrete_mathematics]: {'accuracy': 25.0}
04/22 15:57:10 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-electrical_engineer]: {'accuracy': 27.027027027027028}
04/22 15:57:12 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-metrology_engineer]: {'accuracy': 54.166666666666664}
04/22 15:57:14 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_mathematics]: {'accuracy': 16.666666666666664}
04/22 15:57:17 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_physics]: {'accuracy': 42.10526315789473}
04/22 15:57:19 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_chemistry]: {'accuracy': 47.368421052631575}
04/22 15:57:21 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_biology]: {'accuracy': 26.31578947368421}
04/22 15:57:23 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_mathematics]: {'accuracy': 36.84210526315789}
04/22 15:57:25 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_biology]: {'accuracy': 80.95238095238095}
04/22 15:57:28 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_physics]: {'accuracy': 47.368421052631575}
04/22 15:57:30 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_chemistry]: {'accuracy': 80.0}
04/22 15:57:33 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-veterinary_medicine]: {'accuracy': 43.47826086956522}
04/22 15:57:35 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-college_economics]: {'accuracy': 32.72727272727273}
04/22 15:57:38 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-business_administration]: {'accuracy': 36.36363636363637}
04/22 15:57:40 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-marxism]: {'accuracy': 68.42105263157895}
04/22 15:57:43 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-mao_zedong_thought]: {'accuracy': 70.83333333333334}
04/22 15:57:45 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-education_science]: {'accuracy': 55.172413793103445}
04/22 15:57:48 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-teacher_qualification]: {'accuracy': 59.09090909090909}
04/22 15:57:50 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_politics]: {'accuracy': 57.89473684210527}
04/22 15:57:53 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_geography]: {'accuracy': 47.368421052631575}
04/22 15:57:55 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_politics]: {'accuracy': 71.42857142857143}
04/22 15:57:58 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_geography]: {'accuracy': 75.0}
04/22 15:58:00 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-modern_chinese_history]: {'accuracy': 52.17391304347826}
04/22 15:58:02 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-ideological_and_moral_cultivation]: {'accuracy': 73.68421052631578}
04/22 15:58:05 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-logic]: {'accuracy': 27.27272727272727}
04/22 15:58:07 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-law]: {'accuracy': 29.166666666666668}
04/22 15:58:09 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-chinese_language_and_literature]: {'accuracy': 47.82608695652174}
04/22 15:58:11 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-art_studies]: {'accuracy': 42.42424242424242}
04/22 15:58:14 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-professional_tour_guide]: {'accuracy': 51.724137931034484}
04/22 15:58:16 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-legal_professional]: {'accuracy': 34.78260869565217}
04/22 15:58:19 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_chinese]: {'accuracy': 42.10526315789473}
04/22 15:58:21 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-high_school_history]: {'accuracy': 65.0}
04/22 15:58:23 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-middle_school_history]: {'accuracy': 86.36363636363636}
04/22 15:58:25 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-civil_servant]: {'accuracy': 42.5531914893617}
04/22 15:58:28 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-sports_science]: {'accuracy': 52.63157894736842}
04/22 15:58:30 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-plant_protection]: {'accuracy': 40.909090909090914}
04/22 15:58:33 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-basic_medicine]: {'accuracy': 68.42105263157895}
04/22 15:58:35 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-clinical_medicine]: {'accuracy': 31.818181818181817}
04/22 15:58:37 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-urban_and_rural_planner]: {'accuracy': 47.82608695652174}
04/22 15:58:40 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-accountant]: {'accuracy': 36.734693877551024}
04/22 15:58:42 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-fire_engineer]: {'accuracy': 38.70967741935484}
04/22 15:58:44 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-environmental_impact_assessment_engineer]: {'accuracy': 51.61290322580645}
04/22 15:58:47 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-tax_accountant]: {'accuracy': 36.734693877551024}
04/22 15:58:49 - OpenCompass - INFO - Task [opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b/ceval-physician]: {'accuracy': 42.857142857142854}
dataset                                         version    metric         mode      opencompass.models.huggingface.HuggingFace_Shanghai_AI_Laboratory_internlm2-chat-1_8b
----------------------------------------------  ---------  -------------  ------  ---------------------------------------------------------------------------------------
ceval-computer_network                          db9ce2     accuracy       gen                                                                                       47.37
ceval-operating_system                          1c2571     accuracy       gen                                                                                       47.37
ceval-computer_architecture                     a74dad     accuracy       gen                                                                                       23.81
ceval-college_programming                       4ca32a     accuracy       gen                                                                                       13.51
ceval-college_physics                           963fa8     accuracy       gen                                                                                       42.11
ceval-college_chemistry                         e78857     accuracy       gen                                                                                       33.33
ceval-advanced_mathematics                      ce03e2     accuracy       gen                                                                                       10.53
ceval-probability_and_statistics                65e812     accuracy       gen                                                                                       38.89
ceval-discrete_mathematics                      e894ae     accuracy       gen                                                                                       25
ceval-electrical_engineer                       ae42b9     accuracy       gen                                                                                       27.03
ceval-metrology_engineer                        ee34ea     accuracy       gen                                                                                       54.17
ceval-high_school_mathematics                   1dc5bf     accuracy       gen                                                                                       16.67
ceval-high_school_physics                       adf25f     accuracy       gen                                                                                       42.11
ceval-high_school_chemistry                     2ed27f     accuracy       gen                                                                                       47.37
ceval-high_school_biology                       8e2b9a     accuracy       gen                                                                                       26.32
ceval-middle_school_mathematics                 bee8d5     accuracy       gen                                                                                       36.84
ceval-middle_school_biology                     86817c     accuracy       gen                                                                                       80.95
ceval-middle_school_physics                     8accf6     accuracy       gen                                                                                       47.37
ceval-middle_school_chemistry                   167a15     accuracy       gen                                                                                       80
ceval-veterinary_medicine                       b4e08d     accuracy       gen                                                                                       43.48
ceval-college_economics                         f3f4e6     accuracy       gen                                                                                       32.73
ceval-business_administration                   c1614e     accuracy       gen                                                                                       36.36
ceval-marxism                                   cf874c     accuracy       gen                                                                                       68.42
ceval-mao_zedong_thought                        51c7a4     accuracy       gen                                                                                       70.83
ceval-education_science                         591fee     accuracy       gen                                                                                       55.17
ceval-teacher_qualification                     4e4ced     accuracy       gen                                                                                       59.09
ceval-high_school_politics                      5c0de2     accuracy       gen                                                                                       57.89
ceval-high_school_geography                     865461     accuracy       gen                                                                                       47.37
ceval-middle_school_politics                    5be3e7     accuracy       gen                                                                                       71.43
ceval-middle_school_geography                   8a63be     accuracy       gen                                                                                       75
ceval-modern_chinese_history                    fc01af     accuracy       gen                                                                                       52.17
ceval-ideological_and_moral_cultivation         a2aa4a     accuracy       gen                                                                                       73.68
ceval-logic                                     f5b022     accuracy       gen                                                                                       27.27
ceval-law                                       a110a1     accuracy       gen                                                                                       29.17
ceval-chinese_language_and_literature           0f8b68     accuracy       gen                                                                                       47.83
ceval-art_studies                               2a1300     accuracy       gen                                                                                       42.42
ceval-professional_tour_guide                   4e673e     accuracy       gen                                                                                       51.72
ceval-legal_professional                        ce8787     accuracy       gen                                                                                       34.78
ceval-high_school_chinese                       315705     accuracy       gen                                                                                       42.11
ceval-high_school_history                       7eb30a     accuracy       gen                                                                                       65
ceval-middle_school_history                     48ab4a     accuracy       gen                                                                                       86.36
ceval-civil_servant                             87d061     accuracy       gen                                                                                       42.55
ceval-sports_science                            70f27b     accuracy       gen                                                                                       52.63
ceval-plant_protection                          8941f9     accuracy       gen                                                                                       40.91
ceval-basic_medicine                            c409d6     accuracy       gen                                                                                       68.42
ceval-clinical_medicine                         49e82d     accuracy       gen                                                                                       31.82
ceval-urban_and_rural_planner                   95b885     accuracy       gen                                                                                       47.83
ceval-accountant                                002837     accuracy       gen                                                                                       36.73
ceval-fire_engineer                             bc23f5     accuracy       gen                                                                                       38.71
ceval-environmental_impact_assessment_engineer  c64e2d     accuracy       gen                                                                                       51.61
ceval-tax_accountant                            3a5e3c     accuracy       gen                                                                                       36.73
ceval-physician                                 6e277d     accuracy       gen                                                                                       42.86
ceval-stem                                      -          naive_average  gen                                                                                       39.21
ceval-social-science                            -          naive_average  gen                                                                                       57.43
ceval-humanities                                -          naive_average  gen                                                                                       50.23
ceval-other                                     -          naive_average  gen                                                                                       44.62
ceval-hard                                      -          naive_average  gen                                                                                       32
ceval                                           -          naive_average  gen                                                                                       46.19
04/22 15:58:49 - OpenCompass - INFO - write summary to /root/opencompass/outputs/default/20240422_152654/summary/summary_20240422_152654.txt
04/22 15:58:49 - OpenCompass - INFO - write csv to /root/opencompass/outputs/default/20240422_152654/summary/summary_20240422_152654.csv
(opencompass) root@intern-studio-061925:~/opencompass#

在这里插入图片描述

大海捞针:星辰藏海深,字海寻珠难

大海捞针测试(灵感来自 NeedleInAHaystack)是指通过将关键信息随机插入一段长文本的不同位置,形成大语言模型 (LLM) 的Prompt,通过测试大模型是否能从长文本中提取出关键信息,从而测试大模型的长文本信息提取能力的一种方法,可反映LLM长文本理解的基本能力。

GPT-4 Turbo(128K)在语料长度超过 72K 且句子(“针”)藏在文本头部的时候,准确率不佳。
在这里插入图片描述

https://github.com/gkamradt/LLMTest_NeedleInAHaystack/tree/main
Claude 2.1似乎在语料长度超过 20K 之后就开始准确率不佳,而且句子(“针”)藏在语料靠前的位置时,准确率尤其差

在这里插入图片描述

https://github.com/gkamradt/LLMTest_NeedleInAHaystack/blob/main/viz/CreateVisFromLangSmithTesting.ipynb
v

Kimi Chat 公布“大海捞针”长文本压测结果
https://mp.weixin.qq.com/s?__biz=Mzk0NDU1MDkyNg==&mid=2247483766&idx=1&sn=8754ec4138905dd12c44d321957f0956&chksm=c323a417f4542d01106c1821c7b7fac4c9b0a55e9f97f72a0f3d8dec4359bf94e4257660ff1a&mpshare=1&scene=23&srcid=0126SmQjMKBNs9wTvHOJDAxi&sharer_shareinfo=52aa3b48e79441dd8d77643f5c91fe7f&sharer_shareinfo_first=a2f6f776598a5f3f32ca224e4ef5f5e3#rd

在这里插入图片描述

数据集介绍

Skywork/ChineseDomainModelingEval 数据集收录了 2023 年 9 月至 10 月期间发布的高质量中文文章,涵盖了多个领域。这些文章确保了公平且具有挑战性的基准测试。 该数据集包括特定领域的文件:

  • zh_finance.jsonl 金融
  • zh_game.jsonl 游戏
  • zh_government.jsonl 政务
  • zh_movie.jsonl 电影
  • zh_tech.jsonl 技术
  • zh_general.jsonl 综合 这些文件用于评估LLM对不同特定领域的理解能力

评估步骤

  • 从 Skywork/ChineseDomainModelingEval 下载数据集

在这里插入图片描述

  • 将下载的文件放置在 opencompass/data/CDME/ 下。CDME 目录中的预期文件结构如下
  • 在这里插入图片描述

配置数据集

在最新版本中,数据集不再通过运行脚本手动生成,而是通过在配置文件中动态定义和加载。用户需要根据自己的需求,在配置文件中指定数据集的参数。这种方法提供了更大的灵活性和定制化选项。

数据集配置示例

以下是一个数据集配置的示例,展示了如何在配置文件 configs/datasets/cdme/cdme8k.py 中定义一个数据集。这个示例展示了一个 8000 tokens 长度的中文数据集配置

for original_context_length in context_lengths:for depth_percent in generate_depth_percents(document_depth_percent_intervals,document_depth_percent_interval_type):dataset_dict = {'abbr': f'CDME_Length{original_context_length}Depth{int(depth_percent)}','type': CDMEDataset,'path': base_path,'length': original_context_length,'depth': int(depth_percent),'tokenizer_model': 'gpt-4','file_list': file_list,'num_repeats_per_file': 10,'length_buffer': 200,'guide': True,'language': 'Chinese','needle': '\n小明最喜欢的实习的地点就是上海人工智能实验室。\n','retrieval_question': '小明最喜欢的实习地点是哪里?请按照“小明最喜欢的实习地点就是________。”的格式回答。','reader_cfg': cdme_reader_cfg,'infer_cfg': cdme_infer_cfg,'eval_cfg': cdme_eval_cfg}cdme_datasets.append(dataset_dict)

在这个配置中,主要参数包括:

abbr: 数据集的简称。

type: 数据集类型。

path: 数据集文件的路径。

length: 上下文长度(以token为单位)。

depth: 文档深度百分比。

tokenizer_model: 使用的tokenizer 模型。

file_list: 数据源文件列表。

num_repeats_per_file: 每个文件重复的次数。

length_buffer: 长度缓冲区。

guide: 是否为引导式数据集。

language: 数据集的语言。

needle: 在数据集中要查找的特定文本(针)。

retrieval_question: 用于提示模型检索的问题。

reader_cfg, infer_cfg, eval_cfg: 分别对应读取、推理和评估的配置。

通过在配置文件中定义这些参数,您可以灵活地创建适合您需求的数据集。配置文件提供了一种高度可定制和扩展的方式来管理数据集的生成和使用

使用 internlm 模型进行评估,可以使用以下命令

python run.py configs/eval_needleinahaystack.py --slurm -p partition_name -q auto --max-num-workers 32 

python run.py configs/eval_needlebench.py --slurm -p partition_name -q auto --max-num-workers 32

大模型技术分享

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

《企业级生成式人工智能LLM大模型技术、算法及案例实战》线上高级研修讲座

模块一:Generative AI 原理本质、技术内核及工程实践周期详解
模块二:工业级 Prompting 技术内幕及端到端的基于LLM 的会议助理实战
模块三:三大 Llama 2 模型详解及实战构建安全可靠的智能对话系统
模块四:生产环境下 GenAI/LLMs 的五大核心问题及构建健壮的应用实战
模块五:大模型应用开发技术:Agentic-based 应用技术及案例实战
模块六:LLM 大模型微调及模型 Quantization 技术及案例实战
模块七:大模型高效微调 PEFT 算法、技术、流程及代码实战进阶
模块八:LLM 模型对齐技术、流程及进行文本Toxicity 分析实战
模块九:构建安全的 GenAI/LLMs 核心技术Red Teaming 解密实战
模块十:构建可信赖的企业私有安全大模型Responsible AI 实战 

Llama3关键技术深度解析与构建Responsible AI、算法及开发落地实战

1、Llama开源模型家族大模型技术、工具和多模态详解:学员将深入了解Meta Llama 3的创新之处,比如其在语言模型技术上的突破,并学习到如何在Llama 3中构建trust and safety AI。他们将详细了解Llama 3的五大技术分支及工具,以及如何在AWS上实战Llama指令微调的案例。
2、解密Llama 3 Foundation Model模型结构特色技术及代码实现:深入了解Llama 3中的各种技术,比如Tiktokenizer、KV Cache、Grouped Multi-Query Attention等。通过项目二逐行剖析Llama 3的源码,加深对技术的理解。
3、解密Llama 3 Foundation Model模型结构核心技术及代码实现:SwiGLU Activation Function、FeedForward Block、Encoder Block等。通过项目三学习Llama 3的推理及Inferencing代码,加强对技术的实践理解。
4、基于LangGraph on Llama 3构建Responsible AI实战体验:通过项目四在Llama 3上实战基于LangGraph的Responsible AI项目。他们将了解到LangGraph的三大核心组件、运行机制和流程步骤,从而加强对Responsible AI的实践能力。
5、Llama模型家族构建技术构建安全可信赖企业级AI应用内幕详解:深入了解构建安全可靠的企业级AI应用所需的关键技术,比如Code Llama、Llama Guard等。项目五实战构建安全可靠的对话智能项目升级版,加强对安全性的实践理解。
6、Llama模型家族Fine-tuning技术与算法实战:学员将学习Fine-tuning技术与算法,比如Supervised Fine-Tuning(SFT)、Reward Model技术、PPO算法、DPO算法等。项目六动手实现PPO及DPO算法,加强对算法的理解和应用能力。
7、Llama模型家族基于AI反馈的强化学习技术解密:深入学习Llama模型家族基于AI反馈的强化学习技术,比如RLAIF和RLHF。项目七实战基于RLAIF的Constitutional AI。
8、Llama 3中的DPO原理、算法、组件及具体实现及算法进阶:学习Llama 3中结合使用PPO和DPO算法,剖析DPO的原理和工作机制,详细解析DPO中的关键算法组件,并通过综合项目八从零开始动手实现和测试DPO算法,同时课程将解密DPO进阶技术Iterative DPO及IPO算法。
9、Llama模型家族Safety设计与实现:在这个模块中,学员将学习Llama模型家族的Safety设计与实现,比如Safety in Pretraining、Safety Fine-Tuning等。构建安全可靠的GenAI/LLMs项目开发。
10、Llama 3构建可信赖的企业私有安全大模型Responsible AI系统:构建可信赖的企业私有安全大模型Responsible AI系统,掌握Llama 3的Constitutional AI、Red Teaming。

解码Sora架构、技术及应用

一、为何Sora通往AGI道路的里程碑?
1,探索从大规模语言模型(LLM)到大规模视觉模型(LVM)的关键转变,揭示其在实现通用人工智能(AGI)中的作用。
2,展示Visual Data和Text Data结合的成功案例,解析Sora在此过程中扮演的关键角色。
3,详细介绍Sora如何依据文本指令生成具有三维一致性(3D consistency)的视频内容。 4,解析Sora如何根据图像或视频生成高保真内容的技术路径。
5,探讨Sora在不同应用场景中的实践价值及其面临的挑战和局限性。

二、解码Sora架构原理
1,DiT (Diffusion Transformer)架构详解
2,DiT是如何帮助Sora实现Consistent、Realistic、Imaginative视频内容的?
3,探讨为何选用Transformer作为Diffusion的核心网络,而非技术如U-Net。
4,DiT的Patchification原理及流程,揭示其在处理视频和图像数据中的重要性。
5,Conditional Diffusion过程详解,及其在内容生成过程中的作用。
三、解码Sora关键技术解密
1,Sora如何利用Transformer和Diffusion技术理解物体间的互动,及其对模拟复杂互动场景的重要性。
2,为何说Space-time patches是Sora技术的核心,及其对视频生成能力的提升作用。
3,Spacetime latent patches详解,探讨其在视频压缩和生成中的关键角色。
4,Sora Simulator如何利用Space-time patches构建digital和physical世界,及其对模拟真实世界变化的能力。
5,Sora如何实现faithfully按照用户输入文本而生成内容,探讨背后的技术与创新。
6,Sora为何依据abstract concept而不是依据具体的pixels进行内容生成,及其对模型生成质量与多样性的影响。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/web/7700.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

Linux cmake 初窥【2】

1.开发背景 基于上一篇的基础上&#xff0c;再次升级 2.开发需求 基于 cmake 指定源文件目录可以是多个文件夹&#xff0c;多层目录 3.开发环境 ubuntu 20.04 cmake-3.23.1 4.实现步骤 4.1 准备源码文件 工程目录如下 顶层脚本 compile.sh 负责执行 cmake 操作&#xff0…

FSC森林认证是什么?

FSC森林认证&#xff0c;又称木材认证&#xff0c;是一种运用市场机制来促进森林可持续经营&#xff0c;实现生态、社会和经济目标的工具。FSC森林认证包括森林经营认证&#xff08;Forest Management, FM&#xff09;和产销监管链认证&#xff08;Chain of Custody, COC&#…

微搭低代码入门06分页查询

目录 1 创建自定义代码2 编写分页代码3 创建页面4 创建变量5 配置数据列表总结 我们在数据模型章节介绍了微搭后端服务编写的三种方式&#xff0c;包括Http请求、自定义代码、云函数。本篇我们详细讲解一下利用自定义代码开发分页查询的功能。 1 创建自定义代码 打开控制台&am…

Qt——入门基础

目录 Qt入门第一个应用程序 main.cpp widget.h widget.cpp widget.ui .pro Hello World程序 对象树 编辑框 按钮 Qt 窗口坐标系 Qt入门第一个应用程序 main.cpp 这就像一开始学语言时都会打印一个“Hello World”一样&#xff0c;我们先来看看创建好一个项目后&…

LeetCode 难题解析 —— 正则表达式匹配 (动态规划)

10. 正则表达式匹配 思路解析 这道题虽然看起来不难理解&#xff0c;但却存在多种可能&#xff0c;当然这种可能的数量是有限的&#xff0c;且其规律对于每一次判别都使用&#xff0c;所以自然而然就想到用 动态规划 的方法啦 接下来逐步分析可能的情况&#xff1a; &#x…

栈(使用顺序表构建)

P. S.&#xff1a;以下代码均在VS2019环境下测试&#xff0c;不代表所有编译器均可通过。 P. S.&#xff1a;测试代码均未展示头文件stdio.h的声明&#xff0c;使用时请自行添加。 目录 1、栈的概念2、栈的数组构建方法2.1 前言2.2 正文2.2.1 栈的初始化2.2.2 栈的销毁2.2.3 压…

栈与队列(包括例题一道)

栈 栈的概念 栈&#xff1a;一种特殊的线性表&#xff0c;其只允许在固定的一端进行插入和删除元素操作。 进行数据插入和删除操作的一端 称为栈顶&#xff0c;另一端称为栈底。 栈中的数据元素遵守后进先出 LIFO &#xff08; Last In First Out &#xff09;的原则。 压栈&…

AI去衣技术在动画制作中的应用

随着科技的发展&#xff0c;人工智能&#xff08;AI&#xff09;已经在各个领域中发挥了重要作用&#xff0c;其中包括动画制作。在动画制作中&#xff0c;AI去衣技术是一个重要的工具&#xff0c;它可以帮助动画师们更加高效地完成工作。 AI去衣技术是一种基于人工智能的图像…

神经网络怎么把隐含层变量融合到损失函数中?

&#x1f3c6;本文收录于「Bug调优」专栏&#xff0c;主要记录项目实战过程中的Bug之前因后果及提供真实有效的解决方案&#xff0c;希望能够助你一臂之力&#xff0c;帮你早日登顶实现财富自由&#x1f680;&#xff1b;同时&#xff0c;欢迎大家关注&&收藏&&…

【工具分享】Amnesia2勒索病毒解密工具

前言 Amnesia 勒索软件于 2017 年 4 月 26 日开始出现。Amnesia 主要通过 RDP&#xff08;远程桌面服务&#xff09;暴力攻击进行传播&#xff0c;允许恶意软件作者登录受害者的服务器并执行勒索行为。 特征 Amnesia 是一种用 Delphi 编程语言编写的勒索软件&#xff0c;它使…

程序员的实用神器:助力软件开发的利器 ️

程序员的实用神器&#xff1a;助力软件开发的利器 &#x1f6e0;️ 程序员的实用神器&#xff1a;助力软件开发的利器 &#x1f6e0;️引言摘要自动化测试工具&#xff1a;保障代码质量的利剑 &#x1f5e1;️编写高效测试用例 持续集成/持续部署工具&#xff1a;加速交付的利器…

ASP.NET通用作业批改系统设计

摘  要 该系统采用B/S结构&#xff0c;以浏览器方式登陆系统&#xff0c;用ASP.NET作为开发语言&#xff0c;数据库则使用Microsoft SQL Server 2000实现。《通用作业批改系统》包括了学生子系统、教师子系统、管理员子系统三大模块&#xff0c;该系统主要完成学生&#xff…

基于C语言的贪吃蛇小游戏(简易版)

这篇博客会是对学习C语言成果的检测&#xff0c;为了实现贪吃蛇小游戏&#xff0c;我们用到的“工具”有&#xff1a;C语言函数、枚举、结构体、动态内存管理、预处理指令、链表、Win32 API等。 目录 1.简易版游戏效果 1.1欢迎界面 1.2游戏规则提示页面 1.3游戏进行页面 …

纯净水20、脉动30被指宰客!疯狂开始反噬小杨哥?

作为疯狂小杨哥早期粉&#xff0c;小柴好像很久没看到小杨哥的搞笑视频了。 自然&#xff0c;再在社交媒体上看到&#xff0c;小杨哥兄弟已经不再是那个青涩的少年了。 而是摇身一变不仅成为一个非常成功带货主播&#xff0c;且成为一个资本版图越来越多&#xff0c;玩的越来越…

现场面试题

这里写目录标题 1.sql1.1 只保留学生的最新成绩1.2 统计通话号码数1.3 更新地址 2.基础题2.1 请求序列第N位的值: 0, 1, 1, 2, ,3, 5, 8, 13, 21, 34.....第N位的值2.2 请写一段java代码&#xff0c;输出存在重复字母的单词 1.sql 1.1 只保留学生的最新成绩 表student中记录学…

网络安全之交换基础

交换属于二层技术。路由器&#xff08;router&#xff09;是三层设备&#xff0c;可以基于IP地址转发&#xff0c;但需要路由表来记录。 交换机&#xff08;switch&#xff09;是二层设备&#xff0c;网桥&#xff08;switch&#xff09;也是二层设备&#xff0c;这两个都是基…

SegFix:预测边界和预测方向来修正边界

论文标题&#xff1a;SegFix: Model-Agnostic Boundary Refinement for Segmentation 论文地址&#xff1a;https://arxiv.org/pdf/2007.04269.pdf 代码地址&#xff1a;https://github.com/openseg-group/openseg.pytorch 两种loss监督 八种方向变回归问题为分类问题 代码地…

PyQt6--Python桌面开发(1.安装配置环境)

一.PyQt6简介 PyQt&#xff1a;PyQt是一个功能强大且成熟的GUI框架&#xff0c;基于Qt库。它提供了丰富的组件、布局和主题选项&#xff0c;以及强大的功能和灵活性。PyQt的优点是它具有现代化的外观和丰富的功能&#xff0c;适用于复杂的GUI应用程序。然而&#xff0c;由于Py…

openEuler 22.03 GPT分区表模式下磁盘分区管理

目录 GPT分区表模式下磁盘分区管理parted交互式创建分区步骤 1 执行如下步骤对/dev/sdc磁盘分区 非交互式创建分区步骤 1 输入如下命令直接创建分区。 删除分区步骤 1 执行如下命令删除/dev/sdc1分区。 GPT分区表模式下磁盘分区管理 parted交互式创建分区 步骤 1 执行如下步骤…