一、千问
关于tokenizer的改动:
1.1、更改build_tokenizer中tokenizer类的加载。
/mnt/nas/pretrain/code/Megatron-LM/megatron/tokenizer/__init__.py 或者 tokenizer.py
在build_tokenizer.py函数中:
elif args.tokenizer_type == "QwenTokenizer":assert args.tokenizer_name_or_path is not Nonefrom .tokenization_qwen import QWenTokenizertokenizer = QWenTokenizer.from_pretrained(args.tokenizer_name_or_path,model_max_length=args.seq_length,padding_side='right',use_fast=False,)tokenizer.pad_token_id = tokenizer.pad_idtokenizer.eos_token_id = tokenizer.eod_idargs.padded_vocab_size = tokenizer.vocab_size + args.extra_vocab_size
1.2、dlc时创建主函数.sh文件 or debug时更改主函数参数命名
debug时