一、问题描述
如题报错:“topk_cpu” not implemented for ‘Half’
是在使用transformers
库时本地导入某个模型,完整报错如下:
File "/Users/guomiansheng/anaconda3/envs/ep1/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_contextreturn func(*args, **kwargs)File "/Users/guomiansheng/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 1028, in chatoutputs = self.generate(**inputs, **gen_kwargs)File "/Users/guomiansheng/anaconda3/envs/ep1/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_contextreturn func(*args, **kwargs)File "/Users/guomiansheng/anaconda3/envs/ep1/lib/python3.8/site-packages/transformers/generation/utils.py", line 1485, in generatereturn self.sample(File "/Users/guomiansheng/anaconda3/envs/ep1/lib/python3.8/site-packages/transformers/generation/utils.py", line 2538, in samplenext_token_scores = logits_warper(input_ids, next_token_scores)File "/Users/guomiansheng/anaconda3/envs/ep1/lib/python3.8/site-packages/transformers/generation/logits_process.py", line 92, in __call__scores = processor(input_ids, scores)File "/Users/guomiansheng/anaconda3/envs/ep1/lib/python3.8/site-packages/transformers/generation/logits_process.py", line 302, in __call__indices_to_remove = scores < torch.topk(scores, top_k)[0][..., -1, None]
RuntimeError: "topk_cpu" not implemented for 'Half'
二、解决方法
如果模型权重做了半精度(fp16
),如半精度的chatglm2-6b模型需要13GB内存,如果是16GB显存的macbook pro运行时显存不足时会很卡。刚才的问题是torch.topk
不支持半精度fp16计算,可以使用float()
转换为fp32
后再将to("mps")
。当然除非迫不得已还是使用cuda更好啦。
Reference
[1] Bug: RuntimeError: “topk_cpu” not implemented for ‘Half’