在使用 spacy 进行 NLP 时出现以下错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-164-8ef00790b0bb> in <module>2 opt = nlp.begin_training()3 for i in range(n):
----> 4 loss = train(nlp, train_data, opt)5 acc = evaluate(nlp, valid_text, valid_label)6 print(f"Loss: {loss['textcat']:.3f} \t Accuracy: {accuracy:.3f}")<ipython-input-155-47db869d5b7c> in train(model, train, optimizer, batch_size)8 for batch in batches:9 text, label = zip(*batch)
---> 10 model.update(text, label, sgd=optimizer, losses=loss)11 return loss~\AppData\Roaming\Python\Python37\site-packages\spacy\language.py in update(self, docs, golds, drop, sgd, losses, component_cfg)508 sgd = self._optimizer509 # Allow dict of args to GoldParse, instead of GoldParse objects.
--> 510 docs, golds = self._format_docs_and_golds(docs, golds)511 grads = {}512 ~\AppData\Roaming\Python\Python37\site-packages\spacy\language.py in _format_docs_and_golds(self, docs, golds)480 err = Errors.E151.format(unexp=unexpected, exp=expected_keys)481 raise ValueError(err)
--> 482 gold = GoldParse(doc, **gold)483 doc_objs.append(doc)484 gold_objs.append(gold)gold.pyx in spacy.gold.GoldParse.__init__()TypeError: object of type 'float' has no len()
原因:
数据中有 NaN
,需要处理它
解决方法:
- 直接丢弃,
train = train.dropna()
- 替换为空字符串,
train = train.fillna(" ")