Rasa NLU 实践

文章目录

- 1. 目录结构
- 2. nlu.yml
- 3. config.yml
- 4. domain.yml
- 5. 实践

learn from https://github.com/Chinese-NLP-book/rasa_chinese_book_code

1. 目录结构

在这里插入图片描述

2. nlu.yml

version: "3.0"
nlu:- intent: greetexamples: |- 你好- hello- hi- 喂- 在么- intent: goodbyeexamples: |- 拜拜- 再见- 拜- 退出- 结束- intent: medicineexamples: |- [感冒](disease)了该吃什么药- 我[便秘](disease)了，该吃什么药- 我[胃痛](disease)，该吃什么药- 一直[打喷嚏](disease)吃什么药好- 父母都有[高血压](disease)，我应该推荐他们吃什么药好呢- 头上烫烫的，感觉[发烧](disease)了，该吃什么药好- [减肥](disease)有什么好的药品推荐吗？- intent: medical_departmentexamples: |- [感冒](disease)了该吃去哪个科室看病- 我[便秘](disease)了，该去挂哪个科室的号- 我[胃痛](disease)，该去医院看哪个门诊啊- 一直[打喷嚏](disease)挂哪一个科室的号啊- [头疼](disease)该挂哪科- intent: medical_hospitalexamples: |- 我生病了，不知道去哪里看病- [减肥](disease)有什么好的医院或者健康中心推荐吗？- 想做个[体检](disease)，有哪家医院或者哪里的诊所或者健康中心比较实惠啊？- 父母都有[高血压](disease)，我应该推荐他们去哪家医院好呢

这个配置文件里面有一些 对话的意图，以及一些 该意图可能的说话例子

3. config.yml

recipe: default.v1language: zhpipeline:- name: JiebaTokenizer- name: LanguageModelFeaturizermodel_name: bertmodel_weights: bert-base-chinese- name: "DIETClassifier"epochs: 100policies:
# # No configuration for policies was provided. The following default policies were used to train your model.
# # If you'd like to customize them, uncomment and adjust the policies.
# # See https://rasa.com/docs/rasa/policies for more information.
#   - name: MemoizationPolicy
#   - name: RulePolicy
#   - name: UnexpecTEDIntentPolicy
#     max_history: 5
#     epochs: 100
#   - name: TEDPolicy
#     max_history: 5
#     epochs: 100
#     constrain_similarities: true

这个文件里配置了：语种，分词器，模型、训练epochs等参数

4. domain.yml

version: "3.0"intents:- greet- goodbye- medicine- medical_department- medical_hospital

这个文件里面有所有的意图的类别

5. 实践

pip install --no-deps -r full_requirements.txt
cd Chapter02/

rasa train nlu 训练

rasa train nlu
┌────────────────────────────────────────────────────────────────────────────────┐
│ Rasa Open Source reports anonymous usage telemetry to help improve the product │
│ for all its users.                                                             │
│                                                                                │
│ If you'd like to opt-out, you can use `rasa telemetry disable`.                │
│ To learn more, check out https://rasa.com/docs/rasa/telemetry/telemetry.       │
└────────────────────────────────────────────────────────────────────────────────┘
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/tensorflow/python/autograph/impl/api.py:22: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative usesimport imp
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:23: DeprecationWarning: NEAREST is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.NEAREST or Dither.NONE instead.'nearest': pil_image.NEAREST,
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:24: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.'bilinear': pil_image.BILINEAR,
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:25: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.'bicubic': pil_image.BICUBIC,
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:28: DeprecationWarning: HAMMING is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.HAMMING instead.if hasattr(pil_image, 'HAMMING'):
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:30: DeprecationWarning: BOX is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BOX instead.if hasattr(pil_image, 'BOX'):
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:33: DeprecationWarning: LANCZOS is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.LANCZOS instead.if hasattr(pil_image, 'LANCZOS'):
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/matplotlib/__init__.py:169: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.if LooseVersion(module.__version__) < minver:
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.other = LooseVersion(other)
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/tensorflow_addons/utils/ensure_tf_install.py:47: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.min_version = LooseVersion(INCLUSIVE_MIN_TF_VERSION)
2022-11-07 10:00:26 INFO     transformers.file_utils  - TensorFlow version 2.6.5 available.
2022-11-07 10:00:27 INFO     rasa.engine.training.hooks  - Starting to train component 'JiebaTokenizer'.
2022-11-07 10:00:27 INFO     rasa.engine.training.hooks  - Finished training component 'JiebaTokenizer'.
Building prefix dict from the default dictionary ...
Dumping model to file cache /tmp/jieba.cache
Loading model cost 0.675 seconds.
Prefix dict has been built successfully.
Downloading: 100%|████████████████████████████████████████████████████████████████████████| 110k/110k [00:00<00:00, 250kB/s]
2022-11-07 10:00:30 INFO     transformers.tokenization_utils  - loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-vocab.txt from cache at /home/web/.cache/torch/transformers/8a0c070123c1f794c42a29c6904beb7c1b8715741e235bee04aca2c7636fc83f.9b42061518a39ca00b8b52059fd2bede8daa613f8a8671500e518a8c29de8c00
Downloading: 100%|██████████████████████████████████████████████████████████████████████████| 624/624 [00:00<00:00, 613kB/s]
2022-11-07 10:00:32 INFO     transformers.configuration_utils  - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-config.json from cache at /home/web/.cache/torch/transformers/8a3b1cfe5da58286e12a0f5d7d182b8d6eca88c08e26c332ee3817548cf7e60a.f12a4f986e43d8b328f5b067a641064d67b91597567a06c7b122d1ca7dfd9741
2022-11-07 10:00:32 INFO     transformers.configuration_utils  - Model config BertConfig {"architectures": ["BertForMaskedLM"],"attention_probs_dropout_prob": 0.1,"directionality": "bidi","hidden_act": "gelu","hidden_dropout_prob": 0.1,"hidden_size": 768,"initializer_range": 0.02,"intermediate_size": 3072,"layer_norm_eps": 1e-12,"max_position_embeddings": 512,"model_type": "bert","num_attention_heads": 12,"num_hidden_layers": 12,"pad_token_id": 0,"pooler_fc_size": 768,"pooler_num_attention_heads": 12,"pooler_num_fc_layers": 3,"pooler_size_per_head": 128,"pooler_type": "first_token_transform","type_vocab_size": 2,"vocab_size": 21128
}Downloading: 100%|████████████████████████████████████████████████████████████████████████| 478M/478M [17:05<00:00, 466kB/s]
2022-11-07 10:17:41 INFO     transformers.modeling_tf_utils  - loading weights file https://cdn.huggingface.co/bert-base-chinese-tf_model.h5 from cache at /home/web/.cache/torch/transformers/86a460b592673bcac3fe5d858ecf519e4890b4f6eddd1a46a077bd672dee6fe5.e6b974f59b54219496a89fd32be7afb020374df0976a796e5ccd3a1733d31537.h5
2022-11-07 10:17:43 INFO     transformers.modeling_tf_utils  - Layers from pretrained model not used in TFBertModel: ['nsp___cls', 'mlm___cls']
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/rasa/shared/nlu/training_data/features.py:152: DeprecationWarning: tostring() is deprecated. Use tobytes() instead.f_as_text = self.features.tostring()
2022-11-07 10:17:44 INFO     rasa.engine.training.hooks  - Starting to train component 'DIETClassifier'.
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/rasa/utils/train_utils.py:527: UserWarning: constrain_similarities is set to `False`. It is recommended to set it to `True` when using cross-entropy loss.rasa.shared.utils.io.raise_warning(
Epochs: 100%|██████████████████████████████████████████████| 100/100 [00:31<00:00,  3.15it/s, t_loss=0.458, i_acc=1, e_f1=1]
2022-11-07 10:18:16 INFO     rasa.engine.training.hooks  - Finished training component 'DIETClassifier'.
Your Rasa model is trained and saved at 'models/nlu-20221107-100026-rainy-gazebo.tar.gz'.

模型被保存了

ll models/
total 39536
-rw-rw-r-- 1 web web 20238663 Nov  7 10:18 nlu-20221107-100026-rainy-gazebo.tar.gz
-rw-rw-r-- 1 web web 20238659 Nov 10 09:55 nlu-20221110-095458-green-trill.tar.gz

运行测试 rasa shell nlu

rasa shell nlu
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/future/standard_library/__init__.py:65: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative usesimport imp
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:23: DeprecationWarning: NEAREST is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.NEAREST or Dither.NONE instead.'nearest': pil_image.NEAREST,
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:24: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.'bilinear': pil_image.BILINEAR,
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:25: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.'bicubic': pil_image.BICUBIC,
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:28: DeprecationWarning: HAMMING is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.HAMMING instead.if hasattr(pil_image, 'HAMMING'):
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:30: DeprecationWarning: BOX is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BOX instead.if hasattr(pil_image, 'BOX'):
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/keras_preprocessing/image/utils.py:33: DeprecationWarning: LANCZOS is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.LANCZOS instead.if hasattr(pil_image, 'LANCZOS'):
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/matplotlib/__init__.py:169: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.if LooseVersion(module.__version__) < minver:
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.other = LooseVersion(other)
/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/tensorflow_addons/utils/ensure_tf_install.py:47: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.min_version = LooseVersion(INCLUSIVE_MIN_TF_VERSION)
2022-11-10 09:57:48 INFO     rasa.core.processor  - Loading model models/nlu-20221110-095458-green-trill.tar.gz...
2022-11-10 09:57:50 INFO     transformers.tokenization_utils  - loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-vocab.txt from cache at /home/web/.cache/torch/transformers/8a0c070123c1f794c42a29c6904beb7c1b8715741e235bee04aca2c7636fc83f.9b42061518a39ca00b8b52059fd2bede8daa613f8a8671500e518a8c29de8c00
2022-11-10 09:57:50 INFO     transformers.configuration_utils  - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-config.json from cache at /home/web/.cache/torch/transformers/8a3b1cfe5da58286e12a0f5d7d182b8d6eca88c08e26c332ee3817548cf7e60a.f12a4f986e43d8b328f5b067a641064d67b91597567a06c7b122d1ca7dfd9741
2022-11-10 09:57:50 INFO     transformers.configuration_utils  - Model config BertConfig {"architectures": ["BertForMaskedLM"],"attention_probs_dropout_prob": 0.1,"directionality": "bidi","hidden_act": "gelu","hidden_dropout_prob": 0.1,"hidden_size": 768,"initializer_range": 0.02,"intermediate_size": 3072,"layer_norm_eps": 1e-12,"max_position_embeddings": 512,"model_type": "bert","num_attention_heads": 12,"num_hidden_layers": 12,"pad_token_id": 0,"pooler_fc_size": 768,"pooler_num_attention_heads": 12,"pooler_num_fc_layers": 3,"pooler_size_per_head": 128,"pooler_type": "first_token_transform","type_vocab_size": 2,"vocab_size": 21128
}2022-11-10 09:57:52 INFO     transformers.modeling_tf_utils  - loading weights file https://cdn.huggingface.co/bert-base-chinese-tf_model.h5 from cache at /home/web/.cache/torch/transformers/86a460b592673bcac3fe5d858ecf519e4890b4f6eddd1a46a077bd672dee6fe5.e6b974f59b54219496a89fd32be7afb020374df0976a796e5ccd3a1733d31537.h5
2022-11-10 09:57:57 INFO     transformers.modeling_tf_utils  - Layers from pretrained model not used in TFBertModel: ['nsp___cls', 'mlm___cls']/opt/bdp/data01/anaconda3/envs/rasa/lib/python3.8/site-packages/rasa/utils/train_utils.py:527: UserWarning: constrain_similarities is set to `False`. It is recommended to set it to `True` when using cross-entropy loss.rasa.shared.utils.io.raise_warning(
NLU model loaded. Type a message and press enter to parse it.
Next message:

测试1：

Next message:
我有点感冒，吃什么药好呢？
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.585 seconds.
Prefix dict has been built successfully.
{"text": "我有点感冒，吃什么药好呢？","intent": {"name": "medicine","confidence": 0.9998257756233215},"entities": [{"entity": "disease","start": 3,"end": 5,"confidence_entity": 0.9954996705055237,"value": "感冒","extractor": "DIETClassifier"}],"text_tokens": [[0,1],[1,3],[3,5],[5,6],[6,7],[7,9],[9,11],[11,12],[12,13]],"intent_ranking": [{"name": "medicine","confidence": 0.9998257756233215},{"name": "medical_department","confidence": 0.0001144336347351782},{"name": "medical_hospital","confidence": 2.84777233900968e-05},{"name": "goodbye","confidence": 2.1356245269998908e-05},{"name": "greet","confidence": 9.92826244328171e-06}]
}

Next message:
我有点晕，该看什么医生？
{"text": "我有点晕，该看什么医生？","intent": {"name": "medicine","confidence": 0.7516889572143555},"entities": [],"text_tokens": [[0,1],[1,3],[3,4],[4,5],[5,6],[6,7],[7,9],[9,11],[11,12]],"intent_ranking": [{"name": "medicine","confidence": 0.7516889572143555},{"name": "medical_department","confidence": 0.23077963292598724},{"name": "medical_hospital","confidence": 0.014110525138676167},{"name": "goodbye","confidence": 0.0021244173403829336},{"name": "greet","confidence": 0.0012964475899934769}]
}

Next message:
早上好
{"text": "早上好","intent": {"name": "greet","confidence": 0.9996402263641357},"entities": [],"text_tokens": [[0,3]],"intent_ranking": [{"name": "greet","confidence": 0.9996402263641357},{"name": "medical_department","confidence": 0.00014932868361938745},{"name": "goodbye","confidence": 0.00014898570952937007},{"name": "medical_hospital","confidence": 5.417354987002909e-05},{"name": "medicine","confidence": 7.281645139300963e-06}]
}

Next message:
人民医院在哪里
{"text": "人民医院在哪里","intent": {"name": "medical_hospital","confidence": 0.541263997554779},"entities": [],"text_tokens": [[0,2],[2,4],[4,5],[5,7]],"intent_ranking": [{"name": "medical_hospital","confidence": 0.541263997554779},{"name": "medical_department","confidence": 0.2764747440814972},{"name": "greet","confidence": 0.16937503218650818},{"name": "goodbye","confidence": 0.011964843608438969},{"name": "medicine","confidence": 0.0009213921148329973}]
}

稍微添加点 nlu.yml，加了些赞美的例子

version: "3.0"
nlu:- intent: praiseexamples: |- 你真有才华- 你真帅气- 你好棒啊

重新训练 rasa train nlu
测试 rasa shell nlu

Next message:
你很优雅的完成了任务
{"text": "你很优雅的完成了任务","intent": {"name": "praise","confidence": 0.3122938573360443},"entities": [],"text_tokens": [[0,1],[1,2],[2,4],[4,5],[5,7],[7,8],[8,10]],"intent_ranking": [{"name": "praise","confidence": 0.3122938573360443},{"name": "medical_hospital","confidence": 0.24623937904834747},{"name": "goodbye","confidence": 0.20737841725349426},{"name": "medicine","confidence": 0.19506700336933136},{"name": "medical_department","confidence": 0.020120976492762566},{"name": "greet","confidence": 0.018900321796536446}]
}