笔记为自我总结整理的学习笔记,若有错误欢迎指出哟~
【doccano】文本标注工具——属性级情感分析标注自己的业务数据
- 1.说明
- 2.前提条件
- 3.doccano创建项目
- 4.添加数据集
- 5.添加标签
- 6.标注数据
- 7.导出数据转换格式
1.说明
2.前提条件
确保doccano已经安装完成
可以参考文章:
【doccano】文本标注工具——安装运行教程
3.doccano创建项目
选择序列标注
在标注文本时允许标注的区间出现重叠
勾选allow overlapping spans
在文本中标注实体之间的关系
勾选use relation labeling
4.添加数据集
数据集格式为txt文本
每行一条评论
选择textline,导入
导入完成
5.添加标签
或者导入自定义标签
[{"text": "体验:1","background_color": "#FF0000","text_color": "#ffffff"},{"text": "体验:-1","background_color": "#FF0000","text_color": "#ffffff"},{"text": "设计:1","background_color": "#00FF00","text_color": "#000000"},{"text": "设计:-1","background_color": "#00FF00","text_color": "#000000"},{"text": "电池:1","background_color": "#0000FF","text_color": "#ffffff"},{"text": "电池:-1","background_color": "#0000FF","text_color": "#ffffff"},{"text": "性能:1","background_color": "#FFFF00","text_color": "#000000"},{"text": "性能:-1","background_color": "#FFFF00","text_color": "#000000"},{"text": "摄像:1","background_color": "#FF00FF","text_color": "#ffffff"},{"text": "摄像:-1","background_color": "#FF00FF","text_color": "#ffffff"},{"text": "通信:1","background_color": "#00FFFF","text_color": "#000000"},{"text": "通信:-1","background_color": "#00FFFF","text_color": "#000000"},
]
6.标注数据
7.导出数据转换格式
导出标注数据为jsonl格式,改后缀名为json格式
转为txt格式
import json# 读取JSON文件并处理每条数据
with open('admin.json', 'r', encoding='utf-8') as file:lines = file.readlines()for line in lines:data = json.loads(line)# 处理每条数据并写入txt文件id = data['id']text = data['text']label = data['label']with open('output.txt', 'a', encoding='utf-8') as output_file:for lbl in label:start = lbl[0]end = lbl[1]category = lbl[2].split(":")[0] # 获取类别名称tag = lbl[2].split(":")[1] # 获取类别标签output_file.write(f"{tag}\t{category}#{text[start:end]}\t{text}\n")
输出格式: