简单的动态带特殊符号敏感词校验
敏感词之前进行了简单了解,使用结巴分词自带词库可以实现,具体参考我的如下博文
敏感词校验
此次在此基础进行了部分优化,优化过程本人简单记录一下,具体优化改造步骤如下所示
1.需求
我们公司需要添加一个敏感词功能,之前进行了简单的调研,使用结巴分词,然后加载自定义分词,即可实现。最近发现公司的敏感词总是需要修改,所以要求实现一个可以随时修改的动态敏感词功能,并且敏感词需要支持常用标点符号的分词,要求既加即用,所以我在原有基础上进行了简单修改,实现如下。
2. 具体实现
主要的简化实现步骤如下:
-
添加结巴分词的pom信息,以及对应的初始化字典信息准备
-
读取配置文件,进行数据信息的初始化,添加字典标点符号支持,加载数据字典信息
项目字典加载顺序是首先从本地resource目录下获取/dict/custom.dict
文件中获取敏感词信息,如果数据库没有初始化,则加载信息到数据库中,如果数据加载完毕则跳过此步骤。之后则生成一个临时文件,添加敏感词数据字典信息设定词频,然后将此文件内容加载到结巴分词的词库中,最后删除文件
custom.dict 文件格式如下:
-
数据字典的增删改查,当数据字典有变动之后,需要重新加载数据字典
-
服务为微服务,当修改单节点服务时,需要通知其他节点服务,字典同步更新
项目的整体目录结构如下所示:
2.1 添加pom
我们使用的spring boot 项目,新增pom信息如下:
<!-- 敏感词的包 -->
<dependency><groupId>com.huaban</groupId><artifactId>jieba-analysis</artifactId><version>1.0.2</version>
</dependency>
2.2 项目初始化加载字典
-
新增对应的字典表结构
DROP TABLE MANAGE.TB_SYS_SENSITIVE_WORDS --TbSysSensitiveWords CREATE TABLE MANAGE.TB_SYS_SENSITIVE_WORDS (ID varchar(32),SENSITIVE_WORD varchar(100),SENSITIVE_EXCHANGE_WORD varchar(100),WORD_FREQUENCY number,WORD_DESC varchar(200),ctime date DEFAULT sysdate,mtime date DEFAULT sysdate,is_del varchar(1) DEFAULT 0,primary key(ID) )COMMENT ON COLUMN MANAGE.TB_SYS_SENSITIVE_WORDS.ID IS '主键'; COMMENT ON COLUMN MANAGE.TB_SYS_SENSITIVE_WORDS.SENSITIVE_WORD IS '敏感词'; COMMENT ON COLUMN MANAGE.TB_SYS_SENSITIVE_WORDS.SENSITIVE_EXCHANGE_WORD IS '特殊符号转义后的敏感词'; COMMENT ON COLUMN MANAGE.TB_SYS_SENSITIVE_WORDS.WORD_FREQUENCY IS '词频,注意如果敏感词不好用,可以加大词频,使其生效,默认50'; COMMENT ON COLUMN MANAGE.TB_SYS_SENSITIVE_WORDS.WORD_DESC IS '词语备注描述'; COMMENT ON TABLE MANAGE.TB_SYS_SENSITIVE_WORDS IS '敏感词词库表';
-
新增初始化执行方法
package cn.git.manage.init;import cn.git.manage.util.CommonAnalyzerUtil; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Component;import javax.annotation.PostConstruct;/*** @description: 数据字典初始化* @program: bank-credit-sy* @author: lixuchun* @create: 2024-12-03*/ @Component public class CommonAnalyzerInit {@Autowiredprivate CommonAnalyzerUtil analyzerUtil;/*** 初始化加载自定义分词词典** idea测试环境可用,linux分词加载自定义字典,需要读取jar包中文件内容,spring boot 打包运行后,无法直接读取,获取Path对象* 所以复制一份临时文件到本地,再加载*/@PostConstructpublic void init() {analyzerUtil.analyzerInit();} }
-
新增 CommonAnalyzerUtil 工具类
具体的加载字典以及特殊标点符号的转义都在此方法中package cn.git.manage.util;import cn.git.common.exception.ServiceException; import cn.git.common.util.LogUtil; import cn.git.common.util.ServerIpUtil; import cn.git.elk.util.NetUtil; import cn.git.manage.entity.TbSysSensitiveWords; import cn.git.manage.mapper.TbSysSensitiveWordsMapper; import cn.hutool.core.util.IdUtil; import cn.hutool.core.util.ObjectUtil; import cn.hutool.core.util.StrUtil; import cn.hutool.http.HttpUtil; import com.huaban.analysis.jieba.WordDictionary; import lombok.extern.slf4j.Slf4j; import org.apache.ibatis.session.ExecutorType; import org.apache.ibatis.session.SqlSession; import org.apache.ibatis.session.SqlSessionFactory; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Component;import java.io.*; import java.nio.charset.StandardCharsets; import java.util.*;/*** @description: 数据字典初通用方法* @program: bank-credit-sy* @author: lixuchun* @create: 2024-12-03*/ @Slf4j @Component public class CommonAnalyzerUtil {@Autowiredprivate TbSysSensitiveWordsMapper sensitiveWordsMapper;/*** 敏感词集合*/public static Set<String> SENSITIVE_WORDS_SET = new HashSet<>();/*** 自定义词典路径*/private static final String DICT_PATH = "/dict/custom.dict";/*** 临时文件名称*/private static final String TEMP_FILE_NAME = "custom_tmp.dict";/*** 系统标识win系统*/private static final String WINDOWS_SYS = "windows";/*** 系统标识属性*/private static final String OS_FLAG = "os.name";/*** 当前项目路径*/private static final String USER_DIR = "user.dir";@Autowiredprivate ServerIpUtil serverIpUtil;@Autowiredprivate SqlSessionFactory sqlSessionFactory;/*** 特殊符号集合*/private static final Set<Character> SPECIAL_SYMBOLS = new HashSet<>();/*** 敏感词替换以及还原MAP*/private static final Map<Character, String> REPLACEMENTS = new HashMap<>();private static final Map<String, Character> REVERSE_REPLACEMENTS = new HashMap<>();/*** 静态代码块,初始化特殊符号集合*/static {// 敏感词特殊符号集合SPECIAL_SYMBOLS.add('|');SPECIAL_SYMBOLS.add('?');SPECIAL_SYMBOLS.add('#');SPECIAL_SYMBOLS.add('?');SPECIAL_SYMBOLS.add('*');SPECIAL_SYMBOLS.add('$');SPECIAL_SYMBOLS.add('^');SPECIAL_SYMBOLS.add('&');SPECIAL_SYMBOLS.add('(');SPECIAL_SYMBOLS.add(')');SPECIAL_SYMBOLS.add('(');SPECIAL_SYMBOLS.add(')');SPECIAL_SYMBOLS.add('{');SPECIAL_SYMBOLS.add('}');SPECIAL_SYMBOLS.add('【');SPECIAL_SYMBOLS.add('】');SPECIAL_SYMBOLS.add('[');SPECIAL_SYMBOLS.add(']');SPECIAL_SYMBOLS.add('"');SPECIAL_SYMBOLS.add('\'');SPECIAL_SYMBOLS.add(';');SPECIAL_SYMBOLS.add(':');SPECIAL_SYMBOLS.add('!');SPECIAL_SYMBOLS.add('!');SPECIAL_SYMBOLS.add(',');SPECIAL_SYMBOLS.add(',');SPECIAL_SYMBOLS.add('.');SPECIAL_SYMBOLS.add('<');SPECIAL_SYMBOLS.add('>');SPECIAL_SYMBOLS.add('《');SPECIAL_SYMBOLS.add('》');SPECIAL_SYMBOLS.add('%');SPECIAL_SYMBOLS.add('@');SPECIAL_SYMBOLS.add('~');SPECIAL_SYMBOLS.add('=');SPECIAL_SYMBOLS.add('_');SPECIAL_SYMBOLS.add(' ');SPECIAL_SYMBOLS.add('\\');SPECIAL_SYMBOLS.add('+');SPECIAL_SYMBOLS.add('-');SPECIAL_SYMBOLS.add('/');// 敏感词替换以及还原MAP初始化REPLACEMENTS.put('|', "竖线");REPLACEMENTS.put('?', "问号");REPLACEMENTS.put('?', "中文问号");REPLACEMENTS.put('!', "中文感叹号");REPLACEMENTS.put('!', "感叹号");REPLACEMENTS.put('*', "星号");REPLACEMENTS.put('$', "美元");REPLACEMENTS.put('^', "尖号");REPLACEMENTS.put('\\', "反斜线");REPLACEMENTS.put('/', "斜线");REPLACEMENTS.put('&', "与");REPLACEMENTS.put('(', "中文左括号");REPLACEMENTS.put(')', "中文右括号");REPLACEMENTS.put('(', "左括号");REPLACEMENTS.put(')', "右括号");REPLACEMENTS.put('{', "左大括号");REPLACEMENTS.put('}', "右大括号");REPLACEMENTS.put('【', "中文左中括号");REPLACEMENTS.put('】', "中文右中括号");REPLACEMENTS.put('[', "左中括号");REPLACEMENTS.put(']', "右中括号");REPLACEMENTS.put('"', "双引号");REPLACEMENTS.put('\'', "单引号");REPLACEMENTS.put(';', "分号");REPLACEMENTS.put(':', "冒号");REPLACEMENTS.put(',', "逗号");REPLACEMENTS.put(',', "中文逗号");REPLACEMENTS.put('.', "点");REPLACEMENTS.put('<', "左尖括号");REPLACEMENTS.put('>', "右尖括号");REPLACEMENTS.put('《', "中文左尖括号");REPLACEMENTS.put('》', "中文右尖括号");REPLACEMENTS.put('%', "百分号");REPLACEMENTS.put('@', "AT");REPLACEMENTS.put('#', "井号");REPLACEMENTS.put('~', "波浪号");REPLACEMENTS.put('=', "等号");REPLACEMENTS.put(' ', "空格");REPLACEMENTS.put('+', "加号");REPLACEMENTS.put('-', "减号");REPLACEMENTS.put('_', "下划线");// 构建逆向映射for (Map.Entry<Character, String> entry : REPLACEMENTS.entrySet()) {REVERSE_REPLACEMENTS.put(entry.getValue(), entry.getKey());}}/*** 初始化敏感词*/public void analyzerInit() {log.info("开始执行analyzerInit!");// 获取全部敏感词信息List<TbSysSensitiveWords> sensitiveWordsList = sensitiveWordsMapper.selectList(null);// 敏感词数据信息判空,如果为空则加载dict文件信息到数据库中if (ObjectUtil.isEmpty(sensitiveWordsList)) {// 读取配置文件流,并且生成一个临时文件InputStream dictInputStream = this.getClass().getResourceAsStream(DICT_PATH);SqlSession session = sqlSessionFactory.openSession(ExecutorType.BATCH);if (dictInputStream != null) {try (InputStream inputStream = dictInputStream;BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream, StandardCharsets.UTF_8))) {// 获取SqlSession,并开启批量执行模式TbSysSensitiveWordsMapper sensitiveWordsMapper = session.getMapper(TbSysSensitiveWordsMapper.class);// 读取文件数据写入到数据库中String line;while ((line = bufferedReader.readLine()) != null) {// 考虑到敏感词中间有空格TbSysSensitiveWords tbSysSensitiveWords = new TbSysSensitiveWords();tbSysSensitiveWords.setId(IdUtil.simpleUUID());// 变成小写字母,并且去除空格信息tbSysSensitiveWords.setSensitiveWord(line.toLowerCase().replaceAll(StrUtil.SPACE, StrUtil.EMPTY));// 当前词频默认设置50tbSysSensitiveWords.setWordFrequency(50);if (checkSpecialSymbol(line)) {tbSysSensitiveWords.setSensitiveExchangeWord(exchangeSensitiveWord(line));}tbSysSensitiveWords.setWordDesc(line);sensitiveWordsMapper.insert(tbSysSensitiveWords);}session.commit();} catch (IOException e) {throw new RuntimeException(e);} finally {// 确保 SqlSession 关闭if (session != null) {session.close();}}} else {throw new ServiceException(StrUtil.format("获取文件[{}]失败,请确认文件存在!", DICT_PATH));}}// 插入之后再次查询数据库中数据信if (ObjectUtil.isEmpty(sensitiveWordsList)) {sensitiveWordsList = sensitiveWordsMapper.selectList(null);}// 在工作目录中生成一个custom.dict临时文件文件,用于加载到自定义字典中String userDir = System.getProperty(USER_DIR);String tempFilePath = userDir.concat(File.separator).concat(TEMP_FILE_NAME);// 判空,开始生成临时文件,并且写入try (BufferedWriter writer = new BufferedWriter(new FileWriter(tempFilePath))) {// 获取敏感词数量int size = sensitiveWordsList.size();// 循环写入临时文件for (int i = 0; i < size; i++) {// 获取单个敏感词数据信息TbSysSensitiveWords sysSensitiveWord = sensitiveWordsList.get(i);// 空值判定if (ObjectUtil.isNull(sysSensitiveWord.getSensitiveWord()) || ObjectUtil.isNull(sysSensitiveWord.getWordFrequency())) {throw new ServiceException("初始化敏感词表部分信息为空,请确认信息是否完整!");}// 大小写转换为小写,去除空格信息String sensitiveExchangeWord = sysSensitiveWord.getSensitiveExchangeWord();// 注意,如果敏感词不生效,可以加大词频,使其生效Integer wordFrequency = sysSensitiveWord.getWordFrequency();// 添加敏感词到敏感词集合中,如果包含特殊符号,则替换为转义后的敏感词信息String line;if (StrUtil.isNotBlank(sensitiveExchangeWord)) {SENSITIVE_WORDS_SET.add(sensitiveExchangeWord);line = sensitiveExchangeWord.concat(StrUtil.SPACE).concat(wordFrequency.toString());} else {SENSITIVE_WORDS_SET.add(sysSensitiveWord.getSensitiveWord());line = sysSensitiveWord.getSensitiveWord().concat(StrUtil.SPACE).concat(wordFrequency.toString());}writer.write(line);// 如果不是最后一个元素,则添加换行符if (i < size - 1) {writer.newLine();}}} catch (IOException e) {// 异常信息打印String errorMessage = LogUtil.getStackTraceInfo(e);throw new ServiceException(StrUtil.format("自定义词典文件写入异常,异常信息为:{}", errorMessage));}// 删除临时文件File dictTempFile = new File(tempFilePath);if (dictTempFile.exists()) {log.info("开始加载敏感词信息!");// 加载自定义的词典进词库WordDictionary.getInstance().loadUserDict(dictTempFile.toPath());log.info("加载敏感词信息完毕!");// 删除临时文件boolean delete = dictTempFile.delete();if (delete) {log.info("删除临时文件成功!");} else {log.info("删除临时文件失败!");}} else {throw new ServiceException("自定义词典文件不存在,请检查确认!");}}/*** 重置分词器*/public void sendResetSignal() {// 获取所有服务ipList<String> ipList = serverIpUtil.getServerIpListByName("management-server");// 去除本机服务ipString localIp = NetUtil.getLocalIp();if (StrUtil.isNotBlank(localIp)) {ipList.remove(localIp);}log.info("发送重置信号到服务ip:{},去除本机ip为[{}]",String.join(StrUtil.COMMA, ipList), localIp);// 循环发送请求到management服务ipList.forEach(ip -> {new Thread(() -> {// 请求路径信息String uri = "http://".concat(ip).concat(":").concat("11102").concat("/manage/analyzer/reset");HttpUtil.get(uri, 30000);}).start();});}/*** 判断字符串中是否包含特殊符号** @return*/public boolean checkSpecialSymbol(String content) {// 参数校验if (StrUtil.isBlank(content)) {throw new ServiceException("校验字符串是否包含特殊符号参数为空,请检查参数是否正确!");}// 循环判断是否包含特殊符号for (char symbol : SPECIAL_SYMBOLS) {if (content.indexOf(symbol) != -1) {return true;}}return false;}/*** 敏感词替换** @return*/public String exchangeSensitiveWord(String content) {// 遍历字符串,替换特殊字符StringBuilder result = new StringBuilder();for (char c : content.toCharArray()) {if (REPLACEMENTS.containsKey(c)) {result.append(REPLACEMENTS.get(c));} else {result.append(c);}}return result.toString();}/*** 敏感词还原** @param content 替换后的内容* @return 还原后的内容*/public String restoreSensitiveWord(String content) {StringBuilder result = new StringBuilder();int i = 0;// 遍历替换后的字符串,还原敏感词while (i < content.length()) {// 尝试找到匹配的替换字符串char currentChar = content.charAt(i);boolean foundReplacement = false;// 遍历替换字符串,找到匹配项for (Map.Entry<String, Character> entry : REVERSE_REPLACEMENTS.entrySet()) {// 获取替换字符串和替换字符String replacement = entry.getKey();// 判断是否以替换字符串开头if (content.startsWith(replacement, i)) {// 添加替换字符到结果中,并更新索引result.append(entry.getValue());i += replacement.length();foundReplacement = true;break;}}// 如果没有找到匹配项,则添加当前字符到结果中,并更新索引if (!foundReplacement) {result.append(currentChar);i++;}}return result.toString();} }
2.3 数据字典的增删改查
增删改查部分就是基础的表处理,主要添加controller,service信息,还有敏感词的校验analyzerCheck
方法,具体添加的实现类如下:
-
AnalyzerController
package cn.git.manage.controller;import cn.git.common.result.Result; import cn.git.manage.dto.CommonAnalyzerDTO; import cn.git.manage.service.AnalyzerService; import cn.git.manage.vo.analyzer.AnalyzerAddInVO; import cn.git.manage.vo.analyzer.AnalyzerCheckInVO; import cn.git.manage.vo.analyzer.AnalyzerPageInVO; import cn.git.manage.vo.analyzer.AnalyzerPageOutVO; import io.swagger.annotations.*; import lombok.extern.slf4j.Slf4j; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.web.bind.annotation.*;import javax.validation.Valid;/*** @description: 敏感词操作controller* @program: bank-credit-sy* @author: lixuchun* @create: 2024-12-03*/ @Slf4j @Api(tags = "系统管理=>系统管理=>敏感词管理") @RestController @RequestMapping("/manage") public class AnalyzerController {@Autowiredprivate AnalyzerService analyzerService;/*** 添加敏感词* @param analyzerAddInVO* @return*/@ApiOperation(value = "添加敏感词",notes = "添加敏感词")@ApiResponses({@ApiResponse(code = 1, message = "OK", response = Result.class),@ApiResponse(code = -1, message = "error", response = Result.class)})@PostMapping("/analyzer/add")public Result<String> addSensitiveWords(@ApiParam(name = "analyzerAddInVO", value = "敏感词分词器分词inVO", required = true)@RequestBody @Valid AnalyzerAddInVO analyzerAddInVO){// 参数转换CommonAnalyzerDTO commonAnalyzerDTO = new CommonAnalyzerDTO();commonAnalyzerDTO.setSensitiveWord(analyzerAddInVO.getSensitiveWord());commonAnalyzerDTO.setWordFrequency(analyzerAddInVO.getWordFrequency());commonAnalyzerDTO.setWordDesc(analyzerAddInVO.getWordDesc());analyzerService.addSensitiveWords(commonAnalyzerDTO);return Result.ok("数据修改成功,数据字典加载需要半分钟,敏感词半分钟后生效!");}/*** 删除敏感词* @param id* @return*/@ApiOperation(value = "删除敏感词",notes = "删除敏感词")@ApiResponses({@ApiResponse(code = 1, message = "OK", response = Result.class),@ApiResponse(code = -1, message = "error", response = Result.class)})@GetMapping("/analyzer/delete/{id}")public Result<String> deleteSensitiveWords(@PathVariable("id") String id){analyzerService.deleteSensitiveWordById(id);return Result.ok("删除成功!");}/*** 分页查询敏感词** @param analyzerPageInVO* @return*/@ApiOperation(value = "分页查询敏感词",notes = "分页查询敏感词")@ApiResponses({@ApiResponse(code = 1, message = "OK", response = Result.class),@ApiResponse(code = -1, message = "error", response = Result.class)})@PostMapping("/analyzer/page")public Result<AnalyzerPageOutVO> getAnalyzerPageBean(@ApiParam(name = "commonAnalyzerDTO", value = "分页查询敏感词inVO", required = true)@RequestBody AnalyzerPageInVO analyzerPageInVO){// 传递参数转换CommonAnalyzerDTO commonAnalyzerDTO = new CommonAnalyzerDTO();commonAnalyzerDTO.setSensitiveWord(analyzerPageInVO.getSensitiveWord());commonAnalyzerDTO.setWordDesc(analyzerPageInVO.getWordDesc());// 分页查询CommonAnalyzerDTO analyzerPageBean = analyzerService.getAnalyzerPageBean(commonAnalyzerDTO);AnalyzerPageOutVO outVO = new AnalyzerPageOutVO();outVO.setPageBean(analyzerPageBean.getPageBean());return Result.ok(outVO);}/*** 敏感词校验** @param analyzerCheckInVO* @return*/@ApiOperation(value = "敏感词校验",notes = "敏感词校验")@ApiResponses({@ApiResponse(code = 1, message = "OK", response = Result.class),@ApiResponse(code = -1, message = "error", response = Result.class)})@PostMapping("/analyzer/check")public Result analyzerCheck(@RequestBody AnalyzerCheckInVO analyzerCheckInVO) {analyzerService.checkAnalyzer(analyzerCheckInVO);return Result.ok("校验通过!");}/*** 重置分词** @return*/@ApiOperation(value = "重置分词",notes = "重置分词")@GetMapping("/analyzer/reset")public Result<String> reset(){analyzerService.resetAnalyzer();return Result.ok("重置成功!");}}
-
AnalyzerService
package cn.git.manage.service;import cn.git.manage.dto.CommonAnalyzerDTO; import cn.git.manage.vo.analyzer.AnalyzerCheckInVO;/*** @description: 敏感词操作service* @program: bank-credit-sy* @author: lixuchun* @create: 2024-12-03*/ public interface AnalyzerService {/*** 添加敏感词* @param commonAnalyzerDTO*/void addSensitiveWords(CommonAnalyzerDTO commonAnalyzerDTO);/*** 根据id删除敏感词* @param id*/void deleteSensitiveWordById(String id);/*** 分页查询敏感词* @param commonAnalyzerDTO* @return*/CommonAnalyzerDTO getAnalyzerPageBean(CommonAnalyzerDTO commonAnalyzerDTO);/*** 敏感词校验** @param analyzerCheckInVO*/void checkAnalyzer(AnalyzerCheckInVO analyzerCheckInVO);/*** 重置分词器*/void resetAnalyzer();}
-
AnalyzerServiceImpl
package cn.git.manage.service.impl;import cn.git.common.exception.ServiceException; import cn.git.common.page.CustomPageUtil; import cn.git.common.page.PageBean; import cn.git.common.page.PaginationContext; import cn.git.manage.dto.CommonAnalyzerDTO; import cn.git.manage.entity.TbSysSensitiveWords; import cn.git.manage.mapper.TbSysSensitiveWordsMapper; import cn.git.manage.service.AnalyzerService; import cn.git.manage.util.CommonAnalyzerUtil; import cn.git.manage.vo.analyzer.AnalyzerCheckEntity; import cn.git.manage.vo.analyzer.AnalyzerCheckInVO; import cn.hutool.core.text.StrBuilder; import cn.hutool.core.util.IdUtil; import cn.hutool.core.util.StrUtil; import com.alibaba.fastjson.JSONObject; import com.baomidou.mybatisplus.core.conditions.query.QueryWrapper; import com.huaban.analysis.jieba.JiebaSegmenter; import com.huaban.analysis.jieba.SegToken; import lombok.extern.slf4j.Slf4j; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Service;import java.util.List; import java.util.stream.Collectors;/*** @description: 敏感词操作service 实现类* @program: bank-credit-sy* @author: lixuchun* @create: 2024-12-03*/ @Slf4j @Service public class AnalyzerServiceImpl implements AnalyzerService {@Autowiredprivate TbSysSensitiveWordsMapper sensitiveWordsMapper;@Autowiredprivate CommonAnalyzerUtil commonAnalyzerUtil;@Autowiredprivate CustomPageUtil customPageUtil;/*** 敏感词校验** @param analyzerCheckInVO*/@Overridepublic void checkAnalyzer(AnalyzerCheckInVO analyzerCheckInVO) {// 获取敏感词信息,校验传递参数是否正确List<AnalyzerCheckEntity> checkEntityList = analyzerCheckInVO.getCheckEntityList();// 创建分词对象JiebaSegmenter jiebaSegmenter = new JiebaSegmenter();// 开始校验StrBuilder errorMessageBuilder = new StrBuilder();for (AnalyzerCheckEntity entity: checkEntityList) {// 转换为小写字符串String lowerCaseStr = entity.getCheckContent().replace(StrUtil.SPACE, StrUtil.EMPTY).toLowerCase();// 判定是否有特殊字符串,如果有则进行字符串转换boolean ifSpecialSymbol = commonAnalyzerUtil.checkSpecialSymbol(lowerCaseStr);if (ifSpecialSymbol) {lowerCaseStr = commonAnalyzerUtil.exchangeSensitiveWord(lowerCaseStr);}// 进行分词List<SegToken> segTokenList = jiebaSegmenter.process(lowerCaseStr, JiebaSegmenter.SegMode.INDEX);log.info("[{}]分词结果为 : {}", entity.getCheckContentDesc(),JSONObject.toJSONString(segTokenList.stream().map(word -> word.word).collect(Collectors.toList())));String uncheckWord = "";for (SegToken segToken : segTokenList) {if (CommonAnalyzerUtil.SENSITIVE_WORDS_SET.contains(segToken.word)) {// 重新转义成员字符串信息if (ifSpecialSymbol) {uncheckWord = uncheckWord.concat(commonAnalyzerUtil.restoreSensitiveWord(segToken.word)).concat(StrUtil.COMMA);} else {uncheckWord = uncheckWord.concat(segToken.word).concat(StrUtil.COMMA);}}}if (StrUtil.isNotBlank(uncheckWord)) {// uncheckWord去除最后一个逗号uncheckWord = uncheckWord.substring(0, uncheckWord.length() - 1);String errorMessage = StrUtil.format("[{}]包含敏感词[{}]\n", entity.getCheckContentDesc(), uncheckWord);errorMessageBuilder.append(errorMessage);}}// 校验结果展示if (StrUtil.isNotBlank(errorMessageBuilder.toString())) {throw new ServiceException(errorMessageBuilder.toString());}}/*** 根据id删除敏感词** @param id*/@Overridepublic void deleteSensitiveWordById(String id) {// 删除敏感词int delNum = sensitiveWordsMapper.deleteById(id);if (delNum > 0) {log.info("通过id[{}]删除敏感词成功,重新加载敏感词信息", id);new Thread(() -> {commonAnalyzerUtil.analyzerInit();commonAnalyzerUtil.sendResetSignal();}).start();}}/*** 添加敏感词** @param commonAnalyzerDTO*/@Overridepublic void addSensitiveWords(CommonAnalyzerDTO commonAnalyzerDTO) {// 校验是否包含特殊符号敏感词boolean ifSpecialSymbol = commonAnalyzerUtil.checkSpecialSymbol(commonAnalyzerDTO.getSensitiveWord());// 设置敏感词信息并且添加TbSysSensitiveWords tbSysSensitiveWords = new TbSysSensitiveWords();tbSysSensitiveWords.setSensitiveWord(commonAnalyzerDTO.getSensitiveWord());tbSysSensitiveWords.setWordFrequency(commonAnalyzerDTO.getWordFrequency());// 判定是否有特殊符号if (ifSpecialSymbol) {// 如果特殊符号则进行转义tbSysSensitiveWords.setSensitiveExchangeWord(commonAnalyzerUtil.exchangeSensitiveWord(commonAnalyzerDTO.getSensitiveWord()));}tbSysSensitiveWords.setWordDesc(commonAnalyzerDTO.getWordDesc());tbSysSensitiveWords.setId(IdUtil.simpleUUID());int insertNum = sensitiveWordsMapper.insert(tbSysSensitiveWords);if (insertNum > 0) {// 开启新线程,异步加载敏感词new Thread(() -> {// 重新加载敏感词commonAnalyzerUtil.analyzerInit();// 发送请求,重置敏感词commonAnalyzerUtil.sendResetSignal();}).start();}}/*** 分页查询敏感词** @param commonAnalyzerDTO* @return*/@Overridepublic CommonAnalyzerDTO getAnalyzerPageBean(CommonAnalyzerDTO commonAnalyzerDTO) {// 条件查询QueryWrapper<TbSysSensitiveWords> wrapper = new QueryWrapper<>();wrapper.lambda().like(StrUtil.isNotBlank(commonAnalyzerDTO.getSensitiveWord()),TbSysSensitiveWords::getSensitiveWord, commonAnalyzerDTO.getSensitiveWord()).like(StrUtil.isNotBlank(commonAnalyzerDTO.getWordDesc()),TbSysSensitiveWords::getWordDesc, commonAnalyzerDTO.getWordDesc());// 查询list信息List<TbSysSensitiveWords> sensitiveWordsList = sensitiveWordsMapper.selectList(wrapper);// 进行分页处理PageBean<TbSysSensitiveWords> pageBean = customPageUtil.setFlowListPage(sensitiveWordsList,PaginationContext.getPageNum(), PaginationContext.getPageSize());commonAnalyzerDTO.setPageBean(pageBean);return commonAnalyzerDTO;}/*** 重置分词器*/@Overridepublic void resetAnalyzer() {commonAnalyzerUtil.analyzerInit();} }
-
参数传递的 VO,DTO,以及数据库实体类
AnalyzerAddInVOpackage cn.git.manage.vo.analyzer;import io.swagger.annotations.ApiModel; import io.swagger.annotations.ApiModelProperty; import lombok.AllArgsConstructor; import lombok.Data; import lombok.NoArgsConstructor;import javax.validation.constraints.NotBlank; import javax.validation.constraints.NotNull;/*** @description: 敏感词分词器分词inVO* @program: bank-credit-sy* @author: lixuchun* @create: 2024-12-03*/ @Data @NoArgsConstructor @AllArgsConstructor @ApiModel(value = "AnalyzerAddInVO",description = "敏感词分词器分词inVO") public class AnalyzerAddInVO {@NotBlank(message = "敏感词不能为空")@ApiModelProperty(value = "敏感词,必填")private String sensitiveWord;@NotNull(message = "词频不能为空")@ApiModelProperty(value = "词频,必填")private Integer wordFrequency;@ApiModelProperty(value = "备注词语描述,非必填")private String wordDesc; }
AnalyzerCheckEntity
package cn.git.manage.vo.analyzer;import io.swagger.annotations.ApiModel; import io.swagger.annotations.ApiModelProperty; import lombok.AllArgsConstructor; import lombok.Data; import lombok.NoArgsConstructor;import javax.validation.constraints.NotBlank;/*** @description: 校验实体* @program: bank-credit-sy* @author: lixuchun* @create: 2024-12-03*/ @Data @NoArgsConstructor @AllArgsConstructor @ApiModel(value = "AnalyzerCheckEntity",description = "敏感词校验实体对象") public class AnalyzerCheckEntity {@NotBlank(message = "校验字段描述信息!")@ApiModelProperty(value = "校验字段描述信息,eg: 贷款用途")private String checkContentDesc;@NotBlank(message = "校验字段详情不能为空!")@ApiModelProperty(value = "校验字段详情")private String checkContent;}
AnalyzerCheckInVO
package cn.git.manage.vo.analyzer;import io.swagger.annotations.ApiModel; import io.swagger.annotations.ApiModelProperty; import lombok.AllArgsConstructor; import lombok.Data; import lombok.NoArgsConstructor;import javax.validation.constraints.NotBlank; import javax.validation.constraints.NotNull; import java.util.List;/*** @description: 敏感词分词器分词inVO* @program: bank-credit-sy* @author: lixuchun* @create: 2024-12-03*/ @Data @NoArgsConstructor @AllArgsConstructor @ApiModel(value = "AnalyzerCheckInVO",description = "敏感词分词器分词inVO") public class AnalyzerCheckInVO {/*** 校验字段列表*/@NotNull(message = "校验字段列表不能为空")private List<AnalyzerCheckEntity> checkEntityList;}
AnalyzerPageInVO
package cn.git.manage.vo.analyzer;import io.swagger.annotations.ApiModel; import io.swagger.annotations.ApiModelProperty; import lombok.AllArgsConstructor; import lombok.Data; import lombok.NoArgsConstructor;/*** @description: 敏感词page查询inVO* @program: bank-credit-sy* @author: lixuchun* @create: 2024-12-03*/ @Data @NoArgsConstructor @AllArgsConstructor @ApiModel(value = "AnalyzerPageInVO",description = "敏感词page查询inVO") public class AnalyzerPageInVO {@ApiModelProperty(value = "敏感词")private String sensitiveWord;@ApiModelProperty(value = "备注词语描述")private String wordDesc;}
AnalyzerPageOutVO
package cn.git.manage.vo.analyzer;import cn.git.common.page.PageBean; import cn.git.manage.entity.TbSysSensitiveWords; import io.swagger.annotations.ApiModel; import io.swagger.annotations.ApiModelProperty; import lombok.AllArgsConstructor; import lombok.Data; import lombok.NoArgsConstructor;import javax.validation.constraints.NotBlank;/*** @description: 敏感词分词器分词inVO* @program: bank-credit-sy* @author: lixuchun* @create: 2024-12-03*/ @Data @NoArgsConstructor @AllArgsConstructor @ApiModel(value = "AnalyzerAddInVO",description = "敏感词分词器分词inVO") public class AnalyzerPageOutVO {/*** 分页数据*/PageBean<TbSysSensitiveWords> pageBean;}
数据库实体
TbSysSensitiveWords
package cn.git.manage.entity;import com.baomidou.mybatisplus.annotation.*; import lombok.Data; import lombok.EqualsAndHashCode; import lombok.experimental.Accessors;import java.util.Date;/*** @description: 敏感词词库表* @program: bank-credit-sy* @author: lixuchun* @create: 2024-12-03*/ @Data @EqualsAndHashCode(callSuper = false) @Accessors(chain = true) @TableName("TB_SYS_SENSITIVE_WORDS") public class TbSysSensitiveWords {/*** 主键id*/@TableId(value = "ID", type = IdType.ASSIGN_ID)private String id;/*** 敏感词*/@TableField("SENSITIVE_WORD")private String sensitiveWord;/*** 敏感词替换词*/@TableField("SENSITIVE_EXCHANGE_WORD")private String sensitiveExchangeWord;/*** 词频*/@TableField("WORD_FREQUENCY")private Integer wordFrequency;/*** 词语备注描述*/@TableField("WORD_DESC")private String wordDesc;/*** 创建日期*/@TableField(value = "CTIME", fill = FieldFill.INSERT)private Date ctime;/*** 更新日期*/@TableField(value = "MTIME", fill = FieldFill.UPDATE)private Date mtime;/*** 删除标识*/@TableField(value = "IS_DEL")private String isDel;}
2.4 变化通知其他服务
-
ServerIpUtil
我们的项目是一个微服务,每个子模块都有多个节点,我们修改单一节点后需要通知其他节点也进行修改,所以需要获取注册中心的其他模块节点ip
,当敏感词有变动的时候,需要通知其他节点进行更新,所以提供了一个ip
获取的工具类。在修改以及删除的方法中,新增了通知外围服务的方法,并且调用前需要从ip
中删除本机ip
,以免重复调用。package cn.git.common.util;import cn.git.common.exception.ServiceException; import cn.hutool.core.util.ObjectUtil; import cn.hutool.core.util.StrUtil; import com.alibaba.cloud.nacos.NacosDiscoveryProperties; import com.alibaba.cloud.nacos.NacosServiceManager; import com.alibaba.nacos.api.exception.NacosException; import com.alibaba.nacos.api.naming.NamingService; import com.alibaba.nacos.api.naming.pojo.Instance; import lombok.extern.slf4j.Slf4j; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.scheduling.annotation.EnableScheduling; import org.springframework.scheduling.annotation.Scheduled; import org.springframework.stereotype.Component;import java.util.Arrays; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.concurrent.atomic.AtomicInteger; import java.util.stream.Collectors;/*** 通过serverName获取当前可用服务ip* @program: bank-credit-sy* @author: lixuchun* @create: 2022-05-18*/ @Slf4j @Component @EnableScheduling public class ServerIpUtil {/*** 服务列表信息*/private static Map<String, List<String>> convertServerListMap = new HashMap<>();@Autowiredprivate NacosDiscoveryProperties discoveryProperties;@Autowiredprivate NacosServiceManager nacosServiceManager;/*** serverName数组*/private static final String[] SERVER_NAME_ARR = {"uaa-server", "converter-server"};/*** 获取服务请求数,到限定数值则重置*/private static final Integer MAX_REQ_COUNT = 1000;/*** 服务最小个数*/private static final Integer INIT_SERVER_SIZE = 1;/*** 服务器轮询count参数*/private AtomicInteger serversCount = new AtomicInteger(0);/*** 通过服务名称获取服务可用ip* @return 服务ip*/public String getIpFromServerName(String serverName) {// 通过nacos注册服务名称获取ip端口信息列表,从中选择进行发送String convertServerIp = null;try {// 服务列表信息List<Instance> convertServerList;if (ObjectUtil.isEmpty(convertServerListMap.get(serverName))) {log.info("获取全部nacos对应[{}]namingService!", serverName);NamingService configService = nacosServiceManager.getNamingService(discoveryProperties.getNacosProperties());// 获取服务列表ip 地址convertServerList = configService.getAllInstances(serverName);log.info("获取服务nacos对应[{}]服务全部在线服务列表信息成功!", serverName);if (ObjectUtil.isNotEmpty(convertServerList)) {List<String> serverIpList = convertServerList.stream().map(Instance::getIp).collect(Collectors.toList());convertServerListMap.put(serverName, serverIpList);}}// 筛选ip信息if (ObjectUtil.isNotEmpty(convertServerListMap.get(serverName))) {Integer selectServerIndex = serversCount.incrementAndGet() % convertServerListMap.get(serverName).size();convertServerIp = convertServerListMap.get(serverName).get(selectServerIndex);}// 1000一个循环if (serversCount.get() == MAX_REQ_COUNT) {serversCount.set(0);}} catch (NacosException e) {log.error("通过服务名称[{}]获取服务ip失败!", serverName);e.printStackTrace();}if (StrUtil.isBlank(convertServerIp)) {throw new RuntimeException(StrUtil.format("通过服务名称[{}]获取服务ip失败!", serverName));}log.info("通过服务名称[{}]成功获取服务ip[{}]地址!", serverName, convertServerIp);return convertServerIp;}/*** 通过服务名称获取服务可用ip列表* @return 服务ip*/public List<String> getServerIpListByName(String serverName) {try {// 服务列表信息List<Instance> convertServerList;NamingService configService = nacosServiceManager.getNamingService(discoveryProperties.getNacosProperties());// 获取服务列表ip 地址convertServerList = configService.getAllInstances(serverName);if (ObjectUtil.isNotEmpty(convertServerList)) {return convertServerList.stream().map(Instance::getIp).collect(Collectors.toList());} else {throw new ServiceException(StrUtil.format("通过服务名称[{}]获取服务ip为空!", serverName));}} catch (NacosException e) {throw new ServiceException(StrUtil.format("通过服务名称[{}]获取服务ip失败!", serverName));}}/*** 定时任务,通过服务名称获取服务信息* 7点-23点 每20分钟一次*/@Scheduled(cron = "0 0/20 7-23 * * ?")public void setServerInfoMap() {Arrays.stream(SERVER_NAME_ARR).forEach(serverName -> {if (ObjectUtil.isEmpty(convertServerListMap.get(serverName))) {try {NamingService configService = nacosServiceManager.getNamingService(discoveryProperties.getNacosProperties());// 获取服务列表ip 地址List<Instance> convertServerList = configService.getAllInstances(serverName);if (ObjectUtil.isNotEmpty(convertServerList)) {List<String> serverIpList = convertServerList.stream().map(Instance::getIp).collect(Collectors.toList());log.info("获取服务[{}]服务serverInfo信息成功,当前服务共[{}]个节点!", serverName, serverIpList.size());convertServerListMap.put(serverName, serverIpList);} else {log.info("获取服务[{}]服务serverInfo信息失败!", serverName);}} catch (NacosException e) {e.printStackTrace();}}});} }
3. 测试
以上步骤执行完毕后,便可以进行测试了,启动本地服务,设定token信息,访问服务swagger测试页面
从后台的敏感词中随机找到一个 巫淫新骚伊宁市
,进行校验
校验接口以及参数如下:
校验结果如下:
在使用添加接口新增一个自定义敏感词,包含特殊符号(你\好|赵老四%,gaga
),并且加入词典:
添加成功,提示信息如下:
再次执行校验,使用新增敏感词 你\好|赵老四%,gaga
执行结果如下所示,发现已经可以识别此敏感词了:
至此敏感词优化结束