目录
目录
一、需求
二、ES索引设计
三、页面搜索条件
四、ES的分页搜索DSL语句
五、其他
一、需求
公告列表,需要支持以下搜索
1、根据文本输入,模糊搜索公告标题和公告正文。
2、支持公告类型搜索,单选
3、支持根据公告所在省市区搜索,可多选。如,可以同时选中“浙江”,“广州”,“朝阳区”
4、支持公告包含的行业搜索,行业包括一级行业、二级行业,可多选。
5、支持公告发布的时间搜索,如最近7天,最近一个月,或选定一段日期范围。
7、支持根据公告中涉及到的金额范围搜索,如1万-10万,10万-200万,200万-500万,可选择多个范围
二、ES索引设计
{"aliases":{"announcement_index_read":{},"announcement_index_write":{}},"settings":{"index":{"sort.field":"createTime", //默认根据公告发布时间排序"sort.order":"desc"},"analysis":{"analyzer":{"ik_max_word_ignore_html":{ "type":"custom","char_filter":["html_strip"],"tokenizer":"ik_max_word"},"comma":{"type":"pattern","pattern":","}},"char_filter":{"html_char_filter":{"escaped_tags":[],"type":"html_strip"}}}},"mappings":{"_doc":{"properties":{"id":{"type":"keyword"},"districtCodes":{ //省市区信息,存储省市区的code,如浙江省、杭州市、余杭区,对应存储”100,101,111",支持根据浙江省搜索到该公告,也支持根据区搜索到该公告"type":"text","analyzer":"comma"},"annTypes":{ //公告类型,一个公告可能对应多个类型,多个类型都逗号分隔存储"type":"text","analyzer":"comma"},"announcementId":{"type":"keyword"},"title":{ //公告标题,支持根据标题模糊搜索"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_smart"},"createTime":{ //公告发布时间"type":"date","format":"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"},"amount":{ //公告涉及到的金额"type":"double"},"industryNames":{ //行业名称,包含所有一级行业和二级行业,多个行业用逗号分隔,如“安防,电子电器,数码产品”。支持根据行业名称模糊搜索,支持根据多个行业精确搜索。"type":"text","fields":{"fuzzy":{"type":"text","analyzer":"ik_max_word"},"keyword":{"type":"keyword"}},"analyzer":"comma","search_analyzer":"ik_smart"},"industryNameJson":{ //行业信息,前端页面展示需要嵌套格式的行业信息,纯展示使用,存储一级行业、二级行业嵌套的json格式,如"[{\"firstIndustryName\":\"安防\",\"secondIndustryName\":\"安防设备\"},{\"firstIndustryName\":\"行政办公\",\"secondIndustryName\":\"办公用品\"}]""type":"text"},"content":{ //公告正文内容,内容包含html标签,需要根据输入内容模糊搜索"type":"text","term_vector":"with_positions_offsets","analyzer":"ik_max_word_ignore_html","search_analyzer":"ik_smart"}}}}
}
三、页面搜索条件
搜索条件,公告类型=”招标公告“,行业=”安防和办公“,金额范围为100万到200万和20万到50万,发布日期为2021/03/03到2023/10/12,标题和正文内容匹配”信息化“
{"annType":"招标公告","industry":"安防,办公","amountRange":"100#200,20#50","districtCodes":"330000,650000","createTimeRange":"2021/03/03#2023/10/12","searchValue":"信息化","sort":"desc","curPage":1,"pageSize":10
}
四、ES的分页搜索DSL语句
注:支持根据标题和正文内容搜索,实际是同时匹配title和content两个字段,DSL是把两个字段的查询包装在一个should中,每个字段的匹配operator默认使用or,将operator改成and后,精度变准确了。
{"size":10,"query":{"bool":{"filter":[{"terms":{"annTypes":["招标公告"],"boost":1}},{"terms":{"districtCodes":["330000","650000"],"boost":1}},{"range":{"createTime":{"include_lower":true,"include_upper":false,"from":1614700800000,"boost":1,"to":1697126399000}}},{"bool":{"adjust_pure_negative":true,"should":[{"range":{"amount":{"include_lower":true,"include_upper":true,"from":100000000,"boost":1,"to":200000000}}},{"range":{"amount":{"include_lower":true,"include_upper":true,"from":20000000,"boost":1,"to":50000000}}}],"minimum_should_match":"1","boost":1}},{"bool":{"adjust_pure_negative":true,"should":[{"match":{"industryNames.fuzzy":{"auto_generate_synonyms_phrase_query":true,"query":"信息化","zero_terms_query":"NONE","fuzzy_transpositions":true,"boost":1,"prefix_length":0,"operator":"AND","lenient":false,"max_expansions":50}}},{"match":{"content":{"auto_generate_synonyms_phrase_query":true,"query":"信息化","zero_terms_query":"NONE","fuzzy_transpositions":true,"boost":1,"prefix_length":0,"operator":"AND","lenient":false,"max_expansions":50}}},{"match":{"title":{"auto_generate_synonyms_phrase_query":true,"query":"信息化","zero_terms_query":"NONE","fuzzy_transpositions":true,"boost":1,"prefix_length":0,"operator":"AND","lenient":false,"max_expansions":50}}}],"boost":1}}],"adjust_pure_negative":true,"must":[{"terms":{"industryNames":["安防","办公"],"boost":1}}],"boost":1}},"from":0,"_source":{"exclude":["content"]},"sort":[{"releasedTime":{"order":"desc"}}],"timeout":"20s"
}
五、其他
ES分页搜索,返回条数有限制,最大不能超过1万条,查询总行数会返回正确行数,但超过1万条后返回的数据为空,故,需要在产品设计的时候对前端分页做一下限制,如一页10条数据,最大只展示1000页。