elasticsearch5.x:查询建议介绍、Suggester 介绍
参考:http://www.cnblogs.com/leeSmall/p/9206646.html
参考(重点):https://elasticsearch.cn/article/142
参考(官网):https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-completion.html
一、查询建议介绍
1. 查询建议是什么?
查询建议,为用户提供良好的使用体验。主要包括: 拼写检查; 自动建议查询词(自动补全)
拼写检查如图:
自动建议查询词(自动补全):
2. ES中查询建议的API
查询建议也是使用_search端点地址。在DSL中suggest节点来定义需要的建议查询
示例1:定义单个建议查询词
POST twitter/_search
{"query" : {"match": {"message": "tring out Elasticsearch"}},"suggest" : { <!-- 定义建议查询 -->"my-suggestion" : { <!-- 一个建议查询名 -->"text" : "tring out Elasticsearch", <!-- 查询文本 -->"term" : { <!-- 使用词项建议器 -->"field" : "message" <!-- 指定在哪个字段上获取建议词 -->}}}
}PUT index
{"mappings":{"completion":{"properties":{"title": {"type": "text","analyzer": "ik_smart"},"title_suggest": {"type": "completion","analyzer": "ik_smart","search_analyzer": "ik_smart"}}}}}
示例2:定义多个建议查询词
POST _search {"suggest": {"my-suggest-1" : {"text" : "tring out Elasticsearch","term" : {"field" : "message"}},"my-suggest-2" : {"text" : "kmichy","term" : {"field" : "user"}}} }
示例3:多个建议查询可以使用全局的查询文本
POST _search {"suggest": {"text" : "tring out Elasticsearch","my-suggest-1" : {"term" : {"field" : "message"}},"my-suggest-2" : {"term" : {"field" : "user"}}} }
二、Suggester 介绍
1. Term suggester
term 词项建议器,对给入的文本进行分词,为每个词进行模糊查询提供词项建议。对于在索引中存在词默认不提供建议词,不存在的词则根据模糊查询结果进行排序后取一定数量的建议词。
常用的建议选项:
示例1:
POST twitter/_search
{"query" : {"match": {"message": "tring out Elasticsearch"}},"suggest" : { <!-- 定义建议查询 -->"my-suggestion" : { <!-- 一个建议查询名 -->"text" : "tring out Elasticsearch", <!-- 查询文本 -->"term" : { <!-- 使用词项建议器 -->"field" : "message" <!-- 指定在哪个字段上获取建议词 -->}}}
}
2. phrase suggester
phrase 短语建议,在term的基础上,会考量多个term之间的关系,比如是否同时出现在索引的原文里,相邻程度,以及词频等
示例
POST twitter/_search {"query" : {"match": {"message": "tring out Elasticsearch"}},"suggest" : {"my-suggestion" : {"text" : "tring out Elasticsearch","phrase" : {"field" : "message"}}} }
结果:
{"took": 30,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 2,"max_score": 1.113083,"hits": [{"_index": "twitter","_type": "tweet","_id": "4","_score": 1.113083,"_source": {"user": "kimchy","postDate": "2018-07-23T07:29:57.653Z","message": "trying out Elasticsearch"}},{"_index": "twitter","_type": "tweet","_id": "7","_score": 0.98382175,"_source": {"user": "yuchen20","postDate": "2018-07-23T08:12:05.604Z","message": "trying out Elasticsearch"}}]},"suggest": { <!-- 建议-->"my-suggestion": [{"text": "tring out Elasticsearch","offset": 0,"length": 23,"options": [{{"text": "trying out elasticsearch","score": 0.5118434}]}]}
}
3. Completion suggester 自动补全
针对自动补全场景而设计的建议器。此场景下用户每输入一个字符的时候,就需要即时发送一次查询请求到后端查找匹配项,在用户输入速度较高的情况下对后端响应速度要求比较苛刻。因此实现上它和前面两个Suggester采用了不同的数据结构,索引并非通过倒排来完成,而是将analyze过的数据编码成FST和索引一起存放。对于一个open状态的索引,FST会被ES整个装载到内存里的,进行前缀查找速度极快。但是FST只能用于前缀查找,这也是Completion Suggester的局限所在。
官网链接:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-completion.html
示例1:
为了使用自动补全,索引中用来提供补全建议的字段需特殊设计,字段类型为 completion。 先设置mapping:
PUT index/ {"mappings":{"completion":{"properties":{"title": {"type": "text","analyzer": "ik_smart"},"title_suggest": {"type": "completion","analyzer": "ik_smart","search_analyzer": "ik_smart"}}}}}
重点是title_suggest,这个字段就是之后我们搜索补全的字段,需要设置type为completion,analyzer按情况设置分析器
索引数据:
POST /index/completion/_bulk { "index" : { } } { "title": "背景天安门广场大学", "title_suggest": "背景天安门广场大学"} { "index" : { } } { "title": "北京天安门","title_suggest": "北京天安门"} { "index" : { } } { "title": "北京鸟巢","title_suggest": "北京鸟巢"} { "index" : { } } { "title": "奥林匹克公园","title_suggest": "奥林匹克公园"} { "index" : { } } { "title": "奥林匹克森林公园","title_suggest": "奥林匹克森林公园"} { "index" : { } } { "title": "北京奥林匹克公园","title_suggest": "北京奥林匹克公园"} { "index" : { } } { "title": "北京奥林匹克公园","title_suggest": {"input": "我爱中国","weight": 100}}
索引的时候可以对suggest字段,增加weight增加排序权重
搜索补全:
POST /index/completion/_search {"size": 0,"suggest":{"blog-suggest":{"prefix":"北京","completion":{"field":"title_suggest"}}} }
结果:
{"took": 3,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 0,"max_score": 0,"hits": []},"suggest": {"blog-suggest": [{"text": "北京","offset": 0,"length": 2,"options": [{"text": "北京天安门","_index": "index","_type": "completion","_id": "AWSRo_hn9K_aupETR6FR","_score": 1,"_source": {"title": "北京天安门","title_suggest": "北京天安门"}},{"text": "北京奥林匹克公园","_index": "index","_type": "completion","_id": "AWSRo_hn9K_aupETR6FV","_score": 1,"_source": {"title": "北京奥林匹克公园","title_suggest": "北京奥林匹克公园"}},{"text": "北京鸟巢","_index": "index","_type": "completion","_id": "AWSRo_hn9K_aupETR6FS","_score": 1,"_source": {"title": "北京鸟巢","title_suggest": "北京鸟巢"}}]}]} }
示例2:
创建映射
PUT music
{"mappings": {"docc" : {"properties" : {"suggest" : {"type" : "completion"},"title" : {"type": "keyword"}}}}
}
Input 指定输入词 Weight 指定排序值(可选)
PUT music/docc/1?refresh
{"suggest" : {"input": [ "Nevermind", "Nirvana" ],"weight" : 34}
}
指定不同的排序值:
PUT music/_doc/1?refresh
{"suggest" : [{"input": "Nevermind","weight" : 10},{"input": "Nirvana","weight" : 3}]}
放入一条重复数据
PUT music/docc/2?refresh {"suggest" : {"input": [ "Nevermind", "Nirvana" ],"weight" : 20} }
查询建议根据前缀查询:
POST music/_search?pretty
{"suggest": {"song-suggest" : {"prefix" : "nir", "completion" : { "field" : "suggest" }}}
}
对建议查询结果去重: "skip_duplicates": true ,该特性在6.x支持,5.x不支持
POST music/_search?pretty {"suggest": {"song-suggest" : {"prefix" : "nir", "completion" : { "field" : "suggest","skip_duplicates": true }} }}
查询建议文档存储短语
PUT music/docc/3?refresh {"suggest" : {"input": [ "lucene solr", "lucene so cool","lucene elasticsearch" ],"weight" : 20} }PUT music/docc/4?refresh {"suggest" : {"input": ["lucene solr cool","lucene elasticsearch" ],"weight" : 10} }
查询
POST music/_search?pretty
{"suggest": {"song-suggest" : {"prefix" : "lucene s", "completion" : { "field" : "suggest" }}}}
三 、java -api
## elasticsearch5.x:查询建议java-api介绍、Suggester 介绍
参考:http://www.mamicode.com/info-detail-2347270.htmlpackage com.youlan.es.util;import java.util.concurrent.ExecutionException;import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.rest.RestStatus;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.suggest.*;
import org.elasticsearch.search.suggest.completion.CompletionSuggestion;
import org.elasticsearch.search.suggest.phrase.PhraseSuggestion;
import org.elasticsearch.search.suggest.term.TermSuggestion;public class SuggestDemo {private static Logger logger = LogManager.getRootLogger();//拼写检查(英文)public static void termSuggest(TransportClient client) {// 1、创建search请求//SearchRequest searchRequest = new SearchRequest();SearchRequest searchRequest = new SearchRequest("twitter");// 2、用SearchSourceBuilder来构造查询请求体 ,请仔细查看它的方法,构造各种查询的方法都在这。SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();sourceBuilder.size(0);//做查询建议//词项建议SuggestionBuilder termSuggestionBuilder =SuggestBuilders.termSuggestion("message").text("tring out Elticsearch");//搜索框输入内容:tring out ElticsearchSuggestBuilder suggestBuilder = new SuggestBuilder();suggestBuilder.addSuggestion("suggest_user", termSuggestionBuilder);sourceBuilder.suggest(suggestBuilder);searchRequest.source(sourceBuilder);try{//3、发送请求SearchResponse searchResponse = client.search(searchRequest).get();//4、处理响应//搜索结果状态信息if(RestStatus.OK.equals(searchResponse.status())) {// 获取建议结果Suggest suggest = searchResponse.getSuggest();TermSuggestion termSuggestion = suggest.getSuggestion("suggest_user");for (TermSuggestion.Entry entry : termSuggestion.getEntries()) {logger.info("text: " + entry.getText().string());for (TermSuggestion.Entry.Option option : entry) {String suggestText = option.getText().string();//建议内容logger.info(" suggest option : " + suggestText);}}}} catch (InterruptedException | ExecutionException e) {logger.error(e);}/*"suggest": {"my-suggestion": [{"text": "tring","offset": 0,"length": 5,"options": [{"text": "trying","score": 0.8,"freq": 2}]},{"text": "out","offset": 6,"length": 3,"options": []},{"text": "elasticsearch","offset": 10,"length": 13,"options": []}]}*/}public static void phraseSuggest(TransportClient client){//1、创建search请求SearchRequest searchRequest = new SearchRequest("twitter");//2、构造查询qing'qi请求体SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();sourceBuilder.size(0);SuggestionBuilder phraseSuggestBuilder = SuggestBuilders.phraseSuggestion( "message").text("tring out");SuggestBuilder suggestBuilder = new SuggestBuilder();suggestBuilder.addSuggestion("my-suggestion",phraseSuggestBuilder);sourceBuilder.suggest(suggestBuilder);searchRequest.source(sourceBuilder);try {//3、发送请求SearchResponse searchResponse = client.search(searchRequest).get();//4、处理响应//搜索状态信息if (RestStatus.OK.equals(searchResponse.status())){//获得建议Suggest suggest = searchResponse.getSuggest();PhraseSuggestion phraseSuggestion =suggest.getSuggestion("my-suggestion");for (PhraseSuggestion.Entry entry:phraseSuggestion){logger.info("text:"+entry.getText().string());for (PhraseSuggestion.Entry.Option option:entry){String suggestText = option.getText().string();logger.info(" suggest option :"+suggestText);}}}} catch (InterruptedException e) {logger.error("请求出错:"+e);} catch (ExecutionException e) {logger.error(e);}}//自动补全public static void completionSuggester(TransportClient client) {// 1、创建search请求//SearchRequest searchRequest = new SearchRequest();SearchRequest searchRequest = new SearchRequest("music");// 2、用SearchSourceBuilder来构造查询请求体 ,请仔细查看它的方法,构造各种查询的方法都在这。SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();sourceBuilder.size(0);//做查询建议//自动补全/*POST music/_search?pretty{"suggest": {"song-suggest" : {"prefix" : "lucene s","completion" : {"field" : "suggest" ,"skip_duplicates": true}}}}*/SuggestionBuilder termSuggestionBuilder =SuggestBuilders.completionSuggestion("suggest").prefix("lucene s");// .skipDuplicates(true) 6.x去重;SuggestBuilder suggestBuilder = new SuggestBuilder();suggestBuilder.addSuggestion("song-suggest", termSuggestionBuilder);sourceBuilder.suggest(suggestBuilder);searchRequest.source(sourceBuilder);try {//3、发送请求SearchResponse searchResponse = client.search(searchRequest).get();//4、处理响应//搜索结果状态信息if(RestStatus.OK.equals(searchResponse.status())) {// 获取建议结果Suggest suggest = searchResponse.getSuggest();CompletionSuggestion termSuggestion = suggest.getSuggestion("song-suggest");for (CompletionSuggestion.Entry entry : termSuggestion.getEntries()) {logger.info("text: " + entry.getText().string());for (CompletionSuggestion.Entry.Option option : entry) {String suggestText = option.getText().string();logger.info(" suggest option : " + suggestText);}}}} catch (InterruptedException | ExecutionException e) {logger.error(e);}
// 结果:
// {
// "took": 7,
// "timed_out": false,
// "_shards": {
// "total": 5,
// "successful": 5,
// "skipped": 0,
// "failed": 0
// },
// "hits": {
// "total": 0,
// "max_score": 0,
// "hits": []
// },
// "suggest": {
// "song-suggest": [
// {
// "text": "lucene s",
// "offset": 0,
// "length": 8,
// "options": [
// {
// "text": "lucene so cool",
// "_index": "music",
// "_type": "docc",
// "_id": "3",
// "_score": 20,
// "_source": {
// "suggest": {
// "input": [
// "lucene solr",
// "lucene so cool",
// "lucene elasticsearch"
// ],
// "weight": 20
// }
// }
// },
// {
// "text": "lucene solr cool",
// "_index": "music",
// "_type": "docc",
// "_id": "4",
// "_score": 10,
// "_source": {
// "suggest": {
// "input": [
// "lucene solr cool",
// "lucene elasticsearch"
// ],
// "weight": 10
// }
// }
// }
// ]
// }
// ]
// }
// }}public static void main(String[] args) {EsClient esClient= new EsClient();try (TransportClient client =esClient.getConnection() ;) {logger.info("---------------- 拼写检查:termSuggest----------------------");termSuggest(client);logger.info("------------------ 短语建议:phraseSuggest--------------------");phraseSuggest(client);logger.info("------------------ 自动补全:completionSuggester--------------------");completionSuggester(client);} catch (Exception e) {logger.error(e);}}
}