文章目录
- DSL 查询种类
- DSL query 基本语法
- 1、全文检索
- 2、精确查询
- 3、地理查询
- 4、function score (算分控制)
- 5、bool 查询
- 搜索结果处理
- 1、排序
- 2、分页
- 3、高亮
- RestClient操作
DSL 查询种类
- 查询所有:查询所有数据,一般在测试时使用。march_all,但是一般显示全部,有一个分页的功能
- 全文检索(full text)查询:利用分词器对用户的输入内容进行分词,然后去倒排索引库匹配。例如:
- match_query
- mutil_match_query
- 精确查询:根据精确词条值查询数据,一般查找的时keyword、数值、日期、boolean等字段。例如:
- ids
- term
- range
- 地理查询(geo):根据经纬度查询,例如:
- geo_distance
- geo_bounding_box
- 复合(compound)查询:复合查询时将上面各种查询条件组合在一起,合并查询条件。例如:
- bool
- funcation_score
DSL query 基本语法
1、全文检索
# DSL查询
GET /indexName/_search
{"query":{"查询类型":{"查询条件":"条件值"}}
}
match 与 multi_match 的与别是前者根据单字段查,后者根据多字段查。
参与搜索的字段越多,查询效率越低,建议利用copy_to将多个检索字段放在一起,然后使用match—all字段查。
GET /hotel/_search
{"query": {"match": {"city": "上海"}}
}GET /hotel/_search
{"query": {"match": {"all": "如家"}}
}GET /hotel/_search{"query": {"multi_match": {"query": "如家","fields": ["name","brand","business"]}}}
2、精确查询
精确查询: term字段全值匹配,range字段范围匹配。
精确查询一般查找keyword、数值、boolean等不可分词的字段
# term
GET /hotel/_search
{"query": {"term": {"city": {"value": "北京"}}}
}
# range
GET /hotel/_search
{"query": {"range": {"price": {"gt": 1000,"lt": 2000}}}
}
3、地理查询
GET /hotel/_search
{"query": {"geo_bounding_box": {"location": {"top_left": {"lat": 31.1,"lon": 121.5},"bottom_right": {"lat": 30.9,"lon": 121.7}}}}
}GET /hotel/_search
{"query": {"geo_distance": {"distance": "20km","location": {"lat": 31.13,"lon": 121.8}}}
}
4、function score (算分控制)
复合查询(compound ):将简单查询条件组合在一起,实现复杂搜索逻辑。
function score:算分函数查询
,可以控制文档的相关性算分,控制排名。例如百度竞价
es在5.1及之后就弃用了 TF-IDF 算法,开始采用 BM25算法。BM25算法不会因为词的出现频率变大而导致算分无限增大,会逐渐趋近一个值
function score query :可以修改文档相关性算分,得到新的算分。
三要素
- 过滤条件:决定哪些条件要加分
- 算分函数:如何计算function score
- 加权方式:function score 与 query score如何运算
GET /hotel/_search
{"query": {"function_score": {"query": {"match": {"all": "如家酒店"}},"functions": [{"filter": {"term": {"city": "上海"}},"weight": 10}],"boost_mode": "sum"}}
}
5、bool 查询
boolean query:布尔查询是一个或多个子查询的组合。
- must:必须匹配每个子查询,类似”and“
- should:选择性匹配子查询,类似”or“
- must_not:必须不匹配,不参与算分,类似”非“
- filter:必须匹配,不参与算分
GET /hotel/_search
{"query": {"bool": {"must": [{"match": {"all": "上海"}}],"must_not": [{"range": {"price": {"gt": 500}}}],"filter": [{"geo_distance": {"distance": "10km","location": {"lat": 31.21,"lon": 121.5}}}]}}
}
搜索结果处理
1、排序
es支持对搜索结构进行排序,默认是根据相关度算分(_score)进行排序。可以排序的字段有keyword,数值、地理坐标、日期类型等。
GET /hotel/_search
{"query": {"match_all": {}},"sort": [{"id": {"order": "desc"}}]
}
GET /hotel/_search
{"query": {"match_all": {}},"sort": [{"_geo_distance": {"location": {"lat": 31.2,"lon": 121.5},"order": "asc","unit": "km"}}]
}
这个排序的结果就是相聚的公里数。
2、分页
针对深度分页;ES给出了两种方案
- search after:分页时需要排序,原理是从上次的排序值开始(末尾值),查询下一页的数据。官方推荐使用,不会太占内存。手机向下反动滚页。
- scroll:原理是将排序数据形成快照,保存在内存。不推荐
3、高亮
ES默认搜索字段和高亮字段必须一致,否则不会高亮。或者使用 "require_field_match": "false"
也能高亮。
最后将查询结果中 highlight 与 指定高亮的字段进行替换返回给前端就行。
RestClient操作
普通查询
@Testpublic void testMatchAll() throws IOException {SearchRequest searchRequest = new SearchRequest("hotel");searchRequest.source().query(QueryBuilders.matchAllQuery());SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);SearchHits searchHits = searchResponse.getHits();long value = searchHits.getTotalHits().value;System.out.println(value);SearchHit[] hits = searchHits.getHits();System.out.println(hits[0]);HotelDoc hotelDoc = JSON.parseObject(hits[0].getSourceAsString(), HotelDoc.class);System.out.println(hotelDoc);}QueryBuilders.matchAllQuery()QueryBuilders.matchQuery("all","如家")QueryBuilders.multiMatchQuery("如家","name","brand","business")QueryBuilders.termQuery("city","上海")QueryBuilders.rangeQuery("price").gt(100).lt(400)BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();boolQueryBuilder.must(QueryBuilders.termQuery("city","北京"));boolQueryBuilder.filter(QueryBuilders.rangeQuery("price").gt(100).lt(400));
分页和排序
public void testPageAndSort() throws IOException {int pageNum = 2, pageSize = 10;SearchRequest searchRequest = new SearchRequest("hotel");BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("brand", "如家");MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("all", "北京");boolQueryBuilder.must(termQueryBuilder);boolQueryBuilder.must(matchQueryBuilder);searchRequest.source().query(boolQueryBuilder);searchRequest.source().from((pageNum - 1) * pageSize).size(pageSize);searchRequest.source().sort("price", SortOrder.ASC);SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);SearchHit[] hits = searchResponse.getHits().getHits();for (SearchHit hit : hits) {String source = hit.getSourceAsString();HotelDoc hotelDoc = JSON.parseObject(source, HotelDoc.class);System.out.println(hotelDoc);}}
高亮
public void testHighLight() throws IOException {SearchRequest searchRequest = new SearchRequest("hotel");searchRequest.source().query(QueryBuilders.matchQuery("all","如家"));searchRequest.source().highlighter(new HighlightBuilder().field("name").requireFieldMatch(false));SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);SearchHit[] hits = searchResponse.getHits().getHits();for (SearchHit hit : hits) {String source = hit.getSourceAsString();HotelDoc hotelDoc = JSON.parseObject(source, HotelDoc.class);Map<String, HighlightField> highlightFields = hit.getHighlightFields();if(!highlightFields.isEmpty()){HighlightField highlightField = highlightFields.get("name");//一般value只有一个元素,取数组第一个String name = highlightField.getFragments()[0].string();hotelDoc.setName(name);}System.out.println(hotelDoc);}}
算分
让指定酒店置顶 (function_score )广告业务
// 算分控制FunctionScoreQueryBuilder functionScoreQueryBuilder = QueryBuilders.functionScoreQuery(// 原始查询boolQueryBuilder,// FunctionScore 数组new FunctionScoreQueryBuilder.FilterFunctionBuilder[]{new FunctionScoreQueryBuilder.FilterFunctionBuilder(QueryBuilders.termQuery("isAD", true),ScoreFunctionBuilders.weightFactorFunction(10))});