Elasticsearch复合查询之Boosting Query

前言

ES 里面有 5 种复合查询，分别是：

Boolean Query
Boosting Query
Constant Score Query
Disjunction Max Query
Function Score Query

Boolean Query在之前已经介绍过了，今天来看一下 Boosting Query 用法，其实也非常简单，总结起来就一句话，对不期待的查询关键词进行相关性降分。

Boost 加权机制底层也是 Lucene 提供的能力，对重要的数据加权有两个时机，一个是在索引时，一个是在查询时，在索引时候加权查询性能会比较高但不灵活，所以都会选择在查询时加权，加权的方式也很简单，如：

title: china^20 OR  content: china^20

在 ES里面的大多数全文检索单 Query 都支持 boost 加权，但想要实现降权却不行，因为 Lucene 底层不直接支持，需要使用 function score query来间接实现，boost 的数值必须是正数，当然也可以包括 0-1 之间的小数，所以在 ES 中就封装了 Boosting Query 来支持对某些关键词进行降权查询，却又不是不让其出现在查询结果中，只是让其排名靠后

写入测试数据

在 kibana 中的 dev_tools 的 console 中，直接使用下面的 POST 语句即可，需要注意，如果 ES
版本低于 7.x 的，在 PATH 里面要加上 type，否则会报错：

POST test01/doc/_bulk
{ "index" : { "_id" : "1" } }
{ "title" : "Collecting  Service", "content": "Logstash" }
{ "index" : { "_id" : "2" } }
{ "title" : "Collecting  Service", "content": "Beats" }
{ "index" : { "_id" : "3" } }
{ "title" : "Collecting  Service", "content": "FLume" }

写完之后，可以在 Management => Index patterns => Create Index Patterns 里面创建手动创建索引模板，可以看到生成了如下 mapping，需要注意的时，这里面自动推断的 mapping 字段并不能删减字段，因为我们是已经

将数据写入了 ES，如果想要控制字段的生成，比如不想要 content.keyword 字段，那么就要在写入数据前，提前定制 mapping 才可以

查询测试数据

GET test01/_search?
{"query": {"match": {"title": "Collecting"}}
}

返回结果：

{"took" : 2,"timed_out" : false,"_shards" : {"total" : 6,"successful" : 6,"skipped" : 0,"failed" : 0},"hits" : {"total" : 3,"max_score" : 0.2876821,"hits" : [{"_index" : "test01","_type" : "doc","_id" : "3","_score" : 0.2876821,"_source" : {"title" : "Collecting  Service","content" : "FLume"}},{"_index" : "test01","_type" : "doc","_id" : "2","_score" : 0.2876821,"_source" : {"title" : "Collecting  Service","content" : "Beats"}},{"_index" : "test01","_type" : "doc","_id" : "1","_score" : 0.2876821,"_source" : {"title" : "Collecting  Service","content" : "Logstash"}}]}
}

可以看到评分都相等，这个时候如果我想要命中 logstash 的不优先展示，就可以使用 Boosting Query 了：

GET test01/_search?
{"query": {"boosting": {"positive": {"match": {"title": "Collecting  Service"}},"negative": {"match": {"content": "Logstash"}},"negative_boost": 0.5}}
}

结果展示：

{"took" : 2,"timed_out" : false,"_shards" : {"total" : 6,"successful" : 6,"skipped" : 0,"failed" : 0},"hits" : {"total" : 3,"max_score" : 0.5753642,"hits" : [{"_index" : "test01","_type" : "doc","_id" : "3","_score" : 0.5753642,"_source" : {"title" : "Collecting  Service","content" : "FLume"}},{"_index" : "test01","_type" : "doc","_id" : "2","_score" : 0.5753642,"_source" : {"title" : "Collecting  Service","content" : "Beats"}},{"_index" : "test01","_type" : "doc","_id" : "1","_score" : 0.2876821,"_source" : {"title" : "Collecting  Service","content" : "Logstash"}}]}
}

Boosting Query原理

Positive Boosting:

这种形式用于增强具有特定条件的文档的得分。它由两个子查询组成：主查询（positive query）和副查询（boost query）。主查询用于匹配文档，而副查询用于对匹配到的文档进行权重调整。Boosting Query将副查询的分数与主查询的分数相乘，从而影响文档的最终得分。

Negative Boosting:

这种形式用于降低具有特定条件的文档的得分。它同样由两个子查询组成：主查询和副查询。在Negative Boosting中，主查询用于匹配文档，而副查询用于对不匹配的文档进行权重调整。Boosting Query将副查询的分数与主查询的分数相乘，并将结果从1中减去，以降低不匹配文档的得分。

Boosting Query的实现原理如下：

解析查询语句：Elasticsearch首先解析用户提供的Boosting Query语句，提取出主查询和副查询以及相应的权重。
执行查询：对索引中的文档进行主查询匹配，并为匹配到的文档计算得分。
计算副查询得分：对于每个匹配到的文档，执行副查询，并计算副查询的得分。
应用权重调整：根据Boosting Query的类型（Positive Boosting或Negative Boosting），将副查询的得分与主查询的得分相乘，或者从1中减去，从而调整文档的最终得分。
返回结果：根据得分对匹配的文档进行排序，将搜索结果返回给用户。

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mzph.cn/news/45605.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！