ElasticSearch搜索相关性及打分的相关原理

文章目录

  • 一、相关性和打分简介
  • 二、TF-IDF得分计算公式
  • 三、BM25(Best Matching 25)
  • 四、使用explain查看TF-IDF
  • 五、通过Boosting控制相关度

一、相关性和打分简介

手机

举个例子来说明:

  1. 假设有一个电商网站,用户在搜索框中输入了关键词"手机",然后触发了搜索操作。Elasticsearch会根据用户的查询,在索引中找到所有包含"手机"的文档,并按照相关性对这些文档进行打分。

  2. 相关性评分的目的是确定搜索结果的质量和排序。相关性评分越高,表示搜索结果与用户查询的匹配程度越好。

  3. 例如,对于一个包含"手机"关键词的文档,如果它在标题、描述和其他字段中多次出现"手机",那么它的相关性评分可能会很高。而对于一个只在描述中出现一次"手机"的文档,它的相关性评分可能会较低。

  4. 在Elasticsearch 5.0版本之前,Elasticsearch使用的是TF-IDF(Term Frequency-Inverse Document Frequency)算法来进行相关性判断和打分。

  5. TF-IDF算法是一种经典的信息检索算法,它考虑了词频(Term Frequency)和逆文档频率(Inverse Document Frequency)
    词频表示词在文档中的出现次数
    逆文档频率表示词在整个文档集合中的普遍程度。

  6. 根据TF-IDF算法,搜索词在文档中出现的次数越多,这个词对文档的相关性贡献越大。但是,如果这个词在整个文档集合中出现的次数越多,它对文档的相关性贡献越小,因为它在整个集合中普遍存在,不足以区分文档的重要性。

  7. 从Elasticsearch 5.0版本开始,默认使用了BM25(Best Matching 25)算法来进行相关性判断和打分。BM25算法在计算相关性得分时,考虑了文档的长度和搜索词的位置等因素。

  8. BM25算法相对于TF-IDF算法更为先进和准确,它在实际应用中表现更好。所以,对于Elasticsearch 5.0及以后的版本,推荐使用BM25算法来进行相关性判断和打分。

二、TF-IDF得分计算公式

TF-IDF(Term Frequency-Inverse Document Frequency)得分计算公式是一种用于衡量词语在文档集合中重要性的指标。其计算公式如下:

TF-IDF = TF * IDF

其中,TF(Term Frequency)表示词语在文档中的频率,可以通过以下公式计算:

TF = (词语在文档中出现的次数) / (文档中总词语数)

IDF(Inverse Document Frequency)表示逆文档频率,可以通过以下公式计算:

IDF = log((文档集合中文档总数) / (包含词语的文档数))

TF-IDF得分越高,表示词语在文档集合中越重要。它的核心思想是:当一个词语在某个文档中频繁出现(高TF值),同时在其他文档中较少出现(高IDF值)时,该词语对于该文档的重要性较高。

TF-IDF常用于信息检索、文本分类、关键词提取等自然语言处理任务中。

三、BM25(Best Matching 25)

BM25(Best Matching 25)算法可以被视为对TF-IDF算法的改进和扩展。BM25算法是一种用于信息检索的评分算法,相较于TF-IDF算法,它在一些方面进行了改进,以提高检索结果的质量。
bm2.5

主要的改进包括以下几个方面:

  1. 考虑文档长度:TF-IDF算法仅仅考虑了词频,对文档长度没有考虑。而BM25算法引入了文档长度因子,可以更好地处理文档长度不同的情况。

  2. 调整词频饱和度:TF-IDF算法中,词频的值会随着出现次数的增加而线性增长,容易导致过度强调高频词语。而BM25算法使用了对数函数来调整词频的饱和度,使得高频词语的权重增长趋于饱和。

  3. 引入文档频率饱和度:BM25算法引入了文档频率的饱和度因子,用于调整文档频率的影响。这可以避免过度强调出现在大多数文档中的常见词语。

  4. 综上所述,BM25算法在TF-IDF算法的基础上进行了改进,更加准确地评估了词语在文档集合中的重要性,提高了信息检索的效果。

四、使用explain查看TF-IDF

通过使用Elasticsearch的Explain API,你可以查看特定查询的TF-IDF得分。以下是一个示例,展示如何使用Explain API来计算TF-IDF得分:

  1. 首先,在Kibana的Dev Tools中执行以下命令,创建一个新的索引并索引一些文档数据:
PUT my_index
{"mappings": {"properties": {"text": {"type": "text"}}}
}POST my_index/_doc
{"text": "This is the first document."
}POST my_index/_doc
{"text": "This document is the second document."
}POST my_index/_doc
{"text": "And this is the third one."
}POST my_index/_doc
{"text": "Is this the first document?"
}

执行如下:
在这里插入图片描述
2. 然后,使用Explain API来查看TF-IDF得分。在Kibana的Dev Tools中执行以下命令:

GET my_index/_search
{"explain": true,"query": {"match": {"text": "this is the first document"}}
}

以上命令将返回查询结果,并在每个匹配的文档中提供一个_explanation字段,其中包含有关得分计算的详细信息,包括TF-IDF得分。
explain
请注意,Explain API返回的_explanation字段中包含了与得分相关的详细信息,如词频、文档频率、字段长度等。你可以通过解析_explanation字段来提取和分析TF-IDF得分。
3. explain内容如下:

#! Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security.
{"took" : 16,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 4,"relation" : "eq"},"max_score" : 1.4186639,"hits" : [{"_shard" : "[my_index][0]","_node" : "NkTFrxzGQuOf4zelToSeKQ","_index" : "my_index","_type" : "_doc","_id" : "08FDOokBF8lAViln2DRj","_score" : 1.4186639,"_source" : {"text" : "This is the first document."},"_explanation" : {"value" : 1.4186639,"description" : "sum of:","details" : [{"value" : 0.10943023,"description" : "weight(text:this in 0) [PerFieldSimilarity], result of:","details" : [{"value" : 0.10943023,"description" : "score(freq=1.0), computed as boost * idf * tf from:","details" : [{"value" : 2.2,"description" : "boost","details" : [ ]},{"value" : 0.105360515,"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:","details" : [{"value" : 4,"description" : "n, number of documents containing term","details" : [ ]},{"value" : 4,"description" : "N, total number of documents with field","details" : [ ]}]},{"value" : 0.472103,"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:","details" : [{"value" : 1.0,"description" : "freq, occurrences of term within document","details" : [ ]},{"value" : 1.2,"description" : "k1, term saturation parameter","details" : [ ]},{"value" : 0.75,"description" : "b, length normalization parameter","details" : [ ]},{"value" : 5.0,"description" : "dl, length of field","details" : [ ]},{"value" : 5.5,"description" : "avgdl, average length of field","details" : [ ]}]}]}]},{"value" : 0.10943023,"description" : "weight(text:is in 0) [PerFieldSimilarity], result of:","details" : [{"value" : 0.10943023,"description" : "score(freq=1.0), computed as boost * idf * tf from:","details" : [{"value" : 2.2,"description" : "boost","details" : [ ]},{"value" : 0.105360515,"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:","details" : [{"value" : 4,"description" : "n, number of documents containing term","details" : [ ]},{"value" : 4,"description" : "N, total number of documents with field","details" : [ ]}]},{"value" : 0.472103,"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:","details" : [{"value" : 1.0,"description" : "freq, occurrences of term within document","details" : [ ]},{"value" : 1.2,"description" : "k1, term saturation parameter","details" : [ ]},{"value" : 0.75,"description" : "b, length normalization parameter","details" : [ ]},{"value" : 5.0,"description" : "dl, length of field","details" : [ ]},{"value" : 5.5,"description" : "avgdl, average length of field","details" : [ ]}]}]}]},{"value" : 0.10943023,"description" : "weight(text:the in 0) [PerFieldSimilarity], result of:","details" : [{"value" : 0.10943023,"description" : "score(freq=1.0), computed as boost * idf * tf from:","details" : [{"value" : 2.2,"description" : "boost","details" : [ ]},{"value" : 0.105360515,"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:","details" : [{"value" : 4,"description" : "n, number of documents containing term","details" : [ ]},{"value" : 4,"description" : "N, total number of documents with field","details" : [ ]}]},{"value" : 0.472103,"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:","details" : [{"value" : 1.0,"description" : "freq, occurrences of term within document","details" : [ ]},{"value" : 1.2,"description" : "k1, term saturation parameter","details" : [ ]},{"value" : 0.75,"description" : "b, length normalization parameter","details" : [ ]},{"value" : 5.0,"description" : "dl, length of field","details" : [ ]},{"value" : 5.5,"description" : "avgdl, average length of field","details" : [ ]}]}]}]},{"value" : 0.7199211,"description" : "weight(text:first in 0) [PerFieldSimilarity], result of:","details" : [{"value" : 0.7199211,"description" : "score(freq=1.0), computed as boost * idf * tf from:","details" : [{"value" : 2.2,"description" : "boost","details" : [ ]},{"value" : 0.6931472,"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:","details" : [{"value" : 2,"description" : "n, number of documents containing term","details" : [ ]},{"value" : 4,"description" : "N, total number of documents with field","details" : [ ]}]},{"value" : 0.472103,"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:","details" : [{"value" : 1.0,"description" : "freq, occurrences of term within document","details" : [ ]},{"value" : 1.2,"description" : "k1, term saturation parameter","details" : [ ]},{"value" : 0.75,"description" : "b, length normalization parameter","details" : [ ]},{"value" : 5.0,"description" : "dl, length of field","details" : [ ]},{"value" : 5.5,"description" : "avgdl, average length of field","details" : [ ]}]}]}]},{"value" : 0.3704521,"description" : "weight(text:document in 0) [PerFieldSimilarity], result of:","details" : [{"value" : 0.3704521,"description" : "score(freq=1.0), computed as boost * idf * tf from:","details" : [{"value" : 2.2,"description" : "boost","details" : [ ]},{"value" : 0.35667494,"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:","details" : [{"value" : 3,"description" : "n, number of documents containing term","details" : [ ]},{"value" : 4,"description" : "N, total number of documents with field","details" : [ ]}]},{"value" : 0.472103,"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:","details" : [{"value" : 1.0,"description" : "freq, occurrences of term within document","details" : [ ]},{"value" : 1.2,"description" : "k1, term saturation parameter","details" : [ ]},{"value" : 0.75,"description" : "b, length normalization parameter","details" : [ ]},{"value" : 5.0,"description" : "dl, length of field","details" : [ ]},{"value" : 5.5,"description" : "avgdl, average length of field","details" : [ ]}]}]}]}]}},{"_shard" : "[my_index][0]","_node" : "NkTFrxzGQuOf4zelToSeKQ","_index" : "my_index","_type" : "_doc","_id" : "1sFEOokBF8lAVilnADS4","_score" : 1.4186639,"_source" : {"text" : "Is this the first document?"},"_explanation" : {"value" : 1.4186639,"description" : "sum of:","details" : [{"value" : 0.10943023,"description" : "weight(text:this in 0) [PerFieldSimilarity], result of:","details" : [{"value" : 0.10943023,"description" : "score(freq=1.0), computed as boost * idf * tf from:","details" : [{"value" : 2.2,"description" : "boost","details" : [ ]},{"value" : 0.105360515,"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:","details" : [{"value" : 4,"description" : "n, number of documents containing term","details" : [ ]},{"value" : 4,"description" : "N, total number of documents with field","details" : [ ]}]},{"value" : 0.472103,"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:","details" : [{"value" : 1.0,"description" : "freq, occurrences of term within document","details" : [ ]},{"value" : 1.2,"description" : "k1, term saturation parameter","details" : [ ]},{"value" : 0.75,"description" : "b, length normalization parameter","details" : [ ]},{"value" : 5.0,"description" : "dl, length of field","details" : [ ]},{"value" : 5.5,"description" : "avgdl, average length of field","details" : [ ]}]}]}]},{"value" : 0.10943023,"description" : "weight(text:is in 0) [PerFieldSimilarity], result of:","details" : [{"value" : 0.10943023,"description" : "score(freq=1.0), computed as boost * idf * tf from:","details" : [{"value" : 2.2,"description" : "boost","details" : [ ]},{"value" : 0.105360515,"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:","details" : [{"value" : 4,"description" : "n, number of documents containing term","details" : [ ]},{"value" : 4,"description" : "N, total number of documents with field","details" : [ ]}]},{"value" : 0.472103,"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:","details" : [{"value" : 1.0,"description" : "freq, occurrences of term within document","details" : [ ]},{"value" : 1.2,"description" : "k1, term saturation parameter","details" : [ ]},{"value" : 0.75,"description" : "b, length normalization parameter","details" : [ ]},{"value" : 5.0,"description" : "dl, length of field","details" : [ ]},{"value" : 5.5,"description" : "avgdl, average length of field","details" : [ ]}]}]}]},{"value" : 0.10943023,"description" : "weight(text:the in 0) [PerFieldSimilarity], result of:","details" : [{"value" : 0.10943023,"description" : "score(freq=1.0), computed as boost * idf * tf from:","details" : [{"value" : 2.2,"description" : "boost","details" : [ ]},{"value" : 0.105360515,"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:","details" : [{"value" : 4,"description" : "n, number of documents containing term","details" : [ ]},{"value" : 4,"description" : "N, total number of documents with field","details" : [ ]}]},{"value" : 0.472103,"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:","details" : [{"value" : 1.0,"description" : "freq, occurrences of term within document","details" : [ ]},{"value" : 1.2,"description" : "k1, term saturation parameter","details" : [ ]},{"value" : 0.75,"description" : "b, length normalization parameter","details" : [ ]},{"value" : 5.0,"description" : "dl, length of field","details" : [ ]},{"value" : 5.5,"description" : "avgdl, average length of field","details" : [ ]}]}]}]},{"value" : 0.7199211,"description" : "weight(text:first in 0) [PerFieldSimilarity], result of:","details" : [{"value" : 0.7199211,"description" : "score(freq=1.0), computed as boost * idf * tf from:","details" : [{"value" : 2.2,"description" : "boost","details" : [ ]},{"value" : 0.6931472,"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:","details" : [{"value" : 2,"description" : "n, number of documents containing term","details" : [ ]},{"value" : 4,"description" : "N, total number of documents with field","details" : [ ]}]},{"value" : 0.472103,"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:","details" : [{"value" : 1.0,"description" : "freq, occurrences of term within document","details" : [ ]},{"value" : 1.2,"description" : "k1, term saturation parameter","details" : [ ]},{"value" : 0.75,"description" : "b, length normalization parameter","details" : [ ]},{"value" : 5.0,"description" : "dl, length of field","details" : [ ]},{"value" : 5.5,"description" : "avgdl, average length of field","details" : [ ]}]}]}]},{"value" : 0.3704521,"description" : "weight(text:document in 0) [PerFieldSimilarity], result of:","details" : [{"value" : 0.3704521,"description" : "score(freq=1.0), computed as boost * idf * tf from:","details" : [{"value" : 2.2,"description" : "boost","details" : [ ]},{"value" : 0.35667494,"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:","details" : [{"value" : 3,"description" : "n, number of documents containing term","details" : [ ]},{"value" : 4,"description" : "N, total number of documents with field","details" : [ ]}]},{"value" : 0.472103,"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:","details" : [{"value" : 1.0,"description" : "freq, occurrences of term within document","details" : [ ]},{"value" : 1.2,"description" : "k1, term saturation parameter","details" : [ ]},{"value" : 0.75,"description" : "b, length normalization parameter","details" : [ ]},{"value" : 5.0,"description" : "dl, length of field","details" : [ ]},{"value" : 5.5,"description" : "avgdl, average length of field","details" : [ ]}]}]}]}]}},{"_shard" : "[my_index][0]","_node" : "NkTFrxzGQuOf4zelToSeKQ","_index" : "my_index","_type" : "_doc","_id" : "1MFDOokBF8lAViln6TSq","_score" : 0.78294927,"_source" : {"text" : "This document is the second document."},"_explanation" : {"value" : 0.78294927,"description" : "sum of:","details" : [{"value" : 0.10158265,"description" : "weight(text:this in 0) [PerFieldSimilarity], result of:","details" : [{"value" : 0.10158265,"description" : "score(freq=1.0), computed as boost * idf * tf from:","details" : [{"value" : 2.2,"description" : "boost","details" : [ ]},{"value" : 0.105360515,"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:","details" : [{"value" : 4,"description" : "n, number of documents containing term","details" : [ ]},{"value" : 4,"description" : "N, total number of documents with field","details" : [ ]}]},{"value" : 0.43824703,"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:","details" : [{"value" : 1.0,"description" : "freq, occurrences of term within document","details" : [ ]},{"value" : 1.2,"description" : "k1, term saturation parameter","details" : [ ]},{"value" : 0.75,"description" : "b, length normalization parameter","details" : [ ]},{"value" : 6.0,"description" : "dl, length of field","details" : [ ]},{"value" : 5.5,"description" : "avgdl, average length of field","details" : [ ]}]}]}]},{"value" : 0.10158265,"description" : "weight(text:is in 0) [PerFieldSimilarity], result of:","details" : [{"value" : 0.10158265,"description" : "score(freq=1.0), computed as boost * idf * tf from:","details" : [{"value" : 2.2,"description" : "boost","details" : [ ]},{"value" : 0.105360515,"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:","details" : [{"value" : 4,"description" : "n, number of documents containing term","details" : [ ]},{"value" : 4,"description" : "N, total number of documents with field","details" : [ ]}]},{"value" : 0.43824703,"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:","details" : [{"value" : 1.0,"description" : "freq, occurrences of term within document","details" : [ ]},{"value" : 1.2,"description" : "k1, term saturation parameter","details" : [ ]},{"value" : 0.75,"description" : "b, length normalization parameter","details" : [ ]},{"value" : 6.0,"description" : "dl, length of field","details" : [ ]},{"value" : 5.5,"description" : "avgdl, average length of field","details" : [ ]}]}]}]},{"value" : 0.10158265,"description" : "weight(text:the in 0) [PerFieldSimilarity], result of:","details" : [{"value" : 0.10158265,"description" : "score(freq=1.0), computed as boost * idf * tf from:","details" : [{"value" : 2.2,"description" : "boost","details" : [ ]},{"value" : 0.105360515,"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:","details" : [{"value" : 4,"description" : "n, number of documents containing term","details" : [ ]},{"value" : 4,"description" : "N, total number of documents with field","details" : [ ]}]},{"value" : 0.43824703,"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:","details" : [{"value" : 1.0,"description" : "freq, occurrences of term within document","details" : [ ]},{"value" : 1.2,"description" : "k1, term saturation parameter","details" : [ ]},{"value" : 0.75,"description" : "b, length normalization parameter","details" : [ ]},{"value" : 6.0,"description" : "dl, length of field","details" : [ ]},{"value" : 5.5,"description" : "avgdl, average length of field","details" : [ ]}]}]}]},{"value" : 0.47820133,"description" : "weight(text:document in 0) [PerFieldSimilarity], result of:","details" : [{"value" : 0.47820133,"description" : "score(freq=2.0), computed as boost * idf * tf from:","details" : [{"value" : 2.2,"description" : "boost","details" : [ ]},{"value" : 0.35667494,"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:","details" : [{"value" : 3,"description" : "n, number of documents containing term","details" : [ ]},{"value" : 4,"description" : "N, total number of documents with field","details" : [ ]}]},{"value" : 0.6094183,"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:","details" : [{"value" : 2.0,"description" : "freq, occurrences of term within document","details" : [ ]},{"value" : 1.2,"description" : "k1, term saturation parameter","details" : [ ]},{"value" : 0.75,"description" : "b, length normalization parameter","details" : [ ]},{"value" : 6.0,"description" : "dl, length of field","details" : [ ]},{"value" : 5.5,"description" : "avgdl, average length of field","details" : [ ]}]}]}]}]}},{"_shard" : "[my_index][0]","_node" : "NkTFrxzGQuOf4zelToSeKQ","_index" : "my_index","_type" : "_doc","_id" : "1cFDOokBF8lAViln9jS-","_score" : 0.30474794,"_source" : {"text" : "And this is the third one."},"_explanation" : {"value" : 0.30474794,"description" : "sum of:","details" : [{"value" : 0.10158265,"description" : "weight(text:this in 0) [PerFieldSimilarity], result of:","details" : [{"value" : 0.10158265,"description" : "score(freq=1.0), computed as boost * idf * tf from:","details" : [{"value" : 2.2,"description" : "boost","details" : [ ]},{"value" : 0.105360515,"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:","details" : [{"value" : 4,"description" : "n, number of documents containing term","details" : [ ]},{"value" : 4,"description" : "N, total number of documents with field","details" : [ ]}]},{"value" : 0.43824703,"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:","details" : [{"value" : 1.0,"description" : "freq, occurrences of term within document","details" : [ ]},{"value" : 1.2,"description" : "k1, term saturation parameter","details" : [ ]},{"value" : 0.75,"description" : "b, length normalization parameter","details" : [ ]},{"value" : 6.0,"description" : "dl, length of field","details" : [ ]},{"value" : 5.5,"description" : "avgdl, average length of field","details" : [ ]}]}]}]},{"value" : 0.10158265,"description" : "weight(text:is in 0) [PerFieldSimilarity], result of:","details" : [{"value" : 0.10158265,"description" : "score(freq=1.0), computed as boost * idf * tf from:","details" : [{"value" : 2.2,"description" : "boost","details" : [ ]},{"value" : 0.105360515,"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:","details" : [{"value" : 4,"description" : "n, number of documents containing term","details" : [ ]},{"value" : 4,"description" : "N, total number of documents with field","details" : [ ]}]},{"value" : 0.43824703,"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:","details" : [{"value" : 1.0,"description" : "freq, occurrences of term within document","details" : [ ]},{"value" : 1.2,"description" : "k1, term saturation parameter","details" : [ ]},{"value" : 0.75,"description" : "b, length normalization parameter","details" : [ ]},{"value" : 6.0,"description" : "dl, length of field","details" : [ ]},{"value" : 5.5,"description" : "avgdl, average length of field","details" : [ ]}]}]}]},{"value" : 0.10158265,"description" : "weight(text:the in 0) [PerFieldSimilarity], result of:","details" : [{"value" : 0.10158265,"description" : "score(freq=1.0), computed as boost * idf * tf from:","details" : [{"value" : 2.2,"description" : "boost","details" : [ ]},{"value" : 0.105360515,"description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:","details" : [{"value" : 4,"description" : "n, number of documents containing term","details" : [ ]},{"value" : 4,"description" : "N, total number of documents with field","details" : [ ]}]},{"value" : 0.43824703,"description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:","details" : [{"value" : 1.0,"description" : "freq, occurrences of term within document","details" : [ ]},{"value" : 1.2,"description" : "k1, term saturation parameter","details" : [ ]},{"value" : 0.75,"description" : "b, length normalization parameter","details" : [ ]},{"value" : 6.0,"description" : "dl, length of field","details" : [ ]},{"value" : 5.5,"description" : "avgdl, average length of field","details" : [ ]}]}]}]}]}}]}
}

五、通过Boosting控制相关度

在搜索引擎和信息检索领域中,Boosting可以用来提升或降低文档的相关度得分,以便更好地满足用户的查询需求。

Boosting可以在不同的层次上进行,包括查询级别的Boosting和文档级别的Boosting。

1、查询级别的Boosting:在查询级别上,可以使用Boosting来提高或降低特定查询的相关度得分。这种Boosting通常通过设置查询子句的权重或使用特定的查询类型来实现。例如,可以使用更高的权重来突出某些关键词,或使用更复杂的查询类型(如布尔查询、模糊查询)来调整相关度得分。

2、文档级别的Boosting:在文档级别上,可以使用Boosting来提高或降低特定文档的相关度得分。这种Boosting通常通过为文档定义一个额外的因子或属性来实现。例如,可以为某些文档设置更高的权重,以便它们在搜索结果中排名更靠前,或者可以为某些文档设置更低的权重,以便它们在搜索结果中排名更靠后。

通过使用Boosting技术,可以根据具体需求和优先级来调整搜索结果的相关度得分,从而提供更好的用户体验和更精确的搜索结果。

以下是一个使用boost参数的示例:

1、首先,使用Kibana的Dev Tools或者其他方式创建一个索引,并添加一些文档。例如,创建一个名为my_index的索引,并添加一些包含title和content字段的文档:

PUT my_index
{"mappings": {"properties": {"title": { "type": "text" },"content": { "type": "text" }}}
}POST my_index/_doc
{"title": "Elasticsearch is powerful","content": "Elasticsearch is a highly scalable and distributed search engine"
}POST my_index/_doc
{"title": "Kibana is a visualization tool","content": "Kibana is used to explore and visualize data stored in Elasticsearch"
}

2、然后,执行一个带有boost参数的查询来调整字段的相关度得分。例如,执行一个查询,将title字段的相关度得分提高两倍:

GET my_index/_search
{"query": {"match": {"title": {"query": "Elasticsearch","boost": 2}}}
}

在上述查询中,我们使用了match查询来搜索包含关键词"Elasticsearch"的文档。
通过在match查询中设置boost参数为2,我们将title字段的相关度得分提高了两倍。
这意味着包含关键词"Elasticsearch"的文档在结果中会更加相关。

boost

  • 当boost大于1时,提高字段的相关度得分,使其在结果排序中具有更高的权重。
  • 当0 < boost < 1时,降低字段的相关度得分,使其在结果排序中具有较低的权重。
  • 当boost小于0时,会为字段贡献负分,可能导致字段在结果排序中的得分变为负数。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/1596.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

理解LLM中的ReAct

large language models (LLMs)大语言模型在语义理解和交互式决策方面有着不错的表现。ReAct在一次交互中循环使用推理和行动两个操作解决复杂问题&#xff0c;推理即利用模型自身语义理解能力&#xff0c;行动则利用模型以外的能力&#xff08;如计算、搜索最新消息&#xff0c…

架构训练营学习笔记:4-2 存储架构模式之复制架构

高可用的关键指标 问题&#xff1a;分为故障跟灾难。不是有了多活架构就不在用复制架构 &#xff0c;还是之前的合适原则&#xff1a;多活架构的技术复杂度 跟成本都比复制架构高。 高可用的关键指标 恢复时间目标(RecoveryTimeObjective&#xff0c;RTO)指为避免在灾难发生后…

Spring Cloud Gateway - 新一代微服务API网关

Spring Cloud Gateway - 新一代微服务API网关 文章目录 Spring Cloud Gateway - 新一代微服务API网关1.网关介绍2.Spring Cloud Gateway介绍3.Spring Cloud Gateway的特性4.Spring Cloud Gateway的三大核心概念5.Gateway工作流程6.Gateway核心配置7.动态路由8.Predicate自定义P…

阿里云RockMQ与SpringBoot的整合

前言&#xff1a; 开源版本Rocket和商业版本的RocketMQ有些不同&#xff0c;研究的是商业版本的RocketMQ&#xff0c;阿里云的官方文档&#xff0c;感觉有点乱。看不咋明白&#xff0c;网上虽然有教程&#xff0c;大都还是有点缺少&#xff0c;有时候会突然跳了步骤&#xff0c…

C# 细说async/await的用法

目录 一&#xff0c;引言 二&#xff0c;实例演示 2.1 多线程同步执行下载任务&#xff0c;任务完成后通知 2.2 异步执行下载任务&#xff0c;任务完成后通知 三&#xff0c;async/await的用法 3.1 跨线程修改UI控件 3.2 异步获取数据 一&#xff0c;引言 首先先来区分…

网上书店管理系统

目录 一、系统需求分析 二、数据库概念结构设计 四、数据库物理实现 五、数据库功能调试 一、系统需求分析 需求概述 1.系统背景 当今互联网的迅速发展&#xff0c;使得人们获取信息变得极其便利。在从前&#xff0c;人们以线下书店购买书籍的方式获取知识&#xff0c;常常…

WEB:Confusion1

背景知识 SSTI漏洞 题目 根据网站图片和题目描述的提示&#xff0c;大象是php&#xff0c;蟒蛇是python&#xff0c;说明了这个网站是用python写的 在python中&#xff0c;比较常规的漏洞就是SSTI模板注入 没有思路&#xff0c;先点login和register页面看看 查看源代码 之前…

读论文---On Distillation of Guided Diffusion Models

该论文解决的问题 1 简要描述 2 在之前的工作中存在下述问题 计算过程需要计算: 1 unconditional的unet 2 conditional(w text)的unet 下图展示了计算过程 对应的代码 pipelines-> stable_diffusion-> pipline_stable_diffusion.py-> StableDiffusionPipeling-> 7…

C#在工业自动化领域的应用前景如何?

在2021年&#xff0c;C#与工业自动化已经开始结合&#xff0c;并且这种趋势有望在未来继续发展。C#是一种功能强大的编程语言&#xff0c;其面向对象的特性、跨平台支持以及丰富的类库和工具&#xff0c;使其成为在工业自动化领域应用的有力工具。 我这里刚好有嵌入式、单片机…

微服务系列文章 之 nginx日志格式分析以及修改

如何自定义日志格式&#xff0c;就需要修改nginx日志打印格式 一. 打开终端&#xff0c;登录服务器并输入服务器密码 //ssh 用户名服务器ip ssh root192.168.0.132二. 切换到nginx目录 cd /var/log/nginx/ 三. 查看nginx日志 tail -f access.log 日志说明&#xff1a; //…

Servlet的监听器

Servlet常用的监听器 ServletContextAttributeListener 用来感知ServlerContext对象属性变化&#xff0c;比如添加或删除属性变化 ServletContextListener 用来感知ServlerContext对象的创建和销毁的 ServletRequestListener 可以用来监听感知ServletRequest对象的创建和销毁的…

OLED拼接屏采购指南:如何选择最佳方案?

OLED拼接屏作为一种创新的大屏幕显示设备&#xff0c;正在成为各行各业信息展示和传播的重要工具。 然而&#xff0c;面对市场上众多的品牌和型号&#xff0c;如何选择最佳的OLED拼接屏方案成为一项关键任务。 本文将为您提供一份全面且实用的OLED拼接屏采购指南&#xff0c;…

.NET Native AOT的静态库与动态库

.NET不仅可以使用 C静态库与动态库&#xff0c;也可以将.NET实现的函数导出为C静态库与动态库。在没有Native Aot之前&#xff0c;.NET只能通过P/Invoke享受C/C生态&#xff0c;而在Native Aot之后&#xff0c;不仅可以享受这些生态&#xff0c;还可以开发SDK供其他语言调用。 …

WAIC2023会后记

听了3天WAIC的会&#xff0c; 大开眼界&#xff0c;算是上了堂大课。 本次参会的目的是听听AI企业信息化的想法、理论和实践。以进一步探索可能的业务场景。三天的会结束后&#xff0c;留下深刻印象的有如下几点。 大模型当道 2023这次大会的主题成了大模型&#xff0c;谈的…

基于单片机电子密码锁射频卡识别指纹门禁密码锁系统的设计与实现

功能介绍 通过指纹进行开锁或者是按键输入当前的密码&#xff0c;修改密码&#xff0c;对IC卡可以进行注册&#xff0c;删除。当有RFID卡进入到读卡器的读卡范围内时&#xff0c;则会自动读取卡序列号&#xff0c;单片机根据卡的序列号对卡进行判断。若该卡是有效卡&#xff0c…

RabbitMQ安装

这里写目录标题 简介下载ELANG安装ELang配置环境变量安装RabbitMQ 简介 RabbitMQ 是一个开源的遵循 AMQP 协议实现的基于 Erlang语言编写&#xff0c;**即需要先安装部署Erlang环境再安装RabbitMQ环境。**需加注意的是&#xff0c;读者若不想跟着我的版本号下载安装&#xff0…

MacBook Java开发环境搭建记录

一、Homebrew的镜像设置 对于Java JDK的安装&#xff0c;我们更推荐使用Homebrew来进行安装管理。但Homebrew的curl国外源的下载速度实在是一言难尽&#xff0c;时常还会发生无法访问的情况。 那么我们此时的解决方法就有两种了&#xff0c;第一种便是使用全局的VPN代理进行下载…

让你不再疑惑加水印用什么软件

每个人都有自己的独特创意和作品&#xff0c;而在现今互联网时代&#xff0c;分享和传播作品已成为一种普遍现象。然而&#xff0c;随着互联网的发展&#xff0c;越来越多的作品被人恶意盗用和复制&#xff0c;使得原创作者的权益受到了侵害。为了保护自己的作品&#xff0c;加…

【SpringBoot】从零开始封装自己的starter并且引入到其他项目中使用

从零开始封装自己的starter并且引入到其他项目中使用 简介 本文将介绍如何从零开始封装自己的starter并且引入到其他项目中使用 为什么要自己封装starter&#xff1f; 这样可以对spring以及其他第三方提供的starter做二次封装或者封装一些自己需要的内容提供给其他项目使用&…

MySQL五种约束类型(普通 /自增主键,外键等) + 进阶查询(聚合查询,内 /外连接查询,自连接查询,子查询,合并查询)

文章目录 前言一、五种约束NOT NULL 约束UNIQUE 约束DEFAULT 约束PRIMARY KEY 主键约束(重点)普通主键自增主键 FOREIGN KEY 外键约束(重点) 二、进阶查询聚合查询聚合函数GROUP BY子句HAVING 联合查询笛卡尔积内连接外连接自连接子查询单行子查询&#xff1a;返回一行记录的子…