2019独角兽企业重金招聘Python工程师标准>>>
我们在通过elasticsearch查询text类型的字段时,我们使用aggs进行聚合某个text类型field。这时elasticsearch会自动进行分词将分词后的结果进行聚合。获取每一个分词出现在文档的文档个数。注意:是文档的次数不是文档中分词出现的次数,也就是说即便某个词在某个文档中出现了多次,但是只记录这个词的doc_count次数为1.
添加一个可分词的text字段模板:
需要添加 analyzer 和 fielddata两个属性
[java] view plain copy
- "allContent": {
- "type": "text",
- "analyzer": "ik_smart",
- "fielddata": true
- }
查询语句例子:
[java] view plain copy
- GET voice*/_search
- {
- "_source": "{transData.allContent}",
- "query": {},
- "aggs": {
- "hotword": {
- "terms": {
- "field": "transData.allContent",
- "size": 10,
- "order": {
- "_count": "desc"
- }
- }
- }
- },
- "size": 0
- }
这里的size:0控制的是结果中hits展示的个数。
查询结果例子:
[java] view plain copy
- {
- "took": 0,
- "timed_out": false,
- "_shards": {
- "total": 5,
- "successful": 5,
- "failed": 0
- },
- "hits": {
- "total": 1,
- "max_score": 0,
- "hits": []
- },
- "aggregations": {
- "hotword": {
- "doc_count_error_upper_bound": 1,
- "sum_other_doc_count": 314,
- "buckets": [
- {
- "key": "ok",
- "doc_count": 119
- },
- {
- "key": "一",
- "doc_count": 123
- },
- {
- "key": "一下",
- "doc_count": 114
- },
- {
- "key": "一个",
- "doc_count": 91
- },
- {
- "key": "一个月",
- "doc_count": 52
- },
- {
- "key": "一些",
- "doc_count": 23
- },
- {
- "key": "一包",
- "doc_count": 13
- },
- {
- "key": "一块",
- "doc_count": 11
- },
- {
- "key": "一天",
- "doc_count": 4
- },
- {
- "key": "一定",
- "doc_count": 2
- }
- ]
- }
- }
- }