elasticSearch
概述
Elasticsearch是一个近实时的搜索平台。这意味着,从索引一个文档直到这个文档能够被搜索到有一个很小的延迟(通常是一秒)
集群
一个集群就是由一个或多个节点组织在一起, 它们共同持有你全部的数据, 并一起提供索引和搜索功能。 一个集群由一个唯一的名字标识, 这个名字默认就是“elasticsearch”。 这个名字很重要, 因为一个节点只能通过指定某个集群的名字来加入这个集群。
节点(node)
一个节点是你集群中的一个服务器,作为集群的一部分,它存储你的数据,参与集群的索引和搜索功能。 和集群类似, 一个节点也是由一个名字来标识的, 默认情况下, 在节点启动时会随机分配一个全局唯一标示来作为它的名字,如果你不希望是默认的你也可以自己指定。这个名字在集群管理时很重要,因为在管理过程中,你希望根据这个名称去确定网络中的服务器对应的是Elasticsearch集群中的哪个节点。
一个节点可以通过配置集群名称来加入一个指定的集群。 默认情况下,每个节点都会被安排加入到一个叫做 elasticsearch
的集群中,这意味着,如果你在你的网络中启动了若干个节点, 并假定它们能够相互发现彼此,它们将会自动地形成并加入到一个叫做 elasticsearch
的集群中。
在一个集群里可以拥有任意多个节点。而且,如果当前你的网络中没有运行任何Elasticsearch节点,这时启动一个节点,会默认创建并加入一个叫做 elasticsearch
的单节点集群。
注意:
在不同环境中不要重复使用相同的集群名字
索引库(index)
一个索引库就是一些拥有相似特征文档的集合。例如,你可以有一个会员数据的索引库,一个商品目录的索引库,还有一个订单数据的索引库。一个索引库由一个名字来标识(必须全部是小写字母的),在对 document(文档)执行 indexing(索引),search(搜索),update(更新)和 delete(删除)动作时都需要通过此名字来操作。
文档(document)
文档是索引信息的基本单位。。理由,你可以拥有某一个会员文档、一个商品文档、一个订单文档。文档以JSON格式来表示,JSON是一个到处存在的互联网数据交互格式。
在一个索引库或类型里面,你可以存储任意多的文档。注意,一个文档物理上存在于一个索引库之中,但文档必须被编入或分配到一个索引库的类型。
分片和副本(shards and replicas)
一个索引可以存储超出单个结点硬件限制的大量数据。比如,一个具有10亿文档的索引库需要占据1TB的磁盘空间,他将不可能存储在单一节点、或者处理搜索请求响应会非常慢。
为了解决这个问题,Elasticsearch提供了将索引库划分成多个分片的能力。当你创建一个索引库的时候,你可以指定你想要的分片的数量。每个分片本身也是一个全功能且独立的“索引”,这个“索引” 可以被放置到集群中的任何节点上。
分片之所以重要,主要有两方面的原因:
允许你水平的拆分与扩展容量
允许你在分片(位于多个节点上)之间进行分布式的、并行的操作,进而提高性能与吞吐量
至于分片怎样分布、搜索请求时它的文档怎样聚合返回,完全由Elasticsearch管理,对于用户来说这些都是透明的。
在一个网络或云的环境里异常时可预见的。在分片或节点因为某些原因处于离线状态或者消失的情况下,故障转移机制是非常有用且强烈推荐的。为此,Elasticsearch允许你为分片创建一份或多份拷贝,这些拷贝叫做副本分片,或者直接叫副本。
副本之所以重要,有两个主要原因:
在分片/节点失败的情况下,副本提供了高可用性。基于整个原因,副本分片不要与原分片或主要分片存放在同一节点上是非常重要的。
因为搜索可以在所有的副本上并行运行,副本可以扩展你的搜索量或吞吐量。
总之,每个索引库可以被分成多个分片。一个索引也可以被复制0次(即没有副本) 或多次。一旦复制了,每个索引就有了主分片(作为复制源的分片)和副本分片(主分片的拷贝)。 分片和副本的数量可以在索引创建的时候指定。在索引创建之后,你可以在任何时候动态地改变副本的数量,但是你不能再改变分片的数量。
默认情况下,Elasticsearch中的每个索引分配5个主分片和1个副本。这意味着,如果你的集群中至少有两个节点,你的索引将会有5个主分片和另外5个复制分片(1个全量拷贝),这样每个索引总共就有10个分片。
注意
每个Elasticsearch的分片都是一个独立的Lucene索引。在单个 Lucene 索引中有一个最大的文档数量限制。从LUCENE-5843的时候开始,该限制为 2,147,483,519(=Interger.MAX_VALUE - 128)个文档。您可以使用 _cat/shardsapi来监控分片大小。
启动elasticSearch
./elasticsearch -Ecluster.name=my_cluster_name -Enode.name=my_node_name
rest 接口
现在我们已经有一个正常运行的节点(和集群),下一步就是要去理解怎样与其通信。幸运的是,Elasticsearch提供了非常全面和强大的REST API,利用这个REST API你可以同你的集群交互。下面是利用这个API,可以做的几件事情:
检查你的集群、节点和索引库的健康状态和各种统计信息
管理你的集群、节点、索引数据和元数据
对你的索引库进行CRUD(创建、读取、更新和删除)和搜索操作
执行高级的搜索操作, 像是分页、排序、过滤、脚本编写(scripting)、聚合(aggregations)等其它操作
直接访问时可能无法访问,此时需要修改一下配置:
将xpack.security.enabled: 和
xpack.security.http.ssl:enabled: 设置为false即可。设置完了之后重启elasticsearch,然后在进行访问。
访问下面的地址
集群信息:
http://localhost:9200/
get
{"name": "node-1","cluster_name": "gavin","cluster_uuid": "tBk-G0sNTiG9NkPlRoBfgg","version": {"number": "8.13.4","build_flavor": "default","build_type": "tar","build_hash": "da95df118650b55a500dcc181889ac35c6d8da7c","build_date": "2024-05-06T22:04:45.107454559Z","build_snapshot": false,"lucene_version": "9.10.0","minimum_wire_compatibility_version": "7.17.0","minimum_index_compatibility_version": "7.0.0"},"tagline": "You Know, for Search"
}
集群健康
http://localhost:9200/_cat/health?v
get
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time active_shards_percent
1716804343 10:05:43 gavin green 1 1 1 1 0 0 0 0 - 100.0%
当我们查看集群状态的时候,我们可能得到绿色、黄色或红色三种状态。绿色代表一切正常(集群功能齐全);黄色意味着所有的数据都是可用的,但是某些副本没有被分配(集群功能齐全);红色则代表因为某些原因,某些数据不可用。注意,即使是集群状态是红色的,集群仍然是部分可用的(它仍然会利用可用的分片来响应搜索请求),但是可能你需要尽快修复它,因为你有丢失的数据。
获取所有索引库
_cat Api
http://localhost:9200/_cat/indices?v
--get
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size dataset.size
这个结果意味着,在我们的集群中没有任何索引。
创建一个索引库
http://localhost:9200/customer?pretty
put
{"acknowledged": true,"shards_acknowledged": true,"index": "customer"
}
此时再查看索引库:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size dataset.size
yellow open customer JWwuOiv_RNOpVUpMpzHxJg 1 1 0 0 227b 227b 227b
黄色意味着所有的数据都是可用的,但是某些副本没有被分配(集群功能齐全
黄色意味着某些副本没有(或者还未)被分配。这个索引之所以这样,是因为Elasticsearch会默认为这个索引库创建一份副本。 然而由于我们现在只有一个节点在运行,那这份副本就分配不了了(为了高可用),直到另外一个节点加入到这个集群后,才能分配。一旦那份副本被分配到第二个节点,这个索引库的健康状态就会变成绿色。
索引文档创建与查询
put或post请求都可以
索引名/索引类型/id
索引类型 _doc _create
http://localhost:9200/customer/_doc/1?pretty --put请求body
{"name":"gavin"
}
返回:
{"_index": "customer","_id": "1","_version": 1,"result": "created","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 0,"_primary_term": 1
}
一个新的会员文档在customer索引库的_doc类型中被成功创建。文档也有一个内部id 1, 这个id是我们在创建索引文档的时候指定的。
查看索引:
http://localhost:9200/customer/_doc/1?pretty
返回:
{"_index": "customer","_id": "1","_version": 1,"_seq_no": 0,"_primary_term": 1,"found": true,"_source": {"name": "gavin"}
}
_index elastic索引库类型
_id id
found 找到了
source 返回我们存入的数据
如果没找到:
{"error": {"root_cause": [{"type": "index_not_found_exception","reason": "no such index [customer2]","resource.type": "index_or_alias","resource.id": "customer2","index_uuid": "_na_","index": "customer2"}],"type": "index_not_found_exception","reason": "no such index [customer2]","resource.type": "index_or_alias","resource.id": "customer2","index_uuid": "_na_","index": "customer2"},"status": 404
}
如果创建时指定了id,如果id不存在,则创建,否则就是删除原文档,然后插入这个新的;
post时如果不传id则会自动生成一个id
并发创建时
并发创建时可以指定文档类型为_create
创建时指定了文档类型 _create ,
localhost:9200/gavinlim/_create/1
如果有id了,则会报错:
{"error": {"root_cause": [{"type": "version_conflict_engine_exception","reason": "[1]: version conflict, document already exists (current version [2])","index_uuid": "cjK_AGUBRJaViGVAaB5a4w","shard": "0","index": "gavinlim"}],"type": "version_conflict_engine_exception","reason": "[1]: version conflict, document already exists (current version [2])","index_uuid": "cjK_AGUBRJaViGVAaB5a4w","shard": "0","index": "gavinlim"},"status": 409
}
删除索引
http://localhost:9200/customer?pretty
delete请求
{"acknowledged": true
}
再查看索引 http://localhost:9200/customer/_doc/1?pretty
返回:
{"error": {"root_cause": [{"type": "index_not_found_exception","reason": "no such index [customer]","resource.type": "index_or_alias","resource.id": "customer","index_uuid": "_na_","index": "customer"}],"type": "index_not_found_exception","reason": "no such index [customer]","resource.type": "index_or_alias","resource.id": "customer","index_uuid": "_na_","index": "customer"},"status": 404
}
修改数据
http://localhost:9200/gavin/_doc/1?pretty
put
body:
{"name":"gavinLim"
}
以上的命令将会把这个文档索引到customer索引的_doc类型中,其ID是1。如果我们对一个不同(或相同)的文档应用以上的命令,Elasticsearch将会用一个新的文档来替换(重新索引)当前ID为1的那个文档。
返回值:
{"_index": "gavin","_id": "1","_version": 2,"result": "updated","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 1,"_primary_term": 1
}
创建 与修改 为一个接口,根据id不同来判断时是否是修改
如果创建的时候不传id,则需要使用post请求
这样会生成一个随机id
{"_index": "gavin","_id": "7g8GvY8BLmyEZRBgSBJo","_version": 1,"result": "created","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 3,"_primary_term": 1
}
查询 http://localhost:9200/gavin/_doc/7g8GvY8BLmyEZRBgSBJo?pretty
get
返回值:
{"_index": "gavin","_id": "7g8GvY8BLmyEZRBgSBJo","_version": 1,"_seq_no": 3,"_primary_term": 1,"found": true,"_source": {"name": "gavinLimTest"}
}
在我们想要做一次更新的时候,Elasticsearch先删除旧文档,然后再索引新的文档。
更新文档
除了可以索引、替换文档之外,我们也可以更新一个文档。但要注意,Elasticsearch底层并不支持原地更新。在我们想要做一次更新的时候,Elasticsearch先删除旧文档,然后再索引新的文档。
下面的例子展示了怎样将ID为1的文档的name字段改成“Jane Doe”:
POST /customer/external/1/_update?pretty
{"doc": { "name": "Jane Doe" }
}
下面的例子展示了怎样将ID为1的文档的name字段改成“Jane Doe”的同时,给它加上age字段:
POST /customer/external/1/_update?pretty
{"doc": { "name": "Jane Doe", "age": 20 }
}
更新也可以通过使用简单的脚本来进行。这个例子使用一个脚本将age加5:
该脚本在es5中测试通过,在最新的版本8中测试失败;
POST /customer/external/1/_update?pretty
{"script" : "ctx._source.age += 5"
}
在上面的例子中,ctx._source
指向当前被更新的文档。
注意,目前的更新操作只能一次修改在一个文档上。将来Elasticsearch将提供同时更新符合指定查询条件的多个文档的功能(类似于SQL的 UPDATE-WHERE
语句)。
只更新部分字段 _update
localhost:9200/gavinlim/_update/1 postbody{"doc": {"age": 32}
}
多次请求后版本号没有发生变化
{"_index": "gavinlim","_type": "_doc","_id": "1","_version": 11,"result": "noop","_shards": {"total": 0,"successful": 0,"failed": 0},"_seq_no": 10,"_primary_term": 1
}
如果同时修改了name字段
body
{"doc": {"name":"lisisis","age": 32}
}
返回结果:
{"_index": "gavinlim","_type": "_doc","_id": "1","_version": 12,"result": "noop","_shards": {"total": 0,"successful": 0,"failed": 0},"_seq_no": 11,"_primary_term": 1
}
先查询后更新:
更新前:
{"_index": "gavinlim","_type": "_doc","_id": "1","_version": 14,"_seq_no": 13,"_primary_term": 1,"found": true,"_source": {"name": "张三","doc": {"name": "李四"},"age": 32}
}
更新请求:
localhost:9200/gavinlim/_update_by_query/ post
{"query":{"match":{"_id":1}},"script":{"source":"ctx._source.name='张三xxxx'"}
}
更新后:
{"_index": "gavinlim","_type": "_doc","_id": "1","_version": 15,"_seq_no": 14,"_primary_term": 1,"found": true,"_source": {"name": "张三xxxx","doc": {"name": "李四"},"age": 32}
}
ctx 是上下文,然后可以通过脚本修改上下文中的数据
问题1:如何获得某一个版本号的数据呢?
删除索引
http://localhost:9200/gavin/external/1?pretty
delete
{"found": true,"_index": "gavin","_type": "external","_id": "1","_version": 4,"result": "deleted","_shards": {"total": 2,"successful": 1,"failed": 0}
}
可以通过根据查询条件删除API来一次删除符合指定条件的文档。如果要通过 根据查询条件删除API
来删除所有文档,使用删除整个索引库替代会效率更高。
批量处理
_bulk API
除了能够对单个的文档进行索引、更新和删除之外,Elasticsearch也提供了操作的批量处理功能,它通过使用_bulk API实现。这个功能非常重要,因为它提供了非常高效的机制来尽可能快的完成多个操作,与此同时尽可能地减少网络交互。
作为一个快速的例子,以下调用在一次bulk操作中索引了两个文档(ID 1 - John Doe 与 ID 2 - Jane Doe) :
POST /customer/external/_bulk?pretty
{"index":{"_id":"1"}}
{"name": "John Doe" }
{"index":{"_id":"2"}}
{"name": "Jane Doe" }
以下例子在一个bulk操作中,首先更新第一个文档(ID为1),然后删除第二个文档(ID为2)
POST /customer/external/_bulk?pretty
{"update":{"_id":"1"}}
{"doc": { "name": "John Doe becomes Jane Doe" } }
{"delete":{"_id":"2"}}
注意上面的delete动作,由于删除动作只需要被删除文档的ID,所以并没有对应的源文档。
Bulk API不会因为其中的一个动作失败而整体失败。如果其中一个动作因为某些原因失败了,它将会继续处理后面的动作。在Bulk API返回时,它将提供每个动作的状态(按照同样的顺序),所以你能够看到某个动作成功与否。
批量导入文件中数据:
curl -H "Content-Type: application/json" -XPOST 'localhost:9200/bank/account/_bulk?pretty&refresh' --data-binary "@account.json"
导入之后查看:
[zzy@Gavin elasticsearch-7.17.21]$ curl 'localhost:9200/_cat/indices?v'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open hive nV38T6MrQR6rNvNC0PoZRQ 1 1 2 0 6.4kb 6.4kb
green open .geoip_databases IdS_bUOGTrmowaomEqIUww 1 0 34 0 31.3mb 31.3mb
yellow open bank snqpK3BNSeuc7eYh5BC4JQ 1 1 1000 0 374.5kb 374.5kb
yellow open customertest wdvFyvoyQ32i34ER8qZf3A 1 1 1 0 4.1kb 4.1kb
yellow open customer 47-YlaAlSD-4VOGPvrCXEw 1 1 1 0 3.8kb 3.8kb
yellow open bankccb y70wfcwSTmmQ70UodmGZxQ 1 1 1000 0 374.5kb
错误解决:
The bulk request must be terminated by a n ewline [\n]"
{"error": {"root_cause": [{"type": "illegal_argument_exception","reason": "The bulk request must be terminated by a newline [\\n]"}],"type": "illegal_argument_exception","reason": "The bulk request must be terminated by a newline [\\n]"},"status": 400
}
在json文件开头和结尾添加换行即可
搜索文档
http://localhost:9200/bank/_search?q=*&sort=account_number:asc?pretty
索引类型/_search?q=查询条件&sort=排序字段:排序方式?pretty
我们在 bank
索引库中搜索( _search
端点),并且 q=*
参数指示Elasticsearch去匹配这个索引中所有的文档。sort=account_number:asc
指示结果按 account_number
字段升序排列。pretty参数仅仅是告诉Elasticsearch返回美观的JSON结果。
示例:
http://localhost:9200/bank/_search?q=firstname:Amber&sort=account_number:asc?pretty
get
返回值:
{"took": 7,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 1,"relation": "eq"},"max_score": 6.5032897,"hits": [{"_index": "bank","_type": "account","_id": "1","_score": 6.5032897,"_source": {"account_number": 1,"balance": 39225,"firstname": "Amber","lastname": "Duke","age": 32,"gender": "M","address": "880 Holmes Lane","employer": "Pyrami","email": "amberduke@pyrami.com","city": "Brogan","state": "IL"}}]}
}
对于这个响应,我们可以看到如下的部分:
took
:Elasticsearch 执行这个搜索的耗时,以毫秒为单位timed_out
:指明这个搜索是否超时_shards
:指出多少个分片被搜索了,同时也标记了搜索成功与失败分片的数量hits
:搜索结果hits.total
:匹配查询条件的文档的总数目hits.hits
:真正的搜索结果数组(默认是前10个文档)hits.sort
:结果排序字段(如果缺失则按照得分排序)hits._score
和max_score
:现在先忽略这些字段
使用json搜索文档
使用json搜索文档 ~ get请求,条件放在body中(这样也能接受body中的参数)
http://localhost:9200/bank/_searchbody:{"query": { "match": {"gender":"F"} },"sort": [{ "account_number": "asc" }]
}
一旦你取回了搜索结果,Elasticsearch就完成了使命,它不会保持任何服务器端的资源或者在你的结果中打开游标。
分页
浅分页与深分页
浅分页:
一旦你取回了搜索结果,Elasticsearch就完成了使命,它不会保持任何服务器端的资源或者在你的结果中打开游标。
即一旦查询完成,不会保持与服务器的连接
浅分页~from + size 浅分页:
默认elasticSearch对查询结果进行分页,默认一页10条,
可以指定要查询的数量大小~body
{"query": { "match_all": {} },"size":100,"sort": [{ "account_number": "asc" }]}
或者:
http://localhost:9200/bank/_search?q=*&sort=account_number:asc&size=20?pretty
"浅"分页可以理解为简单意义上的分页。它的原理很简单,就是查询前20条数据,然后截断前10条,只返回10-20的数据。这样其实白白浪费了前10条的查询。
其中,from定义了目标数据的偏移值,size定义当前返回的数目。默认from为0,size为10,即所有的查询默认仅仅返回前10条数据。
在这里有必要了解一下from/size的原理:
因为es是基于分片的,假设有5个分片,from=100,size=10。则会根据排序规则从5个分片中各取回100条数据数据,然后汇总成500条数据后选择最后面的10条数据。
做过测试,越往后的分页,执行的效率越低。总体上会随着from的增加,消耗时间也会增加。而且数据量越大,就越明显!
深分页~scroll 深分页:
from+size查询在10000-50000条数据(1000到5000页)以内的时候还是可以的,但是如果数据过多的话,就会出现深分页问题。
为了解决上面的问题,elasticsearch提出了一个scroll滚动的方式。
scroll 类似于sql中的cursor,使用scroll,每次只能获取一页的内容,然后会返回一个scroll_id。根据返回的这个scroll_id可以不断地获取下一页的内容,所以scroll并不适用于有跳页的情景。
scroll=5m 表示连接客户端5分钟
先获取到查询结果的子集之后可以不断的从服务器提取结果中的剩余部分,这个数据集使用了一种有状态的服务器端游标技术。
http://localhost:9200/bank/_search?scroll=5m get请求body{"query": { "match_all": {} },"size":1,"from": 0,"sort": [{ "account_number": "asc" }]
返回:
{"_scroll_id": "FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFlFURlJSZVV2UlNTMThHblBPeF9qc3cAAAAAAAAAMRZJcTllT29FbVIxT0dmQ2wyQmc3OU5B","took": 9,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 1000,"relation": "eq"},"max_score": null,"hits": [{"_index": "bank","_type": "account","_id": "0","_score": null,"_source": {"account_number": 0,"balance": 16623,"firstname": "Bradshaw","lastname": "Mckenzie","age": 29,"gender": "F","address": "244 Columbus Place","employer": "Euron","email": "bradshawmckenzie@euron.com","city": "Hobucken","state": "CO"},"sort": [0]}]}
}
根据scroll_id查询下一页
http://localhost:9200/bank/_search?scroll=5m get请求body{"scroll_id":"FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFlFURlJSZVV2UlNTMThHblBPeF9qc3cAAAAAAAAAMRZJcTllT29FbVIxT0dmQ2wyQmc3OU5B","scroll": "5m"
}
返回:
{"_scroll_id": "FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFlFURlJSZVV2UlNTMThHblBPeF9qc3cAAAAAAAAAMRZJcTllT29FbVIxT0dmQ2wyQmc3OU5B","took": 13,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 1000,"relation": "eq"},"max_score": null,"hits": [{"_index": "bank","_type": "account","_id": "1","_score": null,"_source": {"account_number": 1,"balance": 39225,"firstname": "Amber","lastname": "Duke","age": 32,"gender": "M","address": "880 Holmes Lane","employer": "Pyrami","email": "amberduke@pyrami.com","city": "Brogan","state": "IL"},"sort": [1]}]}
}
也可以使用post请求:
只需要将上面的get请求方式改为post即可
scroll删除 根据官方文档的说法,scroll的搜索上下文会在scroll的保留时间截止后自动清除,但是我们知道scroll是非常消耗资源的,所以一个建议就是当不需要了scroll数据的时候,尽可能快的把scroll_id显式删除掉。
删除scroll_id
http://localhost:9200/_search/scroll/FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFlFURlJSZVV2UlNTMThHblBPeF9qc3cAAAAAAAAAMxZJcTllT29FbVIxT0dmQ2wyQmc3OU5B
delete请求
返回:
{"succeeded": true,"num_freed": 1
}
删除所有的
http://localhost:9200/_search/scroll/_all
delete请求
scroll 的方式,官方的建议不用于实时的请求(一般用于数据导出),因为每一个 scroll_id 不仅会占用大量的资源,而且会生成历史快照,对于数据的变更不会反映到快照上。
深分页~search_after
search_after 分页的方式是根据上一页的最后一条数据来确定下一页的位置,同时在分页请求的过程中,如果有索引数据的增删改查,这些变更也会实时的反映到游标上。但是需要注意,因为每一页的数据依赖于上一页最后一条数据,所以无法跳页请求。
为了找到每一页最后一条数据,每个文档必须有一个全局唯一值,官方推荐使用 _uid 作为全局唯一值
,其实使用业务层的 id 也可以。
查询
http://localhost:9200/bank/_search?q=*&from=0&size=10&sort=account_number:asc
返回值:
{"took": 5,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 1000,"relation": "eq"},"max_score": null,"hits": [{"_index": "bank","_type": "account","_id": "0","_score": null,"_source": {"account_number": 0,"balance": 16623,"firstname": "Bradshaw","lastname": "Mckenzie","age": 29,"gender": "F","address": "244 Columbus Place","employer": "Euron","email": "bradshawmckenzie@euron.com","city": "Hobucken","state": "CO"},"sort": [0]},...略...{"_index": "bank","_type": "account","_id": "9","_score": null,"_source": {"account_number": 9,"balance": 24776,"firstname": "Opal","lastname": "Meadows","age": 39,"gender": "M","address": "963 Neptune Avenue","employer": "Cedward","email": "opalmeadows@cedward.com","city": "Olney","state": "OH"},"sort": [9]}]}
}
查询下一页~要把上一次查询的sort值传入
可以看到返回值为9,所以下一页的查询条件应该是account_number:gt:9
所以下一页的请求:
http://localhost:9200/bank/_searchbody
{"query": {"match_all": {}},"from": 0,"size": 10,"sort": {"account_number": "asc"},"search_after": [9]
}
结果:
{"took": 4,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 1000,"relation": "eq"},"max_score": null,"hits": [{"_index": "bank","_type": "account","_id": "10","_score": null,"_source": {"account_number": 10,"balance": 46170,"firstname": "Dominique","lastname": "Park","age": 37,"gender": "F","address": "100 Gatling Place","employer": "Conjurica","email": "dominiquepark@conjurica.com","city": "Omar","state": "NJ"},"sort": [10]},...略...{"_index": "bank","_type": "account","_id": "19","_score": null,"_source": {"account_number": 19,"balance": 27894,"firstname": "Schwartz","lastname": "Buchanan","age": 28,"gender": "F","address": "449 Mersereau Court","employer": "Sybixtex","email": "schwartzbuchanan@sybixtex.com","city": "Greenwich","state": "KS"},"sort": [19]}]}
}
查询
不加查询条件的情况下返回文档的全部字段。是返回完整的JSON文档的。这可以通过 source
来引用整个的文档内容,
如果我们不想返回完整的源文档,我们可以通过添加搜索条件以指定返回的几个字段。
http://localhost:9200/bank/_search?pretty getbody
{"query": {"match_all": {}},"from": 0,"size": 1,"_source": ["account_number","balance"]
}
返回结果中只_source包含 account_number balance
{"took": 2,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 1000,"relation": "eq"},"max_score": 1.0,"hits": [{"_index": "bank","_type": "account","_id": "1","_score": 1.0,"_source": {"account_number": 1,"balance": 39225}}]}
}
单字段值查询
单值查询~精确查询
http://localhost:9200/bank/_search get body
{"query":{"match":{"account_number":20}},"from":0,"size":4
}
返回账户编号为20的文档:
{"took": 2,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 1,"relation": "eq"},"max_score": 1.0,"hits": [{"_index": "bank","_type": "account","_id": "20","_score": 1.0,"_source": {"account_number": 20,"balance": 16418,"firstname": "Elinor","lastname": "Ratliff","age": 36,"gender": "M","address": "282 Kings Place","employer": "Scentric","email": "elinorratliff@scentric.com","city": "Ribera","state": "WA"}}]}
}
单值查询~模糊查询
http://localhost:9200/bank/_search getbody{"query":{"match":{"address":"mill"}},"from":2,"size":2
}
返回结果:返回地址中包含了“mill”词条(term)的所有账户:
{"took": 2,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 4,"relation": "eq"},"max_score": 5.4032025,"hits": [{"_index": "bank","_type": "account","_id": "345","_score": 5.4032025,"_source": {"account_number": 345,"balance": 9812,"firstname": "Parker","lastname": "Hines","age": 38,"gender": "M","address": "715 Mill Avenue","employer": "Baluba","email": "parkerhines@baluba.com","city": "Blackgum","state": "KY"}},{"_index": "bank","_type": "account","_id": "472","_score": 5.4032025,"_source": {"account_number": 472,"balance": 25571,"firstname": "Lee","lastname": "Long","age": 32,"gender": "F","address": "288 Mill Street","employer": "Comverges","email": "leelong@comverges.com","city": "Movico","state": "MT"}}]}
}
多字段值查询
http://localhost:9200/bank/_search getbody{"query":{"match":{"address":"mill lane"}},"from":3,"size":4
}
返回结果:返回地址中包含“mill”或者“lane”词条的账户:
{"took": 1,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 19,"relation": "eq"},"max_score": 9.507477,"hits": [{"_index": "bank","_type": "account","_id": "472","_score": 5.4032025,"_source": {"account_number": 472,"balance": 25571,"firstname": "Lee","lastname": "Long","age": 32,"gender": "F","address": "288 Mill Street","employer": "Comverges","email": "leelong@comverges.com","city": "Movico","state": "MT"}},{"_index": "bank","_type": "account","_id": "1","_score": 4.1042743,"_source": {"account_number": 1,"balance": 39225,"firstname": "Amber","lastname": "Duke","age": 32,"gender": "M","address": "880 Holmes Lane","employer": "Pyrami","email": "amberduke@pyrami.com","city": "Brogan","state": "IL"}},{"_index": "bank","_type": "account","_id": "70","_score": 4.1042743,"_source": {"account_number": 70,"balance": 38172,"firstname": "Deidre","lastname": "Thompson","age": 33,"gender": "F","address": "685 School Lane","employer": "Netplode","email": "deidrethompson@netplode.com","city": "Chestnut","state": "GA"}},{"_index": "bank","_type": "account","_id": "556","_score": 4.1042743,"_source": {"account_number": 556,"balance": 36420,"firstname": "Collier","lastname": "Odonnell","age": 35,"gender": "M","address": "591 Nolans Lane","employer": "Sultraxin","email": "collierodonnell@sultraxin.com","city": "Fulford","state": "MD"}}]}
}
使用match 匹配字段时,如果字段间有空格,则是会作为两个字段处理,如果查询条件中有空格,则需要使用match_phrase 匹配字段,否则会匹配不到结果:
http://localhost:9200/bank/_search getbody
{"query":{"match_phrase":{"address":"mill lane"}},"from":0,"size":4}
结果:
{"took": 2,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 1,"relation": "eq"},"max_score": 9.507477,"hits": [{"_index": "bank","_type": "account","_id": "136","_score": 9.507477,"_source": {"account_number": 136,"balance": 45801,"firstname": "Winnie","lastname": "Holland","age": 38,"gender": "M","address": "198 Mill Lane","employer": "Neteria","email": "winnieholland@neteria.com","city": "Urie","state": "IL"}}]}
}
布尔查询
布尔查询,可以组合多个查询条件,通过设置must,must_not,should,filter来组合查询条件,其中must,must_not,filter是必须的,而should是可选的,如果should中有多个条件,则至少需要满足一个条件,否则返回的结果中不会包含该条件对应的文档:
http://localhost:9200/bank/_search get
{"query":{"bool":{"must":[{"match":{"address":"mill"}},{"match":{"address":"lane"}}]}}
}
{"took": 4,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 1,"relation": "eq"},"max_score": 9.507477,"hits": [{"_index": "bank","_type": "account","_id": "136","_score": 9.507477,"_source": {"account_number": 136,"balance": 45801,"firstname": "Winnie","lastname": "Holland","age": 38,"gender": "M","address": "198 Mill Lane","employer": "Neteria","email": "winnieholland@neteria.com","city": "Urie","state": "IL"}}]}
}
复杂条件查询:
{"query": {"bool": {"must": [{"match": {"address": "mill"}},{"match": {"address": "lane"}}],"must_not": [{"match": {"account_number": 20}}],"should": [{"match": {"address": "yt"}}],"filter":[{"match":{"balance":100}}]}}
}
范围查询
http://localhost:9200/bank/_search getbody
{"query": {"bool": {"must": {"match_all": {}},"filter": {"range": {"balance": {"gte": 20000,"lte": 30000}}}}},"from": 0,"size": 1,"sort": [{"account_number": "asc"}]
}
结果:
{"took": 20,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 217,"relation": "eq"},"max_score": null,"hits": [{"_index": "bank","_type": "account","_id": "2","_score": null,"_source": {"account_number": 2,"balance": 28838,"firstname": "Roberta","lastname": "Bender","age": 22,"gender": "F","address": "560 Kingsway Place","employer": "Chillium","email": "robertabender@chillium.com","city": "Bennett","state": "LA"},"sort": [2]}]}
}
聚合统计
http://localhost:9200/bank/_search get
{"size": 0,"aggs": {"group_by_gender": {"terms": {"field": "gender.keyword"}}}
}
如果size大小为0则只展示聚合数据
注意我们将size
设置成 0,这样我们就可以只看到聚合结果了,而不会显示命中的文档的详细结果。
{"took": 29,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 1000,"relation": "eq"},"max_score": null,"hits": []},"aggregations": {"group_by_gender": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "M","doc_count": 507},{"key": "F","doc_count": 493}]}}
}
结果:
{"took": 3,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 1000,"relation": "eq"},"max_score": null,"hits": []},"aggregations": {"group_by_gender": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "M","doc_count": 507},{"key": "F","doc_count": 493}]}}
}
使用年龄段(20-29,30-39,40-49)分组,然后再用性别分组,最后为每一个年龄段的每组性别计算平均账户余额。
http://localhost:9200/bank/_search getbody
{"size": 0,"aggs": {"group_by_age": {"range": {"field": "age","ranges": [{"from": 20,"to": 30},{"from": 30,"to": 40},{"from": 40,"to": 50}]},"aggs": {"group_by_gender": {"terms": {"field": "gender.keyword"},"aggs": {"average_balance": {"avg": {"field": "balance"}}}}}}}
}
结果:
{"took": 22,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 1000,"relation": "eq"},"max_score": null,"hits": []},"aggregations": {"group_by_age": {"buckets": [{"key": "20.0-30.0","from": 20.0,"to": 30.0,"doc_count": 451,"group_by_gender": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "M","doc_count": 232,"average_balance": {"value": 27374.05172413793}},{"key": "F","doc_count": 219,"average_balance": {"value": 25341.260273972603}}]}},{"key": "30.0-40.0","from": 30.0,"to": 40.0,"doc_count": 504,"group_by_gender": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "F","doc_count": 253,"average_balance": {"value": 25670.869565217392}},{"key": "M","doc_count": 251,"average_balance": {"value": 24288.239043824702}}]}},{"key": "40.0-50.0","from": 40.0,"to": 50.0,"doc_count": 45,"group_by_gender": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "M","doc_count": 24,"average_balance": {"value": 26474.958333333332}},{"key": "F","doc_count": 21,"average_balance": {"value": 27992.571428571428}}]}}]}}
}
elasticSearch 入门
起因:全文检索
索引~倒排索引—反向索引
假如有一个站内搜索功能通过某个关键词来搜索相关文章,那么这个关键词可能出现在标题,也可能出现在文章内容中,我们在创建和修改文章的时候,建立一个关键词与文章的对应关系表,我们可以称之为倒排索引;
正排索引: 主键对应到数据
文章id 标题 内容
1 浅析JAVA设计模式 JAVA设计模式是每一个JAVA程序员都应该掌渥的进阶知识
2 JAVA多线程设计模式 JAVA 多线程与设计模式结合
倒排索引: 数据对应到主键
id 关键词 文章id
1 JAVA 1,2
2 设计模式 1,2
3 多线程 2
es常见启动时的错误:
文件索引
/etc/security/ limits .cont末尾添加如下配置:
soft nofile 65536
hard nofile 65536
soft nproc 4096
hard nproc 4096
最大本地线程数:
[ 2 ]: max number Of threads [ 1024 ] for user [es] is too low, increase to at least [ 4096 ]
无法创建本地线程问题,用户最大可创建线程数太小
vim /etc/security/limits.d/20-nproc .conf改为如下配置:
soft nproc 4096
最大虚拟内存太小:
vi /etc/sysctl.conf
追加以下内容:vm.max_map_count=262144
保存退出之后执行如下命
sysctl -p
elasticsearch 安装插件
查看已安装插件
elasticcsearch-plugin list
安装插件:
elasticsearch-plugin install 插件名
分词器->
elasticsearch-plugin install analysis-icu
安装后需要重启才能生效
POST _analyze
{"analyzer":"icu_analyzer","text":"中华人民共和国山东省烟台市"
}
结果:
移除插件
elasticsearch-plugin remove 插件名
分词器默认分词规则~会单字拆分插件
POST _analyze
{"analyzer":"standard","text":"中华人民共和国山东省烟台市"
}结果:
{"tokens" : [{"token" : "中","start_offset" : 0,"end_offset" : 1,"type" : "<IDEOGRAPHIC>","position" : 0},{"token" : "华","start_offset" : 1,"end_offset" : 2,"type" : "<IDEOGRAPHIC>","position" : 1},{"token" : "人","start_offset" : 2,"end_offset" : 3,"type" : "<IDEOGRAPHIC>","position" : 2},{"token" : "民","start_offset" : 3,"end_offset" : 4,"type" : "<IDEOGRAPHIC>","position" : 3},{"token" : "共","start_offset" : 4,"end_offset" : 5,"type" : "<IDEOGRAPHIC>","position" : 4},{"token" : "和","start_offset" : 5,"end_offset" : 6,"type" : "<IDEOGRAPHIC>","position" : 5},{"token" : "国","start_offset" : 6,"end_offset" : 7,"type" : "<IDEOGRAPHIC>","position" : 6}]
}
插件的另一种安装方式:
直接在es的plugins文件夹下建立相应的插件目录,然后把下载的插件解压(如果是jar包的话,直接放进对应的目录即可);
插件的安装目录: $es_home/plugins/ik/
解压文件~重启es
插件对应版本要一致,不然可能会像下面这样;
[zzy@Gavin e1asticsearch-7.17.21 0[zzy@Gavin e1asticsearch-7.17.21 0j ava · lang · 111ega1ArgumentException ·runnlng· /startelastic 。 shuncaught exception in thread [main]P1ugin [analysis-ik] was built for E1asticsearch version6 · 3 · 9but version7 · 1 7 · 211 Sat org.elasticsearch.plugins.IuginsServich1uginsServiceat org.elasticsearch.plugins.at org.elasticsearch.plugins.hluginsservicat org.elasticsearch.plugins.1u insServic.verifyCompatibi1ity( java : 391 )799 氵.10adBund1e P1uginsService.java.10adBund1es(P1uginsServic · j ava : 533 )·<主 n 主 t >( 0 〗 ug 〔《 ns $ 4 〕 0 财《 44 java:170)
但是,如果没有对应的版本的分词器插件,那咋办?
就比如说 7.17.21版本,没有对用的ik分词器插件,那咋办?
最新es7版本的ik分词插件 是7.17.6,此时我们需要修改一下配置文件~修改 plugin-descriptor.properties 文件中 elasticsearch.version=你的ES版本号
然后重启es即可
测试一下:
POST _analyze
{"analyzer":"ik_max_word","text":"中华人民共和国"
}{"tokens": [{"token": "中华人民共和国","start_offset": 0,"end_offset": 7,"type": "CN_WORD","position": 0},{"token": "中华人民","start_offset": 0,"end_offset": 4,"type": "CN_WORD","position": 1},{"token": "中华","start_offset": 0,"end_offset": 2,"type": "CN_WORD","position": 2},{"token": "华人","start_offset": 1,"end_offset": 3,"type": "CN_WORD","position": 3},{"token": "人民共和国","start_offset": 2,"end_offset": 7,"type": "CN_WORD","position": 4},{"token": "人民","start_offset": 2,"end_offset": 4,"type": "CN_WORD","position": 5},{"token": "共和国","start_offset": 4,"end_offset": 7,"type": "CN_WORD","position": 6},{"token": "共和","start_offset": 4,"end_offset": 6,"type": "CN_WORD","position": 7},{"token": "国","start_offset": 6,"end_offset": 7,"type": "CN_CHAR","position": 8}]
}
elasticsearch 与关系型数据库
关系型数据库 database table row collumn
es index type document field
从es7开始,一个索引只能建立一个type,所以,es7以后,一个索引就相当于一个数据库,一个type就相当于一个表,一个document就相当于一行数据,一个field就相当于一个列
{"_index": "hivery","_type": "_doc","_id": "1","_version": 1,"_seq_no": 0,"_primary_term": 1,"found": true,"_source": {"msg": "操作失败","code": 400,"data": false,"desc": "success"}
}
_index:文档所属索引
_type:文档所属type
_id:文档id
_source:文档内容
_version:文档版本,修改或者删除操作,version会自增
_seq_no:文档的seq_no,seq_no是es集群中文档的版本号
_primary_term: 用于恢复数据时处理多个文档的seqno一样时的冲突,保证primaryshard写入时不会被覆盖,每当primaryshard发生重新分配时,比如重启,primary选举时,_primary_term会自增
_primary_term: 用于恢复数据时处理多个文档的seqno一样时的冲突,保证primaryshard写入时不会被覆盖,每当primaryshard发生重新分配时,比如重启,primary选举时,_primary_term会自增
并发场景下修改文档:
_eq_no 和 _primary_term 都是es集群中文档的版本号,当_eq_no和_primary_term都相同时,说明是同一个文档,可以进行修改,否则,说明是另一个文档,不能进行修改,需要重新获取_eq_no和_primary_term,再进行修改
http://localhost:9200/hivery/_doc/1?if_seq_no=3&if_primary_term=1 post# 修改之前的seqno=3 primary_term=1body{"msg": "操作失败","code": 500,"data": false,"desc":"success"
}结果:
{"_index": "hivery","_type": "_doc","_id": "1","_version": 5,"result": "updated","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 4,"_primary_term": 1
}
修改完之后,再重复请求一次
{"error": {"root_cause": [{"type": "version_conflict_engine_exception","reason": "[1]: version conflict, required seqNo [3], primary term [1]. current document has seqNo [4] and primary term [1]","index_uuid": "md-mBvqVQQaL_d0m9ffV5A","shard": "0","index": "hivery"}],"type": "version_conflict_engine_exception","reason": "[1]: version conflict, required seqNo [3], primary term [1]. current document has seqNo [4] and primary term [1]","index_uuid": "md-mBvqVQQaL_d0m9ffV5A","shard": "0","index": "hivery"},"status": 409
}
所以如果并发修改的时候,带上版本号 seqno primary_term,就可以解决并发修改的问题,
如果不带版本号,那么会覆盖掉之前的数据
设置默认分词器:
创建索引时设置分词器
localhost:9200/gavinlim put
{"settings": {"index": {"analysis.analyzers.default.type": "ik_max_word"}}
}
获得索引的分词器类型:
localhost:9200/gavinlim/_settings get结果:
{"gavinlim": {"settings": {"index": {"routing": {"allocation": {"include": {"_tier_preference": "data_content"}}},"number_of_shards": "1","provided_name": "gavinlim","creation_date": "1717576990165","analysis": {"analyzers": {"default": {"type": "ik_max_word"}}},"number_of_replicas": "1","uuid": "cjK_AGUBRJaViGVAaB5a4w","version": {"created": "7172199"}}}}
}
添加文档时
a": false,
“desc”:“success”
}
结果:
{
“_index”: “hivery”,
“_type”: “_doc”,
“_id”: “1”,
“_version”: 5,
“result”: “updated”,
“_shards”: {
“total”: 2,
“successful”: 1,
“failed”: 0
},
“_seq_no”: 4,
“_primary_term”: 1
}
修改完之后,再重复请求一次```json
{"error": {"root_cause": [{"type": "version_conflict_engine_exception","reason": "[1]: version conflict, required seqNo [3], primary term [1]. current document has seqNo [4] and primary term [1]","index_uuid": "md-mBvqVQQaL_d0m9ffV5A","shard": "0","index": "hivery"}],"type": "version_conflict_engine_exception","reason": "[1]: version conflict, required seqNo [3], primary term [1]. current document has seqNo [4] and primary term [1]","index_uuid": "md-mBvqVQQaL_d0m9ffV5A","shard": "0","index": "hivery"},"status": 409
}
所以如果并发修改的时候,带上版本号 seqno primary_term,就可以解决并发修改的问题,
如果不带版本号,那么会覆盖掉之前的数据
设置默认分词器:
创建索引时设置分词器
localhost:9200/gavinlim put
{"settings": {"index": {"analysis.analyzers.default.type": "ik_max_word"}}
}
获得索引的分词器类型:
localhost:9200/gavinlim/_settings get结果:
{"gavinlim": {"settings": {"index": {"routing": {"allocation": {"include": {"_tier_preference": "data_content"}}},"number_of_shards": "1","provided_name": "gavinlim","creation_date": "1717576990165","analysis": {"analyzers": {"default": {"type": "ik_max_word"}}},"number_of_replicas": "1","uuid": "cjK_AGUBRJaViGVAaB5a4w","version": {"created": "7172199"}}}}
}
添加文档时