1、删除我的测试索引:old_index
curl -X DELETE "http://`hostname -i`:9200/old_index"
curl -X DELETE "http://`hostname -i`:9200/new_index"
2、检查集群索引情况
$ curl -X GET "http://`hostname -i`:9200/_cat/indices?v"
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .geoip_databases ib6tlhzjTf-MQBu-XGIVWg 1 0 33 0 31.1mb 31.1mb
3、新建测试索引:old_index
# 注释
# 1、我只有一个节点,为了测试方便,副本 number_of_replicas 设置为0
# 2、假设我的源索引分片为1,number_of_shards设置为1,用于后续对比验证
curl -X PUT "http://`hostname -i`:9200/old_index" -H 'Content-Type: application/json' -d'
{"mappings": {"properties": {"name": { "type": "text" },"description": { "type": "text" },"publish_date": { "type": "date" }}},"settings": {"number_of_shards": 1,"number_of_replicas": 0}
}'
# 返回结果,代表索引创建成功
{"acknowledged":true,"shards_acknowledged":true,"index":"old_index"}
4、在old_index索引中插入几条测试数据
curl -X POST "http://`hostname -i`:9200/old_index/_bulk" -H 'Content-Type: application/x-ndjson' --data-binary '
{ "index": { "_index": "old_index", "_id": "1" } }
{ "name": "可乐", "description": "大数据SRE工程师", "publish_date": "1991-05-20" }
{ "index": { "_index": "old_index", "_id": "2" } }
{ "name": "炎长", "description": "DBA工程师", "publish_date": "1992-11-23" }
'# 返回结果
{"took": 6,"errors": false,"items": [{"index": {"_index": "old_index","_type": "_doc","_id": "1","_version": 1,"result": "created","_shards": {"total": 1,"successful": 1,"failed": 0},"_seq_no": 0,"_primary_term": 1,"status": 201}}, {"index": {"_index": "old_index","_type": "_doc","_id": "2","_version": 1,"result": "created","_shards": {"total": 1,"successful": 1,"failed": 0},"_seq_no": 1,"_primary_term": 1,"status": 201}}]
}
5、查询old_index索引中的数据
curl -X GET "http://`hostname -i`:9200/old_index/_search"# 查询结果
{"took": 7,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 2,"relation": "eq"},"max_score": 1.0,"hits": [{"_index": "old_index","_type": "_doc","_id": "1","_score": 1.0,"_source": {"name": "可乐","description": "大数据SRE工程师","publish_date": "1991-05-20"}}, {"_index": "old_index","_type": "_doc","_id": "2","_score": 1.0,"_source": {"name": "炎长","description": "DBA工程师","publish_date": "1992-11-23"}}]}
}
6、新建目标索引:new_index
# 注释
# 1、本次将分片设置为2,是为了模拟reindex拆封分片的功能
# 2、建议将目标索引副本设置为0,没有副本,目标索引写入速度会变快,reindex任务执行相应比有部分的写入速度快。reindex结束后,可以根据需要,重新设置副本。curl -X PUT "http://`hostname -i`:9200/new_index" -H 'Content-Type: application/json' -d'
{"mappings": {"properties": {"name": { "type": "text" },"description": { "type": "text" },"publish_date": { "type": "date" }}},"settings": {"number_of_shards": 2,"number_of_replicas": 0}
}'# 返回结果
{"acknowledged":true,"shards_acknowledged":true,"index":"new_index"}
7、检查两个索引的数据情况
curl -X GET "http://`hostname -i`:9200/_cat/indices?v"
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .geoip_databases ib6tlhzjTf-MQBu-XGIVWg 1 0 33 0 31.1mb 31.1mb
green open new_index GrJiGswYRqCibszGIVjZhg 2 0 0 0 454b 454b
green open old_index 8k4beb7ETpu6Ki-LpOu_EQ 1 0 2 0 4kb 4kb
8、测试reindex将源索引:old_index中的数据迁移到目标索引:new_index
curl -X POST "http://`hostname -i`:9200/_reindex" -H 'Content-Type: application/json' -d'
{"source": {"index": "old_index"},"dest": {"index": "new_index"}
}
'# 返回结果,创建成功
{"took":8,"timed_out":false,"total":2,"updated":0,"created":2,"deleted":0,"batches":1,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[]}
9、检查索引的迁移进度
# 数据量太小,执行时间可能比较快,查看不到reindex的任务情况curl -X GET "http://`hostname -i`:9200/_tasks?detailed=true&actions=*reindex&human=true"
10、再次检查集群两个索引的情况
curl -X GET "http://`hostname -i`:9200/_cat/indices?v"
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .geoip_databases ib6tlhzjTf-MQBu-XGIVWg 1 0 33 0 31.1mb 31.1mb
green open new_index aU3mztzXRXOSk9Q1oiP2RA 1 0 2 0 4.4kb 4.4kb
green open old_index g24b-XDfQZ6BO5zdcIOM0A 1 0 2 0 4.4kb 4.4kb
总结
根据实际的生产场景,reindex对源集群性能带来的影响非常大,不建议这样使用。reindex的逻辑是先查询,再写入,一次全量的查询和持续的写入,想想就知道对源集群有多大的压力。如果你的磁盘性能又特别差,集群负载本身就比较高,那你完蛋了。建议最好的方式是将索引迁移至新的es集群中,这样源集群只会涉及到查询,影响最小,新集群刚开始无业务压力,写入不会增加太大的负担。