hive索引

创建

hive (zmgdb)> create index index_t1 on table v_t1(name)
> as
> 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
> with
> deferred rebuild in table save_index_t1_table;
OK
Time taken: 0.524 seconds

save_index_t1_table：保存索引的表。

即创建了的索引，需要一张表去保存，一个索引一张索引保存表，保存在hadoop里。
as 指定索引器，org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler 是固定值，常用的索引器。

重建索引，新增数据要重建索引，这样在保存索引的 t1_index_table 就有索引信息了。

hive (zmgdb)> alter index index_t1 on v_t1 rebuild;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20160923005139_9caf10f1-5481-4de8-b95a-889c19e45032
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1474540738385_0003, Tracking URL = http://hello110:8088/proxy/application_1474540738385_0003/
Kill Command = /home/hadoop/app/hadoop-2.7.2/bin/hadoop job -kill job_1474540738385_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2016-09-23 00:51:46,046 Stage-1 map = 0%, reduce = 0%
2016-09-23 00:51:54,485 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.91 sec
2016-09-23 00:52:00,724 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.76 sec
MapReduce Total cumulative CPU time: 4 seconds 760 msec
Ended Job = job_1474540738385_0003
Loading data to table zmgdb.save_index_t1_table
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.76 sec HDFS Read: 9845 HDFS Write: 426 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 760 msec
OK
Time taken: 22.73 seconds

索引表分析

hive (zmgdb)> select * from save_index_t1_table;
OK
save_index_t1_table.name save_index_t1_table._bucketname save_index_t1_table._offsets
lisi hdfs://hello110:9000/user/hive/warehouse/zmgdb.db/v_t1/v_t1 [0]
xiaohua hdfs://hello110:9000/user/hive/warehouse/zmgdb.db/v_t1/v_t1 [49]
xiaoji hdfs://hello110:9000/user/hive/warehouse/zmgdb.db/v_t1/v_t1 [32]
ximing hdfs://hello110:9000/user/hive/warehouse/zmgdb.db/v_t1/v_t1 [15]
xx hdfs://hello110:9000/user/hive/warehouse/zmgdb.db/v_t1/v_t1 [67]
Time taken: 0.073 seconds, Fetched: 5 row(s)

索引里面保存了：索引键内容，内容所在文件位置，内容在文件里的偏移量。

hive select 会去找索引，例如name=lisi的值，找到该值所在的文件位置，和在文件里的偏移量，进入该文件到指定的偏移量里，找出来的就是了。

如果没有索引，会开启mr去目录下全局查找，有了索引，就像书有了目录，不用整本书找了，通过目录找，肯定更快。简单的select 查询hive不启用mapreduce，复杂的会启动。

显示表表的索引

show formatted index on t1;

删除索引

drop index if exists t1_index on t1;

补充：

表的数据发生改变后，都要重建表的索引。

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mzph.cn/news/539164.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！

hive索引

创建

索引表分析

显示表表的索引

删除索引

补充：

相关文章

python爬取知网论文关键词_Python爬虫根据关键词爬取知网论文摘要并保存到数据库中...

网页中查看pdf文档

导入导出数据

python opencv 读取视频流不解码_python + opencv: 解决不能读取视频的问题

jmeter提取mysql返回值_jmeter连接数据库和提取数据库返回值

[转] Lodash

读模式与写模式

用python画雨滴_Python编程从入门到实践练习（雨滴）

python你是否也走进了这个误区了_Python是不是被严重高估了？

spark整合hadoop安装

Hive文件格式

learnpythonthehardway下载_Python【十一】：阶段小结

python使用ssh 中文_Python3制作简易SSH登录工具

hive java导入CVS

Django 博客教程（三）：创建应用和编写数据库模型

python将16进制字符串转换为整数_Python 16进制与字符串的转换

python多个文件打包成exe_多个py文件生成一个可运行exe文件

CSV文件的转义处理

华为读取版本exe_关于esrv_svc.exe和SurSvc.exe疑似泄露用户信息的猜测

gitlab ci mysql_php-Gitlab CI：在阶段之间保留MySQL数据