常用HQL

进入hive客户端后：

1、建表：

create table page_view(viewTime int, userid bigint,page_url string, referrer_url string,ip string comment 'IP Address of the User')comment 'This is the page view table'partitioned by(dt string, country string)row format delimited fields terminated by '\t'  //指定字段间的分隔符stored as sequencefile; //textfile
//sequencefile 是以键值对组织的二进制文件
//hive的默认库是default，表也是一个文件夹，置于hdfs的/user/hive/warehouse下
//新建的库就是在hdfs的/user/hive/warehouse下新建一个库名文件夹

2、在hive客户端将本地文件上传至数据仓库

实际就是将文件上传至对应的hdfs中表文件夹下

load data local inpath '~/data.txt' into table table_name;

//此操作也可通过hadoop fs -put ~/data.txt /user/hive/warehouse/库名/表名来实现

load data inpath '/data.txt' into table table_name;//将hdfs上的文件移动至对应的hdfs表文件夹下

3、外表 external

表：文件找表

外表：表找文件

create external table tab_ip_ext(id int, name string, ip string, country string)
row format delimited fields terminated by ','
stored as testfile
location '/hdfs的文件路径;'

4、类似视图 CTAS 用于创建一些临时表存储中间结果，在hdfs中没有建立目录

create table tab_ip_ctas
as
select id new_id, name new_name, ip new_ip, country new_country
from tab_ip_ext
sort by new_id;

5、向表中写入查询数据

create table tab_ip_like like tab_ip; 
insert overwrite table tab_ip_like  //insert into为追加，overwrite为重写select * from tab_ip;

6、分区 PARTITION

分区就是在表的文件夹下再建一层文件夹，文件夹名称为"分区字段名=分区值"，如"part_flag=part1"

create table tab_ip_part(id int,name string,ip string,country string) partitioned by (part_flag string)  //分区标志可以是表中字段，也可以是另一个自定义字段row format delimited fields terminated by ',';load data local inpath '/home/hadoop/ip.txt' overwrite into table tab_ip_part partition(part_flag='part1');
load data local inpath '/home/hadoop/ip_part2.txt' overwrite into table tab_ip_part partition(part_flag='part2');

select * from tab_ip_part;
select * from tab_ip_part  where part_flag='part2';  //查询时，将分区当做一个字段来使用；末尾会有part2
select count(*) from tab_ip_part  where part_flag='part2';

7、将查询结果写入文件

//写入本地文件
insert overwrite local directory '/tmp/test.txt' select * from tab_ip_part where part_flag='part1';

//写入hdfs   
insert overwrite directory '/hiveout.txt' select * from tab_ip_part where part_flag='part1';

8、数据类型为数组 array

create table tab_array(a array<int>,b array<string>)
row format delimited
fields terminated by '\t'
collection items terminated by ',';

示例数据
one,two,three 1,2,3

d,e,f 4,5,6

select a[0] from tab_array;
select * from tab_array where array_contains(b,'word');
insert into table tab_array select array(0),array(name,ip) from tab_ext t;

9、数据类型为map

create table tab_map(name string,info map<string,string>)
row format delimited
fields terminated by '\t'
collection items terminated by ';'
map keys terminated by ':';

示例数据：
name key1:value1;key2:value2;key3:value3

insert into table tab_map select name,map('name',name,'ip',ip) from tab_ext;

10、数据类型为struct

create table tab_struct(name string,info struct<age:int,tel:string,addr:string>)
row format delimited
fields terminated by '\t'
collection items terminated by ','

insert into table tab_struct select name,named_struct('age',id,'tel',name,'addr',country) from tab_ext;

11、在shell环境下执行HQL

hive -S -e 'select country,count(*) from 库名.表名' > /tmp/query.txt

可用脚本（bash,python）进行hql批量查询

12、用户自定义函数 UDF
select udf(id=1,first,no-first),name from tab_ext;

写一个Java类，定义udf函数逻辑，此类继承UDF，重写evaluate方法，public修饰，参数与返回值按需指定；

打成jar包，告知hive此jar包的位置：hive>add jar /..../xxx.jar

在hive中创建函数udf，将udf和此Java类挂钩即可：

hive>CREATE TEMPORARY FUNCTION my_udf AS 'org.dht.Lower(类全名)'; //退出后，my_udf失去作用

13、sequencefile

create table tab_ip_seq(id int,name string,ip string,country string) row format delimitedfields terminated by ','stored as sequencefile;
insert overwrite table tab_ip_seq select * from tab_ext;

14、分区将不同的记录放在不同的文件夹中，分桶将不同的记录置于不同的文件 CLUSTER

create table tab_ip_cluster(id int,name string,ip string,country string)
clustered by(id) into 3 buckets;load data local inpath '/.../ip.txt' overwrite into table tab_ip_cluster;
set hive.enforce.bucketing=true;
insert into table tab_ip_cluster select * from tab_ip;

select * from tab_ip_cluster tablesample(bucket 2 out of 3 on id);

用于抽样，保证均匀抽样。

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mzph.cn/news/387418.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！

常用HQL

相关文章

阿里云天池金融风控训练营Task1 广东工业站

如何将.crt的ssl证书文件转换成.pem格式

SpringMVC学习记录--Validator验证分析

NTP时间服务器实现Linux时间同步

阿里云天池 Python训练营Task1：从变量到异常处理

python回收机制

安装openssl-devel命令

阿里云天池 Python训练营Task2: Python基础练习：数据结构大汇总学习笔记

windows文件与Linux文件互转

1G.小a的排列（C++）

阿里云天池 Python训练营Task3: Python基础进阶：从函数到高级魔法方法学习笔记

C# 获取句柄程序

HBase安装配置

python cook读书笔记第2章字符串和文本

centos7安装oracle12c 一

Bigtable的些许重点

阿里云天池 Python训练营Task4: Python数据分析：从0完成一个数据分析实战学习笔记

JMETER从JSON响应中提取数据

centos7安装oracle12c 二

阿里云天池 Python训练营Task5：Python训练营测试学习笔记