文章目录
- 1. 特点
- 2. Hive 体系架构
- 3. 安装 Hive
- 3.1 安装 MySQL
- 3.2 安装 Hive
- 3.3 配置 Hive
- 4. 实践
- 4.1 函数
- 4.2 Hive Shell
安装没成功:直接用现成的镜像
大数据Linux实验环境虚拟机镜像文件 http://dblab.xmu.edu.cn/blog/1645-2/
cloudera-quickstart-vm-集成了大数据平台的虚拟机镜像
1. 特点
- 查询语言与 SQL 接近
- 并行执行
- 使用 HDFS 存储
- 支持多种数据格式
- 不支持数据更新
- 不支持索引
- 执行延迟高(不适合在线数据查询)
- 可扩展性高
- 数据规模大
2. Hive 体系架构
3. 安装 Hive
- 先安装 hadoop:hadoop 多机全分布式安装步骤(虚拟机1master+2slave)
3.1 安装 MySQL
- 在 master 上安装
yum localinstall https://dev.mysql.com/get/mysql57-community-release-el7-11.noarch.rpm
或者
wget https://dev.mysql.com/get/mysql57-community-release-el7-11.noarch.rpm
rpm -ivh mysql57-community-release-el7-11.noarch.rpm
- 查看可选版本
yum repolist all | grep mysql
yum install -y mysql-community-server
- 启动 MySQL
[root@master opt]# systemctl start mysqld
[root@master opt]# systemctl status mysqld
● mysqld.service - MySQL ServerLoaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)Active: active (running) since 一 2021-03-29 06:25:16 CST; 22h agoDocs: man:mysqld(8)http://dev.mysql.com/doc/refman/en/using-systemd.htmlProcess: 49287 ExecStart=/usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid $MYSQLD_OPTS (code=exited, status=0/SUCCESS)Process: 49266 ExecStartPre=/usr/bin/mysqld_pre_systemd (code=exited, status=0/SUCCESS)Main PID: 49291 (mysqld)Tasks: 28CGroup: /system.slice/mysqld.service└─49291 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/my...3月 29 06:25:13 master systemd[1]: Starting MySQL Server...
3月 29 06:25:16 master systemd[1]: Started MySQL Server.
- 测试 MySQL
mysql -u root -p
enter 空密码,报错
报错参考https://blog.csdn.net/vv19910825/article/details/82979563
(操作:vim /etc/my.cnf, 添加 skip-grant-tables
跳过密码 systemctl restart mysqld
,下面改过密码了,在删除)
修改密码,注意满足密码强度
mysql> flush privileges; -> alter user 'root'@'localhost' identified by '!@#123qwe';-> flush privileges;
vim /etc/my.cnf, 删除 skip-grant-tables
systemctl restart mysqld
mysql -u root -p
使用新密码,再次登录
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| sys |
+--------------------+
4 rows in set (0.01 sec)
- 在 MySQL 中创建 Hive 所需用户和数据库,并授权
mysql> create user 'dnn' identified by '!@#123Qwe';
mysql> create database hive;
Query OK, 1 row affected (0.02 sec)
mysql> grant all privileges on hive.* to 'dnn'@'localhost' identified by '!@#123Qwe';
mysql> flush privileges;
hive 表创建成功
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| hive |
| mysql |
| performance_schema |
| sys |
+--------------------+
5 rows in set (0.00 sec)
3.2 安装 Hive
下载地址 : https://hive.apache.org/downloads.html
我下载的是 apache-hive-3.1.2-bin.tar.gz
tar -zxvf apache-hive-3.1.2-bin.tar.gz
- 赋权限给普通用户 dnn
[root@master opt]# chown -R dnn /opt/hive3.1.2/
- 复制 MySQL 的 JDBC 驱动包到 Hive 的 /lib 下
下载地址 https://dev.mysql.com/downloads/connector/j/
解压
tar -zxvf mysql-connector-java-5.1.49.tar.gz
移动
[dnn@master Downloads]$ mv mysql-connector-java-5.1.49/mysql-connector-java-5.1.49-bin.jar /opt/hive3.1.2/lib
[dnn@master Downloads]$ rm -rf mysql-connector-java-5.1.49
3.3 配置 Hive
- 配置 hive-env.sh
[dnn@master opt]$ cd /opt/hive3.1.2/conf/
[dnn@master conf]$ cp hive-env.sh.template hive-env.sh
[dnn@master conf]$ vim hive-env.sh
第48行改为自己的路径 HADOOP_HOME=/opt/hadoop-3.3.0
第51行 export HIVE_CONF_DIR=/opt/hive3.1.2/conf
第54行 export HIVE_AUX_JARS_PATH=/opt/hive3.1.2/lib
- 配置 hive-default.xml,直接复制模板
cp hive-default.xml.template hive-default.xml
- 配置 vim hive-site.xml
<configuration><property><name>javax.jdo.option.ConnectionURL</name><value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value><description>JDBC connect string for a JDBC metastore.
ql.jdbc.Driver To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.</description></property><property><name>javax.jdo.option.ConnectionDriverName</name><value>com.mysql.cj.jdbc.Driver</value><description>Driver class name for a JDBC metastore,com.mysql.jdbc.Driver is depricated</description></property><property><name>javax.jdo.option.ConnectionUserName</name><value>dnn</value><description>Username to use against metastore database</description></property><property><name>javax.jdo.option.ConnectionPassword</name><value>!@#123Qwe</value><description>password to use against metastore database</description></property>
</configuration>
- 初始化 Hive Metastore
在 hive 的 bin 目录下./schematool -initSchema -dbType mysql
配置环境变量
vim ~/.bashrc
添加
export HIVE_HOME=/opt/hive3.1.2
export PATH=${HIVE_HOME}/bin:$PATH
使之生效
source ~/.bashrc
4. 实践
启动 hadoop 命令:
start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh start historyserver
# 第三条可以用下面的命令,上面的显示过期了,以后弃用
mapred --daemon start historyserver
4.1 函数
显示函数 show functions;
hive> show functions;
OK
!
!=
$sum0
%
(省略。。。)
显示函数帮助信息:describe function upper(函数名);
hive> > describe function upper;
OK
upper(str) - Returns str with all characters changed to uppercase
Time taken: 0.007 seconds, Fetched: 1 row(s)
- 自定义函数,重写 UDF 中的
evaluate()
package michael_package;import org.apache.hadoop.hive.ql.exec.UDF;public class Sub extends UDF{public Integer evaluate(Integer a, Integer b) {if(a==null || b==null)return null;return a-b;}public Double evaluate(Double a, Double b) {if(a==null || b==null)return null;return a-b;}public Integer evaluate(Integer a, Integer[] b) {if(a==null || b==null)return null;int ans = a;for(int i = 0; i < b.length; ++i) {if(b[i] != null)ans -= b[i];}return ans;}
}
- 表生成自定义函数 UDTF
- 聚集自定义函数 UDAF
4.2 Hive Shell
create table
,show tables
,describe 表name
hive> create table student(> id string,> name string,> sex string,> age tinyint,> dept string)> row format delimited fields terminated by '\t';
OK
Time taken: 0.464 seconds
hive> show tables;
OK
student
Time taken: 0.085 seconds, Fetched: 1 row(s)
hive> describe student;
OK
id string
name string
sex string
age tinyint
dept string
Time taken: 0.069 seconds, Fetched: 5 row(s)
- 编写个数据文件
hadoop@dblab-VirtualBox:/usr/local/eclipse$ vim /home/hadoop/workspace/student.txt
hadoop@dblab-VirtualBox:/usr/local/eclipse$ cat /home/hadoop/workspace/student.txt
1 michael male 18 bigdata
2 ming male 19 AI
3 lili female 18 math
4 huahua female 20 AI
- 加载数据到表格
hive> load data local inpath '/home/hadoop/workspace/student.txt' into table student;
Loading data to table default.student
OK
Time taken: 1.158 seconds
- select 操作表
hive> select * from student;
OK
1 michael male 18 bigdata
2 ming male 19 AI
3 lili female 18 math
4 huahua female 20 AI
Time taken: 0.789 seconds, Fetched: 4 row(s)
hive> select * from student where sex='male';
OK
1 michael male 18 bigdata
2 ming male 19 AI
Time taken: 1.565 seconds, Fetched: 2 row(s)
HiveQL 将命令转换为 MapReduce 操作
hive> select sex, count(*) from student group by sex;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20210331231151_0e76f173-ac54-46e0-8886-f018f8a7f480
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2021-03-31 23:11:52,943 Stage-1 map = 100%, reduce = 100%
Ended Job = job_local236806530_0001
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 352 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
female 2
male 2
Time taken: 1.55 seconds, Fetched: 2 row(s)
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| hive |
| mysql |
| performance_schema |
| spark |
| sys |
+--------------------+
6 rows in set (0.00 sec)mysql> use hive;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -ADatabase changed
mysql> show tables;
+---------------------------+
| Tables_in_hive |
+---------------------------+
| AUX_TABLE |
| BUCKETING_COLS |
| CDS |
| COLUMNS_V2 |
| COMPACTION_QUEUE |
| COMPLETED_COMPACTIONS |
| COMPLETED_TXN_COMPONENTS |
| DATABASE_PARAMS |
| DBS |
| DB_PRIVS |
| DELEGATION_TOKENS |
| FUNCS |
| FUNC_RU |
| GLOBAL_PRIVS |
| HIVE_LOCKS |
| IDXS |
| INDEX_PARAMS |
| KEY_CONSTRAINTS |
| MASTER_KEYS |
| NEXT_COMPACTION_QUEUE_ID |
| NEXT_LOCK_ID |
| NEXT_TXN_ID |
| NOTIFICATION_LOG |
| NOTIFICATION_SEQUENCE |
| NUCLEUS_TABLES |
| PARTITIONS |
| PARTITION_EVENTS |
| PARTITION_KEYS |
| PARTITION_KEY_VALS |
| PARTITION_PARAMS |
| PART_COL_PRIVS |
| PART_COL_STATS |
| PART_PRIVS |
| ROLES |
| ROLE_MAP |
| SDS |
| SD_PARAMS |
| SEQUENCE_TABLE |
| SERDES |
| SERDE_PARAMS |
| SKEWED_COL_NAMES |
| SKEWED_COL_VALUE_LOC_MAP |
| SKEWED_STRING_LIST |
| SKEWED_STRING_LIST_VALUES |
| SKEWED_VALUES |
| SORT_COLS |
| TABLE_PARAMS |
| TAB_COL_STATS |
| TBLS |
| TBL_COL_PRIVS |
| TBL_PRIVS |
| TXNS |
| TXN_COMPONENTS |
| TYPES |
| TYPE_FIELDS |
| VERSION |
| WRITE_SET |
+---------------------------+
57 rows in set (0.00 sec)
- 非交互式,加载查询脚本
hadoop@dblab-VirtualBox:~/workspace$ vim script.q
hadoop@dblab-VirtualBox:~/workspace$ cat script.q
select * from student;hadoop@dblab-VirtualBox:~/workspace$ hive -f script.q
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]Logging initialized using configuration in jar:file:/usr/local/hive/lib/hive-common-2.1.0.jar!/hive-log4j2.properties Async: true
Wed Mar 31 23:36:09 CST 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Wed Mar 31 23:36:09 CST 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Wed Mar 31 23:36:09 CST 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Wed Mar 31 23:36:09 CST 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Wed Mar 31 23:36:10 CST 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Wed Mar 31 23:36:10 CST 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Wed Mar 31 23:36:10 CST 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Wed Mar 31 23:36:10 CST 2021 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
OK
1 michael male 18 bigdata
2 ming male 19 AI
3 lili female 18 math
4 huahua female 20 AI
Time taken: 1.376 seconds, Fetched: 4 row(s)
-e
选项内嵌语句
hadoop@dblab-VirtualBox:~/workspace$ hive -e 'select * from student'