hive表向es集群同步数据20230830

背景:实际开发中遇到一个需求,就是需要将hive表中的数据同步到es集群中,之前没有做过,查看一些帖子,发现有一种方案挺不错的,记录一下。

我的电脑环境如下

软件名称版本
Hadoop3.3.0
hive3.1.3
jdk1.8
Elasticsearch7.10.2
kibana7.10.2
logstash7.10.2
ES-Hadoop7.10.2

ES-Hadoop的引入

hadoop、hive和es的关系如下图,中间有一个组件叫做ES-Hadoop,是连接Hadoop和es的桥梁,es的官网上提供了这个组件,解决Hadoop和es之间的数据同步问题。

在这里插入图片描述

下面说一下数据同步的具体步骤

第一步:去es的官网上下载ES-Hadoop组件

注意:ES-Hadoop 的版本需要和es的版本是一致的
官网下载地址:es的官网链接-点我
在输入框中输入es-hadoop,在version版本处找到和你es相同的版本即可
在这里插入图片描述

第二步:上传这个安装包到集群上,解压完成后将其中的jar包上传到HDFS上你新建的目录

命令如下;

hadoop fs -put elasticsearch-hadoop-7.10.2.jar /user/hive/warehouse/es_hadoop/

第三步:将这个jar包添加到hive中

在hive的终端下输入如下命令

add jar hdfs:///user/hive/warehouse/es_hadoop/elasticsearch-hadoop-7.10.2.jar;

第四步:在hive中创建临时表,作为测试用

创建一个hive中的表,并且添加上测试数据

CREATE EXTERNAL TABLE `hive_test`(`name` string, `age` int, `hight` int)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
WITH SERDEPROPERTIES ( 'field.delim'=',', 'serialization.format'=',') 
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION'hdfs://master:9000/user/hive/warehouse/pdata_dynamic.db/hive_test';
添加数据,展示如下
hive> select * from hive_test;
OK
吴占喜  30      175
令狐冲  50      180
任我行  60      160

第五步:创建hive到es的映射表

CREATE EXTERNAL TABLE `es_hadoop_cluster`(`name` string COMMENT 'from deserializer',`age` string COMMENT 'from deserializer',`hight` string COMMENT 'from deserializer')
ROW FORMAT SERDE'org.elasticsearch.hadoop.hive.EsSerDe'
STORED BY'org.elasticsearch.hadoop.hive.EsStorageHandler'
WITH SERDEPROPERTIES ('serialization.format'='1')
LOCATION'hdfs://master:9000/user/hive/warehouse/pdata_dynamic.db/es_hadoop_cluster'
TBLPROPERTIES ('bucketing_version'='2','es.batch.write.retry.count'='6','es.batch.write.retry.wait'='60s','es.index.auto.create'='TRUE','es.index.number_of_replicas'='0','es.index.refresh_interval'='-1','es.mapping.name'='name:name,age:age,hight:hight','es.nodes'='172.16.27.133:9200','es.nodes.wan.only'='TRUE','es.resource'='hive_to_es/_doc');

映射表的参数说明

参数参数参数说明
bucketing_version2
es.batch.write.retry.count6
es.batch.write.retry.wait60s
es.index.auto.createTRUE通过Hadoop组件向Elasticsearch集群写入数据,是否自动创建不存在的index: true:自动创建 ; false:不会自动创建
es.index.number_of_replicas0
es.index.refresh_interval-1
es.mapping.name7.10.2hive和es集群字段映射
es.nodes指定Elasticsearch实例的访问地址,建议使用内网地址。
es.nodes.wan.onlyTRUE开启Elasticsearch集群在云上使用虚拟IP进行连接,是否进行节点嗅探: true:设置 ;false:不设置
es.resource7.10.2es集群中索引名称
es.nodes.discoveryTRUE是否禁用节点发现:true:禁用 ;false:不禁用
es.input.use.sliced.partitionsTRUE是否使用slice分区: true:使用。设置为true,可能会导致索引在预读阶段的时间明显变长,有时会远远超出查询数据所耗费的时间。建议设置为false,以提高查询效率; false:不使用。
es.read.metadataFALSE操作Elasticsearch字段涉及到_id之类的内部字段,请开启此属性。

第六步:写同步SQL测试一下

同步SQL

insert into es_hadoop_cluster select * from hive_test;

结果报错了,nice,正好演示一下这个错误怎么解决
报错信息中有一行:Caused by: java.lang.ClassNotFoundException: org.elasticsearch.hadoop.hive.EsHiveInputFormat
报错原因:在hive中执行add jar xxx 这个命令,只在当前窗口下有效
解决方案:1.每次执行insert into的时候,执行下面的添加jar包的命令
add jar hdfs:///user/hive/warehouse/es_hadoop/elasticsearch-hadoop-7.10.2.jar;
2.将这个jar包作为永久的函数加载进来(后面在补充再补充一下)

Query ID = root_20230830041146_d125a01c-4174-48a5-8b9c-a15b41d27403
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1693326922484_0004, Tracking URL = http://master:8088/proxy/application_1693326922484_0004/
Kill Command = /root/soft/hadoop/hadoop-3.3.0//bin/mapred job  -kill job_1693326922484_0004
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0
2023-08-30 04:11:51,842 Stage-2 map = 0%,  reduce = 0%
2023-08-30 04:12:45,532 Stage-2 map = 100%,  reduce = 0%
Ended Job = job_1693326922484_0004 with errors
0    [582efbca-df6e-4f87-b4a4-5a5f03667fd9 main] ERROR org.apache.hadoop.hive.ql.exec.Task  - Ended Job = job_1693326922484_0004 with errors
Error during job, obtaining debugging information...
1    [Thread-33] ERROR org.apache.hadoop.hive.ql.exec.Task  - Error during job, obtaining debugging information...
Examining task ID: task_1693326922484_0004_m_000000 (and more) from job job_1693326922484_0004
3    [Thread-34] ERROR org.apache.hadoop.hive.ql.exec.Task  - Examining task ID: task_1693326922484_0004_m_000000 (and more) from job job_1693326922484_0004Task with the most failures(4): 
-----
Task ID:task_1693326922484_0004_m_000000URL:http://master:8088/taskdetails.jsp?jobid=job_1693326922484_0004&tipid=task_1693326922484_0004_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Failed to load plan: hdfs://master:9000/tmp/hive/root/582efbca-df6e-4f87-b4a4-5a5f03667fd9/hive_2023-08-30_04-11-47_010_5312656494742717839-1/-mr-10002/ae1cf98b-a157-4266-92b3-7d618b411f00/map.xmlat org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:502)at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:335)at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:435)at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:881)at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:874)at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:716)at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:175)at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444)at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: org.elasticsearch.hadoop.hive.EsHiveInputFormat
Serialization trace:
inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)
conf (org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator)
childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:156)at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:185)at org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:326)at org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:314)at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:759)at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObjectOrNull(SerializationUtilities.java:203)at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:132)at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:180)at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:180)at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:180)at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:161)at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:39)at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:686)at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:210)at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectByKryo(SerializationUtilities.java:729)at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializePlan(SerializationUtilities.java:613)at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializePlan(SerializationUtilities.java:590)at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:463)... 13 more
Caused by: java.lang.ClassNotFoundException: org.elasticsearch.hadoop.hive.EsHiveInputFormatat java.net.URLClassLoader.findClass(URLClassLoader.java:387)at java.lang.ClassLoader.loadClass(ClassLoader.java:418)at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)at java.lang.ClassLoader.loadClass(ClassLoader.java:351)at java.lang.Class.forName0(Native Method)at java.lang.Class.forName(Class.java:348)at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)... 60 more98   [Thread-33] ERROR org.apache.hadoop.hive.ql.exec.Task  - 
Task with the most failures(4): 
-----
Task ID:task_1693326922484_0004_m_000000URL:http://master:8088/taskdetails.jsp?jobid=job_1693326922484_0004&tipid=task_1693326922484_0004_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: Failed to load plan: hdfs://master:9000/tmp/hive/root/582efbca-df6e-4f87-b4a4-5a5f03667fd9/hive_2023-08-30_04-11-47_010_5312656494742717839-1/-mr-10002/ae1cf98b-a157-4266-92b3-7d618b411f00/map.xmlat org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:502)at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:335)at org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:435)at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:881)at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:874)at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:716)at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:175)at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:444)at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: org.elasticsearch.hadoop.hive.EsHiveInputFormat
Serialization trace:
inputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)
conf (org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator)
childOperators (org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator)
childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:156)at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:185)at org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:326)at org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:314)at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:759)at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObjectOrNull(SerializationUtilities.java:203)at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:132)at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:180)at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:180)at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:180)at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:161)at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:39)at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:218)at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:686)at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:210)at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectByKryo(SerializationUtilities.java:729)at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializePlan(SerializationUtilities.java:613)at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializePlan(SerializationUtilities.java:590)at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:463)... 13 more
Caused by: java.lang.ClassNotFoundException: org.elasticsearch.hadoop.hive.EsHiveInputFormatat java.net.URLClassLoader.findClass(URLClassLoader.java:387)at java.lang.ClassLoader.loadClass(ClassLoader.java:418)at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)at java.lang.ClassLoader.loadClass(ClassLoader.java:351)at java.lang.Class.forName0(Native Method)at java.lang.Class.forName(Class.java:348)at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)... 60 moreFAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
106  [582efbca-df6e-4f87-b4a4-5a5f03667fd9 main] ERROR org.apache.hadoop.hive.ql.Driver  - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Stage-Stage-2: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
将jar包添加进来又报错了,very nice,正好在演示一下这个错误,😄

报错如下所示
报错原因分析:仔细看这行Error: java.lang.ClassNotFoundException: org.apache.commons.httpclient.HttpConnectionManager,原因是缺少httpclient.的jar包导致的
解决方案:将httpclient.的jar包像上面的es-hadoop的jar包一样导入即可

hive> insert into es_hadoop_cluster select * from hive_test;
Query ID = root_20230830041906_20c85a80-072a-4023-b5ab-8b532e5db092
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1693326922484_0005, Tracking URL = http://master:8088/proxy/application_1693326922484_0005/
Kill Command = /root/soft/hadoop/hadoop-3.3.0//bin/mapred job  -kill job_1693326922484_0005
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0
2023-08-30 04:19:15,561 Stage-2 map = 0%,  reduce = 0%
2023-08-30 04:19:34,095 Stage-2 map = 100%,  reduce = 0%
Ended Job = job_1693326922484_0005 with errors
408543 [582efbca-df6e-4f87-b4a4-5a5f03667fd9 main] ERROR org.apache.hadoop.hive.ql.exec.Task  - Ended Job = job_1693326922484_0005 with errors
Error during job, obtaining debugging information...
408543 [Thread-57] ERROR org.apache.hadoop.hive.ql.exec.Task  - Error during job, obtaining debugging information...
Examining task ID: task_1693326922484_0005_m_000000 (and more) from job job_1693326922484_0005
408547 [Thread-58] ERROR org.apache.hadoop.hive.ql.exec.Task  - Examining task ID: task_1693326922484_0005_m_000000 (and more) from job job_1693326922484_0005Task with the most failures(4): 
-----
Task ID:task_1693326922484_0005_m_000000URL:http://master:8088/taskdetails.jsp?jobid=job_1693326922484_0005&tipid=task_1693326922484_0005_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.ClassNotFoundException: org.apache.commons.httpclient.HttpConnectionManagerat java.net.URLClassLoader.findClass(URLClassLoader.java:387)at java.lang.ClassLoader.loadClass(ClassLoader.java:418)at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)at java.lang.ClassLoader.loadClass(ClassLoader.java:351)at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransportFactory.create(CommonsHttpTransportFactory.java:40)at org.elasticsearch.hadoop.rest.NetworkClient.selectNextNode(NetworkClient.java:99)at org.elasticsearch.hadoop.rest.NetworkClient.<init>(NetworkClient.java:82)at org.elasticsearch.hadoop.rest.NetworkClient.<init>(NetworkClient.java:58)at org.elasticsearch.hadoop.rest.RestClient.<init>(RestClient.java:101)at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:620)at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:175)at org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:59)at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:987)at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111)at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:966)at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:939)at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:966)at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:939)at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.closeOp(VectorMapOperator.java:990)at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:733)at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:193)at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)at org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)408555 [Thread-57] ERROR org.apache.hadoop.hive.ql.exec.Task  - 
Task with the most failures(4): 
-----
Task ID:task_1693326922484_0005_m_000000URL:http://master:8088/taskdetails.jsp?jobid=job_1693326922484_0005&tipid=task_1693326922484_0005_m_000000
-----
Diagnostic Messages for this Task:
Error: java.lang.ClassNotFoundException: org.apache.commons.httpclient.HttpConnectionManagerat java.net.URLClassLoader.findClass(URLClassLoader.java:387)at java.lang.ClassLoader.loadClass(ClassLoader.java:418)at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)at java.lang.ClassLoader.loadClass(ClassLoader.java:351)at org.elasticsearch.hadoop.rest.commonshttp.CommonsHttpTransportFactory.create(CommonsHttpTransportFactory.java:40)at org.elasticsearch.hadoop.rest.NetworkClient.selectNextNode(NetworkClient.java:99)at org.elasticsearch.hadoop.rest.NetworkClient.<init>(NetworkClient.java:82)at org.elasticsearch.hadoop.rest.NetworkClient.<init>(NetworkClient.java:58)at org.elasticsearch.hadoop.rest.RestClient.<init>(RestClient.java:101)at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:620)at org.elasticsearch.hadoop.mr.EsOutputFormat$EsRecordWriter.init(EsOutputFormat.java:175)at org.elasticsearch.hadoop.hive.EsHiveOutputFormat$EsHiveRecordWriter.write(EsHiveOutputFormat.java:59)at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:987)at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:111)at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:966)at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:939)at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:966)at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:939)at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.closeOp(VectorMapOperator.java:990)at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:733)at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:193)at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)at org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37)at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1845)at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
408563 [582efbca-df6e-4f87-b4a4-5a5f03667fd9 main] ERROR org.apache.hadoop.hive.ql.Driver  - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Stage-Stage-2: Map: 1   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
在hive的终端下,添加上httpclient的jar包,然后重新执行insert语句

命令如下;

add jar hdfs:///user/hive/warehouse/es_hadoop/commons-httpclient-3.1.jar;

执行成功喽,😄

hive> insert into es_hadoop_cluster select * from hive_test;
Query ID = root_20230830043006_1cf3a042-e5e4-4bec-8cee-49e670ac9b49
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1693326922484_0006, Tracking URL = http://master:8088/proxy/application_1693326922484_0006/
Kill Command = /root/soft/hadoop/hadoop-3.3.0//bin/mapred job  -kill job_1693326922484_0006
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0
2023-08-30 04:30:17,504 Stage-2 map = 0%,  reduce = 0%
2023-08-30 04:30:21,633 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 0.79 sec
MapReduce Total cumulative CPU time: 790 msec
Ended Job = job_1693326922484_0006
MapReduce Jobs Launched: 
Stage-Stage-2: Map: 1   Cumulative CPU: 0.79 sec   HDFS Read: 6234 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 790 msec
OK
Time taken: 16.482 seconds

第七步:查询一下映射表,并去es集群上查询数据是否同步成功了

1.查询映射表

nice,又报错了,报错如下;
报错原因分析:我之前做的时候,将解压的所有包都放在hive的lib目录下了,现在看来,只需要一个即可,将其余的都删除
解决方案:删除多余的jar包在hive的lib的目录下

hive> select * from es_hadoop_cluster;
OK
Exception in thread "main" java.lang.ExceptionInInitializerErrorat org.elasticsearch.hadoop.rest.RestService.findPartitions(RestService.java:216)at org.elasticsearch.hadoop.mr.EsInputFormat.getSplits(EsInputFormat.java:414)at org.elasticsearch.hadoop.hive.EsHiveInputFormat.getSplits(EsHiveInputFormat.java:115)at org.elasticsearch.hadoop.hive.EsHiveInputFormat.getSplits(EsHiveInputFormat.java:51)at org.apache.hadoop.hive.ql.exec.FetchOperator.generateWrappedSplits(FetchOperator.java:425)at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:395)at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:314)at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:540)at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:509)at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2691)at org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:229)at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188)at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402)at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.hadoop.util.RunJar.run(RunJar.java:323)at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.lang.RuntimeException: Multiple ES-Hadoop versions detected in the classpath; please use only one
jar:file:/root/soft/hive/apache-hive-3.1.3-bin/lib/elasticsearch-hadoop-7.17.6.jar
jar:file:/root/soft/hive/apache-hive-3.1.3-bin/lib/elasticsearch-hadoop-hive-7.17.6.jar
jar:file:/root/soft/hive/apache-hive-3.1.3-bin/lib/elasticsearch-hadoop-mr-7.17.6.jar
jar:file:/root/soft/hive/apache-hive-3.1.3-bin/lib/elasticsearch-hadoop-pig-7.17.6.jar
jar:file:/root/soft/hive/apache-hive-3.1.3-bin/lib/elasticsearch-spark-20_2.11-7.17.6.jar
jar:file:/root/soft/hive/apache-hive-3.1.3-bin/lib/elasticsearch-storm-7.17.6.jar

删除掉多余的jar包,然后在执行一次insert 语句,再进行查询,显示如下,在hive中查询是没有问题了

hive> select * from es_hadoop_cluster ;
OK
吴占喜  30      175
令狐冲  50      180
任我行  60      160
吴占喜  30      175
令狐冲  50      180
任我行  60      160
2.在es集群上去进行查询

根据映射表中创建时的索引,进行查询,数据正常展示出来了,nice
在这里插入图片描述
开心,😄,有问题欢迎留言交流
在这里插入图片描述

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/66204.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

浅析ARMv8体系结构:异常处理机制

文章目录 概述异常类型中断终止Abort复位Reset系统调用 异常处理流程异常入口异常返回异常返回地址 堆栈选择 异常向量表异常向量表的配置 同步异常解析相关参考 概述 异常处理指的是处理器在运行过程中发生了外部事件&#xff0c;导致处理器需要中断当前执行流程转而去处理异…

leetcode原题: 最小值、最大数字

题目1&#xff1a;最小值 给定两个整数数组a和b&#xff0c;计算具有最小差绝对值的一对数值&#xff08;每个数组中取一个值&#xff09;&#xff0c;并返回该对数值的差 示例&#xff1a; 输入&#xff1a;{1, 3, 15, 11, 2}, {23, 127, 235, 19, 8} 输出&#xff1a;3&…

【德哥说库系列】-ASM管理Oracle 19C单实例部署

&#x1f4e2;&#x1f4e2;&#x1f4e2;&#x1f4e3;&#x1f4e3;&#x1f4e3; 哈喽&#xff01;大家好&#xff0c;我是【IT邦德】&#xff0c;江湖人称jeames007&#xff0c;10余年DBA及大数据工作经验 一位上进心十足的【大数据领域博主】&#xff01;&#x1f61c;&am…

一百六十九、Hadoop——Hadoop退出NameNode安全模式与查看磁盘空间详情(踩坑,附截图)

一、目的 在海豚跑定时跑kettle的从Kafka到HDFS的任务时&#xff0c;由于Linux服务器的某个文件磁盘空间满了&#xff0c;导致Hadoop的NodeName进入安全模式&#xff0c;此时光执行hdfs dfsadmin -safemode leave命令语句没有效果&#xff08;虽然显示Safe mode is OFF&#x…

Day53|leetcode 1143.最长公共子序列、1035.不相交的线、53. 最大子序和

leetcode 1143.最长公共子序列 题目链接&#xff1a;1143. 最长公共子序列 - 力扣&#xff08;LeetCode&#xff09; 视频链接&#xff1a;动态规划子序列问题经典题目 | LeetCode&#xff1a;1143.最长公共子序列_哔哩哔哩_bilibili 题目概述 给定两个字符串 text1 和 text2&…

Django请求的生命周期

Django请求的生命周期是指: 当用户在浏览器上输入URL到用户看到网页的这个时间段内&#xff0c;Django后台所发生的事情。 直白的来说就是当请求来的时候和请求走的阶段中&#xff0c;Django的执行轨迹。 一个完整的Django生命周期: 用户从客户端发出一条请求以后&#xff…

图像库 PIL(一)

Python 提供了 PIL&#xff08;python image library&#xff09;图像库&#xff0c;来满足开发者处理图像的功能&#xff0c;该库提供了广泛的文件格式支持&#xff0c;包括常见的 JPEG、PNG、GIF 等&#xff0c;它提供了图像创建、图像显示、图像处理等功能。 基本概念 要学…

延迟队列的理解与使用

目录 一、场景引入 二、延迟队列的三种场景 1、死信队列TTL对队列进行延迟 2、创建通用延时消息死信队列 对消息延迟 3、使用rabbitmq的延时队列插件 x-delayed-message使用 父pom文件 pom文件 配置文件 config 生产者 消费者 结果 一、场景引入 我们知道可以通过TT…

Mybatis学习|多对一、一对多

有多个学生&#xff0c;没个学生都对应&#xff08;关联&#xff09;了一个老师&#xff0c;这叫&#xff08;多对一&#xff09; 对于每个老师而言&#xff0c;每个老师都有N个学生&#xff08;学生集合&#xff09;&#xff0c;这叫&#xff08;一对多&#xff09; 测试环境…

[杂谈]-快速了解Modbus协议

快速了解Modbus协议 文章目录 快速了解Modbus协议1、为何 Modbus 如此受欢迎2、范围和数据速率3、逻辑电平4、层数5、网络与通讯6、数据帧格式7、数据类型8、服务器如何存储数据9、总结 ​ Modbus 是一种流行的低速串行通信协议&#xff0c;广泛应用于自动化行业。 该协议由 Mo…

力扣2. 两数相加

2. 两数相加 给你两个 非空 的链表&#xff0c;表示两个非负的整数。它们每位数字都是按照 逆序 的方式存储的&#xff0c;并且每个节点只能存储 一位 数字。 请你将两个数相加&#xff0c;并以相同形式返回一个表示和的链表。 你可以假设除了数字 0 之外&#xff0c;这两个…

16 个前端安全知识

16 个前端安全知识 去年 security course 上的是 React&#xff0c;然后学了一些 一些 React 项目中可能存在的安全隐患&#xff0c;今年看了一下列表&#xff0c;正好看到了前端也有更新&#xff0c;所以就把这个补上了。 一个非常好学习各种安全隐患的机构是 https://owasp…

机器学习基础16-建立预测模型项目模板

机器学习是一项经验技能&#xff0c;经验越多越好。在项目建立的过程中&#xff0c;实 践是掌握机器学习的最佳手段。在实践过程中&#xff0c;通过实际操作加深对分类和回归问题的每一个步骤的理解&#xff0c;达到学习机器学习的目的 预测模型项目模板 不能只通过阅读来掌握…

机器学习的第一节基本概念的相关学习

目录 1.1 决策树的概念 1.2 KNN的概念 1.2.1KNN的基本原理 1.2.2 流程&#xff1a; 1.2.3 优缺点 1.3 深度学习 1.4 梯度下降 损失函数 1.5 特征与特征选择 特征选择的目的 1.6 python中dot函数总结 一维数组的点积&#xff1a; 二维数组&#xff08;矩阵&#xff09;的乘法&am…

深入了解Kubernetes(k8s):安装、使用和Java部署指南(持续更新中)

目录 Docker 和 k8s 简介1、kubernetes 组件及其联系1.1 Node1.2 Pod1.3 Service 2、安装docker3、单节点 kubernetes 和 KubeSphere 安装3.1 安装KubeKey3.2 安装 kubernetes 和 KubeSphere3.3 验证安装结果 4、集群版 kubernetes 和 KubeSphere 安装5、kubectl 常用命令6、资…

浅谈下cdn以及防盗链问题

目录 一、什么是cdn 二、使用cdn带来的好处 三、CDN工作原理 四、cdn使用场景 五、流媒体CDN之防盗链问题 一、什么是cdn CDN&#xff08;Content Delivery Network&#xff09;是一种分布式网络架构&#xff0c;用于提供高效的内容分发服务。CDN通过将内容缓存在离用户最…

Postgresql JSON对象和数组查询

文章目录 一. Postgresql 9.5以下版本1.1 简单查询(缺陷&#xff1a;数组必须指定下标&#xff0c;不推荐)1.1.1 模糊查询1.1.2 等值匹配1.1.3 时间搜索1.1.4 在列表1.1.5 包含 1.2 多层级JSONArray&#xff08;推荐&#xff09;1.2.1 模糊查询1.2.2 模糊查询 NOT1.2.3 等值匹配…

恢复数据的利器:易我数据恢复终身技术版v16.2.0.0

EaseUS Data Recovery Wizard为全球提供数据恢复方案,用于误删数据数据,电脑误删文件恢复,格式化硬盘数据恢复,手机U盘数据恢复等,RAID磁盘阵列数据恢复,分区丢失及其它未知原因丢失的数据恢复,简单易用轻松的搞定数据恢复。 特点描述 - 易我数据恢复中文便携版&#xff0c;无…

STM32f103入门(10)ADC模数转换器

ADC模数转换器 ADC简介AD单通道初始化代码编写第一步开启时钟第二步 RCCCLK分频 6分频 72M/612M第三步 配置GPIO 配置为AIN状态第四步&#xff0c;选择规则组的输入通道第五步 用结构体 初始化ADC第六步 对ADC进行校准编写获取电压函数初始化代码如下 Main函数编写 ADC简介 ADC…

植物根系基因组与数据分析

1.背景 这段内容主要是关于植物对干旱胁迫的反应&#xff0c;并介绍了生活在植物体内外以及根际的真菌和细菌的作用。然而&#xff0c;目前对这些真菌和细菌的稳定性了解甚少。作者通过调查微生物群落组成和微生物相关性的方法&#xff0c;对农业系统中真菌和细菌对干旱的抗性…