CDH显示
问题导致原因:
hbase org.apache.hadoop.hbase.mapreduce.Import -Dmapred.job.queue.name=etl crawl:wechat_biz /hbase/test4
执行import时,短时间内写入数据量过大导致写入异常。
18/09/11 09:44:27 INFO mapreduce.Job: Task Id : attempt_1536465059397_0003_m_000125_1, Status : FAILED
Error: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 404 actions: org.apache.hadoop.hbase.RegionTooBusyException: Over memstore limit=256.0M, regionName=2431744e123e49dee5f099876ebb8bff, server=testHostName,16020,1536467992250at org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:4194)at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3815)at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3755)at org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:1027)at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchOp(RSRpcServices.java:959)at org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:922)at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2666)at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:42014)at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
: 404 times, servers with issues: fwqzx011.zh,16020,1536467992250at org.apache.hadoop.hbase.client.BatchErrors.makeException(BatchErrors.java:54)at org.apache.hadoop.hbase.client.AsyncRequestFutureImpl.getErrors(AsyncRequestFutureImpl.java:1225)at org.apache.hadoop.hbase.client.BufferedMutatorImpl.doFlush(BufferedMutatorImpl.java:309)at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:203)at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:179)at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:143)at org.apache.hadoop.hbase.mapreduce.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:93)at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:670)at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)at org.apache.hadoop.hbase.mapreduce.Import$Importer.processKV(Import.java:584)at org.apache.hadoop.hbase.mapreduce.Import$Importer.writeResult(Import.java:539)at org.apache.hadoop.hbase.mapreduce.Import$Importer.map(Import.java:522)at org.apache.hadoop.hbase.mapreduce.Import$Importer.map(Import.java:505)at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1685)at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
原因Region的memstore占用内存大小超过正常的n倍,这时候会抛异常,写入请求会被拒绝,客户端开始重试请求。当达到128M的时候会触发flush memstore,当达到128M * n还没法触发flush时候会抛异常来拒绝写入。两个相关参数的默认值如下:
hbase.hregion.memstore.flush.size=128M
hbase.hregion.memstore.block.multiplier=4
调整相关参数,保证还数据导入时不会异常。
hbase.hregion.memstore.flush.size=512M
hbase.hregion.memstore.block.multiplier=8
hbase.hregion.memstore.block.multiplier参数参考:
https://blog.csdn.net/zhangshenghang/article/details/82745205