目录
1.join/left join/full join 语句会过滤关联字段 null 的值吗?
(1)join
(2) left join /full join
2.group by 分组语句会进行排序吗?
1.join/left join/full join 语句会过滤关联字段 null 的值吗?
(1)join
sql:
explain
selecta.into_id,a.into_bus_id,b.customer_type as customer_type
from
--进件表
(select * from dp_ods.o_fk_eagle_jsd_intopieces_s
where etl_date = '2023-09-06' and substr(into_time ,1,4) >= '2019' ) a
join
(select * from dp_ods.o_hyd_jsd_loan_order_s where etl_date = '2023-09-06') b
on a.into_id=b.loan_id
STAGE DEPENDENCIES:Stage-1 is a root stageStage-0 depends on stages: Stage-1STAGE PLANS:Stage: Stage-1Map ReduceMap Operator Tree:TableScanalias: o_fk_eagle_jsd_intopieces_sStatistics: Num rows: 36940508 Data size: 360575316992 Basic stats: COMPLETE Column stats: NONEFilter Operatorpredicate: ((substr(into_time, 1, 4) >= '2019') and into_id is not null) (type: boolean)Statistics: Num rows: 12313502 Data size: 120191765823 Basic stats: COMPLETE Column stats: NONESelect Operatorexpressions: into_id (type: string), into_bus_id (type: string)outputColumnNames: _col0, _col1Statistics: Num rows: 12313502 Data size: 120191765823 Basic stats: COMPLETE Column stats: NONEReduce Output Operatorkey expressions: _col0 (type: string)sort order: +Map-reduce partition columns: _col0 (type: string)Statistics: Num rows: 12313502 Data size: 120191765823 Basic stats: COMPLETE Column stats: NONEvalue expressions: _col1 (type: string)TableScanalias: o_hyd_jsd_loan_order_sStatistics: Num rows: 44174590 Data size: 84039429353 Basic stats: COMPLETE Column stats: NONEFilter Operatorpredicate: loan_id is not null (type: boolean)Statistics: Num rows: 44174590 Data size: 84039429353 Basic stats: COMPLETE Column stats: NONESelect Operatorexpressions: loan_id (type: string), customer_type (type: int)outputColumnNames: _col0, _col1Statistics: Num rows: 44174590 Data size: 84039429353 Basic stats: COMPLETE Column stats: NONEReduce Output Operatorkey expressions: _col0 (type: string)sort order: +Map-reduce partition columns: _col0 (type: string)Statistics: Num rows: 44174590 Data size: 84039429353 Basic stats: COMPLETE Column stats: NONEvalue expressions: _col1 (type: int)Reduce Operator Tree:Join Operatorcondition map:Inner Join 0 to 1keys:0 _col0 (type: string)1 _col0 (type: string)outputColumnNames: _col0, _col1, _col3Statistics: Num rows: 48592050 Data size: 92443374291 Basic stats: COMPLETE Column stats: NONESelect Operatorexpressions: _col0 (type: string), _col1 (type: string), _col3 (type: int)outputColumnNames: _col0, _col1, _col2Statistics: Num rows: 48592050 Data size: 92443374291 Basic stats: COMPLETE Column stats: NONEFile Output Operatorcompressed: falseStatistics: Num rows: 48592050 Data size: 92443374291 Basic stats: COMPLETE Column stats: NONEtable:input format: org.apache.hadoop.mapred.TextInputFormatoutput format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDeStage: Stage-0Fetch Operatorlimit: -1Processor Tree:ListSink
可以看到filter 顾虑部分有比原sql多了关联字段不为空的判断
(2) left join /full join
没有相关的过滤空值操作
2.group by 分组语句会进行排序吗?
explain
select date(create_time),max(actual_amount) max_actual_amount
from dp_ods.o_hyd_jsd_loan_order_s
where etl_date = '2023-09-06'
group by date(create_time)
STAGE DEPENDENCIES:Stage-1 is a root stageStage-2 depends on stages: Stage-1Stage-0 depends on stages: Stage-2STAGE PLANS:Stage: Stage-1Map ReduceMap Operator Tree:TableScanalias: o_hyd_jsd_loan_order_sStatistics: Num rows: 44174590 Data size: 84039429353 Basic stats: COMPLETE Column stats: NONESelect Operatorexpressions: CAST( create_time AS DATE) (type: date), actual_amount (type: double)outputColumnNames: _col0, _col1Statistics: Num rows: 44174590 Data size: 84039429353 Basic stats: COMPLETE Column stats: NONEGroup By Operatoraggregations: max(_col1)keys: _col0 (type: date)mode: hashoutputColumnNames: _col0, _col1Statistics: Num rows: 44174590 Data size: 84039429353 Basic stats: COMPLETE Column stats: NONEReduce Output Operatorkey expressions: _col0 (type: date)sort order: +Map-reduce partition columns: rand() (type: double)Statistics: Num rows: 44174590 Data size: 84039429353 Basic stats: COMPLETE Column stats: NONEvalue expressions: _col1 (type: double)Reduce Operator Tree:Group By Operatoraggregations: max(VALUE._col0)keys: KEY._col0 (type: date)mode: partialsoutputColumnNames: _col0, _col1Statistics: Num rows: 44174590 Data size: 84039429353 Basic stats: COMPLETE Column stats: NONEFile Output Operatorcompressed: falsetable:input format: org.apache.hadoop.mapred.SequenceFileInputFormatoutput format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormatserde: org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDeStage: Stage-2Map ReduceMap Operator Tree:TableScanReduce Output Operatorkey expressions: _col0 (type: date)sort order: +Map-reduce partition columns: _col0 (type: date)Statistics: Num rows: 44174590 Data size: 84039429353 Basic stats: COMPLETE Column stats: NONEvalue expressions: _col1 (type: double)Reduce Operator Tree:Group By Operatoraggregations: max(VALUE._col0)keys: KEY._col0 (type: date)mode: finaloutputColumnNames: _col0, _col1Statistics: Num rows: 22087295 Data size: 42019714676 Basic stats: COMPLETE Column stats: NONEFile Output Operatorcompressed: falseStatistics: Num rows: 22087295 Data size: 42019714676 Basic stats: COMPLETE Column stats: NONEtable:input format: org.apache.hadoop.mapred.TextInputFormatoutput format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDeStage: Stage-0Fetch Operatorlimit: -1Processor Tree:ListSink
可以看到group by 字段是进行了正序排序的,查看sql执行结果也能看到。