在之前的博文中分享过一个执行了两天的一条sql语句,走了两个大表的扫描,导致执行时间很长,通过简化sql做了不小的改进,今天我们来看看还可以做些什么。
上次简化后的语句如下:
with tmp_logical_date as (SELECT logical_date
FROM logical_date
WHERE logical_date_type = 'R'
AND expiration_date IS NULL)
SELECT trim(TO_CHAR(COUNT(distinct coll.entity_id), '000000000'))
FROM cl1_coll_entity coll,
table_bpm_step_inst bpm,
table_bpm_step,
ar1_account,
csm_account,
csm_pay_channel,
customer,
subscriber,
ar1_billing_arrangement,
ar1_address_name,
charge_distribute,
tmp_logical_date
WHERE coll.entity_id(+) = csm_account.ban
AND coll.proc_inst_id = bpm.parent2proc_inst
AND bpm.step2step = table_bpm_step.objid
AND bpm.status = 30
AND coll.entity_id = ar1_account.account_id
AND csm_account.ban = csm_pay_channel.ban
-- AND ar1_account.account_id = ar1_aged_trial_balance.account_id
AND csm_account.customer_id = customer.customer_id
AND csm_account.customer_id = subscriber.customer_id
AND ar1_account.account_id = ar1_billing_arrangement.account_id
AND ar1_account.account_id = ar1_address_name.account_id
AND ar1_address_name.address_type = 'ACC'
and exists(
(SELECT 1
FROM ar1_aged_trial_balance
WHERE aged_type = 'D'
AND group_type = 'B'
AND status = 'EFF'
AND TRUNC(tmp_logical_date.logical_date - due_date) >= 0
AND account_id = coll.entity_id
)
)
AND subscriber.trx_id = charge_distribute.trx_id
AND subscriber.subscriber_no = charge_distribute.agreement_no
AND charge_distribute.target_pcn = csm_pay_channel.pym_channel_no
AND csm_account.ban = csm_pay_channel.ban
AND EXISTS
(SELECT null--cl1_treatment_activity.entity_id
FROM cl1_treatment_activity, table_bpm_step_inst, table_bpm_step
WHERE cl1_treatment_activity.step_id = table_bpm_step_inst.objid
AND table_bpm_step_inst.step2step = table_bpm_step.objid
AND table_bpm_step.NAME LIKE '%IVR%'
AND table_bpm_step_inst.status = 65
AND TO_DATE(TO_CHAR(cl1_treatment_activity.activity_date,
'YYYYMMDD'),
'YYYYMMDD') =tmp_logical_date.logical_date
AND cl1_treatment_activity.entity_id = csm_account.ban)
单纯来看这么多表的关联,着实是一个很棘手的事情,十多张大表关联,从技术角度来看,oracle的分析确实还是很细致的,根据数据量,走索引的地方都走了索引,预估的数据量也差不离。
但是想对这条语句做进一步的改进,单纯调整执行计划还是很有限制的。
我们来看看一个新的方法,首先我已经被这些表关联弄晕了,我简单整理了下面的图表。这个图表能够很清楚的看到表连接的情况。
表的数据都是基于cl1_coll_entity,但是通过这个图发现,重心似乎转移了。感觉重心似乎是csm_account
我们来看看csm_account和cl1_coll_entity的关联,使用了一个外连接,即对于csm_account中的关联数据在cl1_coll_entity都存在。csm_account的数据是最全的。
coll.entity_id(+) = csm_account.ban
明白了这一点,我们来看看红色框内的表连接,既然csm_account中的数据是完整的,类似一个全表扫描,那么后面的一个环形表连接就是多余的。因为方框中的表连接都是业务层面,是这些entity之间的完全映射。这些表中没有额外的过滤条件。
可以通过一个简单的例子来说明。我们创建两个表csm_account,cl1_coll_entity
create table csm_account(id number);
insert into csm_account values(1);
insert into csm_account values(2);
insert into csm_account values(3);
create table cl1_coll_entity(id number);
insert into cl1_coll_entity values(1);
select coll.id from cl1_coll_entity coll,csm_account
where coll.id(+)=csm_account.id
ID
----------
1
3 rows selected.
select count(coll.id) from cl1_coll_entity coll,csm_account
where coll.id(+)=csm_account.id
COUNT(COLL.ID)
--------------
1
1 row selected.
因为cl1_coll_entity中的数据是csm_account中的子集,所以后面csm_account的完全映射丝毫不会对cl1_coll_entity的数据有任何的影响。既然没有任何的影响,就不需要保留它了。
同理标红的ar1_billing_arrangement和ar1_account中的数据是多对一的映射。这个也是完全从业务层面保证。
简化后的表连接情况如下:
可以看到原本14个表连接最后简化为了8个表连接,简化的幅度还是比较大的。
这种简化思路可以在平时的调优中参考,从业务层面能够完全保证的数据情况反复关联就显得有些冗余了。毕竟从技术层面我们无法得到更多的细节。
不管怎么样,都是为了简化逻辑,减少资源的消耗。
上次简化后的语句如下:
with tmp_logical_date as (SELECT logical_date
FROM logical_date
WHERE logical_date_type = 'R'
AND expiration_date IS NULL)
SELECT trim(TO_CHAR(COUNT(distinct coll.entity_id), '000000000'))
FROM cl1_coll_entity coll,
table_bpm_step_inst bpm,
table_bpm_step,
ar1_account,
csm_account,
csm_pay_channel,
customer,
subscriber,
ar1_billing_arrangement,
ar1_address_name,
charge_distribute,
tmp_logical_date
WHERE coll.entity_id(+) = csm_account.ban
AND coll.proc_inst_id = bpm.parent2proc_inst
AND bpm.step2step = table_bpm_step.objid
AND bpm.status = 30
AND coll.entity_id = ar1_account.account_id
AND csm_account.ban = csm_pay_channel.ban
-- AND ar1_account.account_id = ar1_aged_trial_balance.account_id
AND csm_account.customer_id = customer.customer_id
AND csm_account.customer_id = subscriber.customer_id
AND ar1_account.account_id = ar1_billing_arrangement.account_id
AND ar1_account.account_id = ar1_address_name.account_id
AND ar1_address_name.address_type = 'ACC'
and exists(
(SELECT 1
FROM ar1_aged_trial_balance
WHERE aged_type = 'D'
AND group_type = 'B'
AND status = 'EFF'
AND TRUNC(tmp_logical_date.logical_date - due_date) >= 0
AND account_id = coll.entity_id
)
)
AND subscriber.trx_id = charge_distribute.trx_id
AND subscriber.subscriber_no = charge_distribute.agreement_no
AND charge_distribute.target_pcn = csm_pay_channel.pym_channel_no
AND csm_account.ban = csm_pay_channel.ban
AND EXISTS
(SELECT null--cl1_treatment_activity.entity_id
FROM cl1_treatment_activity, table_bpm_step_inst, table_bpm_step
WHERE cl1_treatment_activity.step_id = table_bpm_step_inst.objid
AND table_bpm_step_inst.step2step = table_bpm_step.objid
AND table_bpm_step.NAME LIKE '%IVR%'
AND table_bpm_step_inst.status = 65
AND TO_DATE(TO_CHAR(cl1_treatment_activity.activity_date,
'YYYYMMDD'),
'YYYYMMDD') =tmp_logical_date.logical_date
AND cl1_treatment_activity.entity_id = csm_account.ban)
单纯来看这么多表的关联,着实是一个很棘手的事情,十多张大表关联,从技术角度来看,oracle的分析确实还是很细致的,根据数据量,走索引的地方都走了索引,预估的数据量也差不离。
但是想对这条语句做进一步的改进,单纯调整执行计划还是很有限制的。
我们来看看一个新的方法,首先我已经被这些表关联弄晕了,我简单整理了下面的图表。这个图表能够很清楚的看到表连接的情况。
表的数据都是基于cl1_coll_entity,但是通过这个图发现,重心似乎转移了。感觉重心似乎是csm_account
我们来看看csm_account和cl1_coll_entity的关联,使用了一个外连接,即对于csm_account中的关联数据在cl1_coll_entity都存在。csm_account的数据是最全的。
coll.entity_id(+) = csm_account.ban
明白了这一点,我们来看看红色框内的表连接,既然csm_account中的数据是完整的,类似一个全表扫描,那么后面的一个环形表连接就是多余的。因为方框中的表连接都是业务层面,是这些entity之间的完全映射。这些表中没有额外的过滤条件。
可以通过一个简单的例子来说明。我们创建两个表csm_account,cl1_coll_entity
create table csm_account(id number);
insert into csm_account values(1);
insert into csm_account values(2);
insert into csm_account values(3);
create table cl1_coll_entity(id number);
insert into cl1_coll_entity values(1);
select coll.id from cl1_coll_entity coll,csm_account
where coll.id(+)=csm_account.id
ID
----------
1
3 rows selected.
select count(coll.id) from cl1_coll_entity coll,csm_account
where coll.id(+)=csm_account.id
COUNT(COLL.ID)
--------------
1
1 row selected.
因为cl1_coll_entity中的数据是csm_account中的子集,所以后面csm_account的完全映射丝毫不会对cl1_coll_entity的数据有任何的影响。既然没有任何的影响,就不需要保留它了。
同理标红的ar1_billing_arrangement和ar1_account中的数据是多对一的映射。这个也是完全从业务层面保证。
简化后的表连接情况如下:
可以看到原本14个表连接最后简化为了8个表连接,简化的幅度还是比较大的。
这种简化思路可以在平时的调优中参考,从业务层面能够完全保证的数据情况反复关联就显得有些冗余了。毕竟从技术层面我们无法得到更多的细节。
不管怎么样,都是为了简化逻辑,减少资源的消耗。