PostgreSQL Oracle 兼容性之 - INDEX SKIP SCAN (递归查询变态优化) 非驱动列索引扫描优化...

标签

PostgreSQL , Oracle , index skip scan , 非驱动列条件 , 递归查询 , 子树


背景

对于输入条件在复合索引中为非驱动列的,如何高效的利用索引扫描?

在Oracle中可以使用index skip scan来实现这类CASE的高效扫描:

INDEX跳跃扫描一般用在WHERE条件里面没有使用到引导列,但是用到了引导列以外的其他列,并且引导列的DISTINCT值较少的情况。

在这种情况下,数据库把这个复合索引逻辑上拆散为多个子索引,依次搜索子索引中非引导列的WHERE条件里面的值。

使用方法如下:

/*+ INDEX_SS ( [ @ qb_name ] tablespec [ indexspec [ indexspec ]... ] ) */  

The INDEX_SS hint instructs the optimizer to perform an index skip scan for the specified table. If the statement uses an index range scan, then Oracle scans the index entries in ascending order of their indexed values. In a partitioned index, the results are in ascending order within each partition.Each parameter serves the same purpose as in "INDEX Hint". For example:

SELECT /*+ INDEX_SS(e emp_name_ix) */ last_name FROM employees e WHERE first_name = 'Steven';  

下面是来自ORACLE PERFORMANCE TUNING里的原文:

Index skip scans improve index scans by nonprefix columns. Often, scanning index blocks is faster than scanning table data blocks.

Skip scanning lets a composite index be split logically into smaller subindexes. In skip scanning, the initial column of the composite index is not specified in the query. In other words, it is skipped.

The number of logical subindexes is determined by the number of distinct values in the initial column. Skip scanning is advantageous if there are few distinct values in the leading column of the composite index and many distinct values in the nonleading key of the index.

Example 13-5 Index Skip Scan

Consider, for example, a table

employees(  
sex,   
employee_id,  
address  
)   

with a composite index on

(sex, employee_id).   

Splitting this composite index would result in two logical subindexes, one for M and one for F.

For this example, suppose you have the following index data:

('F',98)('F',100)('F',102)('F',104)('M',101)('M',103)('M',105)  

The index is split logically into the following two subindexes:

The first subindex has the keys with the value F.

The second subindex has the keys with the value M

pic

The column sex is skipped in the following query:

SELECT * FROM employeesWHERE employee_id = 101;  

A complete scan of the index is not performed, but the subindex with the value F is searched first, followed by a search of the subindex with the value M.

PostgreSQL 非skip scan

PostgreSQL支持非驱动列的索引扫描,但是需要扫描整个索引。

例子

1、创建测试表

postgres=# create table t(id int, c1 int);  
CREATE TABLE  

2、写入1000万测试数据

postgres=# insert into t select random()*1 , id from generate_series(1,10000000) id;  
INSERT 0 10000000  

3、创建多列索引

postgres=# create index idx_t on t(id,c1);  
CREATE INDEX  

4、非驱动列查询测试如下

index only scan

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where c1=1;  QUERY PLAN                                                                   
-------------------------------------------------------------------------------------------------------------------------------------------  Index Only Scan using idx_t on public.t  (cost=10000000000.43..10000105164.89 rows=1 width=8) (actual time=0.043..152.288 rows=1 loops=1)  Output: id, c1  Index Cond: (t.c1 = 1)  Heap Fetches: 0  Buffers: shared hit=27326  Execution time: 152.328 ms  
(6 rows)  

index scan

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where c1=1;  QUERY PLAN                                                         
-----------------------------------------------------------------------------------------------------------------------  Index Scan using idx_t on public.t  (cost=0.43..105165.99 rows=1 width=8) (actual time=0.022..151.845 rows=1 loops=1)  Output: id, c1  Index Cond: (t.c1 = 1)  Buffers: shared hit=27326  Execution time: 151.881 ms  
(5 rows)  

bitmap scan

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where c1=1;  QUERY PLAN                                                         
------------------------------------------------------------------------------------------------------------------------  Bitmap Heap Scan on public.t  (cost=105164.88..105166.00 rows=1 width=8) (actual time=151.731..151.732 rows=1 loops=1)  Output: id, c1  Recheck Cond: (t.c1 = 1)  Heap Blocks: exact=1  Buffers: shared hit=27326  ->  Bitmap Index Scan on idx_t  (cost=0.00..105164.88 rows=1 width=0) (actual time=151.721..151.721 rows=1 loops=1)  Index Cond: (t.c1 = 1)  Buffers: shared hit=27325  Execution time: 151.777 ms  
(9 rows)  

seq scan(全表扫描)

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where c1=1;  QUERY PLAN                                                  
---------------------------------------------------------------------------------------------------------  Seq Scan on public.t  (cost=0.00..169248.41 rows=1 width=8) (actual time=0.014..594.535 rows=1 loops=1)  Output: id, c1  Filter: (t.c1 = 1)  Rows Removed by Filter: 9999999  Buffers: shared hit=44248  Execution time: 594.568 ms  
(6 rows)  

使用索引扫,因为不需要FILTER,同时扫描的BLOCK更少,所以性能比全表扫略好。但是还是扫了整个索引的PAGE,所以并不能算skip scan。

那么如何让PostgreSQL支持index skip scan呢?

PostgreSQL skip scan

实际上原理和Oracle类似,可以输入驱动列条件,然后按多个条件扫描,这样就能达到SKIP SCAN的效果。(即多颗子树扫描)。

同样也更加适合于驱动列DISTINCT值较少的情况。

用PostgreSQL的递归查询语法可以实现这样的加速效果。这种方法也被用于获取count(distinct), distinct值等。

《distinct xx和count(distinct xx)的变态递归优化方法 - 索引收敛(skip scan)扫描》

例如,我们通过这个方法,可以快速的得到驱动列的唯一值

with recursive skip as (    (    select min(t.id) as id from t where t.id is not null    )    union all    (    select (select min(t.id) as id from t where t.id > s.id and t.id is not null)     from skip s where s.id is not null    )  -- 这里的where s.id is not null 一定要加,否则就死循环了.    
)     
select id from skip ;  

然后封装到如下SQL,实现skip scan的效果

explain (analyze,verbose,timing,costs,buffers) select * from t where id in  
(  
with recursive skip as (    (    select min(t.id) as id from t where t.id is not null    )    union all    (    select (select min(t.id) as id from t where t.id > s.id and t.id is not null)     from skip s where s.id is not null    )  -- 这里的where s.id is not null 一定要加,否则就死循环了.    
)     
select id from skip   
) and c1=1  
union all   
select * from t where id is null and c1=1;  

或者

explain (analyze,verbose,timing,costs,buffers) select * from t where id = any(array  
(  
with recursive skip as (    (    select min(t.id) as id from t where t.id is not null    )    union all    (    select (select min(t.id) as id from t where t.id > s.id and t.id is not null)     from skip s where s.id is not null    )  -- 这里的where s.id is not null 一定要加,否则就死循环了.    
)     
select id from skip   
)) and c1=1  
union all   
select * from t where id is null and c1=1;  

看执行计划:

效果好多了

  QUERY PLAN                                                                                          
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  Append  (cost=55.00..215.22 rows=2 width=8) (actual time=0.127..0.138 rows=1 loops=1)  Buffers: shared hit=21  ->  Nested Loop  (cost=55.00..213.64 rows=1 width=8) (actual time=0.126..0.127 rows=1 loops=1)  Output: t.id, t.c1  Buffers: shared hit=18  ->  HashAggregate  (cost=54.57..55.58 rows=101 width=4) (actual time=0.108..0.109 rows=3 loops=1)  Output: skip.id  Group Key: skip.id  Buffers: shared hit=11  ->  CTE Scan on skip  (cost=51.29..53.31 rows=101 width=4) (actual time=0.052..0.102 rows=3 loops=1)  Output: skip.id  Buffers: shared hit=11  CTE skip  ->  Recursive Union  (cost=0.46..51.29 rows=101 width=4) (actual time=0.050..0.099 rows=3 loops=1)  Buffers: shared hit=11  ->  Result  (cost=0.46..0.47 rows=1 width=4) (actual time=0.049..0.049 rows=1 loops=1)  Output: $1  Buffers: shared hit=4  InitPlan 3 (returns $1)  ->  Limit  (cost=0.43..0.46 rows=1 width=4) (actual time=0.045..0.046 rows=1 loops=1)  Output: t_3.id  Buffers: shared hit=4  ->  Index Only Scan using idx_t on public.t t_3  (cost=0.43..205165.21 rows=10000033 width=4) (actual time=0.045..0.045 rows=1 loops=1)  Output: t_3.id  Index Cond: (t_3.id IS NOT NULL)  Heap Fetches: 0  Buffers: shared hit=4  ->  WorkTable Scan on skip s  (cost=0.00..4.88 rows=10 width=4) (actual time=0.015..0.015 rows=1 loops=3)  Output: (SubPlan 2)  Filter: (s.id IS NOT NULL)  Rows Removed by Filter: 0  Buffers: shared hit=7  SubPlan 2  ->  Result  (cost=0.46..0.47 rows=1 width=4) (actual time=0.018..0.019 rows=1 loops=2)  Output: $3  Buffers: shared hit=7  InitPlan 1 (returns $3)  ->  Limit  (cost=0.43..0.46 rows=1 width=4) (actual time=0.018..0.018 rows=0 loops=2)  Output: t_2.id  Buffers: shared hit=7  ->  Index Only Scan using idx_t on public.t t_2  (cost=0.43..76722.42 rows=3333344 width=4) (actual time=0.017..0.017 rows=0 loops=2)  Output: t_2.id  Index Cond: ((t_2.id > s.id) AND (t_2.id IS NOT NULL))  Heap Fetches: 0  Buffers: shared hit=7  ->  Index Only Scan using idx_t on public.t  (cost=0.43..1.56 rows=1 width=8) (actual time=0.005..0.005 rows=0 loops=3)  Output: t.id, t.c1  Index Cond: ((t.id = skip.id) AND (t.c1 = 1))  Heap Fetches: 0  Buffers: shared hit=7  ->  Index Only Scan using idx_t on public.t t_1  (cost=0.43..1.56 rows=1 width=8) (actual time=0.010..0.010 rows=0 loops=1)  Output: t_1.id, t_1.c1  Index Cond: ((t_1.id IS NULL) AND (t_1.c1 = 1))  Heap Fetches: 0  Buffers: shared hit=3  Execution time: 0.256 ms  
(56 rows)  

从150多毫秒,降低到了0.256毫秒

内核层面优化

与Oracle做法类似,或者说与递归的做法类似。

使用这种方法来改进优化器,可以达到index skip scan的效果,而且不用改写SQL。

参考

《distinct xx和count(distinct xx)的变态递归优化方法 - 索引收敛(skip scan)扫描》

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/254850.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

如何确定镜头CCD靶面尺寸?

在组建机器视觉系统时,需要选用适合实际应用的产品。今天,中国机器视觉商城的培训课堂为您带来的是关于工业镜头CCD靶面尺寸的确定方法。 在选择镜头时,我们通常要注意一个原则:即小尺寸靶面的CCD可使用对应规格更大的镜头&#x…

lua去掉字符串中的UTF-8的BOM三个字节

废话不多说,还是先说点吧,项目中lua读取的text文件如果有BOM,客户端解析就会报错,所以我看了看,任务编辑器swGameTaskEditor 在写入文件的时候,也不知道为什么有的文件就是UTF-8BOM格式;但一般都…

JQuery对象与DOM对象的区别与转换

1.jQuery对象和DOM对象的区别 DOM对象,即是我们用传统的方法(javascript)获得的对象,jQuery对象即是用jQuery类库的选择器获得的对象; eg: var domObj document.getElementById("id"); //DOM对象var $obj $("#id"); //jQuery对象;…

halcon append_ocr_trainf 将字符添加到训练文件中

目录append_ocr_trainf(算子)描述参数append_ocr_trainf(算子) append_ocr_trainf - 将字符添加到训练文件中。 append_ocr_trainf(Character,Image :: Class,TrainingFile ? 描述 运算符a…

CCD 尺寸

CCD(包括CMOS感光元件)的面积是按其矩形对角线英寸长度为指标的。这和定义电视屏幕尺寸类似。一英寸是25.4毫米。1/2.0英寸、1/1.8都是指CCD 对角线有多少分之一英寸长,分母小的其分数值就大,相应感光元件面积也大。 1/2.…

Quagga的安装碰到的问题

1.如果出现以下错误: vtysh: symbol lookup error: /usr/local/lib/libreadline.so.6: undefined symbol: UP 解决方法如下: 1.rootlocalhost:~ # cd /usr/local/lib 2.rootlocalhost:/usr/local/lib# ls -la libreadline* 3.rootlocalhost:/usr/local/lib# mkd…

X264电影压缩率画质

X264电影压缩率画质全对比: http://www.mov8.com/dvd/freetalk_show.asp?id29778

halcon read_ocr_trainf 从文件中读取训练字符并转换为图像

目录read_ocr_trainf(算子)描述参数read_ocr_trainf(算子) read_ocr_trainf - 从文件中读取训练字符并转换为图像。 read_ocr_trainf(:Characters:TrainingFile:CharacterNames&am…

(十二)洞悉linux下的Netfilteramp;iptables:iptables命令行工具源码解析【下】

iptables用户空间和内核空间的交互 iptables目前已经支持IPv4和IPv6两个版本了,因此它在实现上也需要同时兼容这两个版本。iptables-1.4.0在这方面做了很好的设计,主要是由libiptc库来实现。libiptc是iptables control library的简称,是Netfi…

Linux 下实现普通用户只能写入某个目录

今天老婆问了我一个问题:如何在linux 下实现某个目录普通用户能够写入文件,但是不能删除或修改(只能由root 删除或修改)。开始的两分钟里,我初步判断这是做不到的,因为linux 下能 写入(w&#x…

CCD和CMOS摄像头成像原理以及其他区别

CCD的第二层是分色滤色片,目前有两种分色方式,一是RGB原色分色法,另一个则是CMYG补色分色法,这两种方法各有利弊。不过以产量来看,原色和补色CCD的比例大约在2:1左右。原色CCD的优…

FFMPEG分析比较细的文章

http://blog.csdn.net/ym012/article/details/6538301

恢复Ext3下被删除的文件(转)

前言 下面是这个教程将教你如何在Ext3的文件系统中恢复被rm掉的文件。 删除文件 假设我们有一个文件名叫 ‘test.txt’ $ls -il test.txt15 -rw-rw-r– 2 root root 20 Apr 17 12:08 test.txt 注意:: “-il” 选项表示显示文件的i-node号(15)…

halcon trainf_ocr_class_svm 训练OCR分类器

目录trainf_ocr_class_svm(算子)描述参数trainf_ocr_class_svm(算子) trainf_ocr_class_svm - 训练OCR分类器。 trainf_ocr_class_svm(:: OCRHandle,TrainingFile,Epsilon,TrainMo…

Javascript之全局变量和局部变量部分讲解

以此文作为自己学习的一个总结。 关于全局变量和局部变量的一句简单的定义:在函数外声明的变量都为全局变量,在函数内声明的为局部变量。 一、局部变量和全局变量重名会覆盖全局变量 1 var a 1; 2 function test1() { 3 var a 2; 4 ale…

XML-RPC使用手册

内容列表 Preface: About This Manual Introduction to XML-RPC for C/C What is XML-RPC? How Does XML-RPC For C/C Help? More Information On XML-RPC For C/CThe Xmlrpc-c Function Libraries C Libraries C LibrariesUtility Programs xmlrpc xmlrpc_dumpserverAlterna…

利用ffmpeg来进行视频解码的完整示例代码(H.264)

Decode() { FILE * inpf; int nWrite; int i,p; int nalLen; unsigned char* Buf; int got_picture, consumed_bytes; unsigned char *DisplayBuf; DisplayBuf(unsigned char *)malloc(60000); char outfile[] "test.pgm"; //1.打开输入文件 inpf fopen("test…

如何成为非标行业的大拿

1,选一个好的舞台(工作环境),有个广告词叫:‘心有多大,舞台就有多大’,我想变个说法叫‘舞台越大,心就越大’。决定你表演效果的舞台,你如果选择…

TCP UDP HTTP 的关系和区别

TCP UDP HTTP 三者的关系: TCP/IP是个协议组,可分为四个层次:网络接口层、网络层、传输层和应用层。 在网络层有IP协议、ICMP协议、ARP协议、RARP协议和BOOTP协议。 在传输层中有TCP协议与UDP协议。 在应用层有HTTP、FTP、TELNET、SMTP、DNS等协议。 TCP…

微信开放平台全网发布时,检测失败 —— C#

主要就是三个:返回API文本消息,返回普通文本消息,发送事件消息 --会出现失败的情况 (后续补充说明:出现检测出错,不一定是代码出现了问题,也有可能是1.微信方面检测时出现服务器请求失败&…