ORCA优化器浅析——QueryToDXL(CDXLLogical+CDXLScalar)主流程

Orca是Pivotal数据管理产品的新查询优化器,包括GPDB和HAWQ。Orca是一个基于Cascades操作时序框架的现代自上而下的查询优化器。虽然许多Cascades优化器与其主机系统紧密耦合,但Orca的一个独特功能是它能够作为独立的优化器在数据库系统之外运行。这种能力对于使用一个优化器支持具有不同计算架构(例如MPP和Hadoop)的产品至关重要。它还允许在Hadoop等新的查询处理范式中利用关系优化的广泛遗留问题。此外,将优化器作为一个独立的产品运行,可以在不经过数据库系统的整体结构的情况下进行精细的测试。Orca is the new query optimizer for Pivotal data management products, including GPDB and HAWQ. Orca is a modern top-down query optimizer based on the Cascades optimization framework. While many Cascades optimizers are tightly-coupled with their host systems, a unique feature of Orca is its ability to run outside the database system as a stand-alone optimizer. This ability is crucial to supporting products with different computing architectures (e.g., MPP and Hadoop) using one optimizer. It also allows leveraging the extensive legacy of relational optimization in new query processing paradigms like Hadoop. Furthermore, running the optimizer as a stand-alone product enables elaborate testing without going through the monolithic structure of a database system.
在这里插入图片描述
将优化器与数据库系统解耦需要构建一个处理查询的通信机制。Orca包括一个用于在优化器和数据库系统之间交换信息的框架,称为数据交换语言(DXL)该框架使用基于XML的语言对必要的信息进行编码。Decoupling the optimizer from the database system requires building a communication mechanism to process queries. Orca includes a framework for exchanging information between the optimizer and the database system called Data eXchange Language (DXL). The framework uses an XML-based language to encode the necessary information

用于通信,例如输入查询输出计划元数据。DXL之上是一个简单的通信协议,用于发送初始查询结构和检索优化的计划。DXL的一个主要好处是将Orca打包为一个独立的产品。图2显示了Orca和外部数据库系统之间的交互。Orca的输入是一个DXL查询。奥卡的输出是一个DXL计划。在优化期间,可以向数据库系统查询元数据(例如,表定义)。Orca通过允许数据库系统注册元数据提供者(MD提供者)来抽象元数据访问细节,该提供者负责在将元数据发送到Orca之前将元数据序列化到DXL中。元数据也可以从包含以DXL格式序列化的元数据对象的常规文件中使用。for communication, such as input queries, output plans and metadata. Overlaid on DXL is a simple communication protocol to send the initial query structure and retrieve the optimized plan. A major benefit of DXL is packaging Orca as a stand-alone product. Figure 2 shows the interaction between Orca and an external database system. The input to Orca is a DXL query. The output of Orca is a DXL plan. During optimization, the database system can be queried for metadata (e.g., table definitions). Orca abstracts metadata access details by allowing database system to register a metadata provider (MD Provider) that is responsible for serializing metadata into DXL before being sent to Orca. Metadata can also be consumed from regular files containing metadata objects serialized in DXL format.

数据库系统需要包括使用/发出DXL格式数据的翻译器。Query2DXL翻译器将查询解析树转换为DXL查询,而DXL2Plan翻译器将DXL计划转换为可执行计划。这种翻译器的实现完全在Orca之外完成,这允许多个系统通过提供适当的翻译器来使用Orca。Orca的体系结构具有高度的可扩展性;所有组件都可以单独更换和单独配置。图3显示了奥卡的不同组成部分。我们将这些组件简要描述如下。The database system needs to include translators that consume/emit data in DXL format. Query2DXL translator converts a query parse tree into a DXL query, while DXL2Plan translator converts a DXL plan into an executable plan. The implementation of such translators is done completely outside Orca, which allows multiple systems to use Orca by providing the appropriate translators. The architecture of Orca is highly extensible; all components can be replaced individually and configured separately. Figure 3 shows the different components of Orca. We briefly describe these components as follows.

CTranslatorQueryToDXL

QueryToDXL的主要调用流程在OptimizeTask函数中,主要功能由CTranslatorQueryToDXL类完成,QueryToDXLInstance是CTranslatorQueryToDXL类的工厂函数。CTranslatorQueryToDXL类依赖于元数据访问接口mda和Query查询树执行构造函数,并通过TranslateQueryToDXL这个主要函数进行转换动作的执行。
在这里插入图片描述
CTranslatorQueryToDXL::QueryToDXLInstance作为静态工厂函数,用于Creates a new CTranslatorQueryToDXL object for translating the given top-level query. 注意这里用到了CContextQueryToDXL类。
在这里插入图片描述
src\backend\gpopt\translate\CTranslatorQueryToDXL.cpp CTranslatorQueryToDXL类的实现

  • CTranslatorQueryToDXL.h涉及到的文件CContextQueryToDXL.h + CMappingVarColId.h + CTranslatorScalarToDXL.h + CTranslatorUtils.h + CDXLNode.h
  • CTranslatorQueryToDXL.cpp涉及到的文件CCTEListEntry.h + CQueryMutators.h + CTranslatorDXLToPlStmt.h + CTranslatorRelcacheToDXL.h + CDXLDatumInt8.h + CDXLScalarBooleanTest.h + dxlops.h + dxltokens.h + CMDIdGPDBCtas.h + CMDTypeBoolGPDB.h + IMDAggregate.h + IMDScalarOp.h + IMDTypeBool.h + IMDTypeInt8.h。其重要成员如下所示
    CTranslatorScalarToDXL *m_scalar_translator; // scalar translator used to convert scalar operation into DXL.
    CMappingVarColId *m_var_to_colid_map; // holds the var to col id information mapping
    HMUlCTEListEntry *m_query_level_to_cte_map; // hash map that maintains the list of CTEs defined at a particular query level key: query level value: the list of CTE
    CDXLNodeArray *m_dxl_cte_producers; // list of CTE producers
    UlongBoolHashMap *m_cteid_at_current_query_level_map; // CTE producer IDs defined at the current query level

CTranslatorQueryToDXL::CTranslatorQueryToDXL(CContextQueryToDXL *context, CMDAccessor *md_accessor, const CMappingVarColId *var_colid_mapping, Query *query, ULONG query_level, BOOL is_top_query_dml, HMUlCTEListEntry *query_level_to_cte_map)

  1. CheckSupportedCmdType(query) CheckRangeTable(query) WITH CHECK OPTION views are not supported yet
  2. 如果var_colid_mapping不为null,将var_colid_mapping拷贝为m_var_to_colid_map;否则就直接初始化新的
  3. 如果query_level_to_cte_map不为null,按照cte query level逐层将小于当前query level外层的cte list插入m_query_level_to_cte_map,保证当前层的query只能看到外层定义的cte
  4. CheckUnsupportedNodeTypes(query) 检查查询树中是否有不支持的结点类型 CheckSirvFuncsWithoutFromClause(query) check if the query has SIRV functions in the targetlist without a FROM clause
  5. first normalize the query m_query = CQueryMutators::NormalizeQuery(m_mp, m_md_accessor, query, query_level)
  6. 如果m_query->cteList不为空 ConstructCTEProducerList(m_query->cteList, query_level)
  7. m_scalar_translator = GPOS_NEW(m_mp)CTranslatorScalarToDXL(m_context, m_md_accessor, m_query_level, m_query_level_to_cte_map, m_dxl_cte_producers)

TranslateQueryToDXL main driver函数,以TranslateSelectQueryToDXL函数为例描述其流程
在这里插入图片描述

TranslateSelectQueryToDXL函数Translates a Query into a DXL tree. The function allocates memory in the translator memory pool, and caller is responsible for freeing it.

  1. CTranslatorUtils::CheckRTEPremissions(m_query->rtable)
  2. construct CTEAnchor operators for the CTEs defined at the top level CDXLNode *dxl_cte_anchor_top = NULL; CDXLNode *dxl_cte_anchor_bottom = NULL; ConstructCTEAnchors(m_dxl_cte_producers, &dxl_cte_anchor_top, &dxl_cte_anchor_bottom);
  3. 如果m_query->setOperations不为null,说明是union等操作
    child_dxlnode = TranslateSetOpToDXL(m_query->setOperations, m_query->targetList, output_attno_to_colid_mapping)
    CDXLLogicalSetOp *dxlop = CDXLLogicalSetOp::Cast(child_dxlnode->GetOperator());
    const CDXLColDescrArray *dxl_col_descr_array = dxlop->GetDXLColumnDescrArray();
    ForEach(lc, target_list) {
    TargetEntry *target_entry = (TargetEntry *) lfirst(lc);
    if (0 < target_entry->ressortgroupref) {
    ULONG colid = ((*dxl_col_descr_array)[resno - 1])->Id();
    AddSortingGroupingColumn( target_entry, sort_group_attno_to_colid_mapping, colid);
    }
    resno++;
    }
    如果m_query->windowClause不为null
    CDXLNode *dxlnode = TranslateFromExprToDXL(m_query->jointree)
    child_dxlnode = TranslateWindowToDXL(dxlnode, m_query->targetList, m_query->windowClause, m_query->sortClause, sort_group_attno_to_colid_mapping, output_attno_to_colid_mapping)
    其他情况 child_dxlnode = TranslateGroupingSets(m_query->jointree, m_query->targetList, m_query->groupClause,m_query->hasAggs, sort_group_attno_to_colid_mapping,output_attno_to_colid_mapping);
  4. translate limit clause CDXLNode *limit_dxlnode = TranslateLimitToDXLGroupBy(m_query->sortClause, m_query->limitCount, m_query->limitOffset, child_dxlnode, sort_group_attno_to_colid_mapping);
  5. 如果m_query->target不为NULL,需要为m_dxl_query_output_cols调用CreateDXLOutputCols(m_query->targetList, output_attno_to_colid_mapping)创建
  6. result_dxlnode = limit_dxlnode
  7. 如果dxl_cte_anchor_top不为NULL,需要加入CTE anchors. dxl_cte_anchor_bottom->AddChild(result_dxlnode); result_dxlnode = dxl_cte_anchor_top;

CDXLLogical

CDXLNode类所拥有的重要成员有4个(目前仅介绍两个),m_dxl_op是CDXLOperator类型的变量,在QueryToDXL流程中,其代表的是CDXLOperator的子类CDXLLogical和CDXLScalar;m_dxl_array是CDXLOperator类型Array,用于存放所属该节点的子节点,也是CDXLOperator类型的变量(CDXLLogical和CDXLScalar)。ORCA中目前支持的CDXLLogical子类如下所示。
在这里插入图片描述
以TranslateRTEToDXLLogicalGet【Returns a CDXLNode representing a from relation range table entry】为例,说明一下Query树子节点转换为DXL节点的流程。首先介绍一下RangeTblEntry节点:A range table entry may represent a plain relation, a sub-select in FROM, or the result of a JOIN clause. (Only explicit JOIN syntax produces an RTE, not the implicit join resulting from multiple FROM items. This is because we only need the RTE to deal with SQL features like outer joins and join-output-column aliasing.) Other special RTE types also exist, as indicated by RTEKind 【 RTE_RELATION(ordinary relation reference), RTE_SUBQUERY(subquery in FROM), RTE_JOIN(join), RTE_FUNCTION(function in FROM), RTE_VALUES(VALUES (<exprlist>), (<exprlist>), ...), RTE_VOID(CDB: deleted RTE), RTE_CTE(common table expr (WITH list element)), RTE_TABLEFUNCTION(CDB: Functions over multiset input )】。TranslateRTEToDXLLogicalGet函数只关注于处理RTE_RELATION(ordinary relation reference)类型的RangeTblEntry节点。

  1. 首先为range table entry的节点构造table descriptor
  2. 通过元数据访问接口为table descriptor获取IMDRelation元数据对象md_rel
  3. 通过md_rel元数据对象的存储类型,创建不同的CDXLLogical:为外部表创建CDXLLogicalExternalGet,其他表创建CDXLLogicalGet
  4. 创建CDXLNode结构体,并将第3步创建的dxl_op对象赋值给m_dxl_op成员
  5. 向CTranslatorQueryToDXL.m_var_to_colid_map中记录该表的列信息
  6. make note of the operator classes used in the distribution key
CDXLNode *CTranslatorQueryToDXL::TranslateRTEToDXLLogicalGet(const RangeTblEntry *rte, ULONG rt_index, ULONG  //current_query_level) {if (false == rte->inh){GPOS_ASSERT(RTE_RELATION == rte->rtekind);// RangeTblEntry::inh is set to false iff there is ONLY in the FROM clause. c.f. transformTableEntry, called from transformFromClauseItemGPOS_RAISE(gpdxl::ExmaDXL, gpdxl::ExmiQuery2DXLUnsupportedFeature,GPOS_WSZ_LIT("ONLY in the FROM clause"));}// construct table descriptor for the scan node from the range table entryCDXLTableDescr *dxl_table_descr = CTranslatorUtils::GetTableDescr(m_mp, m_md_accessor, m_context->m_colid_counter, rte, &m_context->m_has_distributed_tables);CDXLLogicalGet *dxl_op = NULL;const IMDRelation *md_rel = m_md_accessor->RetrieveRel(dxl_table_descr->MDId());if (IMDRelation::ErelstorageExternal == md_rel->RetrieveRelStorageType()){dxl_op = GPOS_NEW(m_mp) CDXLLogicalExternalGet(m_mp, dxl_table_descr);}else{dxl_op = GPOS_NEW(m_mp) CDXLLogicalGet(m_mp, dxl_table_descr);}CDXLNode *dxl_node = GPOS_NEW(m_mp) CDXLNode(m_mp, dxl_op);// make note of new columns from base relationm_var_to_colid_map->LoadTblColumns(m_query_level, rt_index, dxl_table_descr);// make note of the operator classes used in the distribution keyNoteDistributionPolicyOpclasses(rte);return dxl_node;
}

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/13724.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

51:电机(ULN2003D)

1:介绍 我们51单片机使用的是直流电机 直流电机是一种将电能转换为机械能的装置。一般的直流电机有两个电极&#xff0c;当电极正接时&#xff0c;电机正转&#xff0c;当电极反接时&#xff0c;电机反转 直流电机主要由永磁体&#xff08;定子&#xff09;、线圈&#xff08;转…

flutter:角标

角标应该非常常见了&#xff0c;以小说app为例&#xff0c;通常会在小说封面的右上角上显示当前未读的章数。 badges 简介 Flutter的badges库是一个用于创建徽章组件的开源库。它提供了简单易用的API&#xff0c;使开发者可以轻松地在Flutter应用程序中添加徽章效果。 官方文…

IDEA 使用 maven 搭建 spring mvc

1. 创建项目 1.1 创建成功之后配置 Spring MVC 1.2 勾选 Spring MVC 2.更改配置文件 2.1 更改web.xml配置 更改为 <servlet-mapping><servlet-name>dispatcher</servlet-name><url-pattern>/</url-pattern></servlet-mapping>2.2 dispat…

linux查看服务器系统版本命令

有时我们需要在linux服务器上安装DB、Middleware等&#xff0c;为了保证兼容性&#xff0c;我们需要知晓被提供的linux服务器版本是否满足需求&#xff0c;下面就说一说linux查看服务器系统版本命令。 1.cat /etc/redhat-release 适用于&#xff1a;rhel/centos等 2.cat /etc…

3ds max 烘培世界坐标到贴图/顶点色

设置Diffuse 为ObjectNormal Normalize(objectNormal) * 0.5 0.5 把Diffuse烘培到顶点色 烘培Diffuse到贴图 模型按UV展开 右键复制 &#xff0c; 到mesh上粘贴 烘培到贴图 UE使用 贴图导入为BC7 float3 n ObjectNormal*2-1; return float3(n.x,n.z,n.y); // x ,z ,y

工业平板电脑优化汽车工厂的生产流程

汽车行业一直是自动化机器人系统的早期应用领域之一。通过使用具有高负载能力和远程作用的大型机械臂&#xff0c;汽车装配工厂可以实现点焊、安装挡风玻璃、安装车轮等工作&#xff0c;而较小的机械手则用于焊接和安装子组件。使用机器人系统不仅提高了生产效率&#xff0c;还…

原生求生记:揭秘UniApp的原生能力限制

文章目录 1. 样式适配问题2. 性能问题3. 原生能力限制4. 插件兼容性问题5. 第三方组件库兼容性问题6. 全局变量污染7. 调试和定位问题8. 版本兼容性问题9. 前端生态限制10. 文档和支持附录&#xff1a;「简历必备」前后端实战项目&#xff08;推荐&#xff1a;⭐️⭐️⭐️⭐️…

【Vue3】递归组件

1. 递归组件mock数据 App.vue <template><div><Tree :data"data"></Tree></div> </template><script setup lang"ts"> import { reactive } from vue; import Tree from ./components/Tree.vue; interface Tr…

路由的配置

1、在router中设置路由导航跳转函数,在index.js文件中写这句话&#xff1a; 1.1 只要发生跳转, 就会调用这个函数&#xff1a; 1.2 导航的声明函数 2、访问系统访问控制系统如何形成 3、来一个导航守卫的案例&#xff1a;看看导航守卫的案例&#xff0c;写一个Main.Vue 和login…

sky-notes-02

11、HttpClient HttpClient作用&#xff1a; 发送HTTP请求接收响应数据 HttpClient的maven坐标&#xff1a; <dependency><groupId>org.apache.httpcomponents</groupId><artifactId>httpclient</artifactId><version>4.5.13</vers…

docker启动容器报错

报错信息 [rootDream soft]# docker run -it -d -p 8080:8080 tomcat eec9fab6b9ca06d2bbf1467aef05d8020ee60448978e10ac20c38888934f0a0b docker: Error response from daemon: driver failed programming external connectivity on endpoint hungry_euclid (163242f0079e72…

关于c++中虚函数和虚函数表的创建时机问题

以这段代码为例。 #include <iostream>using namespace std;class Parent { public:Parent(){}virtual void func1() {};virtual void func2() {}; };class Child :public Parent { public:Child():n(0),Parent(){cout << "Child()" << endl;}vir…

如何建立ftp server?快解析内网穿透实现外网直接访问

serveru是一款由Rob Beckers开发的获奖的ftp服务器软件&#xff0c;全称为&#xff1a;serv-u ftp server&#xff0c;它功能强大又易于使用。ftp服务器用户通过它用ftp协议能在internet上共享文件。serv-u不仅100%遵从通用ftp标准&#xff0c;也包括众多的独特功能可为每个用户…

音视频——视频流H264编码格式

1 H264介绍 我们了解了什么是宏快&#xff0c;宏快作为压缩视频的最小的一部分&#xff0c;需要被组织&#xff0c;然后在网络之间做相互传输。 H264更深层次 —》宏块 太浅了 ​ 如果单纯的用宏快来发送数据是杂乱无章的&#xff0c;就好像在没有集装箱 出现之前&#xff0c;…

【Rust教程 | 基础系列 | Rust初相识】Rust简介与环境配置

教程目录 前言一&#xff0c;Rust简介1&#xff0c;Rust的历史2&#xff0c;Rust的特性3&#xff0c;为什么选择Rust 二&#xff0c; Rust环境配置1&#xff0c;windows11安装2&#xff0c;Linux安装 三&#xff0c;安装IDE 前言 Rust是一种系统编程语言&#xff0c;专注于速度…

U盘安装CentOS7.9出错:进入 dracut问题和解决方法

U盘安装CentOS7.9出错&#xff1a;进入 dracut问题和解决方法 原因&#xff1a;U盘名称未识别&#xff0c; 解决&#xff1a;进入启动界面&#xff0c;按e进入编辑界面 修改&#xff1a; vmlinuz initrdinitrd.img inst.stage2hd:LABELCentOS\x207\x20x86_64.check quiet 为 …

Linux 之 systemctl

systemctl 可以控制软件&#xff08;一般指服务&#xff09;的启动、关闭、开机自启动 能被systemctl 管理的软件&#xff0c;一般也称 服务 系统内置服务均可被 systemctl 控制第三方软件&#xff0c;如果 自动注册了 可被systemctl 控制第三方软件&#xff0c;如果没有自动…

【业务功能篇60】Springboot + Spring Security 权限管理 【终篇】

4.4.7 权限校验扩展 4.4.7.1 PreAuthorize注解中的其他方法 hasAuthority&#xff1a;检查调用者是否具有指定的权限&#xff1b; RequestMapping("/hello")PreAuthorize("hasAuthority(system:user:list)")public String hello(){return "hello Sp…

【AutoGluon_03】保存模型并调用模型

在训练好autogluon模型之后&#xff0c;可以将模型进行保存。之后当有新的数据需要使用autogluon进行预测的时候&#xff0c;就可以直接加载原来训练好的模型进行训练。 import pandas as pd from sklearn.model_selection import train_test_split from autogluon.tabular im…

SpringSecurity的实现

SpringSecurity的实现 1.依赖 security起步依赖 redis起步依赖 fastjson jjwt生成token mybatis-plus起步依赖 mysql连接 web起步 test起步 <!-- security启动器 --><dependency><groupId>org.springframework.boot</groupId><arti…