第2.2章 StarRocks表设计——排序键和数据模型

该篇文章介绍StarRocks-2.5.4版本的数据模型相关内容，有误请指出~

一、数据模型概述

1.1 四种模型

1.2 排序键

1.2.1 概述

1.2.2 分类

1.2.3 注意事项

二、明细模型

2.1 概述

2.2 适用场景

2.3 建表语句及说明

三、聚合模型

3.1 概述

3.2 适用场景

3.3 聚合原理

3.3 建表语句及说明

四、更新模型

4.1 概述

4.2 适用场景

4.3 更新原理

4.4 建表语句及说明

五、主键模型

5.1 概述

5.2 适用场景

5.3 更新原理

5.4 建表语句及说明

一、数据模型概述

在 StarRocks中，数据以表（Table）的形式进行逻辑上的描述。一张表包括行（Row）和列（Column）。Row 即用户的一行数据，Column 用于描述一行数据中不同的字段。

Column可以分为两大类：Key和Value，从业务角度看，Key 和 Value分别对应维度列和指标列。StarRocks的key列是建表语句中指定的列，建表语句中的关键字 'duplicate key'、'aggregate key'、'unique key'、' primary key' 后面的列就是Key列，除了 Key列剩下的就是Value列。

1.1 四种模型

Duplicate Key Model：明细模型
Aggregate Key Model：聚合模型
Unique Key Model：更新模型
Primary Key Model：主键模型

1.2 排序键

1.2.1 概述

StarRocks在创建表的时候，可以指定一个列或者多个列（一般来说前三列）作为这个表的排序键（Sort Key），当数据导入时，数据会按照排序键的定义，按照顺序存储在磁盘空间上，当查询根据这些排序字段进行查询时，就能够根据已经排好序的数据，快速定位到你要查询的对应数据集所对应的磁盘地址，在scan阶段就能够大面积减少无关数据，加速查询。

直观来看，各个模型的排序键就是建表语句中duplicate key、aggregate key、unique key或primary key后面指定的列。但是四种模型的排序键还是有一些区别：

1.2.2 分类

明细模型：明细模型排序键写法比较灵活，可以指定部分的维度列为排序键。可以使用duplicate key()显式定义排序键。如果省略duplicate key(列1,列2……)时，默认选择表的前三列作为排序键。在建表语句中，排序键必须定义在其他列之前。指定排序键的时候，列的顺序要和建表语句中的相同，否则建表语句会报错。

#建表语句：
create table if not exists test1 (event_time datetime not null comment "datetime of event",event_type int not null comment "type of event",user_id int comment "id of user",channel int comment ""
)
duplicate key(event_time, event_type,user_id)
distributed by hash(user_id) buckets 10;#===如果使用duplicate key()显式定义排序键，单从建表不报错的角度，可以有四种组合：
event_time
event_time, event_type
event_time, event_type, user_id
event_time, event_type, user_id, channel#===如果省略duplicate key(列1,列2……)，默认选择表的前三列作为排序键。
create table if not exists test1 (event_time datetime not null comment "datetime of event",event_type int not null comment "type of event",user_id int comment "id of user",channel int comment ""
)
distributed by hash(user_id) buckets 10;
#等价于：
create table if not exists test1 (event_time datetime not null comment "datetime of event",event_type int not null comment "type of event",user_id int comment "id of user",channel int comment ""
)
duplicate key(event_time, event_type,user_id)
distributed by hash(user_id) buckets 10;

聚合表：据按照排序键aggregate key聚合后排序，排序键需要满足唯一性约束，并且需要按建表顺序指定所有的维度列。

#建表语句：
create table if not exists test2(site_id largeint not null comment "id of site",date date not null comment "time of event",city_code varchar(20) comment "city_code of user",pv bigint sum default "0" comment "total page views"
)
aggregate key(site_id, date, city_code)
distributed by hash(site_id)
properties (
"replication_num" = "3"
);#排序键必须满足唯一性约束，并且需要按建表顺序指定所有的维度列
#上述的排序键是site_id, date, city_code,指标键是pv #  上述的建表语句可以简写为：
create table if not exists test2(site_id largeint not null comment "id of site",date date not null comment "time of event",city_code varchar(20) comment "city_code of user",pv bigint sum default "0" comment "total page views"
)
distributed by hash(site_id)
properties (
"replication_num" = "3"
);

更新模型：更新模型的排序键（也称主键）只有一种写法，就是在unique key()的括号中指定，并且排序键需要满足唯一性约束。

#建表语句：
create table if not exists test3(create_time date not null comment "create time of an order",order_id bigint not null comment "id of an order",order_state int comment "state of an order",total_price bigint comment "price of an order"
)
unique key(create_time, order_id)
distributed by hash(order_id) buckets 8
properties (
"replication_num" = "3"
); #上述代码，排序键是create_time, order_id
将经常使用的过滤字段订单创建时间create_time、订单编号order_id 作为主键(也是排序键)，其余列订单状态 order_state和订单总价total_price作为指标列

更新模型和主键模型的排序键只有一种写法，就是在UNIQUE KEY()的括号中指定。以table04为例，建表时排序键语句为UNIQUE KEY(create_time, order_id)，则用于排序的列就是create_time和order_id。更新模型/主键模型的排序键必需显式指定，不能省略不写。

主键模型：主键模型的排序键在primary key()括号中指定，并且排序键需要满足唯一性约束。

1.2.3 注意事项

在建表语句中，排序键必须定义在其他列之前
指定排序键的时候，列的顺序要和建表语句中的相同，否则建表语句会报错
在创建表时，可以将一个或多个列定义为排序键。排序键在建表语句中的出现次序，为数据存储时多重排序的次序
排序键不要包含过多的列。如果选择了大量的列用于排序，那么排序的开销会导致数据导入的时间和资源使用增加
排序键的选择需要结合查询业务场景，建表时可以将经常作为查询条件的列指定为排序键。当排序键涉及多个列的时候，我们要将区分度高、经常查询的列建议放在前面。

二、明细模型

2.1 概述

明细模型是StarRocks中最常用的数据模型，适用于既没有聚合需求，又没有主键唯一性约束的原始数据的存储。在该模型下，即便导入两条完全相同的数据，StarRocks也会将数据原封不动的保存进表。

2.2 适用场景

明细模型通常用于追加式的数据写入，比较适合：

查询方式灵活，不需要局限于预聚合的分析方式
旧数据不会更新，只会追加新的数据

2.3 建表语句及说明

#  建表语句如下
create table if not exists detail (event_time datetime not null comment "datetime of event",event_type int not null comment "type of event",user_id int comment "id of user",device_code int comment "device code",channel int comment ""
)
duplicate key(event_time, event_type)
distributed by hash(user_id)
properties (
"replication_num" = "3"
);#使用duplicate keY(event_time, event_type,user_id )显式的说明采用明细模型
#指定event_time、event_type和user_id 作为排序键
#user_id作为分桶键，全表只有一个分区

建表说明：

建表时必须使用distributed by hash子句指定分桶键，否则建表失败
在建表语句中，排序键必须定义在其他列之前，上述建表语句中排序键为 event_time和 event_type
明细模型中的排序键可以为部分或全部维度列;
在省略duplicate key(列1,列2……)时，默认选择表的前三列作为排序键

#  上述的建表语句可以简写为：
create table if not exists detail (event_time datetime not null comment "datetime of event",event_type int not null comment "type of event",user_id int comment "id of user",device_code int comment "device code",channel int comment ""
)
distributed by hash(user_id)
properties (
"replication_num" = "3"
);

三、聚合模型

3.1 概述

建表时定义排序键（维度列key）和指标列（指标列value），并为指标列指定聚合函数。聚合模型会在数据导入时将维度列相同的数据，根据指标列设定的聚合函数进行聚合，最终表格中只会保留聚合后的数据。

3.2 适用场景

分析统计和汇总数据，例如：用户的访问总时长、访问总次数
不需要查询原始的明细数据

3.3 聚合原理

数据的聚合，在StarRocks中有如下三个阶段发生，聚合模型的实现方式是读时合并（merge on read)。

ps： 这种实现方式的表简称为Mor 表，Mor 表是指在导入数据时，不会对数据进行合并，而是在查询时动态合并数据。这种方式可以提高导入速度，但是会增加查询开销。虽然写入时处理简单高效，但是查询时需要在线聚合多版本。并且由于 Merge 算子的存在，谓词和索引无法下推，严重影响了查询性能。

每一批次数据导入的 ETL 阶段：每一个批次的数据形成一个版本version，在一个版本中，同一个排序键的数据内部进行聚合
底层BE进行数据 Compaction 的阶段：BE 会对已导入的多版本的文件定期合并成一个大版本文件
数据查询阶段：对于查询涉及到的数据，所有版本的同一排序键的数据进行聚合，然后再返回查询最终结果

3.3 建表语句及说明

#分析某一段时间内，来自不同城市的用户，访问不同网页的总次数
create table if not exists aggregate_tbl (site_id largeint not null comment "id of site",date date not null comment "time of event",city_code varchar(20) comment "city_code of user",pv bigint sum default "0" comment "total page views"
)
aggregate key(site_id, date, city_code)
distributed by hash(site_id)
properties (
"replication_num" = "3"
);#排序键必须满足唯一性约束，并且需要按建表顺序指定所有的维度列
#上述的排序键是site_id, date, city_code

建表说明：

建表时必须使用distributed by hash子句指定分桶键，否则建表失败。
排序键：在建表语句中，排序键必须定义在其他列之前。排序键可以通过aggregate key显式定义，上述建表语句中排序键为site_id、date和city_code ，指标列是pv。
如果不通过aggregate key显示定义排序键，则默认除指标列之外的列均为排序键。

#  上述的建表语句可以简写为：
create table if not exists aggregate_tbl (site_id largeint not null comment "id of site",date date not null comment "time of event",city_code varchar(20) comment "city_code of user",pv bigint sum default "0" comment "total page views"
)
distributed by hash(site_id)
properties (
"replication_num" = "3"
);

指标列：通过在列名后指定聚合函数，定义该列为指标列，一般为需要汇总统计的数据。
聚合函数：指标列使用的聚合函数，例如sum，max等。
查询时，排序键的过滤在多版本的聚合之前进行，而指标列的过滤在多版本的聚合之后。因此建表可以将频繁使用的过滤字段作为排序键，这样在对数据聚合之前，就可以先过滤一批数据，提升查询性能。

四、更新模型

4.1 概述

建表时，支持定义主键和指标列，查询时返回主键相同的一组数据中的最新数据。

明细模型会将所有写入的数据保留，聚合模型是对写入的数据进行聚合处理，而更新模型的特点是只保留相同主键下最新导入的数据。在更新模型中，排序键构成表的唯一性约束，成为我们常说的“主键”。

4.2 适用场景

实时和频繁更新的业务场景，例如电商场景中，订单状态经常变化，每天的订单更新量可能会突破上亿。

4.3 更新原理

更新模型本质上是聚合模型的一个特例，更新模型的指标列指定的聚合函数为replace，返回具有相同主键的一组数据中的最新数据。聚合模型的实现方式是读时合并（merge on read),Unique模型新的实现方式也是读时合并（merge on read）。

4.4 建表语句及说明

#在电商订单分析场景中，经常按照日期对订单状态进行统计分析
create table if not exists orders (create_time date not null comment "create time of an order",order_id bigint not null comment "id of an order",order_state int comment "state of an order",total_price bigint comment "price of an order"
)
unique key(create_time, order_id)
distributed by hash(order_id) buckets 8
properties (
"replication_num" = "3"
); #既能够满足实时更新订单状态的需求，又能够在查询中进行快速过滤#将经常使用的过滤字段订单创建时间create_time、订单编号order_id 作为主键，其余列订单状态 order_state和订单总价total_price作为指标列

建表说明：

建表时必须使用distributed by hash子句指定分桶键，否则建表失败
在建表语句中，排序键（该模型中的排序键也称作主键）必须定义在其他列之前，上述建表语句中排序键（主键）为 create_time, order_id
主键必须满足唯一性约束
查询时，排序键（主键）的过滤在多版本的聚合之前进行，而指标列的过滤在多版本的聚合之后。因此建表可以将频繁使用的过滤字段作为排序键，这样在对数据聚合之前，就可以先过滤一批数据，提升查询性能。

五、主键模型

5.1 概述

建表时，支持定义主键和指标列，查询时返回主键相同的一组数据中的最新数据。主键模型和更新模型的区别在于：更新模型的实现方式是读时合并（merge on read)，简称Mor 。Primary 模型实现方式是写时合并（merge on write)，简称Mow。聚合模型和更新模型都不支持update功能，主键模型通过Delete+Insert 的策略，实现update功能。

ps：（更新模型）Mor 表是指在导入数据时，不会对数据进行合并，而是在查询时动态合并数据。这种方式可以提高导入速度，但是会增加查询开销虽然写入时处理简单高效，但是查询时需要在线聚合多版本。并且由于 Merge 算子的存在，谓词和索引无法下推，严重影响了查询性能。

（主键模型）Mow表是指在导入数据时，会对数据进行合并，保证每个 key 值只有一条记录，即数据在导入阶段就将被覆盖和被更新的数据进行标记删除，同时将新的数据写入新的文件。在查询的时候，所有被标记删除的数据都会在文件级别被过滤掉，读取出来的数据就都是最新的数据，消除掉了读时合并中的数据聚合过程。这种方式可以提高查询速度，但是会增加导入开销。相对于更新模型，主键模型在查询时不需要执行聚合操作，并且支持谓词和索引下推。

5.2 适用场景

主键模型适用于实时和频繁更新的场景，例如：

实时对接事务型数据至 StarRocks：事务型数据库中，除了插入数据外，一般还会涉及较多更新和删除数据的操作
支持部分列更新轻松实现多流 JOIN：主键模型的部分列更新功能就很好地满足这种需求，不同业务直接各自按需更新与业务相关的列即可，并且继续享受主键模型的实时同步增删改数据及高效的查询性能

5.3 更新原理

主键模型采用了 Delete+Insert 的策略，保证同一个主键下仅存在一条记录，这样就完全避免了 Merge 操作。主键模型实现方式是写时合并（merge on write)，即数据在导入阶段就将被覆盖和被更新的数据进行标记删除，同时将新的数据写入新的文件。在查询的时候，所有被标记删除的数据都会在文件级别被过滤掉，读取出来的数据就都是最新的数据，消除掉了读时合并中的数据聚合过程。写时合并（merge on write)的实现方式如下：

StarRocks 收到对某记录的更新操作时，会通过主键索引找到该条记录的位置，并对其标记为删除（旧记录标记删除Delete），再插入一条新的记录。相当于把Update改写为 Delete+Insert。
StarRocks 收到对某记录的删除操作时，会通过主键索引找到该条记录的位置，对其标记为删除（旧记录标记删除）。

5.4 建表语句及说明

# 需要实时分析用户情况，将user_id 作为主键，其余为指标列。建表语句如下：
create table users (user_id bigint not null,name string not null,email string null,address string null,age tinyint null,sex tinyint null,last_active datetime,property0 tinyint not null,property1 tinyint not null,property2 tinyint not null,property3 tinyint not null
) primary key (user_id)
distributed by hash(user_id) buckets 4
properties ("replication_num" = "3","enable_persistent_index" = "true"
);#分区列和分桶列必须在主键中，该表中的分桶键（分桶列）是user_id，全表只有一个分区

建表说明：