PostgreSQL 中,GROUPING SETS、CUBE 和 ROLLUP 的功能,允许在查询中更灵活地生成聚合结果,而不需要多次重写查询或使用复杂的 UNION 语句。这些功能在数据分析中特别有用,因为它们允许你以不同的维度对数据进行分组和聚合。
1、测试表及数据准备
1.1、json_to_recordset内置函数
It joins two functions into a single FROM target. json_to_recordset() is instructed to return three columns, the first integer , the second text and the third text.
The result of generate_series() is used directly. The ORDER BY clause sorts the column values as integers.
在 PostgreSQL 中 json_to_recordset 的内置函数,主要目的是将 JSON 数据转换为 SQL 可以查询的记录集(recordset)或表格式
1.2、创建测试表
CREATE table t_sales_table
as
SELECT *
FROM ROWS FROM(json_to_recordset('[{"sales":10,"b":"foo","size":"L"},{"sales":"20","b":"bar","size":"M"},{"sales":15,"b":"foo","size":"M"},{"sales":5,"b":"bar","size":"L"},{"sales":3,"b":"super","size":"L"}]')AS ( b TEXT,size TEXT,sales INTEGER),generate_series(1, 5) ) AS T1 (brand, c_size, c_qty,id)
ORDER BY id;select * from t_sales_table;superdb=# select * from t_sales_table;brand | c_size | c_qty | id
-------+--------+-------+----foo | L | 10 | 1bar | M | 20 | 2foo | M | 15 | 3bar | L | 5 | 4super | L | 3 | 5
(5 rows)
2、GROUPING SETS
GROUPING SETS 允许你指定多个分组条件,并一次性生成每个分组条件的结果。例如,如果你想根据 (brand ) 和 (c_size ) 对销售数量进行分组,并同时看到按brand、按c_size和整体的总销售数量,你可以使用 GROUPING SETS。
superdb=# SELECT brand, c_size, SUM(c_qty)
FROM t_sales_table
GROUP BY GROUPING SETS ((brand),(c_size),() -- 空集表示所有行的聚合,即整体的总销售额
)
ORDER BY brand,c_size;brand | c_size | sum
-------+--------+-----bar | | 25foo | | 25super | | 3| L | 18| M | 35| | 53
(6 rows)
3、使用多个SELECT语句进行分组 及UNION ALL 集合运算符
SELECT brand, NULL as c_size, sum(c_qty) as sum_c_qty FROM t_sales_table GROUP BY brand
UNION ALL
SELECT null as brand, c_size, sum(c_qty) as sum_c_qty FROM t_sales_table GROUP BY c_size
UNION ALL
SELECT NULL as brand, NULL as c_size, sum(c_qty) as sum_c_qty FROM t_sales_table;brand | c_size | sum_c_qty
-------+--------+-----------bar | | 25foo | | 25super | | 3| L | 18| M | 35| | 53
(6 rows)
4、CUBE and ROLLUP
4.1、CUBE会枚举指定列的所有可能组合作为Grouping Sets。
CUBE 是 GROUPING SETS 的一个超集,它会自动生成所有可能的分组组合(包括所有维度、每个维度、以及所有维度的组合)
https://www.postgresql.org/docs/current/queries-table-expressions.html#QUERIES-GROUPING-SETS
select brand,c_size,sum(c_qty) as sum_c_qty
from t_sales_table
superdb-# GROUP BY CUBE(brand,c_size);brand | c_size | sum_c_qty
-------+--------+-----------| | 53bar | M | 20super | L | 3foo | M | 15bar | L | 5foo | L | 10bar | | 25foo | | 25super | | 3| L | 18| M | 35
(11 rows)
CUBE ( a, b, c )
is equivalent to
GROUPING SETS (( a, b, c ),( a, b ),( a, c ),( a ),( b, c ),( b ),( c ),( )
)
4.2、ROLLUP会以按层级聚合的方式产生Grouping Sets
ROLLUP 是 GROUPING SETS 的一个子集,它生成从所有维度到每个维度的聚合,以及所有维度的组合的聚合。这与 CUBE 不同,因为 ROLLUP 不包括所有可能的分组组合。
select brand,c_size,sum(c_qty) as sum_c_qty
from t_sales_table
superdb-# GROUP BY ROLLUP(brand,c_size);brand | c_size | sum_c_qty
-------+--------+-----------| | 53bar | M | 20super | L | 3foo | M | 15bar | L | 5foo | L | 10bar | | 25foo | | 25super | | 3
(9 rows)
ROLLUP ( e1, e2, e3, ... )
it is equivalent to
GROUPING SETS (( e1, e2, e3, ... ),...( e1, e2 ),( e1 ),( )
)