探索ClickHouse——使用Projection加速查询

在测试Projection之前,我们需要先创建一张表,并导入大量数据。
我们可以直接使用指令,从URL指向的文件中获取内容并导入表。但是担心网络不稳定,我们先将文件下载下来。

下载文件

wget wget http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-complete.csv .

检查文件

wc -l pp-complete.csv 

28497127 pp-complete.csv

ll pp-complete.csv

-rw-rw-r-- 1 fangliang fangliang 4982107267 Aug 29 05:13 pp-complete.csv

即这个文件约有2850万行,占4个多G磁盘。

移动文件

su root
cp pp-complete.csv /var/lib/clickhouse/user_files/
exit

创建表

查看文件

使用下面指令查看文件内容

head -10 pp-complete.csv 
"{F887F88E-7D15-4415-804E-52EAC2F10958}","70000","1995-07-07 00:00","MK15 9HP","D","N","F","31","","ALDRICH DRIVE","WILLEN","MILTON KEYNES","MILTON KEYNES","MILTON KEYNES","A","A"
"{40FD4DF2-5362-407C-92BC-566E2CCE89E9}","44500","1995-02-03 00:00","SR6 0AQ","T","N","F","50","","HOWICK PARK","SUNDERLAND","SUNDERLAND","SUNDERLAND","TYNE AND WEAR","A","A"
"{7A99F89E-7D81-4E45-ABD5-566E49A045EA}","56500","1995-01-13 00:00","CO6 1SQ","T","N","F","19","","BRICK KILN CLOSE","COGGESHALL","COLCHESTER","BRAINTREE","ESSEX","A","A"
"{28225260-E61C-4E57-8B56-566E5285B1C1}","58000","1995-07-28 00:00","B90 4TG","T","N","F","37","","RAINSBROOK DRIVE","SHIRLEY","SOLIHULL","SOLIHULL","WEST MIDLANDS","A","A"
"{444D34D7-9BA6-43A7-B695-4F48980E0176}","51000","1995-06-28 00:00","DY5 1SA","S","N","F","59","","MERRY HILL","BRIERLEY HILL","BRIERLEY HILL","DUDLEY","WEST MIDLANDS","A","A"
"{AE76CAF1-F8CC-43F9-8F63-4F48A2857D41}","17000","1995-03-10 00:00","S65 1QJ","T","N","L","22","","DENMAN STREET","ROTHERHAM","ROTHERHAM","ROTHERHAM","SOUTH YORKSHIRE","A","A"
"{709FB471-3690-4945-A9D6-4F48CE65AAB6}","58000","1995-04-28 00:00","PE7 3AL","D","Y","F","4","","BROOK LANE","FARCET","PETERBOROUGH","PETERBOROUGH","CAMBRIDGESHIRE","A","A"
"{5FA8692E-537B-4278-8C67-5A060540506D}","19500","1995-01-27 00:00","SK10 2QW","T","N","L","38","","GARDEN STREET","MACCLESFIELD","MACCLESFIELD","MACCLESFIELD","CHESHIRE","A","A"
"{E78710AD-ED1A-4B11-AB99-5A0614D519AD}","20000","1995-01-16 00:00","SA6 5AY","D","N","F","592","","CLYDACH ROAD","YNYSTAWE","SWANSEA","SWANSEA","SWANSEA","A","A"
"{1DFBF83E-53A7-4813-A37C-5A06247A09A8}","137500","1995-03-31 00:00","NR2 2NQ","D","N","F","26","","LIME TREE ROAD","NORWICH","NORWICH","NORWICH","NORFOLK","A","A"

使用客户端连接服务端

clickhouse-client

创建表

CREATE TABLE uk_price_paid ( price UInt32, date Date, postcode1 LowCardinality(String), postcode2 LowCardinality(String), type Enum8('terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4, 'other' = 0), is_new UInt8, duration Enum8('freehold' = 1, 'leasehold' = 2, 'unknown' = 0), addr1 String, addr2 String, street LowCardinality(String), locality LowCardinality(String), town LowCardinality(String), district LowCardinality(String), county LowCardinality(String) ) ENGINE = MergeTree ORDER BY (postcode1, postcode2, addr1, addr2);

导入数据

INSERT INTO uk_price_paid WITH splitByChar(' ', postcode) AS p SELECT toUInt32(price_string) AS price, parseDateTimeBestEffortUS(time) AS date, p[1] AS postcode1, p[2] AS postcode2, transform(a, ['T', 'S', 'D', 'F', 'O'], ['terraced', 'semi-detached', 'detached', 'flat', 'other']) AS type, b = 'Y' AS is_new, transform(c, ['F', 'L', 'U'], ['freehold', 'leasehold', 'unknown']) AS duration, addr1, addr2, street, locality, town, district, county FROM file( 'pp-complete.csv', 'CSV', 'uuid_string String, price_string String, time String, postcode String, a String, b String, c String, addr1 String, addr2 String, street String, locality String, town String, district String, county String, d String, e String' );

在这里插入图片描述
整个处理速度大概是210 thousand rows/s,36.5MB/s。

INSERT INTO uk_price_paid WITH splitByChar(’ ', postcode) AS p
SELECT
toUInt32(price_string) AS price,
parseDateTimeBestEffortUS(time) AS date,
p[1] AS postcode1,
p[2] AS postcode2,
transform(a, [‘T’, ‘S’, ‘D’, ‘F’, ‘O’], [‘terraced’, ‘semi-detached’, ‘detached’, ‘flat’, ‘other’]) AS type,
b = ‘Y’ AS is_new,
transform(c, [‘F’, ‘L’, ‘U’], [‘freehold’, ‘leasehold’, ‘unknown’]) AS duration,
addr1,
addr2,
street,
locality,
town,
district,
county
FROM file(‘pp-complete.csv’, ‘CSV’, ‘uuid_string String, price_string String, time String, postcode String, a String, b String, c String, addr1 String, addr2 String, street String, locality String, town String, district String, county String, d String, e String’)
Query id: 32a2a670-8417-470d-ab26-6368dd1725e5
Ok.
0 rows in set. Elapsed: 140.063 sec. Processed 28.50 million rows, 4.98 GB (203.46 thousand rows/s., 35.57 MB/s.)

检查数据

检查数据行数

SELECT count() From uk_price_paid;

SELECT count()
FROM uk_price_paid
Query id: 2d05b3f1-c683-4f2d-bcaf-e05b777eb3f8
┌──count()───┐
│ 28497127 │
└──────────┘
1 row in set. Elapsed: 0.005 sec.

一共有28,497,127行数据,和文件中行数一致。

检查所占磁盘

SELECT formatReadableSize(total_bytes) FROM system.tables WHERE name = 'uk_price_paid';

SELECT formatReadableSize(total_bytes)
FROM system.tables
WHERE name = ‘uk_price_paid’
Query id: 7cca5694-6d15-4f38-8f8d-ef8331a4caa3
┌─formatReadableSize(total_bytes)─┐
│ 308.18 MiB │
└──────────────────────┘
1 row in set. Elapsed: 0.007 sec.

和之前文件4G多大小对比,减少了9/10,这个比例是相当大的。

查询

SELECT toYear(date), district, town, avg(price), sum(price), count() FROM uk_price_paid  GROUP BY toYear(date), district, town;

80441 rows in set. Elapsed: 2.114 sec. Processed 28.50 million rows, 284.78 MB (13.48 million rows/s., 134.71 MB/s.)

新增PROJECTION

使用下面指令给toYear(date), district, town创建一个PROJECTION ,这样之后插入的数据就会被自动优化。

ALTER TABLE uk_price_paid ADD PROJECTION projection_by_year_district_town(SELECT toYear(date), district, town, avg(price), sum(price), count() GROUP BY toYear(date), district, town);

ALTER TABLE uk_price_paid
ADD PROJECTION projection_by_year_district_town
(
SELECT
toYear(date),
district,
town,
avg(price),
sum(price),
count()
GROUP BY
toYear(date),
district,
town
)
Query id: 3c5ca13e-4805-412c-845a-ab18c411261c
Ok.
0 rows in set. Elapsed: 0.007 sec.

然后使用下面指令修改现有数据

ALTER TABLE uk_price_paid MATERIALIZE PROJECTION projection_by_year_district_town SETTINGS mutations_sync = 1;

ALTER TABLE uk_price_paid
MATERIALIZE PROJECTION projection_by_year_district_town
SETTINGS mutations_sync = 1
Query id: 7bd22c05-c74c-4972-be6d-174eaf99c498
Ok.
0 rows in set. Elapsed: 0.183 sec.

优化后查询

80441 rows in set. Elapsed: 0.170 sec. Processed 92.93 thousand rows, 5.76 MB (548.06 thousand rows/s., 33.98 MB/s.)

可以看到时间也缩短到未优化的1/10。

参考资料

  • https://clickhouse.com/docs/zh/getting-started/example-datasets/uk-price-paid

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/88212.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

项目开发中使用Date和LocalDateTime处理日期

文章目录 项目开发中使用Date和LocalDateTime处理日期Date类型验证数据库表设计(年月日情况)实体类说明映射文件xml响应展示情况注意事项 LocalDateTime验证数据库设计实体类日期类型动态SQL日期类型响应展示情况 总结 项目开发中使用Date和LocalDateTim…

Windows10/11显示文件扩展名 修改文件后缀名教程

前言 写这篇文章的原因是由于我分享的教程中的文件、安装包基本都是存在阿里云盘的,下载后需要改后缀名才能使用。 但是好多同学不会改。。 Windows 10 随便打开一个文件夹,在上方工具栏点击 “查看”点击 “查看” 后下方会显示更详细的工具栏然后点…

SPA项目之登录注册--请求问题(POSTGET)以及跨域问题

🥳🥳Welcome Huihuis Code World ! !🥳🥳 接下来看看由辉辉所写的关于VueElementUI的相关操作吧 目录 🥳🥳Welcome Huihuis Code World ! !🥳🥳 一.ElementUI是什么 💡…

ubuntu下用pycharm专业版连接AI服务器及其docker环境

一:用pycharm专业版连接AI服务器 1、首先在自己电脑上新建一个文件夹,后续用于映射服务器上自己所要用的项目文件 2、用pycharm专业版打开该文件夹,作为一个项目打开 3、然后在工具->部署->配置 4、配置中形式如下: 点击左…

Docker 入门 (详细命令讲解)

1.1 容器简介 1.1.1 什么是 Linux 容器 Linux容器是与系统其他部分隔离开的一系列进程,从另一个镜像运行,并由该镜像提供支持进程所需的全部文件。容器提供的镜像包含了应用的所有依赖项,因而在从开发到测试再到生产的整个过程中&#xff0…

五、点击切换、滚动切换、键盘切换

简介 通过事件改变当前展示的信息组件,交互的事件有点击上下切换、鼠标轮动上下切换、键盘上下键切换。欢迎访问个人的简历网站预览效果 本章涉及修改与新增的文件:App.vue、public 一、鼠标点击上下箭头切换 <template><div class="app-background"…

Nodejs基于Vue.js编程语言在线学习平台的设计与实现5y4p2

本编程语言在线学习平台是为了提高用户查阅信息的效率和管理人员管理信息的工作效率&#xff0c;可以快速存储大量数据&#xff0c;还有信息检索功能&#xff0c;这大大的满足了用户和管理员这二者的需求。操作简单易懂&#xff0c;合理分析各个模块的功能&#xff0c;尽可能优…

【慕伏白教程】 Linux 深度学习服务器配置指北

文章目录 镜像烧录系统安装系统配置常用包安装 镜像烧录 下载 Ubuntu 镜像 Ubuntu 桌面版 下载烧录工具 balenaEtcher 准备至少 8G 的 空白U盘 开始烧录 系统安装 开机进入BIOS&#xff0c;修改U盘为第一启动 选择 Try or Install Ubuntu 往下拉&#xff0c;选择 中文&a…

Redis 线程模式

Redis 是单线程吗&#xff1f; Redis 单线程指的是 [接收客户端请求 -> 解析请求 -> 进行数据读写操作 -> 发送数据给客户端] 这个过程是由一个线程 (主线程) 来完成的&#xff0c;这也是常说的 Redis 是单线程的原因。 但是 &#xff0c;Redis 程序不是单线程的&am…

OpenCV 实现 SIFT→SURF 算法关键点检测实现

目录 1&#xff0c;SIFT算法原理 1.1&#xff0c;基本流程 1.1.1 尺度空间极值检测 1.1.2 关键点定位 1.1.3 关键点方向确定 1.1.4 关键点描述 1.1.5 总结 1.2 SURF原理 2 代码实现 3 结果展示 4&#xff0c;你肯定会遇到报错 cv2.error: OpenCV(3.4.8) C…

Spring面试题8:面试官:说一说Spring的BeanFactory

该文章专注于面试,面试只要回答关键点即可,不需要对框架有非常深入的回答,如果你想应付面试,是足够了,抓住关键点 面试官:说一说Spring的BeanFactory Spring的BeanFactory是Spring框架的核心容器,负责管理和创建Bean对象。它是一个工厂类,用于实例化、配置和管理Bean的…

nodejs+vue 医院病历管理系统

系统使用权限分别包括管理员、病人和医生&#xff0c;其中管理员拥有着最大的权限&#xff0c;同时管理员的功能模块也是最多的&#xff0c;管理员可以对系统上所有信息进行管理。用户可以修改个人信息&#xff0c;对医院病历信息进行查询&#xff0c;对住院信息进行添加、修改…

权威认可!安全狗获CNVD“漏洞信息报送贡献单位”殊荣

9月24日&#xff0c;国家信息安全漏洞共享平台公布了2022年度CNVD支撑单位年度工作情况及优秀单位个人表彰名单。 作为国内云原生安全领导厂商&#xff0c;安全狗入选漏洞信息报送贡献单位。 厦门服云信息科技有限公司&#xff08;品牌名&#xff1a;安全狗&#xff09;成立于…

tp5连接多个数据库

一、如果你的主数据库配置文件都在config.php里 直接在config.php中中定义db2&#xff1a; 控制器中打印一下&#xff1a; <?php namespace app\index\controller; use think\Controller; use think\Db; use think\Request; class Index extends Controller {public fun…

win10,WSL的Ubuntu配python3.7手记

1.装linux 先在windows上安装WSL版本的Ubuntu Windows10系统安装Ubuntu子系统_哔哩哔哩_bilibili &#xff08;WSL2什么的一直没搞清楚&#xff09; 图形界面会出一些问题&#xff0c;注意勾选ccsm出的界面设置 win10安装Ubuntu16.04子系统&#xff0c;并开启桌面环境_win…

记录下电脑windows安装Tina的过程

下面图片记录windows下安装电路仿真软件Tina的整个过程。 首先肯定下载安装包 然后就一直点下一步下一步 这里随便填一下用户名和公司名称 默认安装位置是C盘&#xff0c;如果C盘空间不够&#xff0c;可以修改安装位置 然后继续下一步下一步 这里不知道什么意思&#xff…

【红日靶场】vulnstack1-完整渗透过程

目录 下载地址红日靶场基本环境配置攻击思维导图网络结构 系统环境配置外网打点对phpmyadmin渗透对zzcms的渗透&#xff1a;getshell失败案例getshell成功案例模版制作&#xff1a;应用导入上传&#xff1a;其他方式&#xff1a; 内网渗透信息收集msf上线&#xff1a;搭建隧道内…

LeetCode刷题

一 螺旋矩阵 题目链接&#xff1a;59. 螺旋矩阵 II - 力扣&#xff08;LeetCode&#xff09; 题目描述&#xff1a; 给你一个正整数 n &#xff0c;生成一个包含 1 到 n2 所有元素&#xff0c;且元素按顺时针顺序螺旋排列的 n x n 正方形矩阵 matrix 。 示例 1&#xff1a;…

算法基础之归并排序

一、归并排序的形象理解 原题链接 示例代码 void merge_sort(int q[], int l, int r) {if (l > r) return;int mid l r >> 1;merge_sort(q, l, mid), merge_sort(q, mid 1, r);int k 0, i l, j mid 1;while (i < mid && j < r) //第一处if (q[i]…

计算机类软件方向适合参加的比赛

前言 博主是一名计算机专业的大三学生&#xff0c;在校时候参加了很多比赛和训练营&#xff0c;现在给大家博主参加过的几个的比赛&#xff0c;希望能给大一大二的学生提供一点建议。 正文 最近也有比赛的&#xff0c;我会从时间线上来给大家推荐一些比赛&#xff0c;并且给…