Clickhouse数据库部署、Python3压测实践
一、Clickhouse数据库部署
-
版本:yandex/clickhouse-server:latest
-
部署方式:docker
-
内容
version: "3"services:clickhouse:image: yandex/clickhouse-server:latestcontainer_name: clickhouse ports:- "8123:8123"- "9000:9000"- "9009:9009"- "9004:9004"volumes:- ./data/config:/var/lib/clickhouseulimits:nproc: 65535nofile:soft: 262144hard: 262144healthcheck:test: ["CMD", "wget", "--spider", "-q", "localhost:8123/ping"]interval: 30stimeout: 5sretries: 3deploy:resources:limits:cpus: '4'memory: 4096Mreservations:memory: 4096M
-
建表语句
CREATE TABLE test_table (id int,feild1 String, feild2 String, feild3 String, feild4 String, feild5 String, feild6 String, feild7 String, feild8 String, feild9 String, feild10 String, feild11 String, feild12 String, feild13 String, feild14 String, feild15 String, feild16 String, feild17 String, feild18 String, feild19 String, feild20 String) ENGINE = MergeTree:
二、Python3插入数据压测
-
关键库:clickhouse_driver、 concurrent.futures
-
代码:
import random import time from clickhouse_driver import Client from concurrent.futures import ThreadPoolExecutor, as_completedclient = Client(host='ip')# 采用多个连接,避免单个连接被打死 clients = [Client(host='ip'),Client(host='ip'),Client(host='ip'),Client(host='ip') ]# 采用批量插入,经过测试,单条并发插入支持差,每秒只能执行2-5次insert def task(i):sql = "INSERT INTO ck_table (id, feild1, feild2,feild3,feild4,feild5,feild6,feild7,feild8,feild9,feild10,feild11,feild12,feild13,feild14,feild15,feild16,feild17,feild18,feild19,feild20) VALUES"values = []for i in range(1000):values.append((random.randint(1,10000000),"feild1-"+str((random.randint(1,10000000))),"feild2-"+str(i),"feild3-"+str(i), "feild4-"+str(i), "feild5-"+str(i), "feild6-"+str(i), "feild7-"+str(i), "feild8-"+str(i), "feild9-"+str(i), "feild10-"+str(i), "feild11-"+str(i), "feild12-"+str(i), "feild13-"+str(i), "feild14-"+str(i), "feild15-"+str(i), "feild16-"+str(i), "feild17-"+str(i), "feild18-"+str(i), "feild19-"+str(i), "feild20-"+str(i)))clid = random.randint(1, len(clients)-1)clients[clid].execute(sql, values)return '第',clid, "插入",i, '条数据成功'if __name__ == '__main__':print ("程序开始运行")exec = ThreadPoolExecutor(max_workers=2)#ress = []start_time = time.perf_counter()for j in range(4000000): # 总共需要执行的次数res = exec.submit(task,j)#ress.append(res)# for i in as_completed(ress):# print("执行状态",i.result())print("执行耗时", time.perf_counter()-start_time,"s")
三、Python3查询数据测试
-
关键库:clickhouse_driver、 concurrent.futures
-
代码
import time from concurrent.futures import ThreadPoolExecutor, as_completed from clickhouse_driver import Clientclient = Client(host='10.10.16.110')query_sql = """select * from ck_table where feild2='feild2-1009' """def new_task(i):count_sql = """ select count(*) from ck_table"""time.sleep(1)return "执行第",i,"个任务",client.execute(count_sql)if __name__ == '__main__':print ("程序开始运行")thd_ques = []exec = ThreadPoolExecutor(max_workers=1)ress = []start_time = time.perf_counter()for j in range(1000):res = exec.submit(new_task,j)ress.append(res)for i in as_completed(ress):print("执行状态",i.result())print("执行耗时", time.perf_counter()-start_time,"s")
四、测试结论
clickhouse:21个字段表插入-查询测试, CPU200w数据以内 >100,峰值:133.6, 均值: 约110
1、不支持频繁插入(一般1-2次/s),否则会断联等报错,只能批插入(脚本使用2协程每次1000条没有报错,2个协程或者以上会出现断联等报错)
2、不支持频发查询,QPS官方建议100以内,否则CPU占用会很高,拉高服务器负载
3、查询效率:
一个条件where查询(Memery):60W 0.33s
5个条件where查询(Memery):80W 0.57s
5个条件where查询(Memery):100W 0.54s
5个条件where查询(Memery):112W 0.56s
5个条件where查询(Memery):200W 0.565s
5个条件where查询(Memery):500W 1.2s(停止插入的情况下)
5个条件where查询(Memery):560W 1.97s(停止插入的情况下)
5个条件where查询(TinyLog):7000W条 1分47秒
2个条件where查询(TinyLog):1亿零460万条 89s
5个条件where查询(TinyLog):1亿零460万条 84s
10个条件where查询(TinyLog):1亿零460万条 87s
备注 450w条数据后,数据插入线程和查询线程只能存在一个,慢查询的内存消耗很高,16G内存不够用。5个条件where查询还能执行,在1-2s
(1)500w数据量服务器情况:(COPU均值在320左右,16G内存剩余在500-800M之间,停止写入/查询后,CPU恢复正常水平,内存剩余在800M左右)
total used free shared buff/cache available
15G 5.9G 519M 9.2M 9.1G 9.2G
%CPU %MEM
429.5 26.0
(2)1亿数据量服务器情况(1T磁盘消耗共38%,预计消耗6% )
total used free shared buff/cache available
15G 2.7G 181M 9.2M 12G 12G
%CPU %MEM
103.7 3.6
总结:
- 1、不支持并发单条频繁插入,否则会报错,断联等造成数据丢失
- 2、不支持高并发查询,官方建议QPS<= 100,否则会增加服务器负载,CPU,内存等消耗过高
- 3、对服务器要求高,亿级CPU一般建议16核心以上,内存64G以上
- 4、优点是查询快,批量插入效率高,建议低频大批量插入