本文转载自 万能修实验室 作者:公先生 ID: dropudatabase
我们评估一款数据库产品,除了稳定性和易用性外,数据安全也十分重要,备份与恢复往往是最后一道安全门。
但如果备份策略不完善、恢复手段无效,一旦发生数据误删除就真的抢救无效-扑街了。
目前Clickhouse的备份方式有以下几种:
文本文件导入导出
表快照
ALTER TABLE…FREEZE
备份工具Clickhouse-Backup
Clickhouse-Copier
下面就逐个试试吧。
# 数据备份 概述
https://clickhouse.tech/docs/en/operations/backup/
1. 文本文件导入导出
# 测试数据
MySQL中源数据6.70G,表数据量899万
--测试表数据量899万--MySQL中源数据6.70G0 rows in set. Elapsed: 71.482 sec. Processed 8.99 million rows, 6.70 GB (125.77 thousand rows/s., 93.71 MB/s.)
# 导出
clickhouse-client --query="select * from caihao.ch_test_customer" > /data/clickhouse/tmp/caihao.ch_test_customer.tsv
# 导入 (注意FORMAT后面大写) 多个文件可以用 ch_test*
cat /data/clickhouse/tmp/caihao.ch_test_customer.tsv | clickhouse-client --query="insert into caihao.ch_test_customer FORMAT TSV"
速度:导入需要20多秒
# CH文件磁盘占用 368MB
368 ch_test_customer
# 备份文件3.5G 压缩后139MB
[root@clickhouse-01 tmp]# du -hsm *3539 caihao.ch_test_customer.tsv[root@clickhouse-01 tmp]# gzip caihao.ch_test_customer.tsv[root@clickhouse-01 tmp]# du -hsm *139 caihao.ch_test_customer.tsv.gz
# 对比下占用空间:
MySQL -- 6.7G
ClickHouse -- 368M
导出文本 -- 3.5G
压缩后 -- 139M
2. CTAS表快照
# 1 本地复制表
clickhouse-01 :) create table ch1 as ch_test_customer ;CREATE TABLE ch1 AS ch_test_customerOk.0 rows in set. Elapsed: 0.006 sec. clickhouse-01 :) insert into table ch1 select * from ch_test_customer ;INSERT INTO ch1 SELECT *FROM ch_test_customerOk.0 rows in set. Elapsed: 18.863 sec. Processed 8.99 million rows, 6.70 GB (476.59 thousand rows/s., 355.13 MB/s.)
# 2 远程复制表
https://clickhouse.tech/docs/en/sql-reference/table-functions/remote/
-# 语法remote('addresses_expr', db, table[, 'user'[, 'password']])remote('addresses_expr', db.table[, 'user'[, 'password']])-# 例子:dba-docker :) insert into table ch1 select * from remote ('10.222.2.222','caihao.ch_test_customer','ch_app','qwerty_123');INSERT INTO ch1 SELECT *FROM remote('10.222.2.222', 'caihao.ch_test_customer', 'ch_app', 'qwerty_123')Ok.0 rows in set. Elapsed: 17.914 sec. Processed 8.99 million rows, 6.70 GB (501.85 thousand rows/s., 373.95 MB/s.)
3. ALTER TABLE…FREEZE
语法:
ALTER TABLE table_name FREEZE [PARTITION partition_expr]
该操作为指定分区创建一个本地备份。
如果 PARTITION 语句省略,该操作会一次性为所有分区创建备份。整个备份过程不需要停止服务
注意:FREEZE PARTITION 只复制数据, 不备份元数据. 元数据默认在文件 /var/lib/clickhouse/metadata/database/table.sql
1. 备份的步骤:
# 确认shadow目录为空:
(默认位置:/var/lib/clickhouse/shadow/)
# OPTIMIZE TABLE 把临时分区的数据,合并到已有分区中
OPTIMIZE TABLE caihao.test_restore_tab PARTITION '2020-10' FINAL;
或者
OPTIMIZE TABLE caihao.test_restore_tab FINAL;
# 让ClickHouse冻结表:
echo -n 'alter table caihao.ch_test_customer freeze' | clickhouse-client
# 备份后的文件
[root@clickhouse-01 shadow]# ll /data/clickhouse/data/shadow/total 8drwxr-x--- 3 clickhouse clickhouse 4096 Oct 16 15:34 1-rw-r----- 1 clickhouse clickhouse 2 Oct 16 15:34 increment.txt[root@clickhouse-01 shadow]# du -hsm *309 11 increment.txt
# 按日期保存备份:
mkdir -p /data/clickhouse/data/backup/20201016/cp -r /data/clickhouse/data/shadow/ /data/clickhouse/data/backup/20201016/
# 最后,为下次备份清理shadow目录:
rm -rf /data/clickhouse/data/shadow/*
2. 手动恢复
从备份中恢复数据,按如下步骤操作:
如果表不存在,先创建。查看.sql 文件获取执行语句 (将ATTACH 替换成 CREATE).
从 备份的data/database/table/目录中,将数据复制到 /var/lib/clickhouse/data/database/table/detached/目录
运行 ALTER TABLE t ATTACH PARTITION操作,将数据添加到表中
测试把数据恢复到一个新表test_restore_tab中
# 1 获取建表语句:
cat /data/clickhouse/data/metadata/caihao/ch_test_customer.sql
然后将DDL语句中的 ATTACH TABLE 改为 CREATE TABLE
# 2 备份复制到表的“ detached”目录中:
cp -rl /data/clickhouse/data/backup/20201016/shadow/1/data/caihao/ch_test_customer/* /data/clickhouse/data/data/caihao/test_restore_tab/detached/chown clickhouse:clickhouse -R /data/clickhouse/data/data/caihao/test_restore_tab/detached/*
# 3 将数据添加到表中 attach partition
echo 'alter table caihao.test_restore_tab attach partition 202010 ' | clickhouse-clientecho 'alter table caihao.test_restore_tab attach partition 202009 ' | clickhouse-client
要把所有分区都执行一遍,最终detached 目录下所有的分区,都移动到了上一目录
# 4 确认数据已还原:
echo 'select count() from caihao.test_restore_tab attach' | clickhouse-clientclickhouse-01 :) select count(*) from test_restore_tab;SELECT count(*)FROM test_restore_tab┌─count()─┐│ 8990020 │└─────────┘1 rows in set. Elapsed: 0.002 sec. clickhouse-01 :) select count(*) from ch_test_customer;SELECT count(*)FROM ch_test_customer┌─count()─┐│ 8990020 │└─────────┘1 rows in set. Elapsed: 0.002 sec.
数据量和原表一致
4. Clickhouse-Backup
# Clickhouse-Backup简介
https://github.com/AlexAkulov/clickhouse-backup
# 使用限制:
支持1.1.54390以上的ClickHouse
仅MergeTree系列表引擎
不支持备份Tiered storage或storage_policy
云存储上的最大备份大小为5TB
AWS S3上的parts数最大为10,000
--安装方式1:二进制文件安装
# clickhouse-backup下载:
wget https://github.com/AlexAkulov/clickhouse-backup/releases/download/v0.6.0/clickhouse-backup.tar.gz
# 解压即用
tar -xf clickhouse-backup.tar.gz cd clickhouse-backup / sudo cp clickhouse-backup /usr/local/bin
--安装方式2:rpm安装:
wget https://github.com/AlexAkulov/clickhouse-backup/releases/download/v0.6.0/clickhouse-backup-0.6.0-1.x86_64.rpmrpm -ivh clickhouse-backup-0.6.0-1.x86_64.rpm
# 查看版本
[root@clickhouse-01 clickhouse-backup]# clickhouse-backup -vVersion: 0.6.0Git Commit: 7d7df1e36575f0d94d330c7bfe00aef7a2100276Build Date: 2020-10-02
# 编辑配置文件:
mkdir -p /etc/clickhouse-backup/vi /etc/clickhouse-backup/config.yml
添加一些基本的配置信息
general: remote_storage: none backups_to_keep_local: 7 backups_to_keep_remote: 31clickhouse: username: default password: "" host: localhost port: 9000 data_path: "/data/clickhouse/data"
# 查看全部默认的配置项
clickhouse-backup default-config
# 查看可备份的表
clickhouse-backup tables
# 创建备份
1. 全库备份
clickhouse-backup create
备份存储在中 $data_path/backup 下,备份名称默认为时间戳,可手动指定备份名称。例如:
clickhouse-backup create ch_bk_20201020
备份包含两个目录:
'metadata'目录: 包含重新创建所需的DDL SQL
'shadow'目录: 包含作为ALTER TABLE ... FREEZE操作结果的数据。
2. 单表备份
语法:
clickhouse-backup create [-t, --tables=.
]
备份表caihao.ch_test_customer
clickhouse-backup create -t caihao.ch_test_customer ch_test_customer
3. 备份多个表
clickhouse-backup create -t caihao.test_restore_tab,caihao.ch1 ch_bak_2tab
# 查看备份文件
[root@clickhouse-01 backup]# clickhouse-backup listLocal backups:- 'test20201019' (created at 20-10-2020 14:18:40)- 'ch_bk_20201020' (created at 20-10-2020 14:20:35)- '2020-10-20T06-27-08' (created at 20-10-2020 14:27:08)- 'ch_test_customer' (created at 20-10-2020 15:17:13)- 'ch_bak_2tab' (created at 20-10-2020 15:33:41)
# 删除备份文件
[root@clickhouse-01 backup]# clickhouse-backup delete local test20201019[root@clickhouse-01 backup]# [root@clickhouse-01 backup]# clickhouse-backup listLocal backups:- 'ch_bk_20201020' (created at 20-10-2020 14:20:35)- '2020-10-20T06-27-08' (created at 20-10-2020 14:27:08)- 'ch_test_customer' (created at 20-10-2020 15:17:13)- 'ch_bak_2tab' (created at 20-10-2020 15:33:41)
# 清除shadow下的临时备份文件
[root@clickhouse-01 shadow]# clickhouse-backup clean2020/10/20 14:19:13 Clean /data/clickhouse/data/shadow
# 数据恢复
语法:
clickhouse-backup restore 备份名
[root@clickhouse-01 ~]# clickhouse-backup restore -helpNAME: clickhouse-backup restore - Create schema and restore data from backupUSAGE: clickhouse-backup restore [--schema] [--data] [-t, --tables=.
] OPTIONS: --config FILE, -c FILE Config FILE name. (default: "/etc/clickhouse-backup/config.yml") [$CLICKHOUSE_BACKUP_CONFIG] --table value, --tables value, -t value --schema, -s Restore schema only --data, -d Restore data only
一些参数:
--table 只恢复特定表,可使用正则。
如针对特定的数据库:--table=dbname.*
--schema 只还原表结构
--data 只还原数据
# 备份到远程目标
Clickhouse-backup 支持从远程对象存储(例如S3,GCS或IBM的COS)上载和下载备份。
例如 AWS 的 S3, 修改配置文件/etc/clickhouse-backup/config.yml
s3: access_key: secret_key: bucket: region: us-east-1 path: "/some/path/in/bucket"
然后即可以上传备份:
$ clickhouse-backup upload 2020-07-06T20-13-022020/07/07 15:22:32 Upload backup '2020-07-06T20-13-02'2020/07/07 15:22:49 Done.
或者下载备份:
$ sudo clickhouse-backup download 2020-07-06T20-13-022020/07/07 15:27:16 Done.
# 备份保留策略
general:下的2个参数来控制备份的保留策略
backups_to_keep_local: 0 # 本地备份保留个数
backups_to_keep_remote: 0 # 远程备份保留个数
默认为0,即不自动做备份清理。
可以设置为:
backups_to_keep_local: 7
backups_to_keep_remote: 31
使用clickhouse-backup upload 上传备份可以使用参数 --diff-from
将文件与以前的本地备份进行比较,仅上载新的/更改的文件。
必须保留先前的备份,以便从新备份中进行还原。
# 备份恢复测试:
测试库有3张表,数据量一样
dba-docker :) show tables;SHOW TABLES┌─name─┐│ ch1 │ # 数据量 8990020│ ch2 │ # 数据量 8990020│ ch3 │ # 数据量 8990020└──────┘
做个备份:bk_3_tab
clickhouse-backup create bk_3_tab
进行数据破坏:
truncate table ch1;insert into ch2 select * from ch3;drop table ch3;
此时的数据量
dba-docker :) show tables;SHOW TABLES┌─name─┐│ ch1 │ # 数据量 0│ ch2 │ # 数据量 8990020*2=17980040└──────┘ # ch3被drop
只使用 --schema 恢复ch3表的表结构
clickhouse-backup restore bk_3_tab -table caihao.ch3 --schema
只有表结构,没数据
dba-docker :) select count(*) from ch3;SELECT count(*)FROM ch3┌─count()─┐│ 0 │└─────────┘
用 --data 恢复ch3表中数据
(注意,由于是ATTACH PART操作,如果执行2次的话,数据会翻倍)
clickhouse-backup restore bk_3_tab -table caihao.ch3 --data
数据已导入
dba-docker :) select count(*) from ch3;SELECT count(*)FROM ch3┌─count()─┐│ 8990020 │└─────────┘
恢复其他表:
[root@dba-docker ~]# clickhouse-backup restore bk_3_tab 2020/10/20 17:42:37 Create table 'caihao.ch1'2020/10/20 17:42:37 can't create table 'caihao.ch1': code: 57, message: Table caihao.ch1 already exists.
由于要新建表,只能把表drop掉才能全库恢复。
直接 drop database,然后全库恢复
clickhouse-backup restore bk_3_tab
验证后数据是全部恢复成功了
dba-docker :) show tables;SHOW TABLES┌─name─┐│ ch1 │ │ ch2 │ │ ch3 │└──────┘
dba-docker :) select count(*) from ch1;
SELECT count(*)FROM ch1┌─count()─┐│ 8990020 │└─────────┘
# 加到每日备份任务中:
mkdir -p /data/clickhouse/scriptsvi /data/clickhouse/scripts/CH_Full_Backup.sh
#!/bin/bashBACKUP_NAME=CH_Full_Backup_$(date +%Y-%m-%dT%H-%M-%S)/usr/bin/clickhouse-backup create $BACKUP_NAME# /usr/bin/clickhouse-backup upload $BACKUP_NAME
由于需要副本环境才能使用,Clickhouse-Copier就不做测试了。
数据库的“后悔药”
备份作为数据库最后的“后悔药”意义十分重大:
没有做备份,小心被删库;
一旦删了库,就要快跑路;
万一被抓住,十五年起步。
所以,如果做不好备份恢复,就苦练跑路甩锅,情况不好,拔腿就跑。