本文转载自万能修实验室作者:公先生ID:dropudatabase 我们评估一款数据库产品,除了稳定性和易用性外,数据安全也十分重要,备份与恢复往往是最后一道安全门。 但如果备份策略不完善、恢复手段无效,一旦发生数据误删除就真的抢救无效扑街了。 目前Clickhouse的备份方式有以下几种: 文本文件导入导出 表快照 ALTERTABLEFREEZE 备份工具ClickhouseBackup ClickhouseCopier 下面就逐个试试吧。数据备份概述 https:clickhouse。techdocsenoperationsbackup 1。文本文件导入导出 测试数据 MySQL中源数据6。70G,表数据量899万测试表数据量899万MySQL中源数据6。70G0rowsinset。Elapsed:71。482sec。Processed8。99millionrows,6。70GB(125。77thousandrowss。,93。71MBs。) 导出clickhouseclientqueryselectfromcaihao。chtestcustomerdataclickhousetmpcaihao。chtestcustomer。tsv 导入(注意FORMAT后面大写)多个文件可以用chtestcatdataclickhousetmpcaihao。chtestcustomer。tsvclickhouseclientqueryinsertintocaihao。chtestcustomerFORMATTSV 速度:导入需要20多秒 CH文件磁盘占用368MB368chtestcustomer 备份文件3。5G压缩后139MB〔rootclickhouse01tmp〕duhsm3539caihao。chtestcustomer。tsv〔rootclickhouse01tmp〕gzipcaihao。chtestcustomer。tsv〔rootclickhouse01tmp〕duhsm139caihao。chtestcustomer。tsv。gz 对比下占用空间: MySQL6。7G ClickHouse368M 导出文本3。5G 压缩后139M 2。CTAS表快照 1本地复制表clickhouse01:)createtablech1aschtestcustomer;CREATETABLEch1ASchtestcustomerOk。0rowsinset。Elapsed:0。006sec。clickhouse01:)insertintotablech1selectfromchtestcustomer;INSERTINTOch1SELECTFROMchtestcustomerOk。0rowsinset。Elapsed:18。863sec。Processed8。99millionrows,6。70GB(476。59thousandrowss。,355。13MBs。) 2远程复制表 https:clickhouse。techdocsensqlreferencetablefunctionsremote语法remote(addressesexpr,db,table〔,user〔,password〕〕)remote(addressesexpr,db。table〔,user〔,password〕〕)例子:dbadocker:)insertintotablech1selectfromremote(10。222。2。222,caihao。chtestcustomer,chapp,qwerty123);INSERTINTOch1SELECTFROMremote(10。222。2。222,caihao。chtestcustomer,chapp,qwerty123)Ok。0rowsinset。Elapsed:17。914sec。Processed8。99millionrows,6。70GB(501。85thousandrowss。,373。95MBs。) 3。ALTERTABLEFREEZE 语法:ALTERTABLEtablenameFREEZE〔PARTITIONpartitionexpr〕 该操作为指定分区创建一个本地备份。 如果PARTITION语句省略,该操作会一次性为所有分区创建备份。整个备份过程不需要停止服务 注意:FREEZEPARTITION只复制数据,不备份元数据。元数据默认在文件varlibclickhousemetadatadatabasetable。sql 1。备份的步骤: 确认shadow目录为空: (默认位置:varlibclickhouseshadow) OPTIMIZETABLE把临时分区的数据,合并到已有分区中OPTIMIZETABLEcaihao。testrestoretabPARTITION202010FINAL; 或者OPTIMIZETABLEcaihao。testrestoretabFINAL; 让ClickHouse冻结表:echonaltertablecaihao。chtestcustomerfreezeclickhouseclient 备份后的文件〔rootclickhouse01shadow〕lldataclickhousedatashadowtotal8drwxrx3clickhouseclickhouse4096Oct1615:341rwr1clickhouseclickhouse2Oct1615:34increment。txt〔rootclickhouse01shadow〕duhsm30911increment。txt 按日期保存备份:mkdirpdataclickhousedatabackup20201016cprdataclickhousedatashadowdataclickhousedatabackup20201016 最后,为下次备份清理shadow目录:rmrfdataclickhousedatashadow 2。手动恢复 从备份中恢复数据,按如下步骤操作: 如果表不存在,先创建。查看。sql文件获取执行语句(将ATTACH替换成CREATE)。 从备份的datadatabasetable目录中,将数据复制到varlibclickhousedatadatabasetabledetached目录 运行ALTERTABLEtATTACHPARTITION操作,将数据添加到表中 测试把数据恢复到一个新表testrestoretab中 1获取建表语句:catdataclickhousedatametadatacaihaochtestcustomer。sql 然后将DDL语句中的ATTACHTABLE改为CREATETABLE 2备份复制到表的detached目录中:cprldataclickhousedatabackup20201016shadow1datacaihaochtestcustomerdataclickhousedatadatacaihaotestrestoretabdetachedchownclickhouse:clickhouseRdataclickhousedatadatacaihaotestrestoretabdetached 3将数据添加到表中attachpartitionechoaltertablecaihao。testrestoretabattachpartition202010clickhouseclientechoaltertablecaihao。testrestoretabattachpartition202009clickhouseclient 要把所有分区都执行一遍,最终detached目录下所有的分区,都移动到了上一目录 4确认数据已还原:echoselectcount()fromcaihao。testrestoretabattachclickhouseclientclickhouse01:)selectcount()fromtestrestoretab;SELECTcount()FROMtestrestoretabcount()89900201rowsinset。Elapsed:0。002sec。clickhouse01:)selectcount()fromchtestcustomer;SELECTcount()FROMchtestcustomercount()89900201rowsinset。Elapsed:0。002sec。 数据量和原表一致 4。ClickhouseBackup ClickhouseBackup简介 https:github。comAlexAkulovclickhousebackup 使用限制: 支持1。1。54390以上的ClickHouse 仅MergeTree系列表引擎 不支持备份Tieredstorage或storagepolicy 云存储上的最大备份大小为5TB AWSS3上的parts数最大为10,000 安装方式1:二进制文件安装 clickhousebackup下载:wgethttps:github。comAlexAkulovclickhousebackupreleasesdownloadv0。6。0clickhousebackup。tar。gz 解压即用tarxfclickhousebackup。tar。gzcdclickhousebackupsudocpclickhousebackupusrlocalbin 安装方式2:rpm安装:wgethttps:github。comAlexAkulovclickhousebackupreleasesdownloadv0。6。0clickhousebackup0。6。01。x8664。rpmrpmivhclickhousebackup0。6。01。x8664。rpm 查看版本〔rootclickhouse01clickhousebackup〕clickhousebackupvVersion:0。6。0GitCommit:7d7df1e36575f0d94d330c7bfe00aef7a2100276BuildDate:20201002 编辑配置文件:mkdirpetcclickhousebackupvietcclickhousebackupconfig。yml 添加一些基本的配置信息general:remotestorage:nonebackupstokeeplocal:7backupstokeepremote:31clickhouse:username:defaultpassword:host:localhostport:9000datapath:dataclickhousedata 查看全部默认的配置项clickhousebackupdefaultconfig 查看可备份的表clickhousebackuptables 创建备份 1。全库备份clickhousebackupcreate 备份存储在中datapathbackup下,备份名称默认为时间戳,可手动指定备份名称。例如:clickhousebackupcreatechbk20201020 备份包含两个目录: metadata目录:包含重新创建所需的DDLSQL shadow目录:包含作为ALTERTABLE。。。FREEZE操作结果的数据。 2。单表备份 语法:clickhousebackupcreate〔t,tables。〕pre 备份表caihao。chtestcustomerclickhousebackupcreatetcaihao。chtestcustomerchtestcustomer 3。备份多个表clickhousebackupcreatetcaihao。testrestoretab,caihao。ch1chbak2tab 查看备份文件〔rootclickhouse01backup〕clickhousebackuplistLocalbackups:test20201019(createdat2010202014:18:40)chbk20201020(createdat2010202014:20:35)20201020T062708(createdat2010202014:27:08)chtestcustomer(createdat2010202015:17:13)chbak2tab(createdat2010202015:33:41) 删除备份文件〔rootclickhouse01backup〕clickhousebackupdeletelocaltest20201019〔rootclickhouse01backup〕〔rootclickhouse01backup〕clickhousebackuplistLocalbackups:chbk20201020(createdat2010202014:20:35)20201020T062708(createdat2010202014:27:08)chtestcustomer(createdat2010202015:17:13)chbak2tab(createdat2010202015:33:41) 清除shadow下的临时备份文件〔rootclickhouse01shadow〕clickhousebackupclean2020102014:19:13Cleandataclickhousedatashadow 数据恢复 语法: clickhousebackuprestore备份名〔rootclickhouse01〕clickhousebackuprestorehelpNAME:clickhousebackuprestoreCreateschemaandrestoredatafrombackupUSAGE:clickhousebackuprestore〔schema〕〔data〕〔t,tables。〕OPTIONS:configFILE,cFILEConfigFILEname。(default:etcclickhousebackupconfig。yml)〔CLICKHOUSEBACKUPCONFIG〕tablevalue,tablesvalue,tvalueschema,sRestoreschemaonlydata,dRestoredataonlypre 一些参数: table只恢复特定表,可使用正则。 如针对特定的数据库:tabledbname。 schema只还原表结构 data只还原数据 备份到远程目标 Clickhousebackup支持从远程对象存储(例如S3,GCS或IBM的COS)上载和下载备份。 例如AWS的S3,修改配置文件etcclickhousebackupconfig。yml s3:accesskey:secretkey:bucket:存储桶BUCKET名称region:useast1path:somepathinbucket备份路径 然后即可以上传备份:nbsp;clickhousebackupupload20200706T2013022020070715:22:32Uploadbackup20200706T2013022020070715:22:49Done。 或者下载备份:nbsp;sudoclickhousebackupdownload20200706T2013022020070715:27:16Done。 备份保留策略 general:下的2个参数来控制备份的保留策略 backupstokeeplocal:0本地备份保留个数 backupstokeepremote:0远程备份保留个数 默认为0,即不自动做备份清理。 可以设置为: backupstokeeplocal:7 backupstokeepremote:31 使用clickhousebackupupload上传备份可以使用参数difffrom 将文件与以前的本地备份进行比较,仅上载新的更改的文件。 必须保留先前的备份,以便从新备份中进行还原。 备份恢复测试: 测试库有3张表,数据量一样dbadocker:)showtables;SHOWTABLESnamech1数据量8990020ch2数据量8990020ch3数据量8990020 做个备份:bk3tabclickhousebackupcreatebk3tab 进行数据破坏:truncatetablech1;insertintoch2selectfromch3;droptablech3; 此时的数据量dbadocker:)showtables;SHOWTABLESnamech1数据量0ch2数据量8990020217980040ch3被drop 只使用schema恢复ch3表的表结构clickhousebackuprestorebk3tabtablecaihao。ch3schema 只有表结构,没数据dbadocker:)selectcount()fromch3;SELECTcount()FROMch3count()0 用data恢复ch3表中数据 (注意,由于是ATTACHPART操作,如果执行2次的话,数据会翻倍)clickhousebackuprestorebk3tabtablecaihao。ch3data 数据已导入dbadocker:)selectcount()fromch3;SELECTcount()FROMch3count()8990020 恢复其他表:〔rootdbadocker〕clickhousebackuprestorebk3tab2020102017:42:37Createtablecaihao。ch12020102017:42:37cantcreatetablecaihao。ch1:code:57,message:Tablecaihao。ch1alreadyexists。 由于要新建表,只能把表drop掉才能全库恢复。 直接dropdatabase,然后全库恢复clickhousebackuprestorebk3tab 验证后数据是全部恢复成功了dbadocker:)showtables;SHOWTABLESnamech1ch2ch3 dbadocker:)selectcount()fromch1;SELECTcount()FROMch1count()8990020 加到每日备份任务中:mkdirpdataclickhousescriptsvidataclickhousescriptsCHFullBackup。sh !binbashBACKUPNAMECHFullBackup(dateYmdTHMS)usrbinclickhousebackupcreateBACKUPNAMEusrbinclickhousebackupuploadBACKUPNAME 由于需要副本环境才能使用,ClickhouseCopier就不做测试了。 数据库的后悔药 备份作为数据库最后的后悔药意义十分重大: 没有做备份,小心被删库; 一旦删了库,就要快跑路; 万一被抓住,十五年起步。 所以,如果做不好备份恢复,就苦练跑路甩锅,情况不好,拔腿就跑。