KingbaseES V8R6集群备份恢复案例之---主库single-pro备份恢复

news/2025/1/12 18:21:49/文章来源:https://www.cnblogs.com/tiany1224/p/18520950

案例说明:
KingbaseES V8R6集群物理备份支持single-pro方式,本案例在集群执行single-pro方式备份并多次切换集群后,对集群执行了恢复测试,文档记录了恢复的详细过程。

适用版本:
KingbaseES V8R6

集群架构:

 ID | Name  | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+-------+---------+--------------------1  | node1 | primary | * running |          | running | 44482 | no      | n/a2  | node2 | standby |   running | node1    | running | 15369 | no      | 1 second(s) ago

一、查看集群备份
在执行single-prod模式的物理备份的初始化(sys_backup.sh init)后,同时会在主备库节点执行备份:

1、主库备份

[kingbase@node201 bin]$ /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/sys_rman --config=/home/kingbase/kbbr_repo/sys_rman.conf --stanza=kingbase info
WARN: set process-max 4 is too large, auto set to CPU core count 1
stanza: kingbasestatus: okcipher: nonedb (current)wal archive min/max (V008R006C008B0014): 000000490000000300000040/00000049000000030000004Afull backup: 20241101-105946Ftimestamp start/stop: 2024-11-01 10:59:46 / 2024-11-01 10:59:53wal start/stop: 000000490000000300000048 / 000000490000000300000048database size: 378.4MB, database backup size: 378.4MBrepo1: backup set size: 378.4MB, backup size: 378.4MBincr backup: 20241101-105946F_20241101-110356Itimestamp start/stop: 2024-11-01 11:03:56 / 2024-11-01 11:03:58wal start/stop: 00000049000000030000004A / 00000049000000030000004Adatabase size: 378.8MB, database backup size: 25.1MBrepo1: backup set size: 378.8MB, backup size: 25.1MBbackup reference list: 20241101-105946F

2、备库备份

[kingbase@node202 bin]$ /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/sys_rman --config=/home/kingbase/kbbr_repo/sys_rman.conf --stanza=kingbase info
WARN: set process-max 4 is too large, auto set to CPU core count 1
stanza: kingbasestatus: okcipher: nonedb (current)wal archive min/max (V008R006C008B0014): 000000490000000300000040/0000004B000000030000004Dfull backup: 20241101-105544Ftimestamp start/stop: 2024-11-01 10:55:44 / 2024-11-01 10:55:53wal start/stop: 000000490000000300000046 / 000000490000000300000046database size: 378.4MB, database backup size: 378.4MBrepo2: backup set size: 378.4MB, backup size: 378.4MB

二、切换集群测试

1、执行failover切换后

[kingbase@node201 bin]$ ./repmgr cluster showID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                                       
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------1  | node1 | standby |   running | node2    | default  | 100      | 73       | 0 bytes | host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=90002  | node2 | primary | * running |          | default  | 100      | 74       |         | host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000

2、执行switchover切换后

[kingbase@node201 bin]$ ./repmgr cluster showID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                                       
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------1  | node1 | primary | * running |          | default  | 100      | 75       |         | host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=90002  | node2 | standby |   running | node1    | default  | 100      | 74       | 0 bytes | host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000

3、主库timeline时间线变化
如下所示,主库节点timeline变化:

三、主库数据恢复测试

1、模拟数据丢失

prod=# drop table t3;
DROP TABLE
prod=# drop table t2;
DROP TABLE
prod=# drop table t1;
DROP TABLE
prod=# \dList of relationsSchema |          Name           |       Type        | Owner
--------+-------------------------+-------------------+--------public | sys_roles               | table             | systempublic | sys_stat_statements     | view              | systempublic | sys_stat_statements_all | view              | systempublic | tb1                     | table             | systempublic | teachers                | table             | system
(15 rows)

2、通过物理备份恢复数据
1)关闭集群
[kingbase@node201 bin]$ ./sys_monitor.sh stop

2)备份原data
[kingbase@node201 kingbase]$ mv data data.bk

3)执行全量恢复
如下所示,主库节点数据库恢复成功:

[kingbase@node201 bin]$ /home/kingbase/cluster/R6C8/HAC8/kingbase/bin/sys_rman --config=/home/kingbase/kbbr_repo/sys_rman.conf --stanza=kingbase restore
2024-11-01 11:14:17.027 P00   INFO: restore command begin 2.27: --band-width=0 --config=/home/kingbase/kbbr_repo/sys_rman.conf --exec-id=56849-a18b87e7 --kb2-host=192.168.1.202 --kb1-path=/home/kingbase/cluster/R6C8/HAC8/kingbase/data --kb2-path=/home/kingbase/cluster/R6C8/HAC8/kingbase/data --link-all --log-level-console=info --log-level-file=info --log-path=/home/kingbase/cluster/R6C8/HAC8/kingbase/log --log-subprocess --non-archived-space=1024 --process-max=4 --repo1-path=/home/kingbase/kbbr_repo --stanza=kingbase
WARN: set process-max 4 is too large, auto set to CPU core count 1
2024-11-01 11:14:17.065 P00   INFO: repo1: restore backup set 20241101-105946F_20241101-110356I, recovery will start at 2024-11-01 11:03:56
2024-11-01 11:14:17.981 P00   INFO: Restore Process: FILE: 1 / 4572 0%       SZIE: 182419456 bytes / 397204747 bytes 174.0MB / 378.8MB 45%
........2024-11-01 11:14:23.434 P00   INFO: Restore Process: FILE: 4572 / 4572 100%       SZIE: 397204747 bytes / 397204747 bytes 378.8MB / 378.8MB 100%
2024-11-01 11:14:23.435 P00   INFO: write updated /home/kingbase/cluster/R6C8/HAC8/kingbase/data/kingbase.auto.conf
2024-11-01 11:14:23.438 P00   INFO: restore global/sys_control (performed last to ensure aborted restores cannot be started)
2024-11-01 11:14:23.439 P00   INFO: restore size = 378.8MB, file total = 4572
2024-11-01 11:14:23.440 P00   INFO: restore command end: completed successfully (6417ms)

4)查看数据恢复状态
如下所示,主库节点数据库恢复完成:

# 启动主库数据库服务
[kingbase@node201 bin]$ ./sys_ctl start -D ../data# 访问数据库
[kingbase@node201 bin]$ ./ksql -U system test
Type "help" for help.test=# \c prod
You are now connected to database "prod" as userName "system".
prod=# \dList of relationsSchema |          Name           |       Type        | Owner
--------+-------------------------+-------------------+--------public | sys_roles               | table             | systempublic | sys_stat_statements     | view              | systempublic | sys_stat_statements_all | view              | systempublic | t1                      | table             | systempublic | t2                      | table             | systempublic | t3                      | table             | systempublic | tb1                     | table             | systempublic | teachers                | table             | system
(18 rows)prod=# select count(*) from t3;count(*)
----------10000
(1 row)

四、执行备库恢复
如下所示,在执行备库clone前,将主库kingbase.auto.conf中的“restore_command”选项注释:

[kingbase@node201 bin]$ cat ../data/kingbase.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
.......# Recovery settings generated by sys_rman restore on 2024-11-01 11:14:23
# restore_command = '/home/kingbase/cluster/R6C8/HAC8/kingbase/bin/sys_rman --config=/home/kingbase/kbbr_repo/sys_rman.conf --stanza=kingbase archive-get %f "%p"'

1、备库clone
[kingbase@node202 bin]$ ./repmgr standby clone -h 192.168.1.201 -U esrep -d esrep

2、启动备库数据库服务

[kingbase@node202 bin]$ ./sys_ctl start -D ../data

3、注册备库
[kingbase@node202 bin]$ ./repmgr standby register --force

4、查看集群状态

[kingbase@node201 bin]$ ./repmgr cluster showID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                                       
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------1  | node1 | primary | * running |          | default  | 100      | 74       |         | host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=90002  | node2 | standby |   running | node1    | default  | 100      | 74       | 0 bytes | host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
You have mail in /var/spool/mail/kingbase

---如上所示,集群恢复完成 !

五、恢复故障案例

1、备库clone后故障
如下所示,在主库kingbase.auto.conf中没有注释“restore_command”参数后,备库执行clone,然后启动数据库服务,备库从归档日志开始恢复数据,导致主备库数据时间线timeline不一致,主备流复制建立失败:

备库sys_log日志:

[kingbase@node202 sys_log]$ tail -1000 kingbase-2024-11-01_113351.csv
2024-11-01 11:33:51.683 CST,,,32336,,67244c1f.7e50,1,,2024-11-01 11:33:51 CST,,0,LOG,00000,"ending log output to stderr",,"Future log output will go to log destination ""csvlog"".",,,,,,,""
2024-11-01 11:33:51.688 CST,,,32338,,67244c1f.7e52,1,,2024-11-01 11:33:51 CST,,0,LOG,00000,"database system was interrupted; last known up at 2024-11-01 11:33:21 CST",,,,,,,,,""
2024-11-01 11:33:52.247 CST,,,32338,,67244c1f.7e52,2,,2024-11-01 11:33:51 CST,,0,LOG,00000,"restored log file ""0000004B.history"" from archive",,,,,,,,,""
2024-11-01 11:33:52.259 CST,,,32338,,67244c1f.7e52,3,,2024-11-01 11:33:51 CST,,0,LOG,00000,"entering standby mode",,,,,,,,,""
2024-11-01 11:33:52.269 CST,,,32338,,67244c1f.7e52,4,,2024-11-01 11:33:51 CST,,0,LOG,00000,"restored log file ""0000004B.history"" from archive",,,,,,,,,""
2024-11-01 11:33:52.289 CST,,,32338,,67244c1f.7e52,5,,2024-11-01 11:33:51 CST,,0,FATAL,XX000,"requested timeline 75 is not a child of this server's history","Latest checkpoint is at 3/51000058 on timeline 74, but in the history of the requested timeline, the server forked off from that timeline at 3/4D0000A0.",,,,,,,,""
2024-11-01 11:33:52.290 CST,,,32336,,67244c1f.7e50,2,,2024-11-01 11:33:51 CST,,0,LOG,00000,"startup process (PID 32338) exited with exit code 1",,,,,,,,,""
2024-11-01 11:33:52.290 CST,,,32336,,67244c1f.7e50,3,,2024-11-01 11:33:51 CST,,0,LOG,00000,"aborting startup due to startup process failure",,,,,,,,,""
2024-11-01 11:33:52.297 CST,,,32336,,67244c1f.7e50,4,,2024-11-01 11:33:51 CST,,0,LOG,00000,"database system is shut down",,,,,,,,,""

2、注释主库restore_command参数

3、备库执行clone

如下所示,备库执行clone后,集群恢复正常:

ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                                       
----+-------+---------+-----------+----------+----------+----------+----------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1  | node1 | primary | * running |          | default  | 100      | 74       |         | host=192.168.1.201 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
2  | node2 | standby |   running | node1    | default  | 100      | 74       | 0 bytes | host=192.168.1.202 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000

六、总结
本文详细记录了,在集群环境下通过single-pro模式执行物理备份后的,集群恢复过程,可以用于数据库数据恢复的参考。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hqwc.cn/news/825339.html

如若内容造成侵权/违法违规/事实不符,请联系编程知识网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

号码变换配置对接运营商IMS

概述 freeswitch是一款简单好用的VOIP开源软交换平台。 fs直接对接运营商,调试过程中的号码变换规则比较容易出问题。 本文档记录一个较为通用的对接IMS配置方案。 环境 CentOS 7.9 freeswitch 1.10.7 模块配置 号码变换主要使用mod_translate模块和dialplan拨号计划实现。 确…

Idea上Git仓库不见了是什么原因

在使用IntelliJ IDEA进行项目开发时,Git仓库突然消失是开发者常遇到的问题。该问题可能由多个因素引起,包括:1.环境配置问题;2.软件或插件更新;3.目录结构变更;4.用户权限问题;5.其他软件干扰。理解这些因素并采取相应的解决措施,不仅能快速恢复Git仓库,还能避免类似问…

【算法学习】扫描线

这篇题解写的难以言喻,可能只有我能看的懂! 前言 虽然我觉得这个算法目前不太可能会考,但是我觉得挺有意思的,而且学个算法也挺好,我是为自己学的!!! 定义 扫描线可以求二维图形的面积,也可以求周长等多种用途…… P5490 【模板】扫描线 & 矩形面积并 这就是扫描线…

leetcode 740 删除并获得点数

740 删除并获得点数 题意 给你一个整数数组 nums ,你可以对它进行一些操作。 每次操作中,选择任意一个 nums[i] ,删除它并获得 nums[i] 的点数。之后,你必须删除 所有 等于 nums[i] - 1 和 nums[i] + 1 的元素。 开始你拥有 0 个点数。返回你能通过这些操作获得的最大点数。…

Navicat 17下载与安装

1、安装包 Navicat 17: 链接:https://pan.quark.cn/s/c75e892c4705 提取码:YvyF Navicat 16: 链接:https://pan.quark.cn/s/63c07b20ea7b提取码:B9ij 2、安装教程(这里以安装Navicat 17 为例) 1) 如之前已安装的需卸载当前Navicat,如未安装,直接双击无限试用…

gitlab怎么保护分支

​GitLab作为一个流行的版本控制工具其中“分支保护”是一个关键功能,用以防止开发过程中的不当操作对代码造成不可逆的影响。本文将指导你如何在GitLab中保护分支:1.理解保护分支的重要性;2.学会使用GitLab的界面进行分支的保护操作;3.了解与合并请求的关联使用;4.探讨在…

[编程笔记] 搞人心态的代码含毒事件 “svn无法成功完成操作因为文件包含病毒或潜在的垃圾软件”

svn无法成功完成操作因为文件包含病毒或潜在的垃圾软件,Windows Defender误判?今天突然冒出来的问题,烦死了!     svn拉取代码报毒了,不用想,基本就是下面几个可能性:1、某人提交的代码有毒2、电脑上的第三方杀毒软件引发3、Windows Defender误判报毒的代码是一个dl…

AI作文批阅,AI素材管理……璞华集团携多款明星产品亮相智能社会治理论坛

2024年10月25日,金秋十月的璀璨时节,备受瞩目的第二届智能社会治理论坛暨中国光谷人工智能艺术大会在中国光谷盛大启幕。此次论坛汇聚了人工智能领域的顶尖智慧,共同探讨人工智能技术的最新突破与智能社会治理模式的创新路径,同时强调了科技与文化融合的无限可能。璞华集团…

有Redis为什么还要本地缓存?谈谈你对本地缓存的理解?

本地缓存是将数据存储在应用程序所在的本地内存中的缓存方式。既然,已经有了 Redis 可以实现分布式缓存了,为什么还需要本地缓存呢?接下来,我们一起来看。 为什么需要本地缓存? 尽管已经有 Redis 缓存了,但本地缓存也是非常有必要的,因为它有以下优点:速度优势:本地缓…

Adobe After Effects各版本安装包下载与安装

1、安装包我用夸克网盘分享了 After Effects 2024: 链接:https://pan.quark.cn/s/fac88adbac44 提取码:9ZMW After Effects 2023: 链接:https://pan.quark.cn/s/d41a0a447b93 提取码:4pwM After Effects 2022: 链接:https://pan.quark.cn/s/0070a59da58d 提取码:Eij1 Af…

Adobe InDesign 各版本下载与安装

1、安装包我用夸克网盘分享了「Adobe InDesign 2023.rar」,点击链接即可保存。打开「夸克APP」,无需下载在线播放视频,畅享原画5倍速,支持电视投屏。 链接:https://pan.quark.cn/s/526c259dad6f 提取码:MfMXAdobe InDesign 2022: 链接:https://pan.quark.cn/s/c7ee80a21…

井底车场人员进入识别智慧矿山一体机人车防碰撞识别:矿山AI识别算法是如何训练的?

智慧矿山一体机是为矿山环境量身定制的智能设备,其核心任务是预防和减少重大安全风险,并充分利用边缘计算的视频智能识别技术。该设备能够提供包括安全监控、设备状态监测和灾害预警在内的多种智能化功能,完全满足矿山场景视频智能化建设的技术规范和验收标准。训练矿山视频…