官网:Velero
简介
Velero 是vmware开源的一个云原生的灾难恢复和迁移工具,它本身也是开源的,采用Go语言编写,可以安全的备份、恢复和迁移Kubernetes集群资源数据;官网https://velero.io/。
Velero 是西班牙语意思是帆船,非常符合Kubernetes社区的命名风格,Velero的开发公司Heptio,已被VMware收购。Velero 支持标准的K8S集群,既可以是私有云平台也可以是公有云,除了灾备之外它还能做资源移转,支持把容器应用从一个集群迁移到另一个集群。
Velero 的工作方式就是把kubernetes中的数据备份到对象存储以实现高可用和持久化,默认的备份保存时间为720小时,并在需要的时候进行下载和恢复。
velero由一个客户端和一个服务端组成
客户端:运行在本地的命令行工具,只要配置好kubectl和kubeconfig认证文件就可使用,非常简单
服务端:运行在Kubernetes集群之上,负责执行具体的备份和恢复操作
velero整体架构
velero备份流程
Velero 客户端调用Kubernetes API Server创建Backup任务。Backup 控制器基于watch 机制通过API Server获取到备份任务。Backup 控制器开始执行备份动作,其会通过请求API Server获取需要备份的数据。Backup 控制器将获取到的数据备份到指定的对象存储server端。
Velero与etcd快照备份的区别
k8s集群备份常用的方式是对etcd定时进行快照备份,这里来说明一下velero备份的区别。
- etcd 快照是全局完成备份(类似于MySQL全部备份),即使需要恢复一个资源对象(类似于只恢复MySQL的一个库),但是也需要做全局恢复到备份的状态(类似于MySQL的全库恢复),即会影响其它namespace中pod运行服务(类似于会影响MySQL其它数据库的数据)。
- Velero可以有针对性的备份,比如按照namespace单独备份、只备份单独的资源对象等,在恢复的时候可以根据备份只恢复单独的namespace或资源对象,而不影响其它namespace中pod运行服务。
- velero支持ceph、oss等对象存储,etcd 快照是一个为本地文件。
- velero支持任务计划实现周期备份,但etcd 快照也可以基于cronjob实现。
- velero支持对AWS EBS创建快照及还原https://www.qloudx.com/velero-for-kubernetes-backup-restore-stateful-workloads-with-aws-ebs-snapshots/
https://github.com/vmware-tanzu/velero-plugin-for-aws
安装搭建velero
规划:这里采用minio作为velero的后端存储
1. 搭建minio
docker run --name minio \
-p 9000:9000 \
-p 9999:9999 \
-d --restart=always \
-e "MINIO_ROOT_USER=admin" \
-e "MINIO_ROOT_PASSWORD=12345678" \
-v /data/minio/data:/data \
minio/minio:RELEASE.2023-08-31T15-31-16Z server /data \
--console-address '0.0.0.0:9999'这里直接docker run启动一个minio容器
web访问登陆,创建需要使用的桶
2. k8s master节点部署velero
1) 下载客户端
# 下载客户端
wget https://github.com/vmware-tanzu/velero/releases/download/v1.13.1/velero-v1.13.1-linux-amd64.tar.gz# 解压
tar -zxvf velero-v1.13.1-linux-amd64.tar.gz# 拷贝到/usr/bin目录下
chmod +x velero-v1.13.1-linux-amd64/velerocp velero-v1.13.1-linux-amd64/velero /usr/bin/# 检查验证,查看帮助信息
velero --helpUsage:velero [command]Available Commands:backup Work with backupsbackup-location Work with backup storage locationsbug Report a Velero bugclient Velero client related commandscompletion Generate completion scriptcreate Create velero resourcesdebug Generate debug bundledelete Delete velero resourcesdescribe Describe velero resourcesget Get velero resourceshelp Help about any commandinstall Install Veleroplugin Work with pluginsrepo Work with repositoriesrestore Work with restoresschedule Work with schedulessnapshot-location Work with snapshot locationsuninstall Uninstall Veleroversion Print the velero version and associated imageFlags:--add_dir_header If true, adds the file directory to the header of the log messages--alsologtostderr log to standard error as well as files (no effect when -logtostderr=true)--colorized optionalBool Show colored output in TTY. Overrides 'colorized' value from $HOME/.config/velero/config.json if present. Enabled by default--features stringArray Comma-separated list of features to enable for this Velero process. Combines with values from $HOME/.config/velero/config.json if present-h, --help help for velero--kubeconfig string Path to the kubeconfig file to use to talk to the Kubernetes apiserver. If unset, try the environment variable KUBECONFIG, as well as in-cluster configuration--kubecontext string The context to use to talk to the Kubernetes apiserver. If unset defaults to whatever your current-context is (kubectl config current-context)--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)--log_dir string If non-empty, write log files in this directory (no effect when -logtostderr=true)--log_file string If non-empty, use this log file (no effect when -logtostderr=true)--log_file_max_size uint Defines the maximum size a log file can grow to (no effect when -logtostderr=true). Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)--logtostderr log to standard error instead of files (default true)-n, --namespace string The namespace in which Velero should operate (default "velero")--one_output If true, only write logs to their native severity level (vs also writing to each lower severity level; no effect when -logtostderr=true)--skip_headers If true, avoid header prefixes in the log messages--skip_log_headers If true, avoid headers when opening log files (no effect when -logtostderr=true)--stderrthreshold severity logs at or above this threshold go to stderr when writing to files and stderr (no effect when -logtostderr=true or -alsologtostderr=false) (default 2)-v, --v Level number for the log level verbosity--vmodule moduleSpec comma-separated list of pattern=N settings for file-filtered loggingUse "velero [command] --help" for more information about a command.
2) 配置velero认证环境
# 创建工作目录mkdir /data/velero -p# 创建访问minio的认证文件cat >velero-auth.txt << EOF
[default]
aws_access_key_id = admin
aws_secret_access_key = 12345678EOF
这个velero-auth.txt文件中记录了访问对象存储minio的用户名和密码;
其中,
aws_access_key_id这个变量用来指定对象存储用户名
aws_secret_access_key变量用来指定密码;
这两个变量是固定的不能随意改动。
3)velero服务端快速安装部署到k8s集群
velero completion bash# 查看生成的脚本cat velero.shvelero install \--provider aws \--plugins velero/velero-plugin-for-aws:v1.0.0 \--bucket velerodata \--secret-file ./velero-auth.txt \--use-volume-snapshots=false \--namespace velero-system \--backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://192.168.100.100:9000 # 查看服务端kubectl -n velero-system get pod
4)制作证书和kube.config文件(可选)
# 准备user-csr文件,该文件用于制作证书所需信息cat > awsuser-csr.json << EOF
{"CN": "awsuser","hosts": [],"key": {"algo": "rsa","size": 2048},"names": [{"C": "CN","ST": "SiChuan","L": "ChengDu","O": "k8s","OU": "System"}]
}EOF# 下载cfssl相关工具wget https://github.com/cloudflare/cfssl/releases/download/v1.6.1/cfssl_1.6.1_linux_amd64wget https://github.com/cloudflare/cfssl/releases/download/v1.6.1/cfssljson_1.6.1_linux_amd64wget https://github.com/cloudflare/cfssl/releases/download/v1.6.1/cfssl-certinfo_1.6.1_linux_amd64mv cfssl-certinfo_1.6.1_linux_amd64 cfssl-certinfomv cfssl_1.6.1_linux_amd64 cfsslmv cfssljson_1.6.1_linux_amd64 cfssljsonchmod a+x cfssl-certinfo cfssl cfssljsoncp cfssl-certinfo cfssl cfssljson /usr/bin/# 证书签发复制部署k8s集群的ca-config.json文件至/data/velerocfssl gencert -ca=/etc/kubernetes/ssl/ca.pem -ca-key=/etc/kubernetes/ssl/ca-key.pem -config=./ca-config.json -profile=kubernetes ./awsuser-csr.json | cfssljson -bare awsuser分发证书到api-server证书路径cp awsuser-key.pem /etc/kubernetes/ssl/cp awsuser.pem /etc/kubernetes/ssl/生成k8s集群认证config文件export KUBE_APISERVER="https://192.168.100.111:6443"
--certificate-authority=/etc/kubernetes/ssl/ca.pem \
--embed-certs=true \
--server=${KUBE_APISERVER} \
--kubeconfig=./awsuser.kubeconfig# 设置客户端证书认证
kubectl config set-credentials awsuser \
--client-certificate=/etc/kubernetes/ssl/awsuser.pem \
--client-key=/etc/kubernetes/ssl/awsuser-key.pem \
--embed-certs=true \
--kubeconfig=./awsuser.kubeconfig# 设置上下文参数kubectl config set-context kubernetes \
--cluster=kubernetes \
--user=awsuser \
--namespace=velero-system \
--kubeconfig=./awsuser.kubeconfig# 设置默认上下文kubectl config use-context kubernetes --kubeconfig=awsuser.kubeconfig# k8s集群中创建awsuser账户kubectl create clusterrolebinding awsuser --clusterrole=cluster-admin --user=awsuser# 验证证书的可用性kubectl --kubeconfig ./awsuser.kubeconfig get nodeskubectl --kubeconfig ./awsuser.kubeconfig get pods -n kube-system
使用--kubeconfig选项来指定认证文件,如果能够正常查看k8s集群,pod等信息,说明该认证文件没有问题
kubectl create ns velero-system# velero服务端安装velero --kubeconfig ./awsuser.kubeconfig \
install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.5.5 \
--bucket velerodata \
--secret-file ./velero-auth.txt \
--use-volume-snapshots=false \
--namespace velero-system \
--backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://192.168.100.100:9000# 检查验证kubectl get pod -n velero-system
velero备份数据
1. 备份指定namespace
# 获取当前时间DATE=`date +%Y%m%d%H%M%S`# 创建一个备份velero backup create default-backup-${DATE} \
--include-cluster-resources=true \
--include-namespaces default \
--kubeconfig=./awsuser.kubeconfig \
--namespace velero-system# 检查验证备份velero backup describe default-backup-20240415133242 --kubeconfig=./awsuser.kubeconfig --namespace velero-system
web登陆到minio后台查看桶内是否有数据。
2. 备份指定namespace中的pod或特定资源
DATE=`date +%Y%m%d%H%M%S`velero backup create pod-backup-${DATE} --include-cluster-resources=true \--ordered-resources 'pods=default/bash,magedu/ubuntu1804,magedu/mysql-0;deployments.apps=myserver/myserver-myapp-frontend-deployment,magedu/wordpress-app-deployment;services=myserver/myserver-myapp-service-name,magedu/mysql,magedu/zookeeper' \--namespace velero-system --include-namespaces=myserver,magedu,default
3. 批量备份所有namespace
cat all-ns-backup.sh#!/bin/bash
NS_NAME=`kubectl get ns | awk '{if (NR>2){print}}' | awk '{print $1}'`
DATE=`date +%Y%m%d%H%M%S`
cd /data/velero/
for i in $NS_NAME;do
velero backup create ${i}-ns-backup-${DATE} \
--include-cluster-resources=true \
--include-namespaces ${i} \
--kubeconfig=/root/.kube/config \
--namespace velero-system
done
4. Schedule定时备份
仅供参考
velero create schedule NAME --schedule="0 */6 * * *":每6小时自动备份一次velero --kubeconfig=./kube_config_cluster_examination.yml schedule create k8s-all --schedule="0 0 * * *" --ttl 24h --namespace velero-system:凌晨12点定时备份集群所有资源,备份保留24Hvelero schedule create default-exclude-rancher-daily --schedule="@every 24h" --include-namespaces web:因为schedule也是一种backup,所以创建backup指定的参数这里也都可以使用velero --kubeconfig=./kube_config_cluster_examination.yml get schedule --namespace velero-systemvelero --kubeconfig=./kube_config_cluster_examination.yml delete schedule k8s-all --namespace velero-system除此以外还包括:delete、describe、logsvelero --kubeconfig=./kube_config_cluster_examination.yml get schedule --namespace velero-systemvelero --kubeconfig=./kube_config_cluster_examination.yml delete schedule k8s-all --namespace velero-system
velero恢复数据
模拟故障,删除default空间下pod,这里仅作为示例
kubectl delete pod bash -n default # 恢复velero restore create --from-backup default-backup-20240415133242 --wait --kubeconfig=./awsuser.kubeconfig --namespace velero-system# 检查验证kubectl get pod
velero常用命令
1. 备份常用命令
# 备份常用命令
velero backup get :查看已备份的velero backup create <backupname>:创建一个backup包含所有资源velero backup create <backupname> --include-namespaces ns1,ns2:为ns1,ns2命名空间下的资源备份velero backup create <backupname> --exclude-namespaces ns1,ns2:排除掉ns1,ns2的命名空间,创建备份velero backup create <backupname> --include-resources resource1,resource2:为指定资源备份velero backup create --exclude-resources resource1,resource2:不备份指定资源--storage-location <localpath>:将创建的备份保存到本地路径下-l, --selector:通过指定label来匹配要back up的资源Create a backup containing all resources:
velero backup create {{backup_name}}List all backups:
velero backup getDelete a backup:
velero backup delete {{backup_name}}Create a weekly backup, each living for 90 days (2160 hours):
velero schedule create {{schedule_name}} --schedules="{{@every 7d}}" --ttl {{2160h0m0s}}Create a restore from the latest successful backup triggered by specific schedule:
velero restore create --from-schedule {{schedule_name}}除此以外还包括:delete、describe、logs
2. 恢复常用命令
velero restore get:查看已经restore的资源velero restore create restore-1 --from-backup backup-1:从backup-1恢复velero restore create --from-backup backup-2 --include-resources persistentvolumeclaims,persistentvolumes:仅恢复指定资源,同样使用--exclude-resources:不恢复某资源velero --kubeconfig=./kube_config_cluster_examination.yml restore create --from-backup dcb-k8s-all-backup-20230421145821 --include-namespaces 15-minutes --wait --namespace velero-systemvelero restore create --from-schedule schedule-1:从创建的schedule恢复除此以外还包括:delete、describe、logs
注意事项
备份使用volumes 的Pod,需要给Pod加上注解
备份时禁用快照,可指定参数--snapshot-volumes=false
各云厂商Volumes快照插件: https://velero.io/plugins/
使用 Velero 跨集群迁移资源,确保如下检查工作
确保镜像资源在迁移后可以正常拉取。
确保两个集群的 K8S 版本的 API 兼容,最好是相同版本
绑定集群外部资源的无法迁移, 例如 LoadBalancer 类型的service, 创建备份建议忽略