一、 背景
收到一个磁盘空间告警,检查发现是本地备份保留比较多导致的,处理过程倒很简单,手动清理掉旧的备份(已自动备到远端服务器),告警就恢复了。
但是检查备份脚本的时候,发现keep-data-days参数明明只设置了1,为什么本地会出现3份备份(保留了3天的备份)?
pg_rman backup -b ${BACKUP_TYPE} -s -C -Z -P --keep-data-days=1 --keep-arclog-files=…(非完整命令)
查了下官方文档的解释…说了好像没说一样
pg_rman
二、 源码学习
1. 奇怪的第3份备份文件
检查了下其他设置keep-data-days=1的服务器,发现都只有最近2天的备份文件,而之前在处理告警时,备份正在执行中。因此可以推测,pg_rman是在备份完成后才清理掉过期的备份文件。因此在备份期间会有3天的文件,而备完后就只有2天。
要验证这个猜测,可以简单地再执行下备份,也可以从pg_rman备份源码分析。
以下在backup.c文件的do_backup函数,可以看到pgBackupDelete函数调用是在各种备份完成之后,符合前面的结论。
int
do_backup(pgBackupOption bkupopt)
{parray *backup_list;parray *files_database;parray *files_arclog;parray *files_srvlog;int ret;char path[MAXPGPATH];/* repack the necesary options */int keep_arclog_files = bkupopt.keep_arclog_files;int keep_arclog_days = bkupopt.keep_arclog_days;int keep_srvlog_files = bkupopt.keep_srvlog_files;int keep_srvlog_days = bkupopt.keep_srvlog_days;int keep_data_generations = bkupopt.keep_data_generations;int keep_data_days = bkupopt.keep_data_days;…/** Signal for backup_cleanup() that there may actually be some cleanup* for it to do from this point on.*/in_backup = true;/* backup data */files_database = do_backup_database(backup_list, bkupopt);/* backup archived WAL */files_arclog = do_backup_arclog(backup_list);/* backup serverlog */files_srvlog = do_backup_srvlog(backup_list);pgut_atexit_pop(backup_cleanup, NULL);/* update backup status to DONE */current.end_time = time(NULL);current.status = BACKUP_STATUS_DONE;…/* Delete old backup files after all backup operation. */pgBackupDelete(keep_data_generations, keep_data_days);…return 0;
}
2. keep-data-days的含义
3份备份的问题解决了,还剩下一个,为什么设置keep-data-days=1会保留2天的备份文件而不是1天?以下在delete.c文件的pgBackupDelete函数
/** Delete backups that are older than KEEP_xxx_DAYS, or have more generations* than KEEP_xxx_GENERATIONS.*/
void
pgBackupDelete(int keep_generations, int keep_days)
{int i;parray *backup_list;int existed_generations;bool check_generations;…/* determine whether to check based on the given days */if (keep_days == KEEP_INFINITE){check_days = false;strncpy(days_str, "INFINITE", lengthof(days_str));}else{check_days = true;snprintf(days_str, lengthof(days_str),"%d", keep_days);/** Calculate the threshold day from given keep_days.* Any backup taken before this threshold day to be* a candidate for deletion.*/tim = current.start_time - (keep_days * 60 * 60 * 24);ltm = localtime(&tim);ltm->tm_hour = 0;ltm->tm_min = 0;ltm->tm_sec = 0;keep_after = mktime(ltm);time2iso(keep_after_timestamp, lengthof(keep_after_timestamp),keep_after);}
…
}
可以看到最重要的一行注释:Calculate the threshold day from given keep_days. Any backup taken before this threshold day to be a candidate for deletion.
而所谓的threshold day是怎么算的 —— tim = current.start_time - (keep_days * 60 * 60 * 24);
以20230809为例,当keep-data-days=1,则threshold day为当前时间减1,即20230808。而在阈值日期之前的备份才是过期的,因此20230808不属于,自然也就不会被删除。而20230807就属于过期的文件,因此在备份完成后,它会被删除。
3. 如何只保留当天的备份
有了上面的分析,其实就很简单了,就是设置keep-data-days=0。threshold day为当前时间减0,即20230809,因此当天之前的备份都是过期的,备份完成后也就会删除20230808的文件。简单测试一把:
pg_rman backup -b ${BACKUP_TYPE} -s -C -Z -P --keep-data-days=0 --keep-arclog-files=…(非完整命令)
符合预期~