Welcome! » Log In » Create A New Profile

Remove (large) files from backup without generating negative diff

Posted by Marcin Zajączkowski 
Marcin Zajączkowski
Remove (large) files from backup without generating negative diff
June 11, 2017 03:59AM
Hi,

I have a long established backup with some large files which I would
like to move to another directory withing the same backup (or just
remove entirely from the backup) as a clean up of the original disk
space/organization. I would like to avoid extra many GBs in the next
incremental backup session when those files are moved/removed. I don't
need their history - they will be backed up in another way/place.

Firstly, I was plying with removing just related information from
"increments" directory. Unfortunately it made restore/verification
lopped with 100% CPU utilization. I needed to remove all information
about affected files also from extended_attributes.*, file_statistics.*,
etc. I feel it like asking for trouble.

Two questions:
1. Do you know any better way of removing files from a (rdiff-)backup
without generating a "negative" diff in the next backup?
2. Do you see any additional risk with manual manipulation of
rdiff-backup files if --verify and --restore (checked selectively) seem
to work fine?

Marcin

_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
This message was imported via the External PhorumMail Module
On 11 June 2017 at 11:13, Marcin Zajączkowski <mszpak@wp.pl> wrote:

> Hi,
>
> I have a long established backup with some large files which I would
> like to move to another directory withing the same backup (or just
> remove entirely from the backup) as a clean up of the original disk
> space/organization. I would like to avoid extra many GBs in the next
> incremental backup session when those files are moved/removed. I don't
> need their history - they will be backed up in another way/place.
>
> Firstly, I was plying with removing just related information from
> "increments" directory. Unfortunately it made restore/verification
> lopped with 100% CPU utilization. I needed to remove all information
> about affected files also from extended_attributes.*, file_statistics.*,
> etc. I feel it like asking for trouble.
>
> Two questions:
> 1. Do you know any better way of removing files from a (rdiff-)backup
> without generating a "negative" diff in the next backup?
> 2. Do you see any additional risk with manual manipulation of
> rdiff-backup files if --verify and --restore (checked selectively) seem
> to work fine?
>


​Regarding your second question, I don't know, but I would advise against.
Regarding your first, if you haven't had these large files in your backup
for very long it might be worth regressing your archive to a time before it
held these files using my script at
https://www.timedicer.co.uk/programs/help/rdiff-backup-regress.sh.php - it
must be run on the machine that holds the repository.​ Otherwise it might
be best to accept the bloat in your repository.
_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
This message was imported via the External PhorumMail Module
On 06/11/2017 05:13 AM, Marcin Zajączkowski wrote:
> Hi,
>
> I have a long established backup with some large files which I would
> like to move to another directory withing the same backup (or just
> remove entirely from the backup) as a clean up of the original disk
> space/organization. I would like to avoid extra many GBs in the next
> incremental backup session when those files are moved/removed. I don't
> need their history - they will be backed up in another way/place.

I have a rather horrific shell script that does that, at least for the case where both the source and the archive are on Linux filesystems and there is no long_filename_data in the archive. I'll attach the top comments and usage output. E-mail me directly if you want the script. I've been using it successfully for several years (hasn't needed any changes since 2012-12-22), but it's never been tested in any other environment, and I'd rather not distribute it widely.

--
Bob Nichols "NOSPAM" is really part of my email address.
Do NOT delete it.
#!/bin/bash
#####
# Removes files from an rdiff-backup set.
#
# Deletes files and directories from the backup set, including the current
# mirror and all increments, and removes references from metadata files.
# In the metadata directory, log files, session_statistics files, and backup
# files (names ending with ~) are left as-is.
#
# FIXME -- Metadata for Mac resource forks is not handled,
# long_filename_data is not handled.
#
# If the same full pathname has, at various times, been used both for
# a directory and for a non-directory, then removal of the
# non-directory file will not be possible unless the "-R" option is
# given to remove the directory and its contents as well. See the
# WARNING message below.
#
# When removing some, but not all, of the links to a file with
# multiple hard links, the link counts for the other hard links
# should be adjusted. That is extraordinarily difficult to do in a
# metadata diff chain, and the consequences of not adjusting the
# count are benign. (Those link counts are already too high if not
# all of the links to an inode were originally included in the backup
# set.)
#
# The commented-out "touch -r ..." lines would implement a semi-"stealth"
# mode, in which the modified metadata files retain their original timestamps.
#####


Usage: rmv-from-backup [-DRrnk] backup_dir path_glob [path_glob ...]
rmv-from-backup {-h|--help}
-D (Debug) Never delete the temp directory
-k Keep the backup copies of changed metadata files (needs a
lot of space in the rdiff-backup-data directory).
-R Recursively delete directories matched by a path_glob
-r Limited recursion. Allow a wildcard in the final
portion of a path_glob to match both directory components
and non-directories, but leave the directory structure
intact.
-n Print commands that would be executed, but do nothing
-h --help Print full help.

Every path_glob must be absolute (begin with '/') and is rooted at
backup_dir. The '.' character is NOT special in a path glob, and with
the "-R" or "-r" option netiher is '/', so "/xxx/foo*" would then match
all paths that begin with "/xxx/foo", including, for example,
"/xxx/foo23/dir1/dir2/somefile".

You do get a chance to review the list of files to be deleted and decide
whether to continue. Generating the new metadata files can take a long
time, but that occurs in a temporary directory and can be safely
interrupted. Once installation of the new files begins, the process
becomes non-interruptable. If something does kill the process before it
completes, recovery can usually be accomplished re-running the exact
same command.
_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
This message was imported via the External PhorumMail Module
Sorry, only registered users may post in this forum.

Click here to login