SearchFAQMemberlist Log in
Reply to topic Page 1 of 1
file-by-file changes from one backup to the next
Author Message
Post file-by-file changes from one backup to the next 
Hi there,

I've just recently switched backup system from rsnapshot
to rdiff-backup for a part of my data. Mostly because
of the space saved when large files have small changes,
and thus the ability to retain more backups (possibly
going back all the way to the original version of a file
when it first entered the system).

All in all I am happy with rdiff-backup. However I don't
have the same level of confidence that I quickly developed
when starting with rsnapshot.

The main reason is (as I perceive it) the lack of status
output, or maybe the wrong kind of status output.
Or maybe I just haven't found the right combination of
options to get what I need. I am after all pretty new
to rdiff-backup.

In order to get some more information I have switched
to "-v5" to get file by file output but I still hardly
see the changes that I am interested in, while the noise
level understandably goes up.

Look at this:
===========
Processing changed file 2011/05
Incrementing mirror file /backup/rdiff-backup/foto/2011/05
Processing changed file 2011/05/2011-05-14_233003_dsc02931.jpg
Incrementing mirror file /backup/rdiff-backup/foto/2011/05/2011-05-14_233003_dsc02931.jpg
...
Processing changed file 2011/05/2011-05-31_083803_img_8789.jpg
Incrementing mirror file /backup/rdiff-backup/foto/2011/05/2011-05-31_083803_img_8789.jpg
Processing changed file 2011/05/2011-05-31_235712_img_8790.jpg
Incrementing mirror file /backup/rdiff-backup/foto/2011/05/2011-05-31_235712_img_8790.jpg
===========

It tells that those files have changed, but not in what way,
and it takes two lines per file to do so (and the output is
very hard to read or rather browse).
Adding and removing files is similarly terse.


I dearly miss is the "--itemize-changes" output that I have rsnapshot
(or rsync underneath ) produce:

.d..t...... /etc/
f.st...... /etc/rsnapshot.conf
.d..t...... /etc/cron.d/
f+++++++++ /etc/cron.d/rdiff
.d...p.g... /home/data/static/foto/new/
.f....og... /home/data/static/foto/new/foo.txt
f..tpog... /home/data/static/foto/2009/10/2009-10-31_082857_hla_0019.jpg
*deleting home/data/static/foto/2002/12/2002-12-24_200200_xmas02.jpg

Here I see that
"etc" directory timestamp has changed
"rsnapshot.conf" file check(s)um has changed (seems I have edited that file)
"rdiff" is new file. (ahh now i know why I have edited rsnapshot.conf)
"new" directory (p)ermission and (g)roup have changed.
"foo.txt" file has a new owner and group.
"2009-10-31_082857_hla_0019.jpg" has timestamp permission owner and group changes.
"2002-12-24_200200_xmas02.jpg" has been deleted.

The format of one character per type of change at a fixed
position is great for parsing the output and helps to see
what changed at a glance.

What neither rsnapshot not rdiff-backup provide is a
quantitative measure of changed data. rsnapshot does
a summary output of the data sent and received and thus
gives a rough estimate on how much has changed, but it
doesn't say so for each file.

If rdiff-backup was to adopt a similar kind of output
it would be great to have a measure on the volume of
changes, too.

(I know that --list-increment-sizes does a summary output
of the rdiff size but having some more detail there on
number of changed/added/deleted files might be great too.)

I hope I don't sound like I am looking for nits to pick.
I just feel that rdiff-backup could be even greater if
it gave the user some more/different output.

cheers
-henrik

PS: Just in case you wonder:
My particular use case is my digital pictures collection.
I'd love to tag my pictures and rate them and maybe edit
some color profile here and there but I haven't
done it yet because I don't trust any photo management
software not to completely screw up my pictures. And
since we are talking about several thousand pictures
for e.g. a 2 week trip up some south american mountain
I can't review and rebuild thumbnails after each bulk
exif tagging operation. Thus a bug that eats my pictures
might go unnoticed for enough time so that the original's
rsnapshot backup got rotated out into data nirvana.

Also having the output parseable might allow to quickly
find out when a picture was first added to the collection
and thus allow an automated copy of (mostly) unedited
files to a separate location.


_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post file-by-file changes from one backup to the next 
On 06/14/2011 09:08 AM, Henrik wrote:
Hi there,

I've just recently switched backup system from rsnapshot
to rdiff-backup for a part of my data. Mostly because
of the space saved when large files have small changes,
and thus the ability to retain more backups (possibly
going back all the way to the original version of a file
when it first entered the system).

All in all I am happy with rdiff-backup. However I don't
have the same level of confidence that I quickly developed
when starting with rsnapshot.

The main reason is (as I perceive it) the lack of status
output, or maybe the wrong kind of status output.
Or maybe I just haven't found the right combination of
options to get what I need. I am after all pretty new
to rdiff-backup.
[big snip]

You can get a little of that from the file_statistics.{timestamp}.data.gz
file that is created for each backup. For each file or directory there is
a flag that indicates a change, plus the sizes of the source file, the file
in the mirror, and the increment. Because of the format and the
possibility of file names with spaces, it gets a bit messy to parse that
file, but the following will show all of the lines with the "change" flag
set:

zgrep -E '(# F)|(1 [^ ]+ [^ ]+ [^ ]+$)' \
file_statistcs.{timestamp}.data.gz | less

Alas, if you were using the "--null-separator" option for your backup,
that gets even worse:

zcat file_statistcs.{timestamp}.data.gz | tr '\0' '\n' | \
egrep '(# F)|(1 [^ ]+ [^ ]+ [^ ]+$)' | less

Also, you can always see what versions of a file exist in the backup:

rdiff-backup -l /backup_dir/path/to/some/filename




That's nothing like what you're used to, but at least it's something.

--
Bob Nichols "NOSPAM" is really part of my email address.
Do NOT delete it.


_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post file-by-file changes from one backup to the next 
On Tuesday, June 14, 2011 4:18:25 pm Robert Nichols wrote:


You can get a little of that from the file_statistics.{timestamp}.data.gz
file that is created for each backup. For each file or directory there is
a flag that indicates a change, plus the sizes of the source file, the file
in the mirror, and the increment. Because of the format and the
possibility of file names with spaces, it gets a bit messy to parse that
file, but the following will show all of the lines with the "change" flag
set:

zgrep -E '(# F)|(1 [^ ]+ [^ ]+ [^ ]+$)' \
file_statistcs.{timestamp}.data.gz | less

Alas, if you were using the "--null-separator" option for your backup,
that gets even worse:

zcat file_statistcs.{timestamp}.data.gz | tr '\0' '\n' | \
egrep '(# F)|(1 [^ ]+ [^ ]+ [^ ]+$)' | less

Also, you can always see what versions of a file exist in the backup:

rdiff-backup -l /backup_dir/path/to/some/filename




That's nothing like what you're used to, but at least it's something.

Maybe:

rdiff-backup --list-changed-since 1B /some/rdiff-backup/dir

where 1B equals last backup before current.

Does not show size changes, but does show changed,new,deleted.

---
Adrian Klaver
adrian.klaver < at > gmail.com

_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post file-by-file changes from one backup to the next 
On Tue, Jun 14, 2011 at 04:36:38PM -0700, Adrian Klaver wrote:
On Tuesday, June 14, 2011 4:18:25 pm Robert Nichols wrote:


You can get a little of that from the file_statistics.{timestamp}.data.gz
file that is created for each backup. For each file or directory there is
a flag that indicates a change, plus the sizes of the source file, the file
in the mirror, and the increment. Because of the format and the
possibility of file names with spaces, it gets a bit messy to parse that
file, but the following will show all of the lines with the "change" flag
set:

zgrep -E '(# F)|(1 [^ ]+ [^ ]+ [^ ]+$)' \
file_statistcs.{timestamp}.data.gz | less

Ok, so file_statistics.* is the place to start with and the format for my
perl conditioned mind would be something like this

^(.+) (0|1) (NA|\d+) (NA|\d+) (NA|\d+)$

Seems like the occurence of NA and 0 values can easily tell me about
deleted or new files.


new file or directory:
foo.data 1 1613908 NA NA

removed file:
foo.txt 1 4 NA 0

removed directories:
foo/2009 1 0 NA NA
foo/2009/12 1 0 NA NA

changed file: (grew from 4 bytes to 5 bytes)
foo.txt 1 4 5 126

changed directory (in this backup some files were added):
2011/05 1 0 0 NA


I guess I could thrown together a perl one liner to go through this but
what I'm missing is the information on the type of change .. like here:

2007/04/2007-04-28_095048_s5001204.jpg 1 176308 176308 131

I see that the size didn't change but I can't tell if it was meta data or
file content. To do that I'd probably have to through the two most recent
mirror_metadata.* to see what changed, or find out the format of
increments/2007/04/2007-04-28_095048_s5001204.jpg.2011-06-13T02\:20\:02+02\:00.diff

Alas, if you were using the "--null-separator" option for your backup,
that gets even worse:

I don't plan to use anthing outside the ascii range for filenames, but good
to know that I could Smile

Also, you can always see what versions of a file exist in the backup:

rdiff-backup -l /backup_dir/path/to/some/filename

Oh, I didn't know --list-increments worked on files too. Thanks!


That's nothing like what you're used to, but at least it's something.

It's a good start.

Maybe:

rdiff-backup --list-changed-since 1B /some/rdiff-backup/dir

where 1B equals last backup before current.

I'll take a look at the code that handles this option. maybe it is easier
to hack my changes into the existing python code instead of writing my own
parser for the data.
...
I've taken a look at the code and I guess to do it in a reasonably efficient
manner I'll have to stuff it into an IDE to see if the ListChangedSince method
in restore.py has access to the right kind of data, and where it's output is
used besides from ListChangedSince in Main.py.

Thank you both for the input!

cheers
-henrik


_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post file-by-file changes from one backup to the next 
After taking another look at the file_statistcs.*.data.gz files
and at the ListChangedSince method, I decided that it was
too much of a hassle to look for all the data after the change
has happened.

Instead I went to the source of my problem (being the lack of detailed
logs about changes) and I think I've found _the_ place to fix it.
The function Increment(new, mirror, incpref) in increment.py


There during the backup I have access to old file in the current
backup (mirror), the new file that goes into the backup (new), and
the increment/snapshot file that has been created to record the
difference (incrp)

I added those lines just before the return statement:

log.Log(" mirror: " + str(mirror) + '\n', 5)
log.Log(" new: " + str(new) + '\n' , 5)
log.Log(" incrp: " + str(incrp) + '\n', 5)


and here's what I get when backing up after changing the
file "foo.txt" by adding one byte:

...
Incrementing mirror file /backup/rdiff-backup/foto/foo.txt
mirror: Path: /backup/rdiff-backup/foto/foo.txt
Index: ('foo.txt',)
Data: {'uid': 1000, 'perms': 420, 'type': 'reg', 'gname': 'users', 'ctime': 1308345824, 'devloc': 65025L, 'uname': 'hlangos', 'nlink': 1, 'gid': 100, 'mtime': 1308345695, 'atime': 1308346895, 'inode': 9043973L, 'size': 4L}

new: Path: /backup/rdiff-backup/foto/rdiff-backup.tmp.1
Index: ('rdiff-backup.tmp.1',)
Data: {'uid': 1000, 'perms': 420, 'type': 'reg', 'gname': 'users', 'ctime': 1308346896, 'devloc': 65025L, 'uname': 'root', 'nlink': 1, 'gid': 100, 'mtime': 1308346866, 'atime': 1308346895, 'inode': 9043974L, 'size': 5L}

incrp: Path: /backup/rdiff-backup/foto/rdiff-backup-data/increments/foo.txt.2011-06-17T23:22:56+02:00.diff.gz
Index: ('foo.txt.2011-06-17T23:22:56+02:00.diff.gz',)
Data: {'uid': 1000, 'perms': 420, 'type': 'reg', 'gname': 'root', 'ctime': 1308346896, 'devloc': 65025L, 'uname': 'root', 'nlink': 1, 'gid': 100, 'mtime': 1308345695, 'atime': 1308346896, 'inode': 13254662L, 'size': 125L}


So pretty much everything that I need is there.
I'll just have to write a function that goes through that data (and does some
extra checks for directories, devices, symlinks ... and the deletion and addition
of files.)

I'll probably not send a patch upstream as I will not have time
for more than the most basic functionality and I wouldn't want to
burden somebody with cleaning up my mess and adding all the error
handling that I will not need in my very limited use case.

(Just wanted to get this into the archive, in case somebody else
has a similar problem and needs a pointer to (hopefully) the right place)

cheers
-henrik



_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Display posts from previous:
Reply to topic Page 1 of 1
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
  


Magic SEO URL for phpBB