Welcome! » Log In » Create A New Profile

Unnecessary reads with rsync?

Posted by Anonymous 
Unnecessary reads with rsync?
August 04, 2016 01:18PM
Hello,

Right now, I am staring at the lsof output of the rsync process on
a backup client, spawned by BackupPC. It's processing a 3.5G file
that has not been touched in 5 years and has been backed up numerous
times. According to strace, the entire file is being read, and it's
taking a toll:

- using up time that the backup process takes
- straining the disks and consuming resources

Why?

rsync (unless passwd -c, which I don't) knows not to checksum a file
whose metadata have not changed. Arguably, the metadata of the pool
file on the server have also not changed.

What's going (wr)on(g) here?

--
< at > martinkrafft | http://madduck.net/ | http://two.sentenc.es/

"for her, the dashed lines on the freeway were like grains of sand
slipping, through an hour glass, ticking away the seconds, the
minutes, and the hours of her life. if she got home a few minutes
early on any given afternoon, it gave her a thrill as if she had
stolen a little something back from death."
-- mc 900 ft jesus (http://stuff.madduck.net/pub/misc/fun/newmoon.txt)

spamtraps: madduck.bogus < at > madduck.net

------------------------------------------------------------------------------

_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
Unnecessary reads with rsync?
August 04, 2016 01:33PM
also sprach martin f krafft <madduck < at > madduck.net> [2016-08-04 22]:
[quote]Right now, I am staring at the lsof output of the rsync process on
a backup client, spawned by BackupPC. It's processing a 3.5G file
that has not been touched in 5 years and has been backed up numerous
times. According to strace, the entire file is being read, and it's
taking a toll:
[/quote]
I also can't help but notice that the pool file is open on the
server, and that the corresponding dump process does continuious
reading (according to strace) on a socket, presumably linked to the
SSH process connected with the client.

Maybe reliance on file metadata isn't good enough for a backup.
After all, a backup should care about file content, not metadata.

But instead of (what seems to be) chunk-wise checksum transmission,
why don't we (also) store the whole-file checksum on the server (can
be computed in the same pass) and at least give people the option to
risk reading every file once to compute this checksum, if it means
being able to skip files without further ado or large data
transfers?

--
< at > martinkrafft | http://madduck.net/ | http://two.sentenc.es/

"he gave me his card
he said, 'call me if they die'
i shook his hand and said goodbye
ran out to the street
when a bowling ball came down the road
and knocked me off my feet"
-- bob dylan

spamtraps: madduck.bogus < at > madduck.net

------------------------------------------------------------------------------

_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
Unnecessary reads with rsync?
August 04, 2016 04:15PM
On 05/08/16 06:31, martin f krafft wrote:
[quote]also sprach martin f krafft <madduck < at > madduck.net> [2016-08-04 22]:
[quote]Right now, I am staring at the lsof output of the rsync process on
a backup client, spawned by BackupPC. It's processing a 3.5G file
that has not been touched in 5 years and has been backed up numerous
times. According to strace, the entire file is being read, and it's
taking a toll:
[/quote]I also can't help but notice that the pool file is open on the
server, and that the corresponding dump process does continuious
reading (according to strace) on a socket, presumably linked to the
SSH process connected with the client.

Maybe reliance on file metadata isn't good enough for a backup.
After all, a backup should care about file content, not metadata.

But instead of (what seems to be) chunk-wise checksum transmission,
why don't we (also) store the whole-file checksum on the server (can
be computed in the same pass) and at least give people the option to
risk reading every file once to compute this checksum, if it means
being able to skip files without further ado or large data
transfers?
[/quote]A couple of possibilities:
a) you haven't enabled checksum caching (--checksum-seed=32761)
b) you haven't completed at least 2 full backups including this file
c) you haven't configured backuppc the way you want it
(RsyncCsumCacheVerifyProb)
d) something else, provide more information and we might be able to
comment further

PS, backuppc v4 does store full file checksums, but you probably still
want to verify the block checksums of the file from time to time on the
slim chance that the full file checksum matches but the content is
different.

Regards,
Adam

--
Adam Goryachev Website Managers www.websitemanagers.com.au

------------------------------------------------------------------------------
_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
Unnecessary reads with rsync?
August 05, 2016 12:43AM
Thanks Adam for your patience and insights! And everyone else for
putting up with me! ;)

also sprach Adam Goryachev <mailinglists < at > websitemanagers.com.au> [2016-08-05 01]:
[quote][quote]But instead of (what seems to be) chunk-wise checksum transmission,
why don't we (also) store the whole-file checksum on the server (can
be computed in the same pass) and at least give people the option to
risk reading every file once to compute this checksum, if it means
being able to skip files without further ado or large data
transfers?
[/quote]A couple of possibilities:
a) you haven't enabled checksum caching (--checksum-seed=32761)
[/quote]
I have…

[quote]b) you haven't completed at least 2 full backups including this file
[/quote]
I have…

[quote]c) you haven't configured backuppc the way you want it
(RsyncCsumCacheVerifyProb)
[/quote]
$Conf{RsyncCsumCacheVerifyProb} = '0.01';

So yeah, there is the (1%?) chance that BackupPC chose to revalidate
the file in question, but I kinda consider that unlikely.

Is this recorded somewhere? I can't find anything in the logs. What
LogLevel would be required for this to show up?

[quote]PS, backuppc v4 does store full file checksums, but you probably
still want to verify the block checksums of the file from time to
time on the slim chance that the full file checksum matches but
the content is different.
[/quote]
I might have to look into that ;)

Has anyone managed to make BackupPC run from source?

--
< at > martinkrafft | http://madduck.net/ | http://two.sentenc.es/

"die ideen, für die wir bereit wären, durchs feuer zu gehen,
sind oft nur der grund, das feuer zu legen."
-- jeannine luczak

spamtraps: madduck.bogus < at > madduck.net

------------------------------------------------------------------------------

_______________________________________________
BackupPC-users mailing list
BackupPC-users < at > lists.sourceforge.net
List: https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/
Sorry, only registered users may post in this forum.

Click here to login