SearchFAQMemberlist Log in
Reply to topic Page 1 of 1
Millions of files
Author Message
Post Millions of files 
On Thu, 28 Apr 2005 18:46:46 -0700, Ben Escoto wrote:
rsync seems to have trouble with jobs operating on millions of files:
http://lists.samba.org/archive/rsync/2003-October/007546.html

If that's the case, is it also automatically the case for
rdiff-backup? Does rdiff-backup create a complete list of files to be
backed up before working?
[...]
No, rdiff-backup does not have this problem Razz rdiff-backup only makes
one pass, so memory should not increase linearly with the number of
files.

A new backup setup I'm working on isn't completely finished yet. But one
backup job with >13mill files (totalling 233GB, lasting 45 hours) ran
fine. Nice.

233GB in 45 hours is around 1½ MB/sec if I'm calculating right. I'm not
sure if I'm satisfied with that. Both harddisks and modern LAN-networks
should (in principle) be able to sustain higher throughput.(?)

I'm wondering what the most significant bottleneck(s) in a setup like the
following might be? (Servers running mostly Linux, mostly using reiserfs3
file systems.)

*Backup server (Single, relatively modern P4 CPU, 1GB RAM)
- 8 Disks in RAID5 (not sure about specs)
(no write caching)
^
- Storage controller: Adaptec Serial ATA RAID 2810SA
(no write caching)
^
- Python+librsync+rdiff-backup
^
- sshd
^
|
---Gigabit network hardware and wires-----
^
|
- ssh
^
- Python+librsync+rdiff-backup
^
- Disks
*Production server (Various Intel-like CPUs); sometimes
doing hard work.

One thing to note is that no swapping is happening on the backup server.

Of course, I can (and probably will) measure various parameters myself,
but I think it would be interesting to hear what others may hypothesize.
As (hopefully) illustrated, the backup server sucks data from the
productions servers via SSH. Several hosts are backed up in parallel.

I'm thinking:
- Could SSH's crypto work be significant, or is it really peanuts
for today's fast CPUs?
- Is rdiff-backup performing calculations where Python's
slowness could be a problem?


rdiff-backup and rsync use completely different protocols, and they
don't really share any code.

How about librsync? - Isn't that code shared between rdiff-backup and
rsync?

--
Greetings from Troels Arvin, Copenhagen, Denmark

Display posts from previous:
Reply to topic Page 1 of 1
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
  


Magic SEO URL for phpBB