SearchFAQMemberlist Log in
Reply to topic Page 1 of 1
Severe performance degradation
Author Message
Post Severe performance degradation 
Hi there,

I'm looking for help with a very severe performance problem using
rdiff-backup. Basically we've bought a new server with more and faster
resources to replace a 4-year old one. However, rdiff-backup refuses to
perform on the new server.

Various tests show that generic disk accesses are much faster on the new
server, and it has more memory and faster CPU's.
Nevertheless we see a very severe slowdown when rdiff-backup is making
an incremental backup, up to four or even tenfold or more times slower
than it used to be before on the old server.

I gathered some numbers but they differ wildly depending on the source
material / dir. Maybe it is therefore better to leave specific numbers
for what they are for now and focus on the big picture: our old server
did an rdiff-backup of a remote storage server, worth some 300 GB, in
typically under an hour or so. The new server running with the same
source dataset typically starts at night and is still running the next
morning, into the afternoon even(!).

When we do trials on a tiny subset of the data we get varying results.
Some data takes eightfold the amount of time, some is within a +80%
margin. So that is not very dependable, alas.
Still, what is observable is that any initial backup run (with --force)
runs significantly faster on the new server. Any differential run
afterwards is slower than on the original server. I feel this proves
there are no performance bottlenecks in the network, disks, filesystems
etc of the server.

This is fully repeatable and a real time tail on the log file shows no
one file is to blame, it is just the overall speed that's slow.

The new server runs rdiff-backup 1.2.8, the old one 1.0.5. Downgrading
the new server to 1.0.5 makes things a bit interesting: that speeds it
up a bit, but still a fair bit slower than the original.

During investigation we experimented with different filesystems, testing
local versus remote backups, looking at compile flags and versions of
librsync and python, but we have had no success there.
All versions use librsync 0.9.7 All OS'es are Gentoo, 32 bit.

We did search for workarounds like spawning multiple parallel
rdiff-backup processes dealing each with separate directories so as to
fully use the eight CPU cores. Sadly even that speedup is still not
resulting in an acceptable overall speed. We compared compilation flags,
options and parameters but nothing obvious struck us in that regard.


I'm basically out of ideas. I was tearing my hair out over this until a
couple of days ago (yes, well, I'm bald now). I turn to this list as a
last resort. Can anyone help debugging this strange problem please ?

Regards, thanks for listening,

Maarten


_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post Severe performance degradation 
Did you do testing on the local machine only ( i.e. do tests from one
folder to another)? Doing comparisons on the older server and then the
new server should show you if its just the server to blaim. If that
proves solid, then I would look at network stuff. Maybe use a different
NIC, etc. I know you said you did testing with --force that made it
work faster and you therefore ruled out the network. But IMO, that
isn't thorough enough, I want to remove the network completely from the
equation. Your NIC could be having problems with something that only
happens when doing the diff algorithm (or something else weird).

Maybe this isn't the right track at all, or maybe you have even tried
this already, but its an idea anyway. Smile

--------------------------------------
Randy Syring
Intelicom
Direct: 502-276-0459
Office: 502-212-9913

For the wages of sin is death, but the
free gift of God is eternal life in
Christ Jesus our Lord (Rom 6:23)


On 12/21/2010 12:25 PM, Maarten J H van den Berg wrote:
Hi there,

I'm looking for help with a very severe performance problem using
rdiff-backup. Basically we've bought a new server with more and faster
resources to replace a 4-year old one. However, rdiff-backup refuses to
perform on the new server.

Various tests show that generic disk accesses are much faster on the new
server, and it has more memory and faster CPU's.
Nevertheless we see a very severe slowdown when rdiff-backup is making
an incremental backup, up to four or even tenfold or more times slower
than it used to be before on the old server.

I gathered some numbers but they differ wildly depending on the source
material / dir. Maybe it is therefore better to leave specific numbers
for what they are for now and focus on the big picture: our old server
did an rdiff-backup of a remote storage server, worth some 300 GB, in
typically under an hour or so. The new server running with the same
source dataset typically starts at night and is still running the next
morning, into the afternoon even(!).

When we do trials on a tiny subset of the data we get varying results.
Some data takes eightfold the amount of time, some is within a +80%
margin. So that is not very dependable, alas.
Still, what is observable is that any initial backup run (with --force)
runs significantly faster on the new server. Any differential run
afterwards is slower than on the original server. I feel this proves
there are no performance bottlenecks in the network, disks, filesystems
etc of the server.

This is fully repeatable and a real time tail on the log file shows no
one file is to blame, it is just the overall speed that's slow.

The new server runs rdiff-backup 1.2.8, the old one 1.0.5. Downgrading
the new server to 1.0.5 makes things a bit interesting: that speeds it
up a bit, but still a fair bit slower than the original.

During investigation we experimented with different filesystems, testing
local versus remote backups, looking at compile flags and versions of
librsync and python, but we have had no success there.
All versions use librsync 0.9.7 All OS'es are Gentoo, 32 bit.

We did search for workarounds like spawning multiple parallel
rdiff-backup processes dealing each with separate directories so as to
fully use the eight CPU cores. Sadly even that speedup is still not
resulting in an acceptable overall speed. We compared compilation flags,
options and parameters but nothing obvious struck us in that regard.


I'm basically out of ideas. I was tearing my hair out over this until a
couple of days ago (yes, well, I'm bald now). I turn to this list as a
last resort. Can anyone help debugging this strange problem please ?

Regards, thanks for listening,

Maarten


_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki


_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post Severe performance degradation 
Am 21.12.2010 18:25, schrieb Maarten J H van den Berg:

Hi there,

I'm looking for help with a very severe performance problem using
rdiff-backup. Basically we've bought a new server with more and faster
resources to replace a 4-year old one. However, rdiff-backup refuses to
perform on the new server.

Various tests show that generic disk accesses are much faster on the new
server, and it has more memory and faster CPU's.
Nevertheless we see a very severe slowdown when rdiff-backup is making
an incremental backup, up to four or even tenfold or more times slower
than it used to be before on the old server.

I gathered some numbers but they differ wildly depending on the source
material / dir. Maybe it is therefore better to leave specific numbers
for what they are for now and focus on the big picture: our old server
did an rdiff-backup of a remote storage server, worth some 300 GB, in
typically under an hour or so. The new server running with the same
source dataset typically starts at night and is still running the next
morning, into the afternoon even(!).

When we do trials on a tiny subset of the data we get varying results.
Some data takes eightfold the amount of time, some is within a +80%
margin. So that is not very dependable, alas.
Still, what is observable is that any initial backup run (with --force)
runs significantly faster on the new server. Any differential run
afterwards is slower than on the original server. I feel this proves
there are no performance bottlenecks in the network, disks, filesystems
etc of the server.

This is fully repeatable and a real time tail on the log file shows no
one file is to blame, it is just the overall speed that's slow.

The new server runs rdiff-backup 1.2.8, the old one 1.0.5. Downgrading
the new server to 1.0.5 makes things a bit interesting: that speeds it
up a bit, but still a fair bit slower than the original.

During investigation we experimented with different filesystems, testing
local versus remote backups, looking at compile flags and versions of
librsync and python, but we have had no success there.
All versions use librsync 0.9.7 All OS'es are Gentoo, 32 bit.

We did search for workarounds like spawning multiple parallel
rdiff-backup processes dealing each with separate directories so as to
fully use the eight CPU cores. Sadly even that speedup is still not
resulting in an acceptable overall speed. We compared compilation flags,
options and parameters but nothing obvious struck us in that regard.


I'm basically out of ideas. I was tearing my hair out over this until a
couple of days ago (yes, well, I'm bald now). I turn to this list as a
last resort. Can anyone help debugging this strange problem please ?

Regards, thanks for listening,

Maarten



What about the kernel version? Probably barriers are now working as they
should and you pay the price for that. dpkg package management had huge
performance regressions due to this AFAIR.

Other than that, I'd try to find out where rdiff-backup is spending all
the the time. What does
iostat -dx 1
in the %utils column say while an incremental is running?
You could even fire up sysprof to see what it is waiting for.

Regards,
Jakob

_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post Severe performance degradation 
On 12/21/2010 09:25 AM, Maarten J H van den Berg wrote:

Hi there,




I'm basically out of ideas. I was tearing my hair out over this until a
couple of days ago (yes, well, I'm bald now). I turn to this list as a
last resort. Can anyone help debugging this strange problem please ?

Regards, thanks for listening,

Maarten



Have you tried running with the verbosity -v increased, over a small set
of files to see what is going on? Probably something above 5.

--
Adrian Klaver
adrian.klaver < at > gmail.com

_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post Severe performance degradation 
On 12/21/10 18:44, Randy Syring wrote:
Did you do testing on the local machine only ( i.e. do tests from one
folder to another)? Doing comparisons on the older server and then the
new server should show you if its just the server to blaim. If that
proves solid, then I would look at network stuff. Maybe use a different
NIC, etc. I know you said you did testing with --force that made it
work faster and you therefore ruled out the network. But IMO, that
isn't thorough enough, I want to remove the network completely from the
equation. Your NIC could be having problems with something that only
happens when doing the diff algorithm (or something else weird).

Hi Randy

Ahem. I guess sometimes one indeed should go back to basics and ask
these questions. As it turns out, a local-to-local test indicates there
isn't a problem; the new server outperforms the old server in both tests
by some 30-40%. Shame on us for not testing that. Albeit the test was
done with a tiny directory so results may not be representative.

I did other tests immediately: I wanted to find out whether an rsync of
the network data to local storage and an rdiff-backup of that local data
runs *faster* than an rdiff-backup of the network data. If so that could
be a suitable workaround, also it would point to the problem.

I'm still waiting for the results but here is the obvious reason we
disregarded the network as possible cause of the slowdowns: the rsync of
125 GB of network data to local storage took no more than 2 minute 40
seconds. (sync to pre-existing data of course!). It is therefore
understandable we said to ourselves "Okay, there's certainly no
bottleneck there...".

I have the figures of these tests now.
First test: rsync of data (target dir already populated) over the
network (where "data" consists of 125 GB worth of files) :

2m40.494s

Rdiff-backup of that local dir to a fresh _empty_ repo:

58m17.936s

Rdiff-backup of that same dir to preexisting/populated repo

2m25.840s

And to be able to compare apples to apples, copy of local src dir to
local empty dst dir using rsync:

59m42.792s

So, I still have to reach hard conclusions but some things are obvious:
rdiff-backup on local resources performs well. On par with rsync for
unpopulated target dirs, and very very fast for existing repos.

So, a combination of running rsync in step #1 and rdiff-backup in step
#2 would get the job done in around 5 minutes instead of multiple hours.
Strange, but a result we can probably live with. We have more than
enough storage space to justify storing this data twice (12 TB).

Maybe this isn't the right track at all, or maybe you have even tried
this already, but its an idea anyway. Smile

As it turns out, it was a very very good idea!

Thanks,
Maarten

--------------------------------------
Randy Syring
Intelicom
Direct: 502-276-0459
Office: 502-212-9913

For the wages of sin is death, but the
free gift of God is eternal life in
Christ Jesus our Lord (Rom 6:23)


On 12/21/2010 12:25 PM, Maarten J H van den Berg wrote:
Hi there,

I'm looking for help with a very severe performance problem using
rdiff-backup. Basically we've bought a new server with more and faster
resources to replace a 4-year old one. However, rdiff-backup refuses to
perform on the new server.

Various tests show that generic disk accesses are much faster on the new
server, and it has more memory and faster CPU's.
Nevertheless we see a very severe slowdown when rdiff-backup is making
an incremental backup, up to four or even tenfold or more times slower
than it used to be before on the old server.

I gathered some numbers but they differ wildly depending on the source
material / dir. Maybe it is therefore better to leave specific numbers
for what they are for now and focus on the big picture: our old server
did an rdiff-backup of a remote storage server, worth some 300 GB, in
typically under an hour or so. The new server running with the same
source dataset typically starts at night and is still running the next
morning, into the afternoon even(!).

When we do trials on a tiny subset of the data we get varying results.
Some data takes eightfold the amount of time, some is within a +80%
margin. So that is not very dependable, alas.
Still, what is observable is that any initial backup run (with --force)
runs significantly faster on the new server. Any differential run
afterwards is slower than on the original server. I feel this proves
there are no performance bottlenecks in the network, disks, filesystems
etc of the server.

This is fully repeatable and a real time tail on the log file shows no
one file is to blame, it is just the overall speed that's slow.

The new server runs rdiff-backup 1.2.8, the old one 1.0.5. Downgrading
the new server to 1.0.5 makes things a bit interesting: that speeds it
up a bit, but still a fair bit slower than the original.

During investigation we experimented with different filesystems, testing
local versus remote backups, looking at compile flags and versions of
librsync and python, but we have had no success there.
All versions use librsync 0.9.7 All OS'es are Gentoo, 32 bit.

We did search for workarounds like spawning multiple parallel
rdiff-backup processes dealing each with separate directories so as to
fully use the eight CPU cores. Sadly even that speedup is still not
resulting in an acceptable overall speed. We compared compilation flags,
options and parameters but nothing obvious struck us in that regard.


I'm basically out of ideas. I was tearing my hair out over this until a
couple of days ago (yes, well, I'm bald now). I turn to this list as a
last resort. Can anyone help debugging this strange problem please ?

Regards, thanks for listening,

Maarten


_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL:
http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki


_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL:
http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki


--

Maarten J H van den Berg
Kratz business solutions bv

Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems. -- Jamie Zawinski


_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post Severe performance degradation 
On 12/21/10 21:56, Adrian Klaver wrote:
On 12/21/2010 09:25 AM, Maarten J H van den Berg wrote:

Hi there,




I'm basically out of ideas. I was tearing my hair out over this until a
couple of days ago (yes, well, I'm bald now). I turn to this list as a
last resort. Can anyone help debugging this strange problem please ?

Regards, thanks for listening,

Maarten



Have you tried running with the verbosity -v increased, over a small set
of files to see what is going on? Probably something above 5.

Hi Adrian,

Yes we did, nothing out of the ordinary was found. Then we started using
-v0 to make sure that it wasn't excessive logging that was holding it back.

Regards,
Maarten



_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post Severe performance degradation 
On 12/21/10 18:51, Jakob Unterwurzacher wrote:


What about the kernel version? Probably barriers are now working as they
should and you pay the price for that. dpkg package management had huge
performance regressions due to this AFAIR.

Googled a bit on that but I'm left confused: are you talking about
memory barriers, ext4 barriers, or some other type ?

Other than that, I'd try to find out where rdiff-backup is spending all
the the time. What does
iostat -dx 1
in the %utils column say while an incremental is running?
You could even fire up sysprof to see what it is waiting for.

We did monitor that a couple of weeks back. I do not have numbers
readily available now but we did not find a specific reason I believe.

For now I'm going to probably work around it by making the process a
two-phase one: first step rsync, second step rdiff-backup...
I may still do repaet tests with iostat later, if only to find out what
is causing this...

Cheers!
Maarten

Regards,
Jakob

_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki


_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post Severe performance degradation 
Am 22.12.2010 15:02, schrieb Maarten J H van den Berg:
On 12/21/10 18:51, Jakob Unterwurzacher wrote:


What about the kernel version? Probably barriers are now working as they
should and you pay the price for that. dpkg package management had huge
performance regressions due to this AFAIR.

Googled a bit on that but I'm left confused: are you talking about
memory barriers, ext4 barriers, or some other type ?

I meant hard-disk write barriers as in http://lwn.net/Articles/283161/ ,
but, as the problem seems to be CIFS I think we can forget about iostat.
The CIFS client is built into the kernel, if you want to get rid of the
rsync stage I'd try the older kernel or experiment with CIFS mount
options, ( rsize=130048 is recommended often).

Regards,
Jakob


Other than that, I'd try to find out where rdiff-backup is spending all
the the time. What does
iostat -dx 1
in the %utils column say while an incremental is running?
You could even fire up sysprof to see what it is waiting for.

We did monitor that a couple of weeks back. I do not have numbers
readily available now but we did not find a specific reason I believe.

For now I'm going to probably work around it by making the process a
two-phase one: first step rsync, second step rdiff-backup...
I may still do repaet tests with iostat later, if only to find out what
is causing this...

Cheers!
Maarten

Regards,
Jakob

_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki


_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki


_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post Severe performance degradation 
RE: Severe performance degradation thread

This is a long shot, but I have had issues like this on new servers
with certain RAID hardware, until I forced an initial sync of the RAID
volume. Once the initial sync completed the server performed as
expected.

Best,

Chuck

_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Post Severe performance degradation 
chuck odonnell wrote:
RE: Severe performance degradation thread

This is a long shot, but I have had issues like this on new servers
with certain RAID hardware, until I forced an initial sync of the RAID
volume. Once the initial sync completed the server performed as
expected.

Are we perhaps talking about... Dell servers with PERC controllers ??
If so that's something sure worth investigating. Mine are R510's iirc.

cheers,
Maarten

Best,

Chuck

_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

Display posts from previous:
Reply to topic Page 1 of 1
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
  


Magic SEO URL for phpBB