Welcome! » Log In » Create A New Profile

Incredibly slow i/o to NAS server

Posted by Anonymous 
Incredibly slow i/o to NAS server
November 29, 2016 06:53AM
Hi,

I'm running an rdiff-backup script, to backup my laptop to my local NAS.

I love rdiff-backup and it normally works great, but at the moment I'm having problem with one specific directory.

Such directory is where is store my docker images, therefore it's a lot of files.

To be precise:
sudo find /mnt/vms/docker/ -type f|wc -l
852443
sudo du -hs /mnt/vms/docker
4.9G    /mnt/vms/docker

So, it's nearly a million files, but less than 5GB.

When I run the backup the first time, it didn't take too long (can't remember how long exactly, but less than a couple of hours).

Then I've updated some docker images, and removed the old ones (clearly lots of changes).

Now I re-run the rdiff-backup and it has been running for nearly 24 hours on that specific folder, no problem with all the other folders.

After some investigation, it turned out the bottleneck is the i/o on the NAS.

This NAS is a more than reasonable machine for the job (Dell workstation running 2 Xeon 2.0GHz dual-core CPUs), all on a headless Debian 7.

The storage for this backup is 4 SATA3 disks on a (mdadm) RAID6, all partitioned with LVM for flexibility.

Here is the iostat output:
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           8.39   58.62   12.03    3.99    0.00   16.98

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda              15.40   448.00    4.60   94.40     0.08     1.87    40.29     0.70    7.12    6.13    7.17   6.26  62.02
sdb              22.40   442.00    5.60   93.00     0.11     1.84    40.50     0.41    4.13    5.18    4.07   3.35  33.06
sdc              20.20   443.00    3.60   95.20     0.10     1.85    40.34     0.35    3.54    5.44    3.47   2.61  25.76
sdd              27.40   432.20    4.60   93.60     0.13     1.80    40.26     0.49    4.97   10.17    4.72   3.61  35.46
md127             0.00     0.00    2.00  257.80     0.02     3.33    26.41     0.00    0.00    0.00    0.00   0.00   0.00
dm-6              0.00     0.00    0.40  168.20     0.00     0.66     8.00     1.61    9.36   11.50    9.35   5.77  97.24

sda, sdb, sdc and sdd are the disks part of the RAID6, md127 the name of the RAID6 partition (not sure why everything is zero there), and dm-6 is the logical partition where I'm saving my backup to.

It's writing 0.66MB/s and the partition is 97,24% utilised! Wow!

I suppose it's due to the number of write/s.

This i/o problem is also confirmed by vmstat too (look at the "io bo" column):
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 5  0 1902128 1188960 428720 12141168    0    0   346  1880 11644 21770 72 14 10  3
 6  0 1902128 1190248 429044 12147940    0    0    50  1221 10136 24383 67 12 17  3
 3  0 1902128 1183644 429328 12154196    0    0    38  1038 13587 25388 66 15 15  4
 6  0 1902128 1179204 429668 12158132    0    0   239   692 11868 22070 66 11 19  4
 5  0 1902128 1171128 429968 12166432    0    0    51  1138 12609 25194 68 15 13  3
 4  0 1902128 1165516 430264 12171088    0    0    45  1438 12365 27973 64 13 18  4
 4  0 1902128 1158788 430576 12177560    0    0    42   958 15849 27693 61 16 18  5
 5  1 1902128 1152560 430780 12183864    0    0   348  1135 14911 26943 65 13 18  4
 3  1 1902128 1145700 431008 12191060    0    0    41   990 16313 27469 62 16 18  4
 3  0 1902128 1139508 431168 12197124    0    0    40   806 11108 25448 65 12 19  3
 3  0 1902128 1133748 431688 12202564    0    0    82   880 9026 25443 63 15 18  4
 3  1 1902128 1124528 432024 12209916    0    0   238  5042 12518 24546 66 12 19  3
 5  0 1902128 1120960 432272 12213644    0    0    16   799 13788 22166 72 14 11  2
 2  0 1902128 1115848 432960 12218472    0    0    96  1192 12731 24136 66 12 18  3

In short, is there a flag I can pass to rdiff-backup or anything else I can do to minimise this problem (number of writes?)?

Thanks,

Andrea
Incredibly slow i/o to NAS server
November 29, 2016 09:17AM
On 11/28/2016 07:31 PM, Andrea Bolandrina wrote:
[quote]Hi,

I'm running an rdiff-backup script, to backup my laptop to my local NAS.

I love rdiff-backup and it normally works great, but at the moment I'm having problem with one specific directory.
Such directory is where is store my docker images, therefore it's a lot of files.
To be precise:
sudo find /mnt/vms/docker/ -type f|wc -l
852443
sudo du -hs /mnt/vms/docker
4.9G /mnt/vms/docker
So, it's nearly a million files, but less than 5GB.

When I run the backup the first time, it didn't take too long (can't remember how long exactly, but less than a couple of hours).
Then I've updated some docker images, and removed the old ones (clearly lots of changes).
Now I re-run the rdiff-backup and it has been running for nearly 24 hours on that specific folder, no problem with all the other folders.
[/quote]
First, note that the files (except for those with a suffix known to be uncompressible) in those removed images is going to be compressed and stored as a .snapshot.gz file. That can take quite a while. Even something as simple as deleting an old kernel adds 15 minutes or so to my next backup as all the deleted files under /usr/src/kernels and /lib/modules get compressed and saved away.

Second, are there a large number of files with more than one hard link ("find /mnt/vms/docker/ -type f -links +1 | wc -l")? Those can be an issue if the device number (st_dev) changes ("Device:" in the output from the "stat" command). When that happens, every one of those multi-link files will be seen as changed and have a zero-diff file created in the increments directory. You can avoid that by using the "--no-compare-inode" option, but doing that aggravates the problems that rdiff-backup already has with hard-linked files.

--
Bob Nichols "NOSPAM" is really part of my email address.
Do NOT delete it.

_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Incredibly slow i/o to NAS server
December 03, 2016 07:36PM
I disabled file compression and it seems to be a bit better, even though still fairly slow.

Other problem, but still related to i/o slowness, I tried to run the same rdiff-backup (with file compression off) on a USB drive (which is formatted with btrfs with compression enabled) and at the second/third run, it crashed due to some weird i/o error [read only device].

I suspect the high i/o activity made the "slow" USB disk crash. It would probably work if it was an SSD.

It&#39;s a shame that rdiff-backup is so i/o hungry but I suppose that&#39;s the way it is for now.

Otherwise I really like the tool!

Bob - yes, lots of files under /mnt/vms/docker/ have more than one hard link.

I might try your suggestion (--no-compare-inode) if I run into trouble again.

What problems does rdiff-backup have with hard linked content?

As far as I know hard links should be supported...

Regards,

Andrea
Incredibly slow i/o to NAS server
December 04, 2016 06:59AM
On 12/03/2016 09:32 PM, Andrea Bolandrina wrote:
[quote]Bob - yes, lots of files under /mnt/vms/docker/ have more than one hard link.
I might try your suggestion (--no-compare-inode) if I run into trouble again.
What problems does rdiff-backup have with hard linked content?
As far as I know hard links should be supported...
[/quote]
Supported? Yes, but with plenty of bugs. If links are added and removed from a set, you can end up with two or more separate subsets (i.e., what should be a set of 10 links to a single file becomes 3 files with link counts of 3, 5, and 2), and the link arrangement in the metadata files won't always match the link arrangement in the mirror. The checksum is stored only for the first link in the collating sequence. If that first link gets deleted, the checksum is lost. If a link with a path that comes earlier in the collating sequence is added, it sometimes does not inherit the checksum. I have a massive and time-consuming audit that I run after every backup session to patch that up.

Verification always complains about missing checksums for all the links that do no have one stored. I have to filter out all the verbose 2-line messages for those from the verification report.

And of course there is the issue I mentioned with huge numbers of zero-diff increment files if the device number changes, and that device number can vary randomly when LVM and/or encrypted devices are involved. My only solution for that is a script that refuses to run the backup if the device numbers don't match what was previously recorded.

I really want to look into SafeKeep http://safekeep.sourceforge.net/ as an alternative, but I haven't had a chance to do that.

--
Bob Nichols "NOSPAM" is really part of my email address.
Do NOT delete it.

_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Incredibly slow i/o to NAS server
December 05, 2016 02:53AM
On Sun, 2016-12-04 at 08:58 -0600, Robert Nichols wrote:
[quote][quote]On 12/03/2016 09:32 PM, Andrea Bolandrina wrote:
[quote]
Bob - yes, lots of files under /mnt/vms/docker/ have more than one hard link.
I might try your suggestion (--no-compare-inode) if I run into trouble again.
What problems does rdiff-backup have with hard linked content?
As far as I know hard links should be supported...
[/quote]

Supported? Yes, but with plenty of bugs. If links are added and removed from a set, you can end up with two or more separate subsets (i.e., what should be a set of 10 links to a single file becomes 3 files with link counts of 3, 5, and 2), and the link arrangement in the metadata files won't always match the link arrangement in the mirror. The checksum is stored only for the first link in the collating sequence. If that first link gets deleted, the checksum is lost. If a link with a path that comes earlier in the collating sequence is added, it sometimes does not inherit the checksum. I have a massive and time-consuming audit that I run after every backup session to patch that up.

Verification always complains about missing checksums for all the links that do no have one stored. I have to filter out all the verbose 2-line messages for those from the verification report.

And of course there is the issue I mentioned with huge numbers of zero-diff increment files if the device number changes, and that device number can vary randomly when LVM and/or encrypted devices are involved. My only solution for that is a script that refuses to run the backup if the device numbers don't match what was previously recorded.

I really want to look into SafeKeep <[url=http://safekeep.sourceforge.net/]http://safekeep.sourceforge.net/[/url]> as an alternative, but I haven't had a chance to do that.

[/quote][/quote]However, SafeKeep uses rdiff-backup under the covers, it is just we have spent a lot of time trying to optimise the options that we pass to rdiff-backup.

Regards
Frank
Incredibly slow i/o to NAS server
December 05, 2016 07:04AM
On 12/05/2016 04:01 AM, Frank Crawford wrote:
[quote]On Sun, 2016-12-04 at 08:58 -0600, Robert Nichols wrote:
[quote]I really want to look into SafeKeep http://safekeep.sourceforge.net/ as an alternative, but I haven't had a chance to do that.

[/quote]However, SafeKeep uses rdiff-backup under the covers, it is just we have spent a lot of time trying to optimise the options that we pass to rdiff-backup.
[/quote]
I guess I can scratch that off my todo list, then. I find it astounding that anyone would build a new tool around a product that has been unsupported for over 7 years now.

--
Bob Nichols "NOSPAM" is really part of my email address.
Do NOT delete it.

_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Incredibly slow i/o to NAS server
December 05, 2016 07:14AM
On Mon, Dec 5, 2016 at 10:02 AM, Robert Nichols <rnicholsNOSPAM < at > comcast.net ([email]rnicholsNOSPAM < at > comcast.net[/email])> wrote:
[quote]On 12/05/2016 04:01 AM, Frank Crawford wrote:
[quote] On Sun, 2016-12-04 at 08:58 -0600, Robert Nichols wrote:
[quote] I really want to look into SafeKeep <[url=http://safekeep.sourceforge.net/]http://safekeep.sourceforge.net/[/url]> as an alternative, but I haven&#39;t had a chance to do that.

[/quote] However, SafeKeep uses rdiff-backup under the covers, it is just we have spent a lot of time trying to optimise the options that we pass to rdiff-backup.
[/quote]
I guess I can scratch that off my todo list, then.  I find it astounding that anyone would build a new tool around a product that has been unsupported for over 7 years now.

[/quote]

Mostly because the alternative to rdiff-backup are worst. Even if rdiff-backup has some bugs, is old and stagnant, it&#39;s more robust then many other backup tool for Linux. We did our homework here and rdiff-backup was the best tool for our needs.

The best thing we can do is maintain rdiff-backup and fix the bugs we are facing from time to time.

[quote] --
Bob Nichols     "NOSPAM" is really part of my email address.
                Do NOT delete it.

_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org ([email]rdiff-backup-users < at > nongnu.org[/email])
[url=https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users]https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users[/url]
Wiki URL: [url=http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki]http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki[/url]

[/quote]
Incredibly slow i/o to NAS server
December 05, 2016 07:31AM
On 12/4/2016 9:58 AM, Robert Nichols wrote:
[quote]On 12/03/2016 09:32 PM, Andrea Bolandrina wrote:
[quote]What problems does rdiff-backup have with hard linked content?
As far as I know hard links should be supported...
[/quote]
Supported? Yes, but with plenty of bugs. If links are added and
removed from a set, you can end up with two or more separate subsets
(i.e., what should be a set of 10 links to a single file becomes 3 files
with link counts of 3, 5, and 2), and the link arrangement in the
metadata files won't always match the link arrangement in the mirror.
The checksum is stored only for the first link in the collating
sequence. If that first link gets deleted, the checksum is lost. If a
link with a path that comes earlier in the collating sequence is added,
it sometimes does not inherit the checksum. I have a massive and
time-consuming audit that I run after every backup session to patch that
up.

Verification always complains about missing checksums for all the links
that do no have one stored. I have to filter out all the verbose 2-line
messages for those from the verification report.

[/quote]
I believe the harlink problems you describe are identified in this bug
report (that I submitted in 2009):

http://savannah.nongnu.org/bugs/?func=detailitem&item_id=26848

Attached to the report are patches to fix the issue. Some distros
(which I think include Debian, Ubuntu, & Suse) have since been including
the patches as part of their packaging.

--Joe

_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Incredibly slow i/o to NAS server
December 05, 2016 08:39AM
On 12/05/2016 09:13 AM, Patrik Dufresne wrote:
[quote]

On Mon, Dec 5, 2016 at 10:02 AM, Robert Nichols <rnicholsNOSPAM < at > comcast.net <mailto:rnicholsNOSPAM < at > comcast.net>> wrote:

On 12/05/2016 04:01 AM, Frank Crawford wrote:

On Sun, 2016-12-04 at 08:58 -0600, Robert Nichols wrote:

I really want to look into SafeKeep <http://safekeep.sourceforge.net/ http://safekeep.sourceforge.net/> as an alternative, but I haven't had a chance to do that.

However, SafeKeep uses rdiff-backup under the covers, it is just we have spent a lot of time trying to optimise the options that we pass to rdiff-backup.

I guess I can scratch that off my todo list, then. I find it astounding that anyone would build a new tool around a product that has been unsupported for over 7 years now.

Mostly because the alternative to rdiff-backup are worst. Even if rdiff-backup has some bugs, is old and stagnant, it's more robust then many other backup tool for Linux. We did our homework here and rdiff-backup was the best tool for our needs.
[/quote]
Those are exactly the reasons that I continue to use it, despite its faults.

--
Bob Nichols "NOSPAM" is really part of my email address.
Do NOT delete it.

_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Incredibly slow i/o to NAS server
December 05, 2016 01:28PM
On 12/05/2016 09:28 AM, Joe Steele wrote:
[quote]On 12/4/2016 9:58 AM, Robert Nichols wrote:
[quote]On 12/03/2016 09:32 PM, Andrea Bolandrina wrote:
[quote]What problems does rdiff-backup have with hard linked content?
As far as I know hard links should be supported...
[/quote]
Supported? Yes, but with plenty of bugs. If links are added and
removed from a set, you can end up with two or more separate subsets
(i.e., what should be a set of 10 links to a single file becomes 3 files
with link counts of 3, 5, and 2), and the link arrangement in the
metadata files won't always match the link arrangement in the mirror.
The checksum is stored only for the first link in the collating
sequence. If that first link gets deleted, the checksum is lost. If a
link with a path that comes earlier in the collating sequence is added,
it sometimes does not inherit the checksum. I have a massive and
time-consuming audit that I run after every backup session to patch that
up.

Verification always complains about missing checksums for all the links
that do no have one stored. I have to filter out all the verbose 2-line
messages for those from the verification report.

[/quote]
I believe the harlink problems you describe are identified in this bug report (that I submitted in 2009):

http://savannah.nongnu.org/bugs/?func=detailitem&item_id=26848

Attached to the report are patches to fix the issue. Some distros (which I think include Debian, Ubuntu, & Suse) have since been including the patches as part of their packaging.
[/quote]
Those patches look like they address the problems with checksums being misplaced or lost, but they don't appear to have anything to do with what I see as the greater problem of sets of hard links being broken up and inconsistencies between the hard link counts in the metadata and the mirror. I finally got rdiff-backup to build (needed a patch for librsync >= 1.0.0), so I can do some testing later.

--
Bob Nichols "NOSPAM" is really part of my email address.
Do NOT delete it.

_______________________________________________
rdiff-backup-users mailing list at rdiff-backup-users < at > nongnu.org
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Sorry, only registered users may post in this forum.

Click here to login