Deprecated: __autoload() is deprecated, use spl_autoload_register() instead in /home/pbuc/public_html/forum/mods/ext_phorummail/ezc/Base/src/ezc_bootstrap.php on line 36

Deprecated: Methods with the same name as their class will not be constructors in a future version of PHP; KeyCAPTCHA_CLASS has a deprecated constructor in /home/pbuc/public_html/forum/mods/keycaptcha/keycaptcha.php on line 108
inode effect on tape drive speed?
Welcome! » Log In » Create A New Profile

inode effect on tape drive speed?

Posted by George Sinclair - NOAA Federal 
George Sinclair - NOAA Federal
inode effect on tape drive speed?
October 04, 2019 02:59AM
A few questions on inode impact to tape drive write speed.

1. Why does a file system with a gazillion inodes (many, many tiny
files) often result in super slow write speeds to the tape drive?

I see speeds down in the 200 KB range, whereas almost all file systems,
with a more manageable number of inodes, write at significantly higher
speeds (60-100+ MB/sec), even if it's just a single save set.

I can see that it would take much longer to walk such a file system
(we're assuming no block level backup in this case), but if it's running
a full, wherein it doesn't need to check to see if the file needs to be
backed up, then why would the speed slow down to such a crawl whereas
other backups (not a bunch of tiny files) keep the drive streaming just
fine, even with a single save set?

Indexing is enabled for the pool. I tried to run a full backup of /path
(single save set), and after 8 days, it had only completed 86 GB. I
stopped it. I then changed the save set configuration on the resource to
something like:
path/subdir1
/path/subdir2
....
/path/subdir30

with a client parallelism of 8, but a group parallelism of 4. This
breakdown accounts for the bulk of the data, but, yes, another resource
would be required to capture the residual data and would need a
directive to null the above entries. Anyway, I reran the backup, and
this time, it completed all thirty save sets (240 GB) in 5 days. This
ran significantly faster, passing the first scenario in just two days.
However, the write speeds were still miserably slow.

Next, I ran a test from the client side (level=manual) by launching a
save command, one at a time, against four of the paths, but I specified
'-S' to disable saving the index entries. This data does not require
browsable recovery. The speed started out at 246 KB/sec for the first
one.  After the second one was started, it jumped to 900+ KB/sec, by the
third it was up to 1600+ KB/sec and after the fourth one was started it
averaged around 2200+ KB/sec. All four save sets were writing to a
single drive. I canceled the backups after about an hour. The speed is
still very bad, but way better than before. At this rate, it would be
possible to complete the backups in maybe two days versus five.

2. Does turning off indexing usually create an appreciably faster speed,
particularly with save sets that have millions of tiny files? Or is this
just coincidence?

3. In terms of speed, in this case, is there any difference between
listing four save sets, and running the backup from the server, versus
running four separate saves from the client?

4. Is this a situation wherein the 'parallel save streams per save set'
option would help?

I'm curious if it would be better to set that option and list just the
single save set (/path) versus keeping that option disabled and
scripting the saves from the client instead wherein I always have four
running at a time (path/subdir1 - /path/subdir30).

George

--
George Sinclair
Voice: (301) 713-4921
- The preceding message is personal and does not reflect any official or unofficial position of the United States Department of Commerce -
- Any opinions expressed in this message are NOT those of the US Govt. -


--
This list is hosted as a public service at Temple University by Stan Horwitz
If you wish to sign off this list or adjust your subscription settings, please do so via http://listserv.temple.edu/archives/emc-dataprotection-l.html
If you have any questions regarding management of this list, please send email to owner-emc-dataprotection-l@listserv.temple.edu
This message was imported via the External PhorumMail Module
Preston de Guise
Re: inode effect on tape drive speed?
October 04, 2019 02:59AM
Hi George,

Sent from my iPad

> On 4 Oct 2019, at 10:20, George Sinclair - NOAA Federal <000001bd925b8f5e-dmarc-request@listserv.temple.edu> wrote:
>
> A few questions on inode impact to tape drive write speed.
>
> 1. Why does a file system with a gazillion inodes (many, many tiny files) often result in super slow write speeds to the tape drive?
>
> I see speeds down in the 200 KB range, whereas almost all file systems, with a more manageable number of inodes, write at significantly higher speeds (60-100+ MB/sec), even if it's just a single save set.
>
> I can see that it would take much longer to walk such a file system (we're assuming no block level backup in this case), but if it's running a full, wherein it doesn't need to check to see if the file needs to be backed up, then why would the speed slow down to such a crawl whereas other backups (not a bunch of tiny files) keep the drive streaming just fine, even with a single save set?

This is a problem for any backup product performing a filesystem walk. Different filesystems handle the "dense filesystem problem" slightly better than others, but a single threaded walk of the filesystem will take a longer period of time. (For example, creating a non-compressing tar or zip of a dense directory will equally take a while. Create 15,000 files in a single directory and you'll notice even a "dir" or "ls" will take a while to respond.)

The solution to get around this for most products is to use parallel reads (e.g., your previous question about PSS), or to bypass the filesystem via block based backup. That has considerable speed advantages, but management of block based backups often work differently to regular filesystem backup management.


> Indexing is enabled for the pool. I tried to run a full backup of /path (single save set), and after 8 days, it had only completed 86 GB. I stopped it.. I then changed the save set configuration on the resource to something like:
> path/subdir1
> /path/subdir2
> ...
> /path/subdir30
>
> with a client parallelism of 8, but a group parallelism of 4. This breakdown accounts for the bulk of the data, but, yes, another resource would be required to capture the residual data and would need a directive to null the above entries. Anyway, I reran the backup, and this time, it completed all thirty save sets (240 GB) in 5 days. This ran significantly faster, passing the first scenario in just two days. However, the write speeds were still miserably slow.
>
> Next, I ran a test from the client side (level=manual) by launching a save command, one at a time, against four of the paths, but I specified '-S' to disable saving the index entries. This data does not require browsable recovery. The speed started out at 246 KB/sec for the first one. After the second one was started, it jumped to 900+ KB/sec, by the third it was up to 1600+ KB/sec and after the fourth one was started it averaged around 2200+ KB/sec. All four save sets were writing to a single drive. I canceled the backups after about an hour. The speed is still very bad, but way better than before. At this rate, it would be possible to complete the backups in maybe two days versus five.

Yes, indexing on or off will make a difference in performance in that scenario but in reality the key performance limitation resides in reading from the filesystem.

> 2. Does turning off indexing usually create an appreciably faster speed, particularly with save sets that have millions of tiny files? Or is this just coincidence?

There's less communications going on. When you're doing a backup with indexing, the client obviously has to send the metadata information to the server as well for index storage (or hold it temporarily - I can't remember which is the way it is done, it's been a long time since I've looked at that low level).

> 3. In terms of speed, in this case, is there any difference between listing four save sets, and running the backup from the server, versus running four separate saves from the client?

I'd generally advocate running from the server so you have better control in NetWorker of the resource allocations, etc.

> 4. Is this a situation wherein the 'parallel save streams per save set' option would help?

Yes, PSS will help in this situation, even when writing to tape. (BBB can't go to tape, it has to go to AFTD or DD, though can be cloned to tape if required.)

> I'm curious if it would be better to set that option and list just the single save set (/path) versus keeping that option disabled and scripting the saves from the client instead wherein I always have four running at a time (path/subdir1 - /path/subdir30).

I'd genuinely recommend using PSS for this. It'll avoid issues where someone adds a directory and doesn't let you know, etc., and it lets NetWorker balance the performance.

Cheers,
Preston.

>
> George
>
> --
> George Sinclair
> Voice: (301) 713-4921
> - The preceding message is personal and does not reflect any official or unofficial position of the United States Department of Commerce -
> - Any opinions expressed in this message are NOT those of the US Govt. -
>
>
> --
> This list is hosted as a public service at Temple University by Stan Horwitz
> If you wish to sign off this list or adjust your subscription settings, please do so via http://listserv.temple.edu/archives/emc-dataprotection-l.html
> If you have any questions regarding management of this list, please send email to owner-emc-dataprotection-l@listserv.temple.edu


--
This list is hosted as a public service at Temple University by Stan Horwitz
If you wish to sign off this list or adjust your subscription settings, please do so via http://listserv.temple.edu/archives/emc-dataprotection-l.html
If you have any questions regarding management of this list, please send email to owner-emc-dataprotection-l@listserv.temple.edu
This message was imported via the External PhorumMail Module
Sorry, only registered users may post in this forum.

Click here to login