A few questions on inode impact to tape drive write speed.
1. Why does a file system with a gazillion inodes (many, many tiny
files) often result in super slow write speeds to the tape drive?
I see speeds down in the 200 KB range, whereas almost all file systems,
with a more manageable number of inodes, write at significantly higher
speeds (60-100+ MB/sec), even if it's just a single save set.
I can see that it would take much longer to walk such a file system
(we're assuming no block level backup in this case), but if it's running
a full, wherein it doesn't need to check to see if the file needs to be
backed up, then why would the speed slow down to such a crawl whereas
other backups (not a bunch of tiny files) keep the drive streaming just
fine, even with a single save set?
Indexing is enabled for the pool. I tried to run a full backup of /path
(single save set), and after 8 days, it had only completed 86 GB. I
stopped it. I then changed the save set configuration on the resource to
something like:
path/subdir1
/path/subdir2
....
/path/subdir30
with a client parallelism of 8, but a group parallelism of 4. This
breakdown accounts for the bulk of the data, but, yes, another resource
would be required to capture the residual data and would need a
directive to null the above entries. Anyway, I reran the backup, and
this time, it completed all thirty save sets (240 GB) in 5 days. This
ran significantly faster, passing the first scenario in just two days.
However, the write speeds were still miserably slow.
Next, I ran a test from the client side (level=manual) by launching a
save command, one at a time, against four of the paths, but I specified
'-S' to disable saving the index entries. This data does not require
browsable recovery. The speed started out at 246 KB/sec for the first
one. After the second one was started, it jumped to 900+ KB/sec, by the
third it was up to 1600+ KB/sec and after the fourth one was started it
averaged around 2200+ KB/sec. All four save sets were writing to a
single drive. I canceled the backups after about an hour. The speed is
still very bad, but way better than before. At this rate, it would be
possible to complete the backups in maybe two days versus five.
2. Does turning off indexing usually create an appreciably faster speed,
particularly with save sets that have millions of tiny files? Or is this
just coincidence?
3. In terms of speed, in this case, is there any difference between
listing four save sets, and running the backup from the server, versus
running four separate saves from the client?
4. Is this a situation wherein the 'parallel save streams per save set'
option would help?
I'm curious if it would be better to set that option and list just the
single save set (/path) versus keeping that option disabled and
scripting the saves from the client instead wherein I always have four
running at a time (path/subdir1 - /path/subdir30).
George
--
George Sinclair
Voice: (301) 713-4921
- The preceding message is personal and does not reflect any official or unofficial position of the United States Department of Commerce -
- Any opinions expressed in this message are NOT those of the US Govt. -
--
This list is hosted as a public service at Temple University by Stan Horwitz
If you wish to sign off this list or adjust your subscription settings, please do so via http://listserv.temple.edu/archives/emc-dataprotection-l.html
If you have any questions regarding management of this list, please send email to owner-emc-dataprotection-l@listserv.temple.edu
This message was imported via the External PhorumMail Module