Hi all,
Thinking I could actually use LTFS (using HP's LTFS 1.2.0) for production, I began rsync'ing 1.2TB of data onto an LTFS formatted LTO5 maxell tape and this HP SAS LTO5 drive:
Vendor: HP Model: Ultrium 5-SCSI Rev: Z39D
Type: Sequential-Access ANSI SCSI revision: 06
I don't know how many gigs were written to the tape, but eventually this popped up in messages :
messages-20110731:Jul 30 03:41:42 mobymc kernel: INFO: task ltfs:18133 blocked for more than 120 seconds.
messages-20110731:Jul 30 03:41:42 mobymc kernel: ltfs D ffff88021fe70e00 0 18133 1 0x00000000
messages-20110731:Jul 30 03:43:42 mobymc kernel: INFO: task ltfs:18133 blocked for more than 120 seconds.
messages-20110731:Jul 30 03:43:42 mobymc kernel: ltfs D ffff88021fe70e00 0 18133 1 0x00000000
messages-20110731:Jul 30 03:45:42 mobymc kernel: INFO: task ltfs:18133 blocked for more than 120 seconds.
messages-20110731:Jul 30 03:45:42 mobymc kernel: ltfs D ffff88021fe70e00 0 18133 1 0x00000000
messages-20110731:Jul 30 03:47:42 mobymc kernel: INFO: task ltfs:18133 blocked for more than 120 seconds.
messages-20110731:Jul 30 03:47:42 mobymc kernel: ltfs D ffff88021fe70e00 0 18133 1 0x00000000
There was nothing wrong with the FC connected eonstor from which the data being written to tape was being pulled from, but the rsync process was basically stuck. I used --progress with rsync so I know it was really stuck, plus I left it on for more than an hour and not an additional single byte was transferred. I couldn't umount the tape (without -l), nor run lsof (hang), nor run "ps aux | grep something" (hang), ltfs had seriously fsck'd the system. I rebooted the system and was planning on using good old tar to make my backup.
I started the tar with :
mt -f /dev/nst0 rewind
mt -f /dev/nst0 compression 1
tar -b 1024 -cvWf /dev/nst0 directory
but then tar gave an I/O error after transferring several gigs?, dmesg showed :
st0: Block limits 1 - 16777215 bytes.
I then recalled that ltfs makes two partitions on the tape, one for "metadata" and one for data. Rather than kill the next few hours with an "mt erase", I ran :
mt -f /dev/nst0 rewind
dd if=/dev/zero of=/dev/nst0 bs=4k count=6291456
and wrote 24GB worth of zeros to the tape. I was then able to write the first dataset of 1.2TB and another (after doing an mt fsf 1 after the first tar) dataset of ~200GB . I also ran a script which dumps the output of "tar -b 1024 -tvf /dev/nst0" for each tar position on the tape for another script that I have that tells me how much I have stored on the tape (using the tar -tvf dumps) if I need to use the tape later in the future. Everything was going great.
Then I came in Monday, ran an mt -f /dev/nst0 rewind and then tried to eject the tape with "mt -f /dev/nst0 eject" and got this beautiful error:
messages:Aug 1 12:30:58 mobymc kernel: st0: Add. Sense: Medium removal prevented
I could still rewind the tape, fsf the tape, but it wouldn't eject. I tried pushing the eject button on the drive several times but basically it sounded like the drive was trying to push the tape out, but couldn't and so was re-seating the drive back onto the motor. I finally held down the eject button for a few seconds, lifted the plastic door cover, and slightly nudged the tape into the drive, and it came out. Making sure the drive just had a random hiccup I put the tape back in and was again able to run mt eject without problems. The tape drive/tapes are practically new and have had almost no real use.
Now I wanted to make sure the tape still had the tar'd data, but to my surprise the tar's on the tape were gone! Every attempt at tar -b 1024 -tvf kept returning the error that the data didn't look like a tar archive. I had used the W flag with tar, so tar itself had verified the data, and then I ran through all 1.4TB worth of tar data on the tape to generate the entire tvf listings the night before, but now I couldn't retrieve anything on the tape!
Thinking that I'd have to re-do all the tars again I opted to do a full mt -f /dev/nst0 erase. Again to my surprise, rather than taking several hours for the full erase, it finished in about 15 mins. I then tried tar'ing the 1.2TB data set onto the tape, but again it prematurely stopped tar'ing after only a few gigs of data were written with an I/O error to the tape (tape full), as if LTFS's initial metadata partition was still on the tape. This time however, the st block limits error didn't show up in dmesg. Showing no mercy to the tape I ran this :
mt -f /dev/nst0 erase
dd if=/dev/zero of=/dev/nst0 bs=524288
524288 is the blocksize as mentioned in this LTFS user guide (thus the -b 1024 blocking factor used with tar) http://bizsupport2.austin.hp.com/bc/docs/support/SupportManual/c02262008/c02262008.pdf . Dd reported having written 3.4TB of data at 285MB/s . I was then convinced that the tape was "clean". I tried my tar of the 1.2TB data set again, but still got the same tape full /tar I/O error after only a few GB were written.
I had another LTO5 tape untouched by LTFS, so I started tar'ing the data to this tape a few hours ago and everything is going well so far. Anyone had similar experiences with LTFS? Is there anyway to rescue the misbehaving LTO5 tape? How did my tar's get corrupted?
Thanks,
Sabuj Pattanayek
