SearchFAQMemberlist Log in
Reply to topic Page 1 of 1
Tape/media errors/HP LTO-3
Author Message
Post Tape/media errors/HP LTO-3 
Hi,

We're running the latest F/W for HP LTO-3 tape drives (M6BS) for
4.0GBPS/FC drives.

As was noted in the previous conversation, errors such as:
1323762270 1 386 16 media-server 0 0 0 *NULL* bptm error unloading
media, TpErrno = Robot operation failed
1322549252 1 388 16 media-server 1136618 1136513 0 client-hostname
bptm FREEZING media id XAC228, External event caused rewind during
write, all data on media is lost

When these errors occur in your environments (on multiple tapes) do
you get the drives replaced in advanced or wait for them to fail
completely? In the past I had been getting them replaced regularly
but its getting problematic they used to be servicing components
multiple times per ewek.

Justin.
_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

View user's profile Send private message
Post Tape/media errors/HP LTO-3 
Justin,

Are you sharing the tape drive between systems.
Is there any change one system could have the tape drive in use and another system could issue a rewind command to the tape drive?

Are you using SSO?

len

-----Original Message-----
From: veritas-bu-bounces < at > mailman.eng.auburn.edu [mailto:veritas-bu-bounces < at > mailman.eng.auburn.edu] On Behalf Of Justin Piszcz
Sent: Wednesday, December 14, 2011 7:07 AM
To: veritas-bu < at > mailman.eng.auburn.edu
Subject: [Veritas-bu] Tape/media errors/HP LTO-3

Hi,

We're running the latest F/W for HP LTO-3 tape drives (M6BS) for 4.0GBPS/FC drives.

As was noted in the previous conversation, errors such as:
1323762270 1 386 16 media-server 0 0 0 *NULL* bptm error unloading media, TpErrno = Robot operation failed
1322549252 1 388 16 media-server 1136618 1136513 0 client-hostname bptm FREEZING media id XAC228, External event caused rewind during write, all data on media is lost

When these errors occur in your environments (on multiple tapes) do you get the drives replaced in advanced or wait for them to fail completely? In the past I had been getting them replaced regularly but its getting problematic they used to be servicing components multiple times per ewek.

Justin.
_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

Post Tape/media errors/HP LTO-3 
Hi,

That looks like a robotic arm problem rather than the tape drive or tapes.

I'd be checking the robotics firmware (there's a command or the library
panel normally shows as well) and requesting an engineer onsite to
healthcheck the robotic arm.
But it's often one of the components associated with the gripper (robotics)
that's out of alignment needing alignment or replacing.

Robyn

--
Robyn Hirano
Rodd Consulting Pty Ltd
M: +61 412 352 725
E: robyn.hirano < at > roddconsulting.com.au


-----Original Message-----
From: veritas-bu-bounces < at > mailman.eng.auburn.edu
[mailto:veritas-bu-bounces < at > mailman.eng.auburn.edu] On Behalf Of Justin
Piszcz
Sent: Wednesday, 14 December 2011 11:07 PM
To: veritas-bu < at > mailman.eng.auburn.edu
Subject: [Veritas-bu] Tape/media errors/HP LTO-3

Hi,

We're running the latest F/W for HP LTO-3 tape drives (M6BS) for
4.0GBPS/FC drives.

As was noted in the previous conversation, errors such as:
1323762270 1 386 16 media-server 0 0 0 *NULL* bptm error unloading
media, TpErrno = Robot operation failed
1322549252 1 388 16 media-server 1136618 1136513 0 client-hostname
bptm FREEZING media id XAC228, External event caused rewind during
write, all data on media is lost

When these errors occur in your environments (on multiple tapes) do
you get the drives replaced in advanced or wait for them to fail
completely? In the past I had been getting them replaced regularly
but its getting problematic they used to be servicing components
multiple times per ewek.

Justin.
_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1415 / Virus Database: 2102/4079 - Release Date: 12/13/11

_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

Post Tape/media errors/HP LTO-3 
Hi,

Thanks for the reply we are at the current revision for the robot that
they recommend, we have replaced arms in the past but cannot confirm
or deny whether that has fixed any of the problems. Normally (again,
normally..) when there are robot arm issues there are reach/put errors
etc, have not seen them in this case..

Justin.

On Wed, Dec 14, 2011 at 7:35 AM, Robyn Hirano
<robyn.hirano < at > roddconsulting.com.au> wrote:
Hi,

That looks like a robotic arm problem rather than the tape drive or tapes.

I'd be checking the robotics firmware (there's a command or the library
panel normally shows as well) and requesting an engineer onsite to
healthcheck the robotic arm.
But it's often one of the components associated with the gripper (robotics)
that's out of alignment needing alignment or replacing.

Robyn

--
Robyn Hirano
Rodd Consulting Pty Ltd
M: +61 412 352 725
E: robyn.hirano < at > roddconsulting.com.au


-----Original Message-----
From: veritas-bu-bounces < at > mailman.eng.auburn.edu
[mailto:veritas-bu-bounces < at > mailman.eng.auburn.edu] On Behalf Of Justin
Piszcz
Sent: Wednesday, 14 December 2011 11:07 PM
To: veritas-bu < at > mailman.eng.auburn.edu
Subject: [Veritas-bu] Tape/media errors/HP LTO-3

Hi,

We're running the latest F/W for HP LTO-3 tape drives (M6BS) for
4.0GBPS/FC drives.

As was noted in the previous conversation, errors such as:
1323762270 1 386 16 media-server 0 0 0 *NULL* bptm error unloading
media, TpErrno = Robot operation failed
1322549252 1 388 16 media-server 1136618 1136513 0 client-hostname
bptm FREEZING media id XAC228, External event caused rewind during
write, all data on media is lost

When these errors occur in your environments (on multiple tapes) do
you get the drives replaced in advanced or wait for them to fail
completely?  In the past I had been getting them replaced regularly
but its getting problematic they used to be servicing components
multiple times per ewek.

Justin.
_______________________________________________
Veritas-bu maillist  -  Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1415 / Virus Database: 2102/4079 - Release Date: 12/13/11

_______________________________________________
Veritas-bu maillist  -  Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
_______________________________________________
Veritas-bu maillist - Veritas-bu < at > mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

View user's profile Send private message
Post  
The External Event Caused Rewind error is not what it appears to be at first blush, and its a particularly bad error. A couple of years ago we noticed we had dissapearing backups, data was lost and it wasn't readily apparent why. I worked on this for almost a year in total because it can get very bad in some instances. An SCSI error generally returns a normal write error or hardware error OpCode, but sometimes either the event is so bad that there is a natural reason, or there is a protocol error that triggers an unnatural event, and a full rewind OpCode is sent instead. This rewinds the tape and kicks it out. Well, the big deal here is that the next time that cartridge is loaded by a drive, the images on it are overwritten. So, if that was the last backup being written to that cartridge, you could have lost another terabyte of other data at the same time it occurred. Not to mention any images that spanned multiple tapes are now defunct because the parts of them on that cartridge are gone the instant the rewind occurs.

A rewind event can be a protocol error, in our case the #1 cause was a problem in certain SAN card firmwares that triggered a protocol problem, I won't mention a vendor because its been fixed. Another cause can be a drive whose main board is going bad and rather than throwing write errors, its throwing protocol errors instead. A lot of things can cause this problem, but it always occurs between the interface and the drive (including the drivers on the host, which can be a part of the problem). And in some cases, the only way to track down the real cause is with a sniffer if its on a fabric. A good tool to use for diagnosis is to have the drive vendor examine the drive's log buffer after a rewind and before the drive's error buffer is overwritten.

But, yes, if there's a chance it could be the drive, then replace it right away. The error is caussing you data loss during your backup period so its a no-brainer.

K-
--Tape is dead. Long live the tape.

View user's profile Send private message
Post Tape/media errors/HP LTO-3 
Hi,

You don't *have* to have reach/put errors for it to be a robot arm issue, there are more points of failure in a robotic arm than this. Reach/put errors are just the obvious alignment ones.
(When I was level 2, if I had an error with robot in it, it was pretty easy to ask a hw engineer to do an onsite just to health check it - even if it had been checked recently.)


I also agree with Kevin's comment that you have a data loss situation if the frozen tape is put back into circulation, so it's not a situation to treat lightly. I'd work all angles.


Given you've already:
  • confirmed firmware is up-to-date
  • robotic arm has been replaced



I'd then:
  • Make sure that tape is pulled out of circulation, so that noone accidentally unfreezes it
  • Start collecting iostat stats
  • Get level 3 to do a diagnostic dump and analyse - not all errors are reported to syslog/bptm - before replacing the tape drive, cos once you pull it you lose that history (defintiely would do, if you've already been replacing LTO3 drives)
  • Check what was replaced - was it the whole robotic component or a portion, just how many tape drives have been replaced, what were the serial IDs.
  • Get someone to check robotic arm operation, in case the wrong component was replaced



If this comes up blank, as a level 2, I'd be escalating to level 3 so that they are across decision to replace and fully investigate the the pending and previous replacements.


Again, hope this helps.I'm mainly just pulling from my collective memory of lots of tape support cases - I've even seen replaced tape drives being diverted to support for stress testing when there was a silent error, but this was only the once - normally it was possible to get a reason if you dug.


Robyn

On Wed, Dec 14, 2011 at 11:58 PM, Justin Piszcz <jpiszcz < at > lucidpixels.com ([email]jpiszcz < at > lucidpixels.com[/email])> wrote:
Hi,

Thanks for the reply we are at the current revision for the robot that
they recommend, we have replaced arms in the past but cannot confirm
or deny whether that has fixed any of the problems.  Normally (again,
normally..) when there are robot arm issues there are reach/put errors
etc, have not seen them in this case..

Justin.


On Wed, Dec 14, 2011 at 7:35 AM, Robyn Hirano
<robyn.hirano < at > roddconsulting.com.au ([email]robyn.hirano < at > roddconsulting.com.au[/email])> wrote:
Hi,

That looks like a robotic arm problem rather than the tape drive or tapes.

I'd be checking the robotics firmware (there's a command or the library
panel normally shows as well) and requesting an engineer onsite to
healthcheck the robotic arm.
But it's often one of the components associated with the gripper (robotics)
that's out of alignment needing alignment or replacing.

Robyn

--
Robyn Hirano
Rodd Consulting Pty Ltd
M: [url=tel:%2B61%20412%20352%20725]+61 412 352 725[/url]
E: robyn.hirano < at > roddconsulting.com.au ([email]robyn.hirano < at > roddconsulting.com.au[/email])


-----Original Message-----
From: veritas-bu-bounces < at > mailman.eng.auburn.edu ([email]veritas-bu-bounces < at > mailman.eng.auburn.edu[/email])
[mailto:veritas-bu-bounces < at > mailman.eng.auburn.edu ([email]veritas-bu-bounces < at > mailman.eng.auburn.edu[/email])] On Behalf Of Justin
Piszcz
Sent: Wednesday, 14 December 2011 11:07 PM
To: veritas-bu < at > mailman.eng.auburn.edu ([email]veritas-bu < at > mailman.eng.auburn.edu[/email])
Subject: [Veritas-bu] Tape/media errors/HP LTO-3

Hi,

We're running the latest F/W for HP LTO-3 tape drives (M6BS) for
4.0GBPS/FC drives.

As was noted in the previous conversation, errors such as:
1323762270 1 386 16 media-server 0 0 0 *NULL* bptm error unloading
media, TpErrno = Robot operation failed
1322549252 1 388 16 media-server 1136618 1136513 0 client-hostname
bptm FREEZING media id XAC228, External event caused rewind during
write, all data on media is lost

When these errors occur in your environments (on multiple tapes) do
you get the drives replaced in advanced or wait for them to fail
completely?  In the past I had been getting them replaced regularly
but its getting problematic they used to be servicing components
multiple times per ewek.

Justin.
_______________________________________________
Veritas-bu maillist  -  Veritas-bu < at > mailman.eng.auburn.edu ([email]Veritas-bu < at > mailman.eng.auburn.edu[/email])
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1415 / Virus Database: 2102/4079 - Release Date: 12/13/11

_______________________________________________
Veritas-bu maillist  -  Veritas-bu < at > mailman.eng.auburn.edu ([email]Veritas-bu < at > mailman.eng.auburn.edu[/email])
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu






--
Robyn Hirano
RODD Consulting Pty Ltd
M: +61 412 352 725
E: robyn.hirano < at > roddconsulting.com.au ([email]robyn.hirano < at > roddconsulting.com.au[/email])

Post  
After reading all that , i am still confused regarding media errors in Hp LTO 3...anyhow thanks for sharing other knowledge...

HP LTO 3

View user's profile Send private message
Display posts from previous:
Reply to topic Page 1 of 1
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
  


Magic SEO URL for phpBB