Welcome! » Log In » Create A New Profile

TSM 8.1.1 on Linux crash

Posted by Remco Post 
Remco Post
TSM 8.1.1 on Linux crash
January 09, 2018 09:59AM
Hi All,

over here we have a few TSM servers on Linux (RHEL 7.4) TSM 8.1.1.021 (to be upgraded to .100 soon) and we see something new that we never saw on AIX. If for some reason a tape gets left in a drive (IBM 3592) the only way to get the drive working again in to reboot the drive. Until then TSM is unable to open the drive (error 16). Unfortunately we seem to be able to reliably cause severe issues in TSM by rebooting a tapedrive:

kernel: NMI watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [dsmserv:6021]

The only way out is to reboot the entire server.

We never had such issues with TSM on AIX… is this something Linux-specific that we can hang TSM in non-interuptable routines (kernel space) by simply rebooting a tape drive?

--

Met vriendelijke groeten/Kind Regards,

Remco Post
r.post@plcs.nl
+31 6 248 21 622
This message was imported via the External PhorumMail Module
Martin Janosik
Re: TSM 8.1.1 on Linux crash
January 09, 2018 09:59AM
Hello there,

errno 16 usually corresponds to 'device busy'. I have observed this error
mostly on systems with shared library (shared=yes) and with storage agents
that had some problems to communicate with TSM server.
Are you facing the issue only on a single tape drive, or all? Are they
zoned by any chance also to other systems, i.e. 2nd cluster node?
Can you send few lines from actlog related to the drive with error before
failure occured?

M. Janosik

"ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> wrote on 2018-01-09
17:58:09:

> From: Remco Post <r.post@PLCS.NL>
> To: ADSM-L@VM.MARIST.EDU
> Date: 2018-01-09 17:59
> Subject: [ADSM-L] TSM 8.1.1 on Linux crash
> Sent by: "ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU>
>
> Hi All,
>
> over here we have a few TSM servers on Linux (RHEL 7.4) TSM 8.1.1.
> 021 (to be upgraded to .100 soon) and we see something new that we
> never saw on AIX. If for some reason a tape gets left in a drive
> (IBM 3592) the only way to get the drive working again in to reboot
> the drive. Until then TSM is unable to open the drive (error 16).
> Unfortunately we seem to be able to reliably cause severe issues in
> TSM by rebooting a tapedrive:
>
> kernel: NMI watchdog: BUG: soft lockup - CPU#6 stuck for 22s!
[dsmserv:6021]
>
> The only way out is to reboot the entire server.
>
> We never had such issues with TSM on AIX… is this something Linux-
> specific that we can hang TSM in non-interuptable routines (kernel
> space) by simply rebooting a tape drive?
>
> --
>
> Met vriendelijke groeten/Kind Regards,
>
> Remco Post
> r.post@plcs.nl
> +31 6 248 21 622
>
This message was imported via the External PhorumMail Module
Remco Post
Re: TSM 8.1.1 on Linux crash
January 09, 2018 03:59PM
> Op 9 jan. 2018, om 18:14 heeft Martin Janosik <martin.janosik@CZ.IBM.COM> het volgende geschreven:
>
> Hello there,
>
> errno 16 usually corresponds to 'device busy'. I have observed this error
> mostly on systems with shared library (shared=yes) and with storage agents
> that had some problems to communicate with TSM server.
> Are you facing the issue only on a single tape drive, or all? Are they
> zoned by any chance also to other systems, i.e. 2nd cluster node?
> Can you send few lines from actlog related to the drive with error before
> failure occured?
>

Hi Martin,

you are absolutely right, the drives are shared to multiple systems. The issue arises when the TSM instance using the drive crashes. To then free up the drive it has to be rebooted. But apparently, rebooting a tape drive is not a safe operation, given the fall-out is sometimes has.

> M. Janosik
>
> "ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU> wrote on 2018-01-09
> 17:58:09:
>
>> From: Remco Post <r.post@PLCS.NL>
>> To: ADSM-L@VM.MARIST.EDU
>> Date: 2018-01-09 17:59
>> Subject: [ADSM-L] TSM 8.1.1 on Linux crash
>> Sent by: "ADSM: Dist Stor Manager" <ADSM-L@VM.MARIST.EDU>
>>
>> Hi All,
>>
>> over here we have a few TSM servers on Linux (RHEL 7.4) TSM 8.1.1.
>> 021 (to be upgraded to .100 soon) and we see something new that we
>> never saw on AIX. If for some reason a tape gets left in a drive
>> (IBM 3592) the only way to get the drive working again in to reboot
>> the drive. Until then TSM is unable to open the drive (error 16).
>> Unfortunately we seem to be able to reliably cause severe issues in
>> TSM by rebooting a tapedrive:
>>
>> kernel: NMI watchdog: BUG: soft lockup - CPU#6 stuck for 22s!
> [dsmserv:6021]
>>
>> The only way out is to reboot the entire server.
>>
>> We never had such issues with TSM on AIX… is this something Linux-
>> specific that we can hang TSM in non-interuptable routines (kernel
>> space) by simply rebooting a tape drive?
>>
>> --
>>
>> Met vriendelijke groeten/Kind Regards,
>>
>> Remco Post
>> r.post@plcs.nl
>> +31 6 248 21 622
>>

--

Met vriendelijke groeten/Kind Regards,

Remco Post
r.post@plcs.nl
+31 6 248 21 622
This message was imported via the External PhorumMail Module
Harris, Steven
Re: TSM 8.1.1 on Linux crash
January 09, 2018 05:59PM
Remco

Can you please explain what the fall-out is?

I'm using TSM 7.1.0 on AIX and have issues with LTO6 drives emulating LTOs. Sometimes I just power cycle the drive and that clears the problem, but that does not always work.

Whilst I don't have any linux servers or storage agents in that particular mix, I'd like to understand what you are seeing for future reference.

Thanks

Steve

Steven Harris
TSM Admin/Consultant

Canberra Australia



This message and any attachment is confidential and may be privileged or otherwise protected from disclosure. You should immediately delete the message if you are not the intended recipient. If you have received this email by mistake please delete it from your system; you should not copy the message or disclose its content to anyone.

This electronic communication may contain general financial product advice but should not be relied upon or construed as a recommendation of any financial product. The information has been prepared without taking into account your objectives, financial situation or needs. You should consider the Product Disclosure Statement relating to the financial product and consult your financial adviser before making a decision about whether to acquire, hold or dispose of a financial product.

For further details on the financial product please go to http://www.bt.com.au

Past performance is not a reliable indicator of future performance.
This message was imported via the External PhorumMail Module
Remco Post
Re: TSM 8.1.1 on Linux crash
January 10, 2018 01:59AM
> On 10 Jan 2018, at 02:05, Harris, Steven <steven.harris@BTFINANCIALGROUP.COM> wrote:
>
> Remco
>
> Can you please explain what the fall-out is?

The fall-out is TSM locking up a CPU in an uninterruptible routine, with only one solution: reboot the entire Linux box.

>
> I'm using TSM 7.1.0 on AIX and have issues with LTO6 drives emulating LTOs. Sometimes I just power cycle the drive and that clears the problem, but that does not always work.
>
> Whilst I don't have any linux servers or storage agents in that particular mix, I'd like to understand what you are seeing for future reference.
>
> Thanks
>
> Steve
>
> Steven Harris
> TSM Admin/Consultant
>
> Canberra Australia
>
>
>
> This message and any attachment is confidential and may be privileged or otherwise protected from disclosure. You should immediately delete the message if you are not the intended recipient. If you have received this email by mistake please delete it from your system; you should not copy the message or disclose its content to anyone.
>
> This electronic communication may contain general financial product advice but should not be relied upon or construed as a recommendation of any financial product. The information has been prepared without taking into account your objectives, financial situation or needs. You should consider the Product Disclosure Statement relating to the financial product and consult your financial adviser before making a decision about whether to acquire, hold or dispose of a financial product.
>
> For further details on the financial product please go to http://www.bt.com.au
>
> Past performance is not a reliable indicator of future performance.

--

Met vriendelijke groeten/Kind Regards,

Remco Post
r.post@plcs.nl
+31 6 248 21 622
This message was imported via the External PhorumMail Module
Remco Post
Re: TSM 8.1.1 on Linux crash
January 12, 2018 08:59AM
Today we checked the lin_tape driver versions. Well, in 3.0.23 are a number of fixes for issues introduced in our current level 3.0.20… Found out another nice issue: we’re at redhat 7.4, while lin_tape doesn’t seem to support RHEL 7.4.


> On 10 Jan 2018, at 10:32, Remco Post <r.post@PLCS.NL> wrote:
>
>> On 10 Jan 2018, at 02:05, Harris, Steven <steven.harris@BTFINANCIALGROUP.COM <mailto:steven.harris@BTFINANCIALGROUP.COM>> wrote:
>>
>> Remco
>>
>> Can you please explain what the fall-out is?
>
> The fall-out is TSM locking up a CPU in an uninterruptible routine, with only one solution: reboot the entire Linux box.
>
>>
>> I'm using TSM 7.1.0 on AIX and have issues with LTO6 drives emulating LTOs. Sometimes I just power cycle the drive and that clears the problem, but that does not always work.
>>
>> Whilst I don't have any linux servers or storage agents in that particular mix, I'd like to understand what you are seeing for future reference.
>>
>> Thanks
>>
>> Steve
>>
>> Steven Harris
>> TSM Admin/Consultant
>>
>> Canberra Australia
>>
>>
>>
>> This message and any attachment is confidential and may be privileged or otherwise protected from disclosure. You should immediately delete the message if you are not the intended recipient. If you have received this email by mistake please delete it from your system; you should not copy the message or disclose its content to anyone.
>>
>> This electronic communication may contain general financial product advice but should not be relied upon or construed as a recommendation of any financial product. The information has been prepared without taking into account your objectives, financial situation or needs. You should consider the Product Disclosure Statement relating to the financial product and consult your financial adviser before making a decision about whether to acquire, hold or dispose of a financial product.
>>
>> For further details on the financial product please go to http://www.bt.com.au
>>
>> Past performance is not a reliable indicator of future performance.
>
> --
>
> Met vriendelijke groeten/Kind Regards,
>
> Remco Post
> r.post@plcs.nl <mailto:r.post@plcs.nl>
> +31 6 248 21 622

--

Met vriendelijke groeten/Kind Regards,

Remco Post
r.post@plcs.nl
+31 6 248 21 622
This message was imported via the External PhorumMail Module
Zoltan Forray
Re: TSM 8.1.1 on Linux crash
January 12, 2018 10:59AM
>we’re at redhat 7.4, while lin_tape doesn’t seem to support RHEL 7.4

I wouldn't be too concerned about this. We have installed the "latest"
lin_tape drivers on our RHEL Linux systems waaay before the readme said it
supported the OS level we were installing it on. I wouldn't suspect a
world of difference between 7.3 and 7.4.....

On Fri, Jan 12, 2018 at 11:29 AM, Remco Post <r.post@plcs.nl> wrote:

> Today we checked the lin_tape driver versions. Well, in 3.0.23 are a
> number of fixes for issues introduced in our current level 3.0.20… Found
> out another nice issue: we’re at redhat 7.4, while lin_tape doesn’t seem to
> support RHEL 7.4.
>
>
> > On 10 Jan 2018, at 10:32, Remco Post <r.post@PLCS.NL> wrote:
> >
> >> On 10 Jan 2018, at 02:05, Harris, Steven <steven.harris@
> BTFINANCIALGROUP.COM <mailto:steven.harris@BTFINANCIALGROUP.COM>> wrote:
> >>
> >> Remco
> >>
> >> Can you please explain what the fall-out is?
> >
> > The fall-out is TSM locking up a CPU in an uninterruptible routine, with
> only one solution: reboot the entire Linux box.
> >
> >>
> >> I'm using TSM 7.1.0 on AIX and have issues with LTO6 drives emulating
> LTOs. Sometimes I just power cycle the drive and that clears the problem,
> but that does not always work.
> >>
> >> Whilst I don't have any linux servers or storage agents in that
> particular mix, I'd like to understand what you are seeing for future
> reference.
> >>
> >> Thanks
> >>
> >> Steve
> >>
> >> Steven Harris
> >> TSM Admin/Consultant
> >>
> >> Canberra Australia
> >>
> >>
> >>
> >> This message and any attachment is confidential and may be privileged
> or otherwise protected from disclosure. You should immediately delete the
> message if you are not the intended recipient. If you have received this
> email by mistake please delete it from your system; you should not copy the
> message or disclose its content to anyone.
> >>
> >> This electronic communication may contain general financial product
> advice but should not be relied upon or construed as a recommendation of
> any financial product. The information has been prepared without taking
> into account your objectives, financial situation or needs. You should
> consider the Product Disclosure Statement relating to the financial product
> and consult your financial adviser before making a decision about whether
> to acquire, hold or dispose of a financial product.
> >>
> >> For further details on the financial product please go to
> http://www.bt.com.au
> >>
> >> Past performance is not a reliable indicator of future performance.
> >
> > --
> >
> > Met vriendelijke groeten/Kind Regards,
> >
> > Remco Post
> > r.post@plcs.nl <mailto:r.post@plcs.nl>
> > +31 6 248 21 622
>
> --
>
> Met vriendelijke groeten/Kind Regards,
>
> Remco Post
> r.post@plcs.nl
> +31 6 248 21 622
>



--
*Zoltan Forray*
Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
Xymon Monitor Administrator
VMware Administrator
Virginia Commonwealth University
UCC/Office of Technology Services
www.ucc.vcu.edu
zforray@vcu.edu - 804-828-4807
Don't be a phishing victim - VCU and other reputable organizations will
never use email to request that you reply with your password, social
security number or confidential personal information. For more details
visit http://phishing.vcu.edu/
This message was imported via the External PhorumMail Module
Zoltan Forray
Re: TSM 8.1.1 on Linux crash
February 02, 2018 10:59AM
Remco,

I don't know if you noticed that IBM updated the lin_tape.fixlist file to
include RHEL 7.4. I got a notice about an update but didn't see anything
changing, version-wise. This was the file from November, 2017

Fixlist for Linux IBM Tape Device Driver (lin_tape)

(C) Copyright IBM Corporation 2007-2017

Level Date Description
---------------------------------------------------------------
3.0.23 11/03/2017 - Initial support for RHEL 6.9 (min. kernel version
2.6.32-696)
- Support for LTO8
- Fix for reservation conflict when reserve_6 used
- Fix for failover after device rediscovery (introduced
at 3.0.20)
- Fix for device name after removal (introduced at
3.0.20)
- Fix for read with resid through join interface

This is the file, today.

Fixlist for Linux IBM Tape Device Driver (lin_tape)

(C) Copyright IBM Corporation 2007-2017

Level Date Description
---------------------------------------------------------------
3.0.23 11/03/2017 - Initial support for RHEL 7.4 (min. kernel
version 3.10.0-693)
- Initial support for RHEL 6.9 (min. kernel
version 2.6.32-696)
- Support for LTO8
- Fix for reservation conflict when reserve_6 used
- Fix for failover after device rediscovery
(introduced at 3.0.20)
- Fix for device name after removal (introduced at 3.0.20)
- Fix for read with resid through join interface


and they thought we wouldn't notice
Remco Post
Re: TSM 8.1.1 on Linux crash
February 03, 2018 02:59PM
Hi,

how nice of them. Now if only they would fix the bug that crashes the kernel if you remove/reboot/access a tape drive at the wrong time…

> On 2 Feb 2018, at 19:33, Zoltan Forray <zforray@VCU.EDU> wrote:
>
> Remco,
>
> I don't know if you noticed that IBM updated the lin_tape.fixlist file to
> include RHEL 7.4. I got a notice about an update but didn't see anything
> changing, version-wise. This was the file from November, 2017
>
> Fixlist for Linux IBM Tape Device Driver (lin_tape)
>
> (C) Copyright IBM Corporation 2007-2017
>
> Level Date Description
> ---------------------------------------------------------------
> 3.0.23 11/03/2017 - Initial support for RHEL 6.9 (min. kernel version
> 2.6.32-696)
> - Support for LTO8
> - Fix for reservation conflict when reserve_6 used
> - Fix for failover after device rediscovery (introduced
> at 3.0.20)
> - Fix for device name after removal (introduced at
> 3.0.20)
> - Fix for read with resid through join interface
>
> This is the file, today.
>
> Fixlist for Linux IBM Tape Device Driver (lin_tape)
>
> (C) Copyright IBM Corporation 2007-2017
>
> Level Date Description
> ---------------------------------------------------------------
> 3.0.23 11/03/2017 - Initial support for RHEL 7.4 (min. kernel
> version 3.10.0-693)
> - Initial support for RHEL 6.9 (min. kernel
> version 2.6.32-696)
> - Support for LTO8
> - Fix for reservation conflict when reserve_6 used
> - Fix for failover after device rediscovery
> (introduced at 3.0.20)
> - Fix for device name after removal (introduced at 3.0.20)
> - Fix for read with resid through join interface
>
>
> and they thought we wouldn't notice
Sorry, only registered users may post in this forum.

Click here to login