Load-balanced tape drives?

I keep running into a particular problem at customers and I'm curious if any backup software products have addressed it.  Do any backup products load-balance their use of tape drives across multiple Fibre Channel ports?  Click Read more to see more details.

The problem starts with having more tape drives behind each Fibre Channel port than the port can actually support.  This often happens when you're sharing a bunch of tape drives between several servers.  For example, say you're sharing 20 tape drives between 10 servers, and 10 are available via one FC port, and 10 are available via another FC port. 

If each server gets two tape drives, everything's fine.  Even if a given server grabs two tape drives on one FC port, that port can usually handle streaming two tape drives. 

BUT, what if a server want's four tape drives, and is able to grab them?  NOW what happens if it grabs all four tape drives from the same port?  Now the four tape drives are sharing a single FC port that can only handle two tape drives.  

But if the backup software were SMART, it would attempt (if it could) to grab two tape drives from each of the two FC ports.  But I don't see any backup software products doing this.

Do you? 

Written by W. Curtis Preston (@wcpreston), four-time O'Reilly author, and host of The Backup Wrap-up podcast. I am now the Technology Evangelist at Sullivan Strickler, which helps companies manage their legacy data

13 comments
  • On the software side, I know the IBM Atape driver can handle multipathing… not sure if it can round robin or how it handles things. It’s a start at least…

    Might want to look into actual appliances to do this type of thing. I think you could set something like this up via StorageTek SN6000 (which I think is EOL, and highly unreliable IME). You would present a set of virtual drives to the system via ACSLS… say, 2 per FC port… and the SN6000 would send the data to any physical drive that was available behind it. There may be something from Falconstor that would be appropriate as well.

    Neither of these exactly fit the bill for what you want, but with some ingenuity you can combine technologies to achieve this goal, I believe.

  • haha, agreed. I had a party when I got rid of my SN6000 3 years ago, in favor of EMC CDL.

    The best part was whenever I would put in a support call, and none of the Sun/STK guys had much of a clue as to how the thing worked, it was just voodoo magic. The solution to every problem was rebooting the SN6000 and ACSLS in the right order.

  • i don’t know of anything like that, but on NBU you can limit the max amount of drives to use, so that way you could impose a limit of the amount of drives to use at once.

  • I can’t count the number of times we have asked Symantec to make Netbackup HBA aware and load balance across them. To date there is nothing you can do but limit the number of drives the server has access to so as to not overload an HBA.

    The problem with the limit max number of drives is you end up having to set the number to the maximum number you would allow on any one HBA.

  • Maybe I am strange, but I hate backup. Traditional Backup. As we all know, it is about restoration, not backup, and recovering anything from tape is brutal. We spend a majority of our time helping to design solutions that eliminate “backup” by building on the ability to restore files, systems, and sites. Obviously, files must be restored instantly, and systems in a boot cycle, and then sites in a time frame that makes fiscal sense. All this can be done, thereby relegating tape to archive, not primary restoration. The larger the amount of data, the more this approach makes sense.

    If I am not mistaken, the amount of data continues to grow at incredible rates making this type of approach even more imperative.

    Paul Clifford
    Davenport Group
    http://www.davenportgroup.com

  • First, I agree that it’s not about backup; it’s about restore. I disagree that backup = tape, or even that backup = what you would think of as backup software. I’d also say it’s about balancing restore speed (RTO) to system cost. Not everyone has the requirements to drive them to the types of systems you’re talking about. Traditional backup meets the requirement of 90+% of applications. Near-CDP & CDP are great, but why use the more expensive, more complex solutions if the customer has no requirement to do so? Finally, a traditional backup system can also be a framework for controlling all the different ways you are protecting your data.

    I hope you enjoyed the free plug for your company. Maybe I’ll go over to your user group and post a free plug for my site. ๐Ÿ˜‰

  • Hi,

    Surely this is a capacity planning thing… If you have 10 tape drives that could all be used at the same time then you’d need to distribute them across multiple fibre channel ports to ensure you have enough bandwidth. For LTO3 drives for example at 78MB/sec native you’ve need 5 2gbit FC ports for 10 drives so 10 ports for all 20 drives. I’ve come across a similar situation recently and had to advise the customer to install more HBAs as they simply couldn’t handle the bandwidth.

  • As I said in the post, the problem comes when you’re sharing tape drives between servers, such as having 20 tape drives shared between 10 servers. All 20 servers have access to all 20 drives, but we’ll configure backups so that no server will ask for more than two-four drives at a time. Therefore, he doesn’t need enough bandwidth for 20 drives; he only needs enough bandwidth for 2-4 drives, so two FC connections should be enough. The problem is that with only two connections, we’re going to configure 10 of the 20 drives on one connection and 10 on the other. Then when that server’s backups kick off, we WANT him to grab two drives from one FC connection and two drives from the other FC connection. BUT, with most backup software, he’ll grab the first four drives he sees, which are probably on the first FC connection.

    Sure, we could solve the problem in a number of ways:

    1. We could make sure that each server has as many FC connections as we want its drive fanout to be, but that means a lot of HBAs that we really don’t need.

    2. We could only configure the drives on each server that we want it to use. But the one of the reasons we’re sharing 20 drives with 10 servers is to allow each of them to grab as many drives as they need without having to specify which drives to grab. If server A is only allowed to see 4 of the 20 drives, what happens when some of the 4 it can see are not available? Bummer.

    I just think it wouldn’t be that hard for backup software to make sure that when they grab the next tape drive that they grab one on a different port.

  • Interesting. Would setting up multipathing across the two HBAs (in a load-balance config) help with this?

    When done right, the tape drives are assigned under the ‘virtual’ controller. Cross-connect the two FC switches that have tape drives on them, and place half the drives on each switch.

    Yes, I know that’s a lot of redundant hardware (we use this in SAN disk environments for HA), but in theory it would overcome the vendor’s failure to load-balance by offloading the logic to the OS and FC switches.

    BTW – am I the only one who would like to see tape drives with the option of having two FC ports?

    –TSK

  • This is already in place in 2 backup applications that I know of right now, both products are from Novastor. One being the NovaNET product where it will shed load over multiple devices concurrently. The other would be the Hiback iXT application where it will utilize multiple tape devices concurrently.

  • I have a question related to the optimal data transfer rate of a path itself.

    Is there a way to determine the optimal number of tape mounts from a standard Windows client transfering files in a fixed size (20GB, 35GB, and 73GB) over Gb copper to a specific TSM/CDL Environment (4Gb trunked P595 TSM5.5 AIX5.4 LPAR connected to EMC CDL 710 over 4x 2Gb Paths over MCdata director presenting a scalar i2000 with 16x sdlt320 drives)?