Tech Field Day Post 1: Compellent and Nimble Storage

I spent two and a half days last week with a bunch of miscreants collected from around the globe (USA, Scotland, Australia, Nigeria, Holland, and — of all places — Ohio). We called it Seattle Tech Field Day, and it was organized by none other than my friend Stephen Foskett (and, of course, his right-hand, Claire Chaplais). For two exhausting days we experienced death-by-Powerpoint and listened to several vendor pitches, and we grilled said vendors about the strengths and weaknesses of their various approaches.

I was not paid to attend this event, but it did not cost me to attend it. I had free meals and drinks on these guys, and I got a few chotzkies, but I am under no obligation to blog about what I saw. So please consider the blogs I do write about this event to be products I found genuinely interesting.

This is the first of a few Tech Field Day blog posts that I will write, and it will cover Compellent and Nimble Storage. Both companies offer block-based arrays, which means they are SAN offerings, not NAS offerings. (Compellent does offer two NAS options by using a Windows Storage Server & CIFS or Nexenta and ZFS, but in the end that’s just a NAS head in front of a SAN array.) Both systems support thin provisioning, redirect-on-write snapshots (more on that later), and replication. Nimble is an iSCSI only array, and Compellent offers both iSCSI and Fibre Channel interfaces. Both companies believe that they are offering less expensive ways to store your data while increasing performance.

Compellent’s main emphasis is on automated storage tiering. They support SSD, SATA, SAS, and Fibre Channel drives natively within their enclosure, and they see each of these as a tier of storage to which they can move .5M, 2M, or 4M blocks of data depending on how much they are being used. The most-used blocks would be on SSD, the least-used blocks would be on SATA, etc. Their belief is that by automatically putting the appropriate blocks on the appropriate tiers of storage, you could store much more data for much longer periods of time for much less money – without having to worry about what to put where.

Nimble’s story is very different. They feel that the I/O requirements of all middle-enterprise customers can be met using SATA drives that are front-ended with an SSD cache. Where Compellent sees SSD as a tier on which to put the most recently used blocks, Nimble sees SSD as a place to store a copy of the most recently used blocks. Since they’re not using SSD as a tier, they can use less expensive versions of SSD drives and they do not have to put them in a RAID array and suffer the performance and capacity overhead associated with that. (If they lost an SSD drive, they have simple lost a cached copy of data and can reload it, as opposed to Compellent that would need to restore any data stored on a failed SSD volume.) While their competitors might argue that this means they’re storing two copies – which costs money – they argue that their second copy is on SATA disk – the cheapest of all disks.

The other thing that makes Nimble’s story different is that they support inline compression of all blocks before they’re even written to the flash cache. This increases performance as the blocks are written to cache, and even more when written to the SATA disks – and it saves capacity too.

Now let’s move on to the important stuff – backups! Anyone who reads this blog knows that I’m a fan of storage systems that offer snapshots and replication. I believe that, with proper management and reporting software, they can meet the backup and recovery needs of many (if not most) of today’s businesses – without requiring them to resort to traditional backup software. (I also call the combination of snapshots and replication “near-CDP,” but not everyone likes it when I do that.)

In order for snapshot-based backup to work, I thoroughly believe that the array vendor must not use copy-on-write snapshot technology. Copy-on-write is fine for creating a snapshot that you’re going to backup using other methods and then expire, but you cannot use a copy-on-write snapshot system to create hundreds of snapshots and keep them for multiple months, which is a requirement for snapshot-based backup to replace traditional backup software. If you were to do this with a copy-on-write array, you’d find your I/O performance significantly degraded over time. (One SE from a copy-on-write array vendor suggested that if a customer of mine kept 90 days of snapshots on their array, its performance would decrease by 50%.)

Redirect-on-write snapshots, however, do not have this performance issue, which is why they can keep hundreds of snapshots for hundreds of days and not suffer a performance penalty. This is why I was excited to hear that both the Compellent and Nimble Storage arrays use redirect-on-write snapshots, as this really fits into my way of thinking about backup. The snapshot systems of both are very similar, but Nimble is claiming that their snapshots are created with a much lower level of granularity that significantly reduces the amount of storage that their snapshots use. They had a use case where they compared 90 days of snapshots with their system to 90 days of snapshots with another “leading iSCSI vendor” (Gee, I wonder who that could be), and the difference was startling and resulted in a significant savings when compared to “the leading iSCSI vendor.”

Compellent is a four eight year old company with many customers. They said Gartner said that they were the 3rd fastest growing storage array vendor, but what that says to me is that they’re losing the “new storage array vendor” race. What I mean is that EMC, NetApp, HDS, IBM, and HP are not going to be going to be growing at the same rate as a storage array startup, and there’s not that many storage startups. So I think that being the 3rd fastest-growing storage array vendor means you’re towards the end of the pack, not in the front. (Update: That’s what I get for editing video while listening to a vendor pitch.)

Nimble, on the other hand, is a brand new company that announced its product at Tech Field Day last week. The product looks interesting, but we don’t know yet if anyone’s going to buy it.

I wish both products the best of luck!

Written by W. Curtis Preston (@wcpreston), four-time O'Reilly author, and host of The Backup Wrap-up podcast. I am now the Technology Evangelist at Sullivan Strickler, which helps companies manage their legacy data

11 comments
  • Hi Curtis, it was good seeing you in Seattle. Thanks for the great feedback. Though I wanted to just clarify a couple things – Compellent is eight years old (founded 2002, product shipping since ’04). And we’re the fastest growing SAN company 3 years in a row according to Gartner market reports, not the 3rd fastest growing. Not sure if this changes anything but I wanted to point this out. I guess I should have hovered even closer during our session! 🙂

    Liem Nguyen, Compellent corp. communications director

  • Curtis,

    Thanks for the writeup! I enjoyed your wisdom and stories in Seattle last week (I do have note to myself to avoid you at all costs when I need to go through International Customs – I hope you understand 🙂

    One minor nit:

    Compellent’s main emphasis is on automated storage tiering. They support SSD, SATA, SAS, and Fibre Channel drives natively within their enclosure, and they see each of these as a tier of storage to which they can move .5K, 2K, or 4K blocks of data depending on how much they are being used.

    I think you meant .5MB, 2MB or 4MB pages (not KB).

    Dan Leary

  • Feel free to knock this comment if you’d prefer not to have hearsay comments….but at a recent event I was talking with a Compellent salesman whose blunt take was that the company was just holding out long enough to find a buyer….that matches up with your analysis overall I’d say.

  • Hey Curtis, I’ve been testing the Nimble array for a while now and it’s worked well for my limited mixed workloads. I haven’t had the time to really stress the box too much (stupid day job) but all of the bits are working as expected. The abuse will also be ramping up shortly using production workloads and that will really start to give some usable information. I need to talk to the Nimble guys before I say any more but it is pretty exciting stuff 🙂

  • Hi Curtis,
    I was struck by your comment about the performance penalties of copy-on-write if used for persistent snapshots. Perhaps this is off-topic, but that’s always been a question for me about Microsoft’s Volume Shadow Copy functionality. If you have a lot of snapshots building up, there is surely going to be a performance hit on the system. If you offload the snapshot to another server, as Data Protection Manager does, that is surely going to exacerbate the problem.

    Yours,
    Mark Alexander

  • I’m not sure what you mean when you say that transferring it to another server would exacerbate the problem.

  • The performance limitation of copy-on-write is that any writes have to wait for the old data to be copied to the snapshot before the new data gets written. Once there are multiple snapshots to maintain that overhead grows. My understanding of DPM is that the snapshot resides on a centralised server, the DPM server. I’m not familiar with the internals of DPM, but there would have to be some sort of local cache of the snapshot to avoid the additional overhead of copying the old data over the network to the DPM server before writes can be completed.
    Hope that’s clear.

  • Many folks tend to forget that redirect snapshot models invariably cause data layouts to become fragmented over time. As snapshots are deleted and changes are made to existing data blocks, holes of free space start to emerge and the data layout can start looking like Swiss cheese. The free space is still there, but it’s randomly scattered across the system and no longer contiguous on disk. If all your workloads are random in nature, you’d probably never notice any difference in performance.

    Usually said storage vendors have to kick off some kind of post process to attempt to realign data segments and blocks in a more contiguous manner. These processes can take a long time to complete, need to be scheduled and in some cases can cause snapshots to bloat.

    The key trade off that folks tend to forget about (with redirect models vs. COFW approaches) is with workloads that are sequential in nature –i.e. backup steams, data warehouses + DB indexes, some OLTP databases, parts of Exchange 2010, transaction logs etc. With redirect models, sequential reads from a host can very easily end up turning into random reads when the back end storage system has to do a lot of heavy lifting (think: random seeks) to put together a bunch of blocks into a large stripe that would otherwise be incredibly fast on a traditional storage system.

    It would be nice if SSD helped with large block sequential I/O but it doesn’t really perform much better than spinning rust for sequential I/O.

  • Jonas, your concern that redirect-on-write (ROW) can degrade sequential
    performance is valid, at least with how ROW has been implemented
    traditionally. However, Nimble is architected to avoid this drawback
    while retaining the advantage of ROW over COW. (COW does not support
    frequent snapshots efficiently, because it does an extra copy whenever a
    block is first written after a snapshot).

    ROW systems can indeed develop free blocks that are randomly distributed
    on the physical disk, resulting in the Swiss cheese pattern you mentioned.

    Traditional implementations of ROW write by filling these holes, thus
    translating logically sequential writes to physically random writes.
    Furthermore, a logically sequential read of this data will also
    translate into physically random reads.

    However, in the Nimble architecture, writes are _always_ physically
    sequential. It is truly “log structured”, and runs a cleaning process
    in the background. This process squeezes out the holes, aggregating
    small holes into large ones, which can then be written sequentially.

    Traditional ROW systems also run a cleaning process occasionally, but,
    as you noted, the process is generally heavy weight. On the other hand,
    Nimble is designed with efficient cleaning in mind; for example, its
    internal indexes are optimized for cleaning. Cleaning is the rule, not
    the exception. And writes are guaranteed to be sequential.