Sunday, February 24, 2013

Driving Efficiency in Hybrid Storage Models

There is no doubt about it, flash technology has completely changed the storage landscape over the past few years, and quite notably for the better. Although crystal balls are cloudy it looks disk based storage still will serve a purpose in the datacenter going forward. NetApp's Larry Freeman has his own thoughts on the future of disks here, but with the density and economics of flash today, and the need to drive efficiencies in cost throughout the datacenter, hybrid storage arrays make a lot of sense for the vast majority of workloads.

Although the concept of hybrid arrays are simple, the technical details are not. In the most uncomplicated terms hybrid arrays use large capacity hard drives to hold inactive data that needs to be accessible, and performance for actively accessed data is handled by high speed flash. It is important to pay attention to the term "actively accessed" since a lot of our competitors don't do real-time data promotion to flash. Even technologies they claim are real-time may require several subsequent reads of that same data in order to promote it to the fast access tier. For post-process or pre-staging data to flash there needs to be scheduled jobs that run based on data access patterns. When we start talking about staging and de-staging data from flash to HDD it is critical to do it efficiently. If you don't have the ability to take those data blocks in flash and lay them down rapidly to disk all you have done is moved when your IO bottleneck occurs. The challenge most customers have is when to defer that data movement. ETL processes, batch jobs, data backups, and report generation are tasks that run when user load can be minimized. Moving large amounts of data from flash to disk is non-trivial, and often when storage systems are the most busy is after hours making it very difficult to find a good window for extra load.

One of the ways NetApp drives efficiency to disk is by coalescing random IO operations and making them sequential on disk. We have been doing this for decades, and quite simply our algorithms for data placement, block optimization, and maintaining locality of reference for data over time is the best in the industry. 20 years of engineering gives us a lot of experience in this arena, and our competitors -- both old and new -- have been feverishly trying to duplicate our model for intelligent data handling. This quick video below shows (at a very high level) how we create "write chains" in order to achieve SSD like write speeds to disk.

These read and write chains happen on a per-disk basis, and since we pool our disks together in large groups, we can drive tremendous amounts of IO to spinning media. Imagine the process above across 50 disks. Where we are generating ~50 back-end IOPS at RAID-6 levels of data protection, the vast majority of our competitors will need around 4,500 to do the same (15 IOPS per drive * 6 for the RAID-6 write penalty). That is a massive difference, and one that will require over purchasing disks in order to meet data movement windows. Due to increased external pressures from cloud based service providers overbuying infrastructure is not something internal IT departments can afford.

Ray Lucchesi at Silverton Consulting has a great post here where he charts some SPC-1 results for hard drive based arrays. One of the interesting things the article discusses is how sophistication matters in storage arrays. The results on the very high performing workloads (greater than 250,000 IOPS) showed the need for at least 1,000 drives in order to hit those impressive numbers with traditional disk arrays. The NetApp submission that used our hybrid approach of virtual storage tiering, and a slew of other disk optimization technologies including the write coalescing goodness above, achieved a number of 250,039 IOPS with only 432 drives.

I don't have all the nice data Ray used to generate the graph, but I did eyeball the NetApp submission for our 6 node cluster and added it to the chart. It is very striking where our system appears on the list. We are the only hybrid array vendor to do over 250,000 IOPS with less than 1000 drives, and we actually did it with 62% fewer drives than our nearest competitor; that's efficiency!


Anonymous said...

Note from Terry at ACL:

Fix line two. It says:
Although crystal ball's are cloudy it looks disk

it should say:
Although crystal balls are cloudy it looks like disk

Anonymous said...

Erick, I have a question on Chains. Would a Chain be considered a 256k chunk of data written to one disk or multiple 256k chunks of data writtens to multiple disks in a aggregate?

Erick said...

Thanks Terry! adjusted. I hope all is going well!

Anon, it is 256K per disk that get's written. There is a field that can be tuned to adjust the amount of data that gets written per CP operation to disk.

Jodi W. Brown said...

but with the density and economics of flash today, and the need to drive efficiencies in cost throughout the datacenter, hybrid storage arrays make a lot of sense for the vast majority of workloads.become driving instructor