What is the primary purpose of an All Flash Array?

Vaughn Stewart at Pure got me thinking. He posted a table attempting to compare the capabilities of a range of competing All Flash Arrays (AFA) including the Pure 400 Series Array. Really it tries to paint Pure in the best competitive light.                                             Vaughn’s Post

Some things struck me about the table apart from its inaccuracies. (Which I will cover later)

Vaughn has 2 categories of AFA’s. Performance Focused and Storage Efficient.

In the First category was our Violin 6000 Series Array and the IBM 820, in the 2and was EMC Xtremio and Pure. 3Par subsequently volunteered their 7XXX series Array for this category in a follow-on posting.

Perhaps Vaughn could help me here! Yes I know Tier 1 AFA’s need to store data reliably and in a cost effective way, but the primary function of an AFA is to provide high performance storage, ideally as fast as possible.

Has he just created a category of AFA to describe devices, which are not that wonderful at their primary function?

Are Pure, EMC and 3Par like a car design team who tried to design a Ferrari beating Supercar but ended with a slightly sporty Red Family Sedan with a big spoiler and go faster stripes down the side but at a Ferrari price point.

Not this ImageThat Image

Storage Efficient vendors are doing what they can to make the best of a bad deal but they are in the same position as the Red Sedan Design team, I know lets stop talking about our babies Track credentials and emphasise its ability to transport more than 2 people plus luggage from A to B.

This may seem harsh but its performance that solves customer problems not capacity and performance should be the Primary design goal for AFA’s. Its performance without hotspots that:

  • Improves application response times.
  • Allows customers to support more VDI clients and server VM’s per core.
  • Allows customers to consolidate DBMS instances.
  • Allows customers to have servers closer to full capacity as with no hotspots performance is more consistent.

This is the transformational opportunity AFA’s offer to IT organisations. But it is conditional on the AFA providing low and consistent average response times with high IOPs.

So what’s a few msec amongst friends, does it matter that a Violin Array can sustain sub 1 msec times under load without any significant distortions due to GC while a Capacity Focused Array may average 2+ with occasional bumps up to 20 and in some cases 50 when GC kicks in.

After all 2+ is probably better than what you could do with enough spinning disks!

But high performance storage is good because:

  • It improves application response times. <1+ vs 2+ msec can make the difference between success and failure, for example low latency trading, telecoms fraud analysis and a rapidly growing range of real-time applications generate revenue or reduce cost to a business based in part on how fast they are. For these kinds of apps 2+ msec costs them more and/or generates less revenue than <1 msec.
  • It allows customers to consolidate their infrastructure more effectively, more VDI users per core etc. Again 2+ msec vs <1 msec matters because at 2+ with lumps your consolidation effect is less.
  • If your AFA is highly consistent you can reduce headroom. Further consolidating your server infrastructure. If your Array indulges in unpredictable GC related bumps, then you need to allow more headroom.

So a Storage Efficient array saves you less and if you need real-time data access also reduces your potential revenue.

Sadly Storage Efficient Arrays don’t cost you less. Customers pay similar $/GB for both types of Array. If your measure is $/IO then Storage Efficient Arrays are more expensive. And they save you less.

This might be ok if the Storage Efficiencies claimed by the Storage Efficient Array vendors were actually realizable for the target data sets being hosted on Tier 1 Arrays. But here’s the rub, mostly they are not.

There is a solution of sorts to the 50 msec response times caused by the Storage Efficient AFA’s making no attempt to manage Flash Garbage collection. Just make sure you never fill your Array over about 80%, this reduces the problem a bit but it plays merry hell with the Storage Efficient Story.

As for  data types, Unstructured Data is a great use case for de-duplication and this is where many of the headline grabbing reduction ratios come from. But would you put unstructured data on Tier 1 storage? I don’t think so!

Some types of VDI, but not all and a smaller subset of Server Virtualization are fits for de-duplication.

After that you start running out of Tier 1 use cases with DBMS’s which are probably the most compelling use case for all Flash Arrays being the least amenable to de-duplication.

Hidden in  all the de-dupe hype Pure themselves admit that simple Compression is 66% more effective than de-dupe  at optimising both DBMS and Virtual Server Storage and its available in the DBMS. They also admit that Thin Provisioning more than doubles the effectiveness of their Storage Efficiency Story.

Alarmingly if you are Pure their  collateral shows that a Violin 6000 series Array which has Thin Provisioning coupled with DBMS compression could achieve by their measures close to 6x storage reduction but without the obvious perfromance disadvantages of having always on de-duplication.

Bizarrely  Pure et al’s strategy mirrors that of Disk Drive Manufacturers they are trying to replace. After the 2000 release of the 15K drive, faced with difficulties in getting more IO’s from spinning disks they focused mainly on increasing capacity. Customers now throw away increasing amounts of storage to allow for short stroking and other space wasting performance techniques as drives get denser.

Given the IO limits inherent within the Storage Efficient Array’s, lumpy GC, unpredictable performance, heavy write penalties due to de-dupe, like spinning disks you can see scenarios where customers will have to buy more and more capacity arrays which they don’t fully utilise. Short stroking but for Flash!!

To be fair the “Storage Efficient” suppliers don’t try to make a case that their Arrays are high performance devices, if they had we would be able to download proof point benchmark reports such as TPC, Vmware etc to support their case but you cannot. But then they are also failing to make a case for their devices primary purpose!

By focusing on the wrong thing Pure, EMC et al have missed the opportunity to really transform IT platform infrastructure and that’s a shame.

About lightspeedstorage
In my current life I am Violin Employee and Flash Storage specialist, Father, Skier and Fly Fisherman. Previous lives include incarnations at Sun and Symantec.

One Response to What is the primary purpose of an All Flash Array?

  1. Ian Morgan-Russell says:

    We have corrupted the purpose of storage due to the increase in size of the storage device.
    An application requires an injection of data to work and provide business information. How fast the data stream is when it starts is now of less importance than how long it takes to start the stream! LATENCY is the third storage parameter with capacity and IOPs and now with Virtualisation the most critical.

Leave a comment

Violin Insights

Flash Storage and Spinning Rust

Architecting IT

Flash Storage and Spinning Rust

flashdba

Database Performance in the Cloud

lightspeedstorage

Flash Storage and Spinning Rust