This is a cross post from my Metis IT blogpost, which you can find here.
VMware VSAN 6.2
On February 10 VMware announced Virtual SAN version 6.2. A lot of Metis IT customers are asking about the Software Defined Data Center (SDDC) and how products like VSAN fit into this new paradigm. Let’s investigate what VMware VSAN is, and what the value would be to use it, as well as what the new features are in version 6.2
VSAN and Software Defined Storage
In the data storage world, we all know that the growth of data is explosive (to say the least). In the last decade the biggest challenge for most companies was that people just kept making copies of their data and the data of their co-workers. Today we not only have this problem, but storage also has to provide the performance needed for data-analytics and more.
First the key components of Software Defined Storage:
Abstraction: Abstracting the hardware from the software provides greater flexibility and scalability
Aggregation: In the end it shouldn’t matter what storage solution you use, but it should be managed through only one interface
Provisioning: the possibility to provision storage in the most effective and efficient way
Orchestration: Make use of all of the storage platforms in your environment by orchestration (vVOLS, VSAN)
VSAN and Hyper-Converged Infrastructure
So what about Hyper-Converged Infrastructure (HCI)? Hyper-Converged systems allow the integrated resources (Compute, Network and Storage) to be managed as one entity through a common interface. With Hyper-converged systems the infrastructure can be expanded by adding nodes.
VSAN is Hyper-converged in a pure form. You don’t have to buy a complete stack, and you’re not bound to certain hardware configurations from certain vendors. Of course, there is the need for a VSAN HCL to make sure you reach the full potential of VSAN.
VMware VSAN 6.2. new features
With the 6.2 version of VSAN, VMware introduced a couple of really nice and awesome features, some of which are only available on the All-Flash VSAN clusters:
Data Efficiency (Deduplication and Compression / All-Flash only)
RAID-5/RAID-6 – Erasure Coding (All-Flash only)
Quality of Service (QoS Hybrid and All-Flash)
Software Checksum (Hybrid and All-Flash)
IPV6 (Hybrid and All-Flash)
Performance Monitoring Service (Hybrid and All-Flash)
Dedupe and compression happens during de-staging from the caching tier to the capacity tier. You enable “space efficiency” on a cluster level and deduplication happens on a per disk group basis. Larger disk groups will result in a higher deduplication ratio. After the blocks are deduplicated, they are compressed. A significant saving already, but combined with deduplication, the results achieved can be up to 7x space reduction, off course fully dependent on the workload and type of VMs.
New is RAID 5 and RAID 6 support over the network, also known as erasure coding. In this case, RAID-5 requires 4 hosts at a minimum as it uses a 3+1 logic. With 4 hosts, 1 can fail without data loss. This results in a significant reduction of required disk capacity compared to RAID 1. Normally a 20GB disk would require 40GB of disk capacity with FTT=1, but in the case of RAID-5 over the network, the requirement is only ~27GB. RAID 6 is an option if FTT=2 is desired.
Quality of Service
This enables per VMDK IOPS Limits. They can be deployed by Storage Policy-Based Management (SPBM), tying them to existing policy frameworks. Service providers can use this to create differentiated service offerings using the same cluster/pool of storage. Customers wanting to mix diverse workloads will be interested in being able to keep workloads from impacting each other.
Software Checksum will enable customers to detect corruptions that could be caused by faulty hardware/software components, including memory, drives, etc. during the read or write operations. In the case of drives, there are two basic kinds of corruption. The first is “latent sector errors”, which are typically the result of a physical disk drive malfunction. The other type is silent corruption, which can happen without warning (These are typically called silent data corruption). Undetected or completely silent errors could lead to lost or inaccurate data and significant downtime. There is no effective means of detection these errors without end-to-end integrity checking.
Virtual SAN can now support IPv4-only, IPv6-only, and also IPv4/IPv6-both enabled. This addresses requirements for customers moving to IPv6 and, additionally, supports mixed mode for migrations.
Performance Monitoring Service
Performance Monitoring Service allows customers to be able to monitor existing workloads from vCenter. Customers needing access to tactical performance information will not need to go to vRO. Performance monitor includes macro level views (Cluster latency, throughput, IOPS) as well as granular views (per disk, cache hit ratios, per disk group stats) without needing to leave vCenter. The performance monitor allows aggregation of states across the cluster into a “quick view” to see what load and latency look like as well as share that information externally to 3rd party monitoring solutions by API. The Performance monitoring service runs on a distributed database that is stored directly on Virtual SAN.
VMware is making clear that the old way to do storage is obsolete. A company needs the agility, efficiency and scalability that is provided by the best of all worlds. VSAN is one of these, and although it has a short history, it has grown up pretty fast. For more information make sure to read the following blogs, and if you’re looking for a SDDC/SDS/HCI consultant to help you in solving your challenges, make sure to look for Metis IT.
I’m really exited to see the VMware VSAN team during Storage Field Day 9, where they will probably dive deep into the new features of VSAN 6.2. It will be an open discussion, where a I’m certain that the delegates will have some awesome questions. Also I would advise you to watch our earlier visit to the VMware VSAN team in Palo Alto about a year ago, at Storage Field Day 7 (Link)
During Storage Field Day 7 we had the privilege to get a presentation from the founders of Springpath. Springpath is a start-up which came out of stealth a couple of weeks ago and is trying to solve one of the major problems in the datacenter, storage, through a software only solution. Surely it still needs hardware, but Springpath is one of those few companies which provide you with an excellent peace of software to put on top of the hardware you choose, although there still is a HCL for supported hardware. Please watch the Springpath HALO Architecture Deep Dive below for a deep dive into this solution (promise it is worth your time):
In the datacenters around the world companies are struggling with the datagrowth and it’s related cost. Where a lot of companies were used to buying server hardware seperate from storage, the price of scaling both silos independantly creates a lot of friction between the people managing these silos within the IT department. A lot of the older SAN’s are purely Scale Up and we all know that might be effecient enough for capacity, but the problems arise when the need excists for an increas storage performance.
The solution is in the software!?
The last two years, or so we’re hearing that the solution for all are datacenter problems are in the software. Software Defined Everything (which off course includes Software Defined Bacon :D) is the credo these days. Building upon this believe Springpath made their choice to only provide software for their customers, which can then leverage their own hardware, either already in place or newly bought. For now, and to be honest I don’t know if this will change at any given time, but the HCL now includes Cisco, HP, Dell and SuperMicro. Which is a large piece of the datacenter pie, if you ask me…
To leverage the full potential of hardware we always needed the versatility that software could give us. Only in the last couple of years it seems that there finally is a synergy between the two. Let’s be honest, a great Software Defined DataCenter can only be build with great software that leverages great hardware. Why would there otherwise be HCL’s still in place for almost all of the software suppliers.
Back to Springpath
Springpath is the next in anever growing line of vendors trying to leverage the storage problems through software. Although not that many provide you with a software solution only, there are still a couple of companies trying to provide a (kind) of similar solution. With services like inline deduplication, inline compression and the chance to use 7200 RPM SATA disk along with Flash and DRAM, is something we see more and more in the industry. So you have to bring other or better solutions to differentatiate from competitors. First bringing a software only solution is a different solution than most of the other players in this market, although Maxta does the exact same thing.
Looking High level at the DataPlatform gives you a feeling of the great potential this platform :
If you look at the whole picture, you’ll see a solution that will serve legacy as well as future applications as well as legacy as future storage protocols. Again, this is where Springpath takes a different approach to many of it’s competitors. Let’s dive a little deeper into the HALO architecture;
All Application data is striped across the servers in a server pool, and not only to the server the application is located. This way the applications can use all compute resources within the springpath Platform Software (SPS). Utilizing this kind of Data distribution leverage scaling performance as well as capacity when servers are added, and removing I/O bottlenecks on single server.
Like competitors like VSAN and Maxta reads and writes are cached at the Flash layer, giving a high performance rate. A write is acknowlegded as soon as it lands on FLASH and is replicated to the other flash resources in the SPS cluster, to make sure written data is secure. Hot data sets are kept in cache (Flash and DRAM) and only written to the capacity tier (which can be any type of disk, even 7200 SATA) when it becomes cold.
With HALO you’re able to seperate the performance and the capacity. Making it easier to scale independently tiers is a big gain that comes with these hyperconverged storage pools and it’s a great thing to be able to add capacity if you run out of space and and performance if that’s resource you’re getting short in.
HALO does inline deduplication as well as inline compression. The inline compression is done in variable sized blocks. Doing an inline variable sized block compression is one of those competitve edges Springpath has, using the sequantial data layout used in the HALO architecture.
HALO provides many Data Services like snapshots and clones. As all of you probably know these services can be very efficient and in the HALO architecture they can grow to very large numbers. These services help companies to recover data quickly and deliver applications rapidly.
Log Structured Distributed Object
As already mentioned the data layout within the HALO architecture is done in such a way that data is packed into smaller objects which in turn are layed out across a pool of servers in a sequential way. This kind of layout provide better endurance on the flash layer as well as better performance throughout the system. Replication is done in the same manner to make sure data is written in a secure way.
Where to use Springpath technology
There are a lot of ways to use this solution. But (I know there is always a but) as this is a 1.0 solution you may just want to wait a bit before depolying this in your production environment. This doesn’t mean you would not be able to leverage the great benefits the solution brings and spin this software up in parts of your datacenter that aren’t as critical as your production environment. Springpath sees there solution a good fit for the following enviroments:
Test and Dev
Remote office/Branch office
Virtualized Enterprise Applications
Big Data analytics
I’m not sure if these would all be the best fit for the software, but I can see a couple of them being a great fit for exploring the springpath software.
Call home functions
The last thing I want to mention is the call home function (and the Springpath support cloud leveraging this) which springpath calls autosupport. I have a strong feeling they’ve looked at NimbleStorage’s Infosight, which in my opinion is a good thing. Although I hope you have the opportunity to opt-out of this solution, I think this is a very strong feature, as it provides a solution which gives Springpath the power to proactively monitor your system, and thus provide a solution for a problem you even didn’t know you had or might occur when you didn’t take action. As well as give you an insight, through their big data analytics engine to provide insights on configurations, trends and best practices. This would give you a much better insight into your environment making sure it is always performing at is best as well as never running out of capacity.
Make sure to watch the entire #SFD7 Springpath presentation HERE, as well as read these great blogs by my fellow SFD7 delegates:
During Storage Field Day 6 we visited Coho Data HQ for the second time, and if you want to learn you really should watch the videos recorded during the event. First a bit of history on Coho Data which was founded by Andrew Warfield, Keir Fraser and Ramana Jonnala. And if you didn’t know these guys are known by a small thing called XenSource (later acquired by Citrix).
Coho Data introduced their new scale-out hybrid storage solution (NFS for VM workloads) during Storage Field Day 4 a year ago and is a regular Tech Field Day sponsor as they presented during Virtualization Field Day 3. Hybrid in the Coho Data product means they use a mix of PCIe Flash and SATA disks. As said the Flash devices used by CohoData are PCIe based devices (Intel 910 800 GB to be exact, but due to the Coho Data architecture this can be changed easy and the Intel devices are the second kind of flash devices Coho uses in the array, first were Micron).
As you can see in the picture above the Coho Data is build up of a 2U box holding 2 “MicroArrays” that each have 2 CPUs, 2 x 10GbE NIC port and 2 PCIe INTEL Flash cards. With this configuration a 2U block provides 39TB of capacity and around 180K IOPS (Random 80/20 read/write, 4K block size). The Coho Data product offers deduplication and compression as well as replication, High Availabilty and Snapshot technology in their offering . Last but certainly not least, it comes with an OpenFlow-enabled 10GbE switch (Arista) to allow ease of management, scalability and the opportunity to Streamline the data streams.
Diving deeper into the Coho Data DataStream architecture reveals the IO lane technology uses: 10GbE NIC <-> CPU <-> PCIe Flash. All IO lanes have their own CPU, 10 GbE NIC Port and a 800 GB Intel PCIe Flash . With this architecture Coho Data created an easy to scale, high performance storage system. By using the Openflow enabled SDN switch to manage the streams within the whole DataStream environment and giving the customer a SDS solution with the Coho Data MicroArray this is storage at it best.
I hear you think: “what about setting it up and managing the Coho Data offering? It’s probably extremely hard to setup and manage this system.” But it isn’t. You could setup the Coho Data system in about 15 minutes, and once your done you can use the UI to manage and maintain the system easily. Just take a look at the picture below and make sure to watch the Tech Field Day videos to see more on the UI.
What’s the future for Coho Data?
During the presentation there were a couple of questions going around in my head, but because listening to Andy presenting is taking almost all of my brain resources I didn’t ask then. That should be that big of problem, so I asked the questions through mail when I was back in the Netherlands and here are the questions and answers:
Q.You mentioned that with 1 PCIe flash device you were able to saturate a 10 Gig NIC. I understand the PCIe performance is more than sufficient for the CohoData product, but are you already looking at things like Diablo’s MCS? I know it’s still new technology with it’s own pros and negatives, but still I thought in some cases this might be a great solution for Coho. What’s your opinion?
A. The reason that I talked about NVDIMM in the second part of my presentation is that I really see RAM speed memories starting to become more and more practical in storage systems from about 2016/2017 onwards. The data path work that we are doing is really focussed towards these: PCIe flash is fast enough to saturate the 10Gb interface, but mostly with large requests on today’s hardware. As we move to NVDIMM and related technologies like Diablo’sstuff (which is really, really cool BTW), the biggest overheads will be the (software) data path processing to do file system layout, replication, snapshots, placement, recovery, etc.
The work that Coho is doing here, both on the host and in the network, is one of the biggest differences between us and other companies. I think it’s really going to start to show over the next couple of years.
If you look at the left picture (taken last year) and the right picture (taken during the SFD6 presentation) it seems AFA and cold data systems will be added…
Q. One of the slides showed a cluster of Coho arrays and it was interesting to see normal arrays as an all HDD (archiving/object store??) array. Is this what you’re looking at? And maybe even further are you also looking at AFA’s for demanding workloads, or is this not needed at all with Coho?
A. Ah — you found the (unintentional) easter egg! I totally forgot to mention this in my presentation!
In 2015 we will roll out 2 new appliance versions. One will be a “hybrid flash” chassis that combines PCIe flash with SAS flash. It will be performance focussed and still have all the transparent scale-out properties of our existing boxes. It will also be able to install into an existing hybrid-disk/flash based coho install.
The second new box, which we are planning for 2H 2015 is a capacity box that is a 2-server, 70-disk 4u. It will have between 250 and 500 TB raw capacity, and serve as bulk storage for cold data.
For large installs, these two boxes will allow customers to scale capacity and performance completely independently of one another.
There is so much more to be told about Coho Data but that’s for a later time. For now…. Let’s have weekend!!! Have a great one and CU again soon!!