VCF on VxRAIL Design Part 2: Compute Considerations
Updated: Dec 21, 2020
In my previous VCF post, I talked about some of the high level considerations that should be considered when designing a VCF on VxRAIL solution, based on VCF version 3.10. I wanted to make some additional posts for each sub-section of the platform.
In this section, we will be talking about VxRAIL and some of the things that should be considered for the underlying compute part of the system. I am not going to get into too much detail about VxRAIL specifically, as there are many blogs and articles that talk about that platform out on the internet today.
The main thing you need to know is that VxRAIL is the only hardware platform that is validated end to end by VMware for the VCF platform. Of course you do not need VxRAIL in order to do VCF, there might be some customers who really love other hardware vendors such as HPE or Pure Storage and they may want to use those solutions.
The main requirement when it comes to hardware is that you will always need vSAN in the Management WLD. And as long as those nodes are on the vSAN HCL, they are allowed by VCF.
The use of VMFS on FC storage as principal storage in workload domains is available starting in VMware Cloud Foundation version 3.9, and vVols with VCF 4.1. vVols is another interesting technology and we will be touching on that subject in the future.
Some things to consider when it comes to VCF on VxRAIL:
1.) Node Type & Form Factor
VxRAIL comes in a different versions and each one has its own use case. Some customers might have space constraints and want to use a small form factor, such as the G or E series. Other customers might want to have VDI or Storage optimised nodes, which are the V or S series nodes.
It is important to determine what kinds of workloads will be migrated or deployed (classic workloads, Containers, virtual desktops, etc) and then use that information to determine the capacity and specifications required.
Smaller nodes might provide better building blocks for availability, as you can isolate failures to a smaller domain, compared to a larger server with fewer nodes which effectively becomes a larger failure domain.
It is important to standardize your deployment and have a consistent hardware configuration across your nodes. This is a best VxRAIL best practice from a DellEMC perspective.
2.) Hosts & Clusters
Use All Flash whenever possible. The cost of flash has dramatically fallen over the last few years and the general technology trend has been the wide adoption of all flash systems and this should be taken into consideration when it comes to VxRAIL.
System availability should be considered from the beginning of the design process. How are will your clusters be laid out? How do we survive a rack failure?
Generally speaking, each rack is a single point of failure, as the rack itself generally shares power and networking. Yes, we do have sexy SDN technology at play here such as NSX-T that make the fabric more dynamic, but these systems are dependant on the physical environment for success.
For clusters of 5 nodes or more, always assume 1 host in maintenance mode (planned, such as a software upgrade) and 1 additional host failure (unplanned).
Consider a vSAN stretched cluster if only 2 racks are possible. A VCF platform in a stretched cluster configuration is effectively a single availability zone with the ability to loose a complete site and still be able to keep your applications up and running.
vSAN stretched clusters give you an RPO of zero, when you are using the dual site mirroring storage policies on the datastore.
But you need to remember that it does not give you an RTO of zero. An RTO of zero is very difficult to achieve. VCF uses HA to restart the VMs that failed on other hosts in the case of a host failure and another site in the case of a site failure.
An RTO of a few minutes (depends on how many VMs need to restart) is possible with a vSAN stretched Cluster and vSphere HA.
These factors should be considered for both the MGMT and VI WLD`s. Always remember that your management services will be hosted in the MGMT WLD so that is just as critical as VI.
Always make sure that your vSAN for all WLD`s is sized correctly and that you have the correct drive configurations. You do not want to get into a situation where you do not insufficient capacity ESPECIALLY after you have installed the system.
The vSAN Sizer Tool & Live Optics can help you make those sizing estimations. Always double check with DellEMC to insure that you have not missed anything. https://vsansizer.vmware.com/ https://www.liveoptics.com/
Determine what level of host protection and rack protection you need based on your requirements. You should design you vSAN for failures within a site, and the loss of an entire site.
As stated above, if you want to do vSAN stretched clusters, then you will need to create SPBM policies that basically double your storage capacity (mirror copies of data on each site).
These are all design decisions related to capacity planning that should be done at the very beginning of the design process, as it will play a factor into how many nodes / drives you will buy.
There are of course many other things to consider in VCF and I simply do not have the time to list each and everyone of them here. I will be working on additional posts that will go into further detail. I hope that some of you might find it useful.