VCF on VxRAIL Design Part 1: Platform Considerations
Updated: Dec 21, 2020
VMware Cloud Foundation (VCF) is a very interesting platform. I have had the great opportunity to implement version 3.10 over the summer. It has been an amazing learning experience from start to finish. The more I use it, the more I like it and the more I see the clear value add proposition that the platform delivers, as well as the challenges related to building an enterprise class software defined datacenter in the year 2020.
Being involved with a large VCF project from the very beginning of the design process has changed my entire mindset when it comes to private cloud technology, and how to design such a system with state of the art applications and workloads in mind. Implementing a VCF platform will make you a pretty rock-solid SDDC admin across all of the sub-areas (Storage, Compute, Networking.... and in the case of VCF.. AUTOMATION).
The design selected was VCF on VxRAIL. The main reasons for selecting this design was to be able to support a few hundred VMs from a legacy system, as well as to be able to provide a next generation platform capable of supporting containers with the added assistance of deep self service automation capabilities, which the platform is well positioned to provide.
Any cloud implementation, be it on premise or in the cloud requires planning and design. And ideally this should all be completed prior to deployment.
When we started the design process earlier in the year, we were told by some of the consulting engineers that VCF "made the design decisions for you", and that basically you as the implementer would provide some IP addresses and rack elevations, rack / stack and click next next next and basically you are on your way. This of course is presales marketing fluff and does not get into the details needed to deploy the solution. VCF has some very different design characteristics, guide rails and requirements that I think anyone who is looking to deploy this solution should be aware of. But, when these guidelines are implemented correctly, you will end up with a state of the art system built based on vendor best practices designed for a specific use case (or cases).
It has been a learning experience for everyone involved. I have also come to find that even though VMware has been around for many years, it might be a completely new technology for some customers. So I wanted to make a few posts related to my experience with VCF, and hopefully help others in need that might be looking to implement this solution.
1): Automation is the SOUL of the platform
The core of the VCF platform is the software. The VCF platform is a suite of software components that are unified under a central point of management. Of course vSphere, vSAN and other components make up muscle behind the system, but the SDDC Manager is the soul of the core infrastructure components.
The SDDC Manager might be one of the most different concepts to understand in VCF. It provides lifecycle management, capacity monitoring, off-site software depot access and security access management for the rest of the "core" components. The SDDC Manager is a virtual machine, which runs in the Management Workload Domain (WLD).
"Core" components in VCF are the ESXi hosts, vCenters, PSC`s, NSX Edges and other Management virtual machines, which are all hosted in the Management WLD. All of these components interact with each other in various ways.
One of the main challenges with operating private cloud infrastructure in the past as been software updates. In a traditional ESXi environment, applying updates would be be a human driven process, with steps taken such as downloading patches and updates, checking hardware compatibility lists, checking software matrixes, etc. This is now a fully automated process controlled by the SDDC Manager.
It should also be noted that all certificate management operations should be done through the SDDC Manager, NOT MANUALLY.
If you change the certificates manually on each component, it will break the backend automation in the system and will cause you all kinds of problems.
Many infrastructure component passwords can also be managed by the SDDC Manager, with some exceptions such as the vSphere passwords, which are still changed via the vSphere HTML page under the Administration section.
Each WLD has its own vCenter appliance, and all vCenters are physically hosted in the Management WLD, along with the external PSC`s (external if you are using pre-4.X VCF).
2): Reference architectures are a good place to start
With VCF being a software solution, there are several physical architectures that are readily available and are pre-validated by both VMware and the hardware supplier. Using a reference design can remove the need to perform a lot of the component level design and validation normally required when you are trying to build your own private cloud.
In my use case here, I implemented a VCF on VxRAIL version 3.10 solution. You can check this reference architecture out here: https://www.delltechnologies.com/resources/en-us/asset/technical-guides-support-information/products/converged-infrastructure/vmware_cloud_foundation_on_vxrail_architecture_guide.pdf
Here is the link to the 4.1 version: https://www.delltechnologies.com/resources/en-us/asset/technical-guides-support-information/products/converged-infrastructure/vmware-cloud-foundation-4-x-on-vxrail-architecture-guide.pdf
3): Understand the bill of materials
There are a few options available for VCF, and it is always a good idea to verify the bill of materials based on your use case. For example, if you are going down the application automation path you should insure that your release has the vRealize Suite components included.
Also you might find that some extras are included in your order, such as the VMware HCX product, which was included in the Enterprise Edition of VCF 3.10. In VCF 4.X, VMware`s Container Management Platform - Tanzu is also included in some of the packages.
Always consult your VMware Account Manager before making a purchase in order to determine what software options are available, and that your bill of materials ifs appropriate for your use case.
4): Understand the road map
VCF is on an aggressive software release cycle and each new version contains new features or critical security patches that have been pre-validated and are ready to install in your environment.
It is important to understand which version that you choose to deploy initially. VCF is currently on version 4.1. We decided to go with VCF 3.10 back earlier in the year because 4.0 was still bleeding edge and we wanted to run the most reliable code version, which was 3.10 at the time.
The consequences of this were: NSX-V was installed in the Management WLD. NSX-V is deprecated and is an end of service, end of life technology which will be replaced by NSX-T in version 4.0 going forward. The main issue with this is that there is currently no upgrade path from version 3.10 to 4.X, and this will not be available until early 2021.
VMware and DellEMC have stated that they will indeed have an upgrade path that will allow customers to go from 3.X to 4.X starting sometime in early 2021. This might utilize the HCX product to do the actual migration, but this has not been formally announced yet.
So to summarize, it is very important to consult your VMware account team in choosing what version is best for you
5): Design Decisions
In VCF, you sacrifice customization for the sake of automation. meaning that the VCF system deployment follows configuration parameters and design decisions that are based on VMware recommendations thus it effectively makes a lot of design decisions for you when it comes to the core infrastructure components.
VMware has had its VMware Validated Designs for some years now. These are reference architectures designed to provide architects with blueprints for how to build VMware platforms in the best practice way based on validated standards and pre-validated designs.
VCF version 3.10 is loosely based on the VVD version 5.1 reference design. But there are some deviations from this architecture and the design changes even more when you (for example) introduce a technology such as VxRAIL. Where there are special considerations and constraints introduced to accommodate the underlying hardware platform.
The good news is that starting in VCF version 4.0, they have aligned the platform with the VVD version 6.0 design. This allows you to rollout a best practice design based on the VMware VVD standards, which really cuts out a lot of the guess work that goes into many aspects of the ore platform implementation. It also gives you the opportunity to opt out of some of the build options available when you plan the initial design for the system, allowing you to deploy those at a later time if that is more appropriate for your use case.
VCF requires user defined architecture decisions at each layer of the system, and you need to understand your requirements and the target architecture and then make the changes needed to the VCF designs in order to fit your use case.
You also have to have the big picture in mind across all of the technology domains and understand their dependencies and how they fit together. There are mistakes that you can make out of not understanding the platform and those could come back to haunt you later on in the project.
It should also be noted that when you deviate from the VCF design, you are essentially "breaking the automation path" of the system. The SDDC Manager keeps a configuration state of the platform, a sort of Infrastructure as Code functionality. It does this to insure that the system configuration state complies with the baseline automation path defined by VCF.
Meaning if I were to enable a feature such as vCenter HA.. this would break the VCF automation path and cause errors, including potential system downtime. So when you deploy VCF, you give give up that kind of customization over to the system. vCenter HA in this example is NOT supported by VCF. VCF uses HA for Management component failure mitigation.
Another example of this is VMware Update Manager on VCF systems running on top of VxRAIL. VUM is not supported for host upgrades or patches and those should be done through the SDDC Manager, because when VxRAIL comes into the picture, all host patching is done via the VxRAIL Manager, which is controlled by the SDDC Manager.
To summarize: Even though VCF provides many of the guard-rails and software needed to build an SDDC, you need to understand the implications for each decision that you make.
VCF has the ability to make your life a LOT easier, and the self service automation and lifecycle management capabilities make it a contender for playing in an increasingly public cloud dominated world.
I will be making some additional posts to detail each section (Compute, Storage, Networking, Automation) and will post them here.
Please feel free to reach out to me on this site if you have any questions, comments or recommendations, I would love to discuss.