Design Considerations for Multiple Availability Zones in VCF3.11 (NSX-T 3.0.3)
Updated: Oct 30, 2022
NSX-T provides software defined networking for the VCF platform. Prior to VCF4.0, NSX-V was used in the Management domain. This has been replaced by NSX-T starting in VCF4.0.
Edge node appliances provide overlay networking for workload VMs. When deploying VCF in a stretched cluster configuration, there are several things to consider in order to insure performance and availability.
Overlay segments use Tier-1 routers to connect to an edge cluster, which provides gateway services. Tier-1 gateways attached to an edge cluster will designate one node as active and the other as standby. The system automatically selects which edge node is active or standby when the Tier-1 is using the "Auto Allocated" setting.
With this configuration, traffic flows out of one of the edge nodes and will switch to the standby node in the event of an edge failure.
It is possible to manually set the "active" edge node vs. using the auto-allocated feature. The disadvantage here is that you cannot use the "Standby Relocation" feature, which will automatically switch the active Tier-1 back to the originally-designated active edge in the event that edge goes down and then comes back up. My recommendation is to use the auto-allocated feature.
With VCF stretched clusters comes the concept of regions and availability zones (AZs). Stretched clusters are an option that customers can use to implement a solution that allows the system to withstand the loss of a physical datacenter site. Workloads can be run across both AZs, and in the event of a site outage, the VMs on the failed site will failover to the surviving site.
In VCF3.X, two edge nodes are installed in each AZ (a total of four in an edge cluster) and are added to an active-active Tier-0.
When using a multi-rack design along with stretched clusters (which provides the system with the ability to withstand physical rack failures within a site) you can use affinity rules to place a dedicated edge node in each physical rack. This design decision can ensure that overlay networking is still available even if a physical rack fails within a site.
In my example above, I am using a single BGP ASN for the physical network. You can also use a dedicated ASN for each AZ. This could come in handy when it comes to troubleshooting network issues and help with deterministic routing. Tier-1s added to the edge cluster in this design select edge nodes at random which load balances traffic across the cluster.
But starting in VCF3.11 there were some design modifications introduced. The most significant of these being the upgrade of NSX-T from version 2.5.X to 3.0.3. Supporting documentation can be found here:
In order to upgrade to VCF3.11, a pretty major design change is required.
Basically VMware says we need to create a new edge cluster with two edge nodes which are active on one site. BGP uplinks use "stretched" networks now, where the same networks are available across all TOR routers. In this design, the edge nodes have active connections to all four routers.
If AZ1 goes down for some reason, the nodes do an HA restart to AZ2. This design change is more of a disaster recovery solution rather than an "active-active" solution as used in the earlier versions of VCF. Here is a high level overview of what this looks like:
But why did VMware implement such a major change? There are a couple of reasons that I am aware of:
In NSX-T 2.5.X, the edge nodes use a "3-NVDS" design. One for overlay, and one for each uplink. In NSX3.X this is changed to a single, converged "NSX-VDS" as the NVDS is deprecated and will be removed in future releases
Enables two TEPs per edge node for redundancy
As traffic is only passing thru one site, you avoid asymmetric routing (In our use case, asymmetric routing has never been an issue, but this might affect some customers)
vSphere 6.7 and NSX 2.5.X are end of service, end of life and VMware does not currently have an upgrade path that upgrades a stretched VCF3.X system to VCF4.X. In order to stay in software support for the current setup, we needed to upgrade VCF to 3.11 and implement these design changes in NSX-T. In order for us to stay in support we ended up building another VCF4.X system. We ended up building a new VCF4.4 platform. VMs will be migrated off of VCF3 and over to VCF4.
But there were a few challenges with the planning of this upgrade. The first being a lack of documentation. The only procedure available to the public is this KB https://kb.vmware.com/s/article/87426. Perhaps we were one of the first or only customers to be in this situation (I find that to be strange). This KB is a bit incomplete and I think that some of these steps are not needed or are missing. After performing the upgrade I have a good understanding of what needs to be done.
The KB outlines the procedure in these steps:
1.) Change the Teaming Policy in the Uplink Profile
2.) Create an Overlay Uplink Profile
3.) Create the Transport Zones for Uplink Traffic
4.) Create Uplink Segments
5.) Deploy the NSX-T Edge Appliances
6.) Create Anti-Affinity Rule for Edge Nodes
7.) Move the NSX-T Edge Nodes to a Dedicated Resource Pool
8.) Create an NSX-T Edge Cluster Profile
9.) Create an NSX-T Edge Cluster
10.) Create and Configure a New Tier-0 Gateway
11.) Configure IP Prefixes in the New Tier-0 Gateway for Availability Zone 2
12.) Configure Route Maps in the New Tier-0 Gateway for Availability Zone 2
13.) Configure BGP in the Tier-0 Gateway for Availability Zone 2
14.) Migrate the Existing T1-Gateway to the New Edge Cluster
15.) Remove the Legacy Edge Cluster and Nodes
I had several issues getting this solution to work using the KB. But I wanted to outline what worked for us in the following steps:
1.) Create an Overlay Uplink Profile
Use the Overlay VLAN ID that is available across both sites.
2.) Create a Transport Zone for Uplink Traffic
Create a TZ VLAN for the new edge cluster. Specify the uplinks with the same name given in the uplink profile above.
3.) Create Uplink Segments
Create two logical segments for the edge nodes. Specify the TZ created above and set the VLAN to 0-4094.
4.) Deploy the NSX-T Edge Appliances
In our case, I was not able to deploy the edge nodes from the UI. So I deployed them directly in vCenter. After downloading the Edge OVA from https://customerconnect.vmware.com/ deploy the Edges in vCenter. Ensure that the edge nodes have DNS entries and IPs designated. During the installation, select the uplink segment interfaces we created in the step above.
After deploying the VMs, collect the NSX-T Manager cert thumbprint:
nsx1> get certificate api thumbprint d948723968c72d6ab5a75b8120270c4e417cc8272101875ce03f17998c410240
Run the following commands to join the edge nodes to the NSX-T Manager: NSX-Edge1> join management-plane NSX_MANAGER_PRIMARY_VM_IP thumbprint d948723968c72d6ab5a75b8120270c4e417cc8272101875ce03f17998c410240 username admin password NSX_MANAGER_ADMIN_PASSWORD
In the edge node configuration, add the existing overlay TZ and the newly created VLAN uplink TZ. Also set the uplink profile. In our setup I was not able to create a new NVDS due to some bug, so I simply selected the existing one. Set the fp-eth0 port to uplink-1 and fp-eth1 to uplink-2.
5.) Create Anti-Affinity Rule for Edge Nodes
Self explanatory, but also create rules to place the nodes in specific racks if you are using a multirack design.
6.) Move the NSX-T Edge Nodes to a Dedicated Resource Pool
Optional, I just used the resource pool that is used by the existing edge nodes.
7.) Create an NSX-T Edge Cluster Profile
8.) Create an NSX-T Edge Cluster
9.) Create and Configure a New Tier-0 Gateway
10.) Configure IP Prefixes in the New Tier-0 Gateway for Availability Zone 2
This is not needed in our setup as deterministic routing is not a requirement.
11.) Configure Route Maps in the New Tier-0 Gateway for Availability Zone 2
This is not needed in our setup as deterministic routing is not a requirement.
12.) Configure BGP in the Tier-0 Gateway
Create uplink segments for the service interfaces. This is use to bridge the BGP uplink networks into the system. Our setup uses 3 VRFs in the physical network that need to access the VCF environment, so each edge node should have redundant uplinks for each network.
Under the Tier-0 configuration, set the External and Service Interfaces for the uplink segments that we created above. The thing to note here is that the uplink networks must be available across all TOR routers.
Under the BGP configuration, set the BGP neighbors.
After verifying that BGP is up, create a test segment and Tier-1 that is connected to the existing edge cluster and switch it over to the new cluster to test connectivity.
13.) Migrate the Existing T1-Gateway to the New Edge Cluster
Create a switchover plan to migrate your Tier-1s over to the new edge cluster.
14.) Remove the Legacy Edge Cluster and Nodes
Simply power off the nodes and delete the edge cluster from the NSX-T Manager.
And what about the SDDC Manager? I was concerned that the edge cluster information would be "hard coded" into the SDDC Manager somehow- requiring a database update. But as it turns out, it automatically updated with the new cluster information and removed the old one after deleting the cluster from the NSX-T Manager.
After performing these steps, I was able to successfully run the upgrade, which enabled us to get the system to a supported version and buy us some time while we build our new environments. NSX-T is an amazing and versatile product that we are very happy with and will continue to use to achieve our business needs. I am very excited about the new innovations coming out with each release. VCF just makes infrastructure management and upgrades easy.
Always make sure to whiteboard and collect requirements in order to form the basis for any infrastructure project.
Of course you should consult with VMware or a verified partner before making any major design changes unless you totally know what you are doing. If anyone would like to know more about this project or anything else, you can reach me at firstname.lastname@example.org.