vSphere with Tanzu - Troubleshooting HAProxy deployment
Introduction
You may have already heard and read about our latest changes regarding our Kubernetes offering(s) vSphere with Tanzu also former known as vSphere with Kubernetes. Personally, I was totally excited and full of anticipation to make my first hands-on experience with this new deployment option –> vSphere Networking as an alternative to NSX-T and how HAProxy is doing it’s job within this construct. I don’t know how you are dealing with installations of the “NEW” but I always read the documentation first… … not … 🙊 … but I should!
Honestly! Doesn’t matter of which kind of implementation we are talking, is it a Homelab (test environment), a Proof of Concept implementation or in production, our final goal is a working solution and this is why we should be prepared best. Through this article, I will even more stress this point because I did a mistake which did cost me time in troubleshooting the failure I’ve received. On the other hand and “as always”, it enlightened me and enriched my wealth of experience.
Preperations
This article will not describe the vSphere with Tanzu installation itself. For this kind of details, I would like to point you to VMware’s official documentation or to this vSphere with Tanzu Quick Start Guide V1a (for evaluation purposes) which VMware is maintaining on the Cloud Platform Tech Zone.
Also! For being prepared better as well as for documentation purposes, we are providing a checklist-file which can be downloaded through the Workload Management
section in the vSphere Client and then shared or forwarded to the networking engineer of your confidence (Figure I).
Resources
- VMware Docs - vSphere with Tanzu Basics
- VMware Cloud Platform Tech Zone - vSphere with Tanzu Quick Start Guide V1a
- - HAProxy Direct Download Link
- - HAProxy Powershell/PowerCLI Deployment Script
- - HAProxy Build the Appliance from scratch
- Blogs:
The Frontend Network
Step 6 in the Quick Start Guide describes the deployment of the HAProxy Virtual Appliance on vSphere. OVA deployment step 7 gives us the ability to deploy HAProxy with an additional network interface to seperate the Kubernetes nodes of our clusters from the network used by clients or services to access these clusters (Figure II).
The configuration of the additional Frontend network is not covered through the aforementioned Quick Start Guide (for simplicity reasons) but if you are going to deploy vSphere with Tanzu into production, it definitely should be used.
I tried to simplify the traffic flow coming from the client and/ or service through the Kubernetes clusters roughly via the following chart:
It must be a valid Subnetwork
OVA deployment step 2.7 requests the static IP address for the Frontend interface and this IP must be outside of the Load Balancer IP Range which you have to define in the next step 3.1. Furthermore, this IP range should not overlap with the assigned IP address of the Frontend network interface as well as with any other VMs in this range!
I’ve misinterpreted the description for Loadbalancer IP ranges
in step 3.1 (see Figure III) which I should have “painfully” realised after the complete deployment.
At this point, I’d like to make a little excursion in networking 101 to make sure you also see my mistake which I’ve made through this misinterpretation. I entered 10.10.18.18/28
as an IP range which isn’t a valid Subnetwork. A valid subnetwork configuration for my evaluation deployment should have looked like:
Description | Address |
---|---|
Network ID | 10.10.18.16 |
Broadcast | 10.10.18.31 |
Netmask | 255.255.255.240 |
Host IP starts with | 10.10.18.17 |
Host IP ends with | 10.10.18.30 |
10.10.18.16/28
is valid and because I decided to use a /28
network, the available IPs for the virtual server range, which you have to provide in Step 5 in the Workload Management
configuration (Figure IV) later on, starts with .17
and ends with .30
.
The following table gives you a quick overview of available Host IPs for a specific subnetwork.
Subnet Mask | /29 | /28 | /27 | /26 | /25 | /24 |
---|---|---|---|---|---|---|
Host IPs | 6 | 14 | 30 | 62 | 162 | 254 |
* I excluded Network-ID and Broadcast-ID
Don’t be confused
The following has nothing to do with the actual problem but the hint could avoid an uncertainty. Figure V shows the configuration summary of Step 5 and Ingress CIDR/ IP Ranges: 10.10.18.19/11
(in my example) could be confusing because of the CIDR notation.
What it shows, is the first IP - .19
- and the maximum number of remaining IPs - /11
- for your defined IP Ranges for Virtual Servers
.
Connect: No route to host.
At first sight, the installation seems to be successful and the Cluster Config Status in the vSphere client doesn’t indicate the opposite (Figure VI).
But appearances are deceptive! I created a vSphere Namespace to deploy a Tanzu Kubernetes Cluster (Guest Cluster) and clicked on the OPEN button/ link to the CLI tools (Figure VII) to check the reachability of my Supervisor Cluster and got an ERR_ADDRESS_UNREACHABLE
back.
Of course that made me suspicious and I tried to login via kubectl vsphere login
:
kubectl vsphere login --insecure-skip-tls-verify --vsphere-username administrator@jarvis.lab --server=10.10.18.19
and got the following error:
HAProxy Frontend VIP bind service
Having a look at my above drawing it shows us that our incoming request tries to reach the Frontend VIP 10.10.18.19
trough HAProxy. Because HAProxy is providing those VIP’s, it’s logical to start troubleshooting the HAProxy appliance first. SSH is enabled by default.
I started with a simple status check of all services by executing systemctl status
. The output was surprisingly degraded
.
Let`s narrow it down!
|
|
Three services have failed to start. I’m starting from the top with verifying the anyip-routes.service
.
|
|
The mentioned script in this line Process: 783 ExecStopPost=/var/lib/vmware/anyiproutectl.sh down (code=exited, status=0/SUCCESS)
seems to do something and by reviewing it, I found the following hint to the configuration file location for the broken service.
|
|
|
|
Verifying the file and thus my made configuration finally brought me enlightenment. I changed it consequently from 10.10.18.18/28
to a valid Subnetwork range, which is the already mentioned 10.10.18.16/28
and rebooted the HAProxy appliance (reboot
). Just restarting the service with systemctl restart anyip-routes.service
does not apply all necessary changes.
Finally, the validation of the restored functionality:
|
|