This time I had luck with the outcome of my recent homelab crash. If I weren’t able to fix my broken vCenter Server, as described in my previous article, I would have had to reinstall my vSphere (+ vSAN, + Tanzu) environment basically from scratch again.
Is this actually true?
Actually NO! Because if I would have configured my vSphere environment correctly, vCenter Server file-based backup were configured properly and I wouldn’t have had to worry about the consequences in the end.
But, I haven’t configured backup and therefore, I wanted to do it the right way now.
VMware Postgres Archiver Service stopped
I logged in into the vCenter Appliance Management Interface (VAMI) and started configuring my backup. When I kicked-off the first backup operation manually, an error was thrown on me immediateley. The error was telling me that something is wrong with the Backup Service. Consequently, I check the responsible VMware Postgres Archiver service.
Turned out, mine was in stopped state. Trying to simply start the service manually via service-control --start vmware-postgres-archiver didn’t do the trick (too easy!).
Do we have chatty service logs? Yes, the Postgres Archiver Service has a dedicated stderr log which can be queried.
Interesting to me was the boolean in column active. It was false.
In PostgreSQL, the “active” column represents whether a replication slot is currently actively being used or not. An “f” in this column indicates that the replication slot is not active, meaning it is not currently being utilized. Conversely, a value of “t” would indicate that the replication slot is active and is currently being used.
I continued gathering information and found VMware internally the final hints.
Removing the PG_Replication_Slot
Disclaimer
You should raise a ticket at the VMware support first, before interfere with critical components of the vCenter Server. Therefore, the next steps are for non-production environments only.
The solution here would be to remove the vpg_archiver replication slot using pg_drop_replication_slot.
As a final step before trying to restart the vmware-postgres-archiver service, is the deletion of already existing segments within /storage/archive/vpostgres/.
1
2
3
4
root@vcsa [ ~ ] df -h /storage/archive/vpostgres/
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/archive_vg-archive 49G 4.8G 42G 11% /storage/archive
Remove it: rm /storage/archive/vpostgres/*
Start the vmware-postgres-archiver service:
1
2
3
4
5
service-control --start vmware-postgres-archiver
Operation not cancellable. Please waitfor it to finish...
Performing start operation on service vmware-postgres-archiver...
Successfully started service vmware-postgres-archiver
After bringing the service back into operating state, I started a new backup.
As you can see on Figure II, it took a few attempts but ultimately, it works again. Maybe the VCSA needed a moment to sort things out.