ssh admin@avi-ctrl.jarvis.lab
Avi Cloud Controller
Avi Networks software, Copyright (C) 2013-2017 by Avi Networks, Inc.
All rights reserved.
Management: 10.10.70.10/24 UP
Gateway: 10.10.70.1 UP
admin@avi-ctrl.jarvis.lab's password:
# The copyrights to certain works contained in this software are# owned by other third parties and used and distributed under# license. Certain components of this software are licensed under# the GNU General Public License (GPL) version 2.0 or the GNU# Lesser General Public License (LGPL) Version 2.1. A copy of each# such license is available at# http://www.opensource.org/licenses/gpl-2.0.php and# http://www.opensource.org/licenses/lgpl-2.1.phpLast login: Tue Feb 6 16:19:05 2024
I ran journalctl to check the systemd journal and my terminal filled up with errors like:
1
2
3
4
5
6
7
8
$ journalctl
[...]-- Logs begin at Tue 2024-02-06 18:18:43 UTC, end at Wed 2024-02-07 07:06:22 UTC. --
Feb 06 18:18:43 10-10-70-10 sshd[902]: error: connect_to localhost port 5025: failed.
[...]
error: connect_to localhost port 5025: failed.
The port number itself varied per entry. I looked it up on the internet but I couldn’t find any useful information.
Next, I checked with the system and service manager systemd the conditions of the units (services, etc.) which are supposed to be up and running but it was a dead-end too. Only one unit was in state failed but I putted it aside, since it was only the motd-news.service:
1
2
3
4
5
6
7
8
9
10
systemctl list-units --state=failed
UNIT LOAD ACTIVE SUB DESCRIPTION
● motd-news.service loaded failed failed Message of the Day
LOAD= Reflects whether the unit definition was properly loaded.
ACTIVE= The high-level unit activation state, i.e. generalization of SUB.
SUB= The low-level unit activation state, values depend on unit type.
1 loaded units listed.
I continued digging around on the issue and read logs, checked on VMware’s Knowledge Base (KB) and other internal resources but unfortunately in vain. I started realizing that I faced a low-level corruption of the system which can’t be fixed in a simple fashion.
My suspicion got even more supported when I tried accessing the Avi Shell:
1
2
3
4
5
6
admin@10-10-70-10:~$ shell --address 10.10.70.10
Login: admin
Password:
The controller at ['https://10.10.70.10'] is not active. If this is not the controller address, please restart the shell using the --address flag to specify the address of the controller.
{"error": "Bad service"}
Bonus - Docker run Avi Shell
In section Bonus: Docker run Avi Shell I’m providing a brief guidance on how to run the Avi shell inside a container which you can easily start and use from basically any remote system which provides a container runtime.
Problem is, that simply deploying a new instance a starting from scratch (w/o restoring data) isn’t really an option, because other lab (business 😉) critical systems relying on the Avi controller - on the Load Balancer service.
My Supervisor cluster (vSphere with Tanzu (TKGS)) for example:
Backups FTW!
Well, fixing the Avi controller manually on the system-level wasn’t an option anymore, therefore restoring the “Good” from a backup was next on the list.
By default, the Avi controller runs backups periodically and stores the files locally in /var/lib/avi/backups. You might remember that you’ve configured the passphrase for the backups as one of the first steps when you’ve deployed the controller.
In a working instance, the configuration in terms of frequency, retention and remote backup location can be adjusted under Administration - Configuration Backup.
Per default, the backup is configured to run every 10 minutes and has a retention of 4 files. Best practice is of course to additionally store (copy) the backup-files on a remote server location using either SCP or SFTP.
I recommend looking up the official documentation for more options on backing up the Avi configuraiton. The documentation contains great examples on how to configure the Avi backup using e.g. the avi_shell, the Avi API, REST API or even via Python.
Like I’ve mentioned earlier, the backup-files are stored locally in /var/lib/avi/backups:
1
2
3
4
5
6
ls -rtl
total 10904-rw-r--r-- 1 root root 3040701 Jan 26 11:00 backup_Default-Scheduler_20240126_110041.json
-rw-r--r-- 1 root root 3044239 Feb 1 11:00 backup_Default-Scheduler_20240201_110048.json
-rw-r--r-- 1 root root 2536819 Feb 2 11:00 backup_Default-Scheduler_20240202_110041.json
-rw-r--r-- 1 root root 2533528 Feb 3 11:00 backup_Default-Scheduler_20240203_110044.json
Since I was still able to connect to my Avi controller via ssh, I was also able to remote-copy (scp) the files.
I started packaging the files into a tar-ball:
1
2
3
4
sudo tar -czf /tmp/backup.tar *.json
ls -rtl /tmp/backup.tar
-rw-r--r-- 1 root root 18042880 Feb 29 10:28 /tmp/backup.tar
Next was “downloading” the files to my computer:
1
2
3
4
5
6
7
8
9
10
11
12
scp admin@10.10.70.10:/tmp/backup.tar ~/Downloads/avibackup
Avi Cloud Controller
Avi Networks software, Copyright (C) 2013-2017 by Avi Networks, Inc.
All rights reserved.
Management: 10.10.70.10/24 UP
Gateway: 10.10.70.1 UP
admin@10.10.70.10's password:
backup.tar 100% 17MB 80.6MB/s 00:00
BONUS - Powershell and Bash Scripts available
I’ve created a Powershell as well a Bash script to simplify the download of the backup-files from the Avi Controller. Both scripts can be copied from here.
The Restore Process
Now that I have a backup of my backup, I was brave enough to execute the restore_config.py script with the option --flushdb.
BTW! If you don’t use the option, the restore script will immediately end with the error:
Restore config must be run on a clean controller. Please run with the --flushdb option or do a reboot clean to clear all config before restoring.
Executing restore_config.py:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
sudo /opt/avi/scripts/restore_config.py --config /var/lib/avi/backups/backup_Default-Scheduler_20240203_110044.json --passphrase 'VMware1!' --flushdb
Traceback (most recent call last):
File "/opt/avi/scripts/restore_config.py", line 98, in <module>
user, token= get_username_token() File "/opt/avi/python/lib/avi/util/login.py", line 136, in get_username_token
rsp= json.loads(response.stdout) File "/usr/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s) File "/usr/lib/python3.8/json/decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)json.decoder.JSONDecodeError: Extra data: line 1 column 9(char 8)restore_config.py on node (10-10-70-10) crash -- Pls investigate further.
core_archive.20240207_152903.tar.gz not collected due to similar stack-trace se
en=restore_config.py.20240207_152728.stack_trace
🤔 Not what I expected!
I tried it with other files once again but without success. I also tried executing the flush_db.py script alone:
1
2
3
4
5
6
7
8
9
admin@10-10-70-10:/opt/avi/scripts$ sudo ./flushdb.sh
rm: cannot remove '/etc/luna/cert/server/*': No such file or directory
rm: cannot remove '/etc/luna/cert/client/*': No such file or directory
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76.)debconf: falling back to frontend: Readline
Creating SSH2 RSA key; this may take some time ...
3072 SHA256:OZDiQ2SqCXaWR5+Dd1PLc95ekDY6Ix5el0JPAHoc9Ck root@10-10-70-10 (RSA)rescue-ssh.target is a disabled or a static unit, not starting it.
But it also failed.
Well, if the flush_db option erases the controller config anyway, then deploying a fresh instance is an option too.
Deploying a fresh Instance
So, I deployed a new instance of the Avi controller (OVA), scped the existing backup-files to the new controller, connected via ssh and executed the restore_config.py once again.
1
2
3
4
5
6
sudo /opt/avi/scripts/restore_config.py --config /tmp/backup_Default-Scheduler_20240203_110044.json --passphrase VMware1! --flushdb
Restoring config will duplicate the environment of the config. If the environment is active elsewhere there will be conflicts.
Please confirm if you still want to continue(yes/no)yes======== Configuration Restored for 1-node cluster========
Note: The documentation is mentioning options like:
--do_not_form_cluster
--vip
--followers
These options are not supported in version 22.1.3 and onwards anymore.
If you are running the Avi controller in cluster-mode (three-nodes), roll out the other two nodes as well, configure the same IP-addresses as you have used before and simply add them to the Controller Cluster using the gui Administration -> Controller -> Nodes -> Edit.
It is not intended to execute the restore process on the follower nodes!. They’ll get the configuration mirrored from the leader.
Back on the Job
Restoring the Avi controller is really a simple and effective procedure. Ultimately, troubleshooting my broken instance costed me more time than simply restoring a new one.
But we won’t be engineers if we wouldn’t try to fix the broken, right? On the bright sight, additionally to this article, two scripts and a way how to run the avi_shell in a container on your computer are the outcome.
Here we go!
Bonus: Docker run Avi Shell
In the first section of my article, I mentioned that I wasn’t able to open the Avi Shell from within the controller. When I entered the command shell in order to start it, I received the error:
1
2
3
4
5
6
admin@10-10-70-10:~$ shell
Login: admin
Password:
The controller at ['https://10.10.70.10'] is not active. If this is not the controller address, please restart the shell using the --address flag to specify the address of the controller.
{"error": "Bad service"}
I also tried it with the option --address but unfortunately without changing the result. Interesting is, that when entering shell on the controller, a container will spin up in the background and will provide the shell.
This got me thinking. It can come in handy if you can quickly make a shell available in a container without having to install the shell itself on a system.
Therefore, here’s a way to run a container on your system which will host the Avi shell:
Identify the container image on the Avi controller using docker images:
1
2
3
4
5
6
7
8
admin@10-10-70-10:~$ sudo docker images
[sudo] password for admin:
REPOSITORY TAG IMAGE ID CREATED SIZE
avinetworks/cli 22.1.5.9093 8fd02452dac3 4 months ago 484MB
avinetworks/controlscript 22.1.5.9093 ecc7d8244a83 7 months ago 72.8MB
avinetworks/grafana influxdb3 4a4cc4a8f5e6 3 years ago 159MB
avinetworks/grafana latest 38e8ff91a07e 4 years ago 246MB
Package the container image into a tar-ball using docker save:
1
2
3
4
admin@10-10-70-10:~$ sudo docker save avinetworks/cli -o /tmp/avinetworks-cli.tar
admin@10-10-70-10:~$ ls -l /tmp/avinetworks-cli.tar
-rw------- 1 root root 500762624 Feb 28 14:50 /tmp/avinetworks-cli.tar
Give the Avi default user admin owner permissions to the tar-ball:
1
2
3
4
5
admin@10-10-70-10:~$ sudo chown admin /tmp/avinetworks-cli.tar
admin@10-10-70-10:~/avicli-image$ ls -l
total 489032-rw------- 1 admin root 500762624 Feb 28 14:45 avinetworks-cli.tar
Download the tar-ball from the Avi controller:
1
2
3
4
5
6
7
8
9
10
11
12
scp admin@10.10.70.10:/tmp/avinetworks-cli.tar ~/Downloads/
Avi Cloud Controller
Avi Networks software, Copyright (C) 2013-2017 by Avi Networks, Inc.
All rights reserved.
Management: 10.10.70.10/24 UP
Gateway: 10.10.70.1 UP
admin@10.10.70.10's password:
avinetworks-cli. 100% 478MB 35.8MB/s 00:13
Import the container image to your system using docker load:
The Avi shell can be very usefull when you’d like to obtain specific information or if you’d like to make adjustments on the existing configuration to the Avi controller without using the GUI.
# Define the variables$remoteServer="admin@10.10.70.10"$remotePath="/var/lib/avi/backups/*"$baseLocalPath="C:\avibackups"# Download pscp from: https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html$pscpPath="C:\avibackups\pscp.exe"# Generate a timestamp$timestamp=Get-Date-Format"yyyy-MM-dd_HH-mm-ss"# Create a new directory with the timestamp$newLocalPath=Join-Path-Path${baseLocalPath}-ChildPath${timestamp}New-Item-ItemTypeDirectory-Force-Path${newLocalPath}# Optional: Add your SSH key path if you're using one# $sshKeyPath = "C:\path\to\your\private_key.ppk"# Download files from remote server into the newly created directory using $sshKeyPath# & $pscpPath -i ${sshKeyPath} ${remoteServer}:${remotePath} ${newLocalPath}# Download files from remote server into the newly created directory&$pscpPath${remoteServer}:${remotePath}${newLocalPath}
Bash:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#!/bin/bash
set -euo pipefail
# Define variablesREMOTE_SERVER="admin@10.10.70.10"REMOTE_PATH="/var/lib/avi/backups/*"BASE_LOCAL_PATH="/Users/rguske/Downloads/avibackup"# Generate a timestampTIMESTAMP=$(date +"%Y-%m-%d_%H-%M-%S")# Create a new directory with the timestampNEW_LOCAL_PATH="$BASE_LOCAL_PATH/$TIMESTAMP"mkdir -p "$NEW_LOCAL_PATH"# Download files from the remote server into the newly created directoryscp -r "${REMOTE_SERVER}:${REMOTE_PATH}""${NEW_LOCAL_PATH}"