VxRAIL upgrade COMPLETED_WITH_FAILURE
Updated: Dec 7, 2020
I was recently involved in a VCF on VxRAIL environment upgrade from 184.108.40.206 to 220.127.116.11. As part of that upgrade path, a VxRAIL upgrade from version 4.7.511-26539430 to 4.7.515-26640584 was included. The upgrades were initiated from the SDDC Manager.
The MGMT WLD upgraded just fine, but when we upgraded the VI domain, the upgrade failed, but strangely the PRECHECK passed in SDDC Manager. We also noticed some unresponsive behaviour occurring in the VxRAIL Manager plugin.
Step 1: If you check out the Updates button in the VxRAIL Manager plugin, you can see a bit more detail as to what technical process is happening during the upgrade.
Step 2: After downloading the VxRAIL support bundle from vCenter, you can locate the lcm.log which contains the lifecycle manager logs for the cluster.
There were a few errors under the lcm.log vSAN sections for the hosts listed above, but further investigation showed that the vSAN cluster was working just fine with no errors.
Step 3: The next step was verifying the upgrade process state in the /var/lib/vmware-marvin/bundle_state.json file, which showed an error:
Step 4: A snapshot of the VxRAIL Manager VM was taken before using VI to change the upgrade state from "UPGRADE_ERROR" to "NONE":
Step 5: After this, we verified the upgrade status, current version and upgrade version in VxRAIL Manager:
mystic@vxm01:> psql -U postgres mysticmanager -c "select id, upgrade_status, current_version, upgrade_version from virtual_appliance;"
From here we can see the latest release 4.7.515-26640584 should be in the "id 1" slot, which indicates that should be the next patch in the upgrade path.
Step 6: We can manually set the upgrade status to HAS_NEWER by running the following command: update virtual_appliance set upgrade_status='HAS_NEWER' where id=1;UPDATE 1
Step 7: Then restart both marvin and the runjars services:
systemctl restart runjars (Updates information in the web interface)
systemctl restart vmware-marvin (Updates marvindb)
Step 8: After this, rerun the upgrade from SDDC Manager, which succeeded this time and the operation was successful.
Do not forgot to delete any old snapshots in the environment and be sure to open a support case with Dell Support if your systems are in a data loss / data inaccessible state.