COLO Servers Are Unavailable
April 14, 2023 10:20 a.m.Websites and services on the affected hosts were unavailable.
After the patch apply to first COLO host, esxi1, network and nfs storage issues appeared for this one host. These were then resolved after seeing the configuration changes done by the patch, but the bug within patch undid these fixes about 90 minutes after these were fixed and the CCSO team was applying these patches to the subsequent two ESXi hosts for COLO.
Users of these systems, including Law School Students performing registration.
During normal monthly ESXi patching for the COLO environment, per RFC 254730, there was a bug within the VMware code, per https://kb.vmware.com/s/article/88875, that was to be fixed with this patch. This patch, however, did not fix this issue and when implementing this patch the bug happened wherein the Standard Virtual Switch configuration was changed, undoing normal configuration, and when CCSO engineers fixed the configuration (specifically around the Firewall rules within VMware, and Virtual Switch Port Group settings) they moved on to patching the remaining hosts. This bug still presented itself and then undid those fixes.
CCSO engineers then worked with VMware to resolve this issue.
Doing the work on a single host before moving to the others (and only doing so after the first host is upgraded successfully) and with us doing them in the following order: DR NODUS, Fairfax NODUS, COLO, CUI