Outage category: Applications
Location: All users on RedHat Satellite, NetMRI, and several authentication services that may have then impacted downstream applications (such as Zoom) which might have experienced slowness.
Status: Closed
Resolved Alert:
Initial Symptoms
Work was being done with a Dell Technician who rebooted the only working DR Storage Controller.
Root Cause Analysis
Cause
Mistake committed during a fix per RFC 273521. Dell had a typographical error about which server needed repaired, which then caused the technician to insist on rebooting the working Controller. During this the previously working Controller had errors loading and this took time to resolve.
Resolution
Worked with Dell support to replace faulty part within Storage Controller and then get both DR Controllers upgraded, running, and in sync with one another.
Prevention
Insist on multiple parties to agree to steps being done when performing work that has a single point of dependency.