Outage category: Website
Location: All Users
Status: Closed
Resolved Alert:
Initial Symptoms
Users were not able to access the website to do their daily health checks.
Root Cause Analysis
Cause
An issue with SQL caused the CPU and memory utilization to spike on the SQL database back end. Once this happened, users were unable to use this service, and their sessions would freeze.
Resolution
Limited the number of sessions from “unlimited” to 600, increased RAM from 20 to 32 GB, increased the number of CPUs from 4 to 8, and maxed out the amount of memory SQL could use (maxed at 28 GB)
Prevention
The Service Team is currently looking into the SQL responsible for the CPU and memory utilization spikes. Limiting the maximum number of sessions and providing more resources, and capping resources that the SQL can use should mitigate this issue from happening again.