VPN Outage

Outage category: 
VPN
Location: 
All VPN Users
Status: 
Closed
Resolved alert: 
10/08/2020 10:30 am

Users were unable to log in to the VPN or CUI between 10:17 and 10:30 AM on October 8, 2020.

Initial symptoms: 

Users were unable to log in to the VPN

Duration: 
10/08/2020 10:17 am - 10/08/2020 10:30 am
Impact to Mason: 

Users were unable to log in to the VPN

Affected Services: 
Virtual Private Network (VPN)
Other Affected Services: 
duoproxy.gmu.edu vpn.gmu.edu CUI
ROOT CAUSE ANALYSIS
Cause: 

A new radius client was added to one of the duo proxy servers. When the service was reloaded an error was thrown due to an issue with the client. That error was corrected and the process was reloaded. No errors were thrown.

If we had checked at this point we would have seen that the duoproxy process was no longer running, but we didn’t.

Since it appeared to be successful, the new radius client was added to the 2nd duo proxy server. The process was reloaded and there were no errors.

At this point there were no duo proxy servers with duo proxy processes running, so the service was down.

Resolution: 

NET notified us of a service outage. The new client info was removed and the processes were restarted on both duo proxy servers.

Prevention: 

After making any changes we will ensure the duo proxy service is actually running since we now know there are some scenarios where it will silently fail.

We are upgrading the software which may fix the silent failure bug.

Longer term we are moving the duo proxy processes to run in Docker containers in Kubernetes. All changes will go through a CI/CD pipeline that will perform automated testing, including acceptance testing that will ensure that a misconfigured client definition that prevents the duo proxy process from running can never be promoted to production.

STATISTICS
Service Team: 
Network & Security Operations and Advanced Technology