ResNet (Wi-Fi) Outage

Outage category: 
Wireless Network
Location: 
Fairfax Campus Residence Halls
Status: 
Closed
Resolved alert: 
10/25/2021 12:00 am

Approximately 50 students had WiFi interruption/outage.  Apogee Engineering noted there was a common thread amongst affected users and identified that the root issue appeared to be on our wireless controller.  After emergency maintenance we continued to monitor and noted that WiFi was not improving due to a bug in the software . Two more maintenance were done for to fix the bug in the software and to balance the load on the nodes and stop cascade fail over noted below

Initial symptoms: 

50+ students on the DP03 were not able to consistently connect to ResNet

Duration: 
10/20/2021 4:01 pm - 10/25/2021 12:00 am
Impact to Mason: 

Approximately 50 students on dp3 had interruption in internet service.

Affected Services: 
Network Access
ROOT CAUSE ANALYSIS
Cause: 

DP03 was failing causing lose in service to users.   The cluster was in a cascade failure stated. Cause was the HA was set too low.  As DPs add the new VLANS, their control plane became unresponsive too long and the cluster would fail over to the next DP in line restarting the process.

Resolution: 

DP03 was removed from the cluster and upgrade during reboot.  Bug in software was fixed and The DP03 was re-added to cluster with an additional maintenance on 10/25 to readjust load balance among VLANS.

Prevention: 

Entire cluster HA times have been has increased to avoid cascade fail-overs.

STATISTICS
Service Team: 
Apogee