ResNet (Wi-Fi) OutageJanuary 21, 2022 10:09 a.m.
Approximately 50 students had WiFi interruption/outage. Apogee Engineering noted there was a common thread amongst affected users and identified that the root issue appeared to be on our wireless controller. After emergency maintenance we continued to monitor and noted that WiFi was not improving due to a bug in the software . Two more maintenance were done for to fix the bug in the software and to balance the load on the nodes and stop cascade fail over noted below
50+ students on the DP03 were not able to consistently connect to ResNet
Approximately 50 students on dp3 had interruption in internet service.
DP03 was failing causing lose in service to users. The cluster was in a cascade failure stated. Cause was the HA was set too low. As DPs add the new VLANS, their control plane became unresponsive too long and the cluster would fail over to the next DP in line restarting the process.
DP03 was removed from the cluster and upgrade during reboot. Bug in software was fixed and The DP03 was re-added to cluster with an additional maintenance on 10/25 to readjust load balance among VLANS.
Entire cluster HA times have been has increased to avoid cascade fail-overs.