50+ students on the DP03 were not able to consistently connect to ResNet
Approximately 50 students on dp3 had interruption in internet service.
DP03 was failing causing lose in service to users. The cluster was in a cascade failure stated. Cause was the HA was set too low. As DPs add the new VLANS, their control plane became unresponsive too long and the cluster would fail over to the next DP in line restarting the process.
DP03 was removed from the cluster and upgrade during reboot. Bug in software was fixed and The DP03 was re-added to cluster with an additional maintenance on 10/25 to readjust load balance among VLANS.
Entire cluster HA times have been has increased to avoid cascade fail-overs.