Network Outage – Merten Hall 3rd Floor
October 04, 2022 3:29 p.m.Degraded performance (intermittent slowness, interruptions) when accessing web sites or using network-dependent applications. Network unavailable for about 10 minutes after a system crash and reboot near the end of the incident.
Network & Security Engineering received secondhand reports via email of a cluster of users experiencing problems on the third floor of Merten Hall. On investigation, Network Engineering team observed intermittent, abnormal operation of the local campus edge router.
All Users connected to the Blue Ridge Core Campus Edge Router were affected – primarily users in Merten and Peterson Hall.
The primary cause was spanning tree topology changes and re-convergence across the different sets of links in the multi-chassis link aggregation groups (MLAG) connecting the downstream access switches to the router. This is abnormal operation since the MLAGs should operate as a single logical link to each switch. All problems observed were associated with line module 3 in switch chassis 1.
Network Engineering created a workaround by shutting down the MLAG interfaces in line module 3 in switch chassis 1.
Based on an earlier failure (18 weeks previous), the manufacturer has determined there is likely a hardware defect in module 3 of switch chassis 1. They will send a replacement unit to be installed at a later scheduled maintenance window. After the crash, they are also investigating the crash report to see if additional equipment may need replacement.
Replace hardware (module 3 of switch chassis 1 and any other replacements recommended by the manufacturer). Schedule a maintenance restart if the replacement hardware is not installed before November 13.