Network Outage – Merten Hall 3rd Floor

Outage category: 
Wired Network, Wireless Network
Location: 
Merten Hall 3rd floor
Status: 
Closed
Resolved alert: 
09/28/2022 3:27 pm

Degraded performance (intermittent slowness, interruptions) when accessing web sites or using network-dependent applications.  Network unavailable for about 10 minutes after a system crash and reboot near the end of the incident.

Initial symptoms: 

Network & Security Engineering received secondhand reports via email of a cluster of users experiencing problems on the third floor of Merten Hall. On investigation, Network Engineering team observed intermittent, abnormal operation of the local campus edge router.

Duration: 
09/28/2022 3:25 pm - 09/28/2022 3:27 pm
Impact to Mason: 

All Users connected to the Blue Ridge Core Campus Edge Router were affected – primarily users in Merten and Peterson Hall.

Affected Services: 
Network Access
Other Affected Services: 
All network dependent services were impacted (initially performance degraded; unavailable for the duration of the system crash and reboot near the end of the incident).
ROOT CAUSE ANALYSIS
Cause: 

The primary cause was spanning tree topology changes and re-convergence across the different sets of links in the multi-chassis link aggregation groups (MLAG) connecting the downstream access switches to the router. This is abnormal operation since the MLAGs should operate as a single logical link to each switch. All problems observed were associated with line module 3 in switch chassis 1.

Resolution: 

Network Engineering created a workaround by shutting down the MLAG interfaces in line module 3 in switch chassis 1.

Prevention: 

Based on an earlier failure (18 weeks previous), the manufacturer has determined there is likely a hardware defect in module 3 of switch chassis 1. They will send a replacement unit to be installed at a later scheduled maintenance window. After the crash, they are also investigating the crash report to see if additional equipment may need replacement.

Replace hardware (module 3 of switch chassis 1 and any other replacements recommended by the manufacturer). Schedule a maintenance restart if the replacement hardware is not installed before November 13.

STATISTICS
Service Team: 
NSENG and NSOPS