Outage category: Wired Network, Wireless Network
Location: Merten Hall 3rd floor
Status: Closed
Resolved Alert:
Initial Symptoms
Network & Security Engineering received secondhand reports via email of a cluster of users experiencing problems on the third floor of Merten Hall. On investigation, Network Engineering team observed intermittent, abnormal operation of the local campus edge router.
Root Cause Analysis
Cause
The primary cause was spanning tree topology changes and re-convergence across the different sets of links in the multi-chassis link aggregation groups (MLAG) connecting the downstream access switches to the router. This is abnormal operation since the MLAGs should operate as a single logical link to each switch. All problems observed were associated with line module 3 in switch chassis 1.
Resolution
Network Engineering created a workaround by shutting down the MLAG interfaces in line module 3 in switch chassis 1.
Prevention
Based on an earlier failure (18 weeks previous), the manufacturer has determined there is likely a hardware defect in module 3 of switch chassis 1. They will send a replacement unit to be installed at a later scheduled maintenance window. After the crash, they are also investigating the crash report to see if additional equipment may need replacement.
Replace hardware (module 3 of switch chassis 1 and any other replacements recommended by the manufacturer). Schedule a maintenance restart if the replacement hardware is not installed before November 13.