After Action Report- Network Service Interruption

Outage category: 
Wired Network, Wireless Network
Location: 
All Campuses
Status: 
Closed
Resolved alert: 
05/03/2024 9:38 am
Dropped connections, and performance issues were experienced when connecting to Internet sites, services, and applications.
Initial symptoms: 
Dropped connections, performance issues when connecting to Internet sites, services, and applications.
Duration: 
05/02/2024 3:30 pm - 05/03/2024 9:38 am
Impact to Mason: 

All Network users were impacted

Affected Services: 
Network Access
Other Affected Services: 
All services accessed via Mason networks and all users utilizing those networks would have experienced performance impacts.
ROOT CAUSE ANALYSIS
Cause: 
Mason’s Internet router, L3CR-01-A-IR01, received too many internet routing advertisements from our Internet2 connector, MARIA, during RFC 489141. This overwhelmed the router’s operating memory and caused performance problems for all routed traffic.
Resolution: 
The initial workaround was to shut down our connection to MARIA, which removed the routes and stopped attracting traffic. The resolution was to restart the router to repair operating memory and for MARIA to apply a route filter, limiting the number and types of routes sent to a manageable level.
Prevention: 

Procedurally, ITS will coordinate with our upstream carriers and providers to determine approximate number of expected routes before a peering configuration is turned up. If the number of routes is too large, we’ll work with the provider to filter routes to a manageable level before turning up the peering. ITS will also ask for a live call with the provider while the work is being done to more quickly catch and react to any problems.

Technically, the new network equipment (to be installed this summer) has higher capacity routing tables that will prevent similar service-impacting problems in the future.

STATISTICS
Service Team: 
NSENG