Zoom Outage

Outage category: 
Applications
Location: 
All Zoom Users
Status: 
Closed
Resolved alert: 
04/18/2022 9:37 am

Affected users would see a Mason branded ADFS page that stated:
An error occurred
You are not authorized to access this site. Click here to sign out and sign in again or contact your administrator for permissions.

Initial symptoms: 

Two separate users emailed ecinfo@gmu.edu with similar login issues. EC verified an incident, and EC notified the ITS Active Directory Engineer. The ITS Support Center notified EC of reports of Mason Zoom login issues when using SSO.

Duration: 
04/18/2022 9:00 am - 04/18/2022 9:37 am
Impact to Mason: 

Any new logins to Mason Zoom using SSO would receive an error stating they were not authorized to access the site. These users could not continue logging into Zoom. This would not prevent a guest from joining a Zoom meeting, unless if the meeting required Mason authentication for entry. Any currently logged in applications and web browsers were not affected as they had cached logins.

Affected Services: 
Zoom, Conferencing, Collaboration & Calling
ROOT CAUSE ANALYSIS
Cause: 

ADFS has the ability to make ad-hoc changes to the Access Control rules which control access and 2fa. ITS developed the updated policy using our testing application; as ITS makes changes to the access control rules they are almost immediately live. This has always been the case when changing access control rules. Once the new rules were fully tested and verified, we scheduled the update to Zoom on the 15th, via ITS RFC # 232822. The change should have been seamless as it has been for every ADFS trust.

On April 15th the rules were updated and CCSE and EC completed independent testing to ensure everything was working as expected. At this time the AD groups were not removed so if there was a problem with the new rules it would be easy to rollback to the old rules. On the morning of the 16th the Duo AD groups were deleted. This caused the Zoom ADFS trust to deny all users. Within minutes of the issue being reported to CCSE, the AD groups were restored and service was restored.

Resolution: 

Within minutes of the issue being reported to CCSE, the AD groups were restored and service was fully functional.

Prevention: 

ITS will attempt to make changes to access control for Zoom during off (after) hours.

As completed prior to the outage:
ITS did complete, and will continue to complete these preventative measures:
ITS will continue to confirm any new rules in place for the Zoom trust will be checked via the GUI and also through PowerShell commands. ITS will continue to confirm that this is not a replication issue between ADFS server where the primary server did not replicate out to the secondary server. Testing will be done in the Production ADFS environment with a Microsoft trust called Claims X-ray, which is specifically used for testing access control rules and claims. We will do the development and testing against the Claims X-Ray trust without affecting other trusts. We will continue to complete testing, allowing ITS to copy over or apply the Access Control rules to a production trust.

STATISTICS
Service Team: 
Enterprise Collaboration, Cloud Computing and Storage Engineering