Zoom Outage – August 24th

Outage category: 
Web Conferencing
Location: 
All Locations
Status: 
Closed

Users were unable to start or join meetings, primarily launching via web browsers.

Initial symptoms: 

Users were unable to access the Zoom web portal to start or join meetings and webinars and being directed to billing access errors.

Duration: 
08/24/2020 8:43 am
Impact to Mason: 

Users were unable to start or join meetings, primarily launching via web browsers.

Affected Services: 
Zoom
ROOT CAUSE ANALYSIS
Cause: 

(From Zoom Communications)

To scale globally, Zoom has developed a geographically distributed platform for delivery of services, which is housed across Zoom’s 19 data centers and other well-established cloud service providers. Zoom periodically performs updates within this infrastructure to resolve existing issues and introduce new features and enhancements to services. This infrastructure also integrates with third party providers as needed, for services such as billing and subscription management.

On August 23, during a planned change control window, Zoom deployed a web backend update to introduce additional features and bug fixes to Zoom’s services. This update introduced an inconsistency in the code, which triggered calls to our third party billing system each time a user logs in through the web. Since the code was deployed over the weekend, the concurrency of logins was low and the update passed the post deployment verification. As the traffic started increasing early in the morning of August 24, our third party billing system was unable to handle the increased volume of requests. The billing system is integrated with our Zoom web portal for signed-in web users to start or join meetings. As a result, a subset of signed-in users was unable to access the Zoom web portal or join meetings and webinars.

Resolution: 

(From Zoom Communications)

Zoom’s engineering team deployed an emergency hotfix to roll back this change in the code and initially bypass the billing system, which allowed signed-in users to access the Zoom web portal to join meetings and webinars. Subsequently, with the normalized call requests to the billing system, we were also able to restore customer access to the billing page for subscription updates.

Prevention: 

(From Zoom Communications)

Zoom is performing a thorough analysis of our engineering process to uncover any unexpected gaps between our change and implementation requirements to prevent this from happening again. Zoom is also looking to further improve the functionality testing along with performance testing for any web changes that could impact our overall service. Zoom will investigate if we can further improve our monitoring process to identify surges in third party calls for early warnings.

STATISTICS
Service Team: 
Enterprise Collaboration