A post-incident review by CrowdStrike has shed light on the cause of a massive Windows crash that affected 8.5 million machines worldwide. According to the report, a faulty update was to blame, which was caused by a bug in the company’s Content Validator tool.
The issue occurred when two new Template Instances were released on July 19, one of which contained “problematic data” that passed validation despite being only 40KB in size. When this update was received by the Falcon Sensor, a kernel-level tool designed to prevent DDoS and other types of attacks, it triggered an out-of-bounds memory read and resulted in a Windows operating system crash, also known as a blue screen of death (BSOD).
The incident had far-reaching consequences, impacting multiple companies worldwide, including airlines, broadcasters, and financial institutions. Many organizations, such as Delta Airlines, are still recovering from the outage, which forced Windows machines into a boot loop and required local access to recover.
To prevent similar incidents in the future, CrowdStrike has promised to implement a series of new measures, including more thorough testing of Rapid Response content, additional validation checks, and enhanced error handling. The company will also adopt a staggered deployment strategy for Rapid Response Content, which will allow customers to control the delivery of updates and provide release notes for each update.
However, some experts have questioned why these measures were not in place from the start. “CrowdStrike should have been aware of the potential risks associated with these updates,” said engineer Florian Roth. “Implementing a staggered deployment strategy from the beginning would have prevented this global outage.”
[…] CrowdStrike’s Post-Incident Review Reveals Series of Critical Errors […]