Friday, November 22, 2024

CrowdStrike update leads to disruption across critical infrastructure environments

Must read

A software update from cybersecurity firm CrowdStrike has caused significant outages in Microsoft’s Windows systems, leading to widespread disruptions across the transportation, banking, and healthcare sectors. These systems have faced issues such as Blue Screens of Death (BSOD), sudden shutdowns, and other usability problems due to the faulty update, impacting operations extensively.

The Australian Signals Directorate’s Australian Cyber Security Centre (ASD’s ACSC) said in an alert that it is monitoring the situation and can provide assistance and advice as required. “Organisations or individuals that have been impacted or require assistance can contact us via 1300 CYBER1 (1300 292 371).”

The U.S. Department of Homeland Security (DHS) and the Cybersecurity and Infrastructure Security Agency (CISA) said in a statement on X, formerly Twitter, that they “are working with CrowdStrike, Microsoft and our federal, state, local and critical infrastructure partners to fully assess and address system outages.”

The Federal Aviation Administration (FAA) said in a statement on X that “The FAA is closely monitoring a technical issue impacting IT systems at U.S. airlines. Several airlines have requested FAA assistance with ground stops until the issue is resolved.” 

“We continue to work closely with airlines as they work to resume normal operations. Ground stops and delays will be intermittent at various airports as the airlines work through residual technology issues,” the FAA added. “Currently FAA operations are not impacted by the global IT issue. We continue to monitor the situation closely.”

CrowdStrike said in a statement that it is actively working with customers impacted by a defect found in a single content update for Windows hosts. Mac and Linux hosts are not impacted. “This is not a security incident or cyberattack. The issue has been identified, isolated and a fix has been deployed. We refer customers to the support portal for the latest updates and will continue to provide complete and continuous updates on our website.

“We further recommend organizations ensure they’re communicating with CrowdStrike representatives through official channels,” it added. “Our team is fully mobilized to ensure the security and stability of CrowdStrike customers.”

CrowdStrike Engineering has identified a content deployment related to this issue and reverted those changes.

If hosts are still crashing and unable to stay online to receive the Channel File Changes, the workaround steps for individual hosts include rebooting the host to allow it to download the reverted channel file. If the host crashes again, then boot Windows into Safe Mode or the Windows Recovery Environment; navigate to the ‘%WINDIR%\System32\drivers\CrowdStrike’ directory, locate the file matching ‘C-00000291*[dot]sys,’ and delete it. Then, boot the host normally.

CrowdStrike provides two options for public cloud or similar environments including virtual. In the first case, it advises detaching the operating system disk volume from the impacted virtual server; creating a snapshot or backup of the disk volume before proceeding further as a precaution against unintended changes; and attaching/mounting the volume to a new virtual server. Then, navigate to the ‘%WINDIR%\System32\drivers\CrowdStrike’ directory, locate the file matching ‘C-00000291*[dot]sys,’ and delete it; detach the volume from the new virtual server; and reattach the fixed volume to the impacted virtual server. 

The second option is to roll back to a snapshot before 0409 UTC.

“CrowdStrike is actively working with customers impacted by a defect found in a single content update for Windows hosts. Mac and Linux hosts are not impacted. This is not a security incident or cyberattack,” George Kurtz, president and CEO of CrowdStrike, wrote in an X message. “The issue has been identified, isolated and a fix has been deployed. We refer customers to the support portal for the latest updates and will continue to provide complete and continuous updates on our website.” 

He added “We further recommend organizations ensure they’re communicating with CrowdStrike representatives through official channels. Our team is fully mobilized to ensure the security and stability of CrowdStrike customers.

Researchers from the Institute for Critical Infrastructure Technology told Industrial Cyber that they have outlined the direct impacts of recent events on critical infrastructure. The D.C. Metro system was also impacted, telling ABC News some of its internal systems were down and IT teams are working to resolve the issue. New York City’s mass transit system, the largest in the U.S., says that bus and train operations are not impacted by the global outage, though some MTA customer information systems are temporarily offline.

Additionally, the Department of Justice computers were affected by the outage, though there was no indication that it was affecting law enforcement activities in the field. A notice issued by the DOJ Office of the Chief Information Officer said the issue is ‘significant’ and there is no restoration time. The Canadian Press said that the CrowdStrike Windows outage had disrupted some of its services, including wire content, all audio and photo delivery.

“The global technology incident involving Crowdstrike’s IT shutdown underscores critical lessons in interconnected IT-OT environments. Dependence on single systems magnifies risks, highlighting the need for diversified infrastructure,” Parham Eftekhari, founder and chairman at the Institute for Critical Infrastructure Technology, wrote in an emailed statement. “Swift adaptation to manual processes underscores the importance of robust backup systems and disaster recovery plans.” 

Eftekhari added that “the scale of disruption we are seeing from a single update emphasizes the necessity of thorough testing and staged deployments. This incident exposes vulnerabilities in global interconnectedness, impacting sectors like transportation, healthcare, and government services worldwide. Moving forward, the tech industry must prioritize resilience in cloud infrastructure and update management to mitigate future disruptions and safeguard against potential threats in our increasingly digitized world.”

“Developing a robust patch management strategy in ICS environments is crucial for maintaining security while ensuring system availability and operational integrity,” Greg Valentine, SVP of solutions engineering at Industrial Defender told Industrial Cyber. “Key practices include conducting regular risk assessments, classifying and prioritizing assets based on their criticality, and thoroughly testing patches in a controlled environment.”

He added that security teams must partner with operations and for OT to collaborate with IT. “Effective patching requires strong collaboration and planning, with all sides of the organization understanding what’s involved in each maintenance window and also what the protocols are for responding to emergency out-of-band vulnerabilities. Various parts of the organization must be involved in understanding the associated security, business, operational, and compliance risks.”

Commenting on the global outage, Omer Grossman, chief information officer (CIO) at CyberArk wrote in an emailed statement that the damage to business processes at the global level is dramatic. “The glitch is due to a software update of CrowdStrike’s EDR product. This is a product that runs with high privileges that protect endpoints. A malfunction in this can, as we are seeing in the current incident, cause the operating system to crash.”

“News of a global IT outage that has caused problems at airlines, media, and banks is a timely reminder that operational resilience should be at the forefront of the business agenda,” Alan Stephenson-Brown, CEO of Evolve, wrote in an emailed statement. “Demonstrating that even large corporations aren’t immune to IT troubles, this outage highlights the importance of having distributed data centers and rerouting connectivity that ensures business can continue functioning when cloud infrastructure is disrupted.”

Josh Thorngren, security strategist at ForAllSecure wrote in an emailed statement that while “Allowing third-party software to self-update in your own environment can save time on testing and verification, but it also means that new bugs or issues are rolled out without any warning or checks and balances. For critical systems like emergency management and aviation – automated updates should be heavily restricted and a good process with human-in-the-loop verification should occur.”

Addressing some of the changes that need to be made, Thorngren said “Software vendors need to integrate functional regression testing – does my application still work the way it did before this change – into every security or dependency upgrade, not just feature work.  That’s the biggest single difference-maker here.  If you’re not testing the behavior of your application under expected (and unexpected) conditions with every update – this type of issue will always be a risk.”

“As far as long-term impacts go – hopefully this spurs regulation,” Thorngren noted. “We’ve already seen changes in industries like automotive, medical software, and others – where new regulation and guidance drives software vendors to do functional testing on a more regular cadence – and test the safety and security of their updates hand in hand.  It’s time that the same approach comes for broader enterprise software – this isn’t an industry-specific problem. Vendors like Crowdstrike have their software across dozens of critical industries.”

Latest article