Maksim Kabakou - Fotolia

Cyber firms need to centre their own resilience

The Computer Weekly Security Think Tank panel considers incident response in the wake of the July CrowdStrike incident, sharing their views on what CrowdStrike got wrong, what it did right, and next steps

Niel Harper

Published: 04 Sep 2024

Information security is essentially an information risk management discipline. By rendering many information systems inoperable, the global outage precipitated by Crowdstrike prevented several companies from accessing critical business information due to unplanned and extended downtime.

The unavailability was not only to information systems, but also to related information processing. It was not only an information risk event, but it was also an information security incident. And the impact of the risk event/information security incident was high from operational, financial, reputation, legal, technological and even regulatory perspectives.

This Crowdstrike security incident most definitely represents a breakdown in digital trust between vendors and their customers. So what needs to be done to restore this trust?

The wrong

Automation of testing is an area where many organisations are routinely failing. For startups especially, it's not uncommon to automate testing internally and then to quickly release updates to remediate any bugs – essentially using their customers as quality assurance (QA).

However, in recent times, more companies are incorporating this practice as part of their 'agile methodology' or to drive CI/CD pipeline efficiency and quickly scale the business.

More and more security companies are promising 'Swiss army knife' solutions whereby they provide automated updates and takeover the ongoing maintenance cycles from businesses.

The potential disaster with this is when an automated update occurs, where testing was partially or fully automated, and there are issues that weren't picked up by automated testing systems, resulting in mass outages to businesses in critical sectors, threatening public safety.

The right

From the onset, many companies experiencing a major security incident are being generally transparent with all their stakeholders – this includes customers, partners, staff, and investors.

Crowdstrike, for example, did a solid job in this respect. They admitted they were at fault, and quickly worked with their own teams and Microsoft’s teams to develop a fix. Crowdstrike executive management also led the charge in reaching out to several customers offering assistance in terms of remediation and recovery. They conducted a thorough root cause analysis (RCA) and have been somewhat transparent in terms of where their security controls failed.

The company has committed to an action plan, which includes enhancements to people, process, and technology. They also seem prepared for increased regulatory intervention and oversight, especially given the incident had a material impact on critical sectors. For example, in the European Union (EU), legislation such as the Digital Operational Resilience Act (DORA), Network and Information Security (NIS2) Directive, and the Cyber Resilience Act will demand that Crowdstrike provide assurances to lawmakers that such incidents will not happen in the future.

CrowdStrike details errors that led to mass outage

Learn more about the root causes of the July 2024 CrowdStrike outage.

What can we do better?

Going forward, organisations need to put a greater focus on business resilience, specifically around business continuity management (BCM), disaster recovery, third-party risk management (TPRM), and incident management.

Risk monitoring for multiple scenarios, including supply chain issues, is a first step. These can be added to existing risk registers, business impact analyses (BIAs), and risk and control self assessments (RCSAs).

Broader risk treatment plans can encompass, but are not limited to, greater scrutiny around product security in third-party risk assessments, more rigorous testing for vendor updates (yes, even updates from endpoint detection tools), disabling of auto-updates where feasible, staggered deployments of vendor updates, updates to incident playbooks and disaster recovery plans to address third-party risks, and the inclusion of risk simulations for third-party security incidents in cybersecurity exercises.

Niel Harper is vice chair of the ISACA board of directors.

Cyber firms need to centre their own resilience

The Computer Weekly Security Think Tank panel considers incident response in the wake of the July CrowdStrike incident, sharing their views on what CrowdStrike got wrong, what it did right, and next steps

The wrong

The right

CrowdStrike details errors that led to mass outage

What can we do better?

Read more on Data breach incident management and recovery

CrowdStrike outage explained: What caused it and what’s next

CrowdStrike apologises to US government for global mega-outage

CrowdStrike exec apologizes to Congress, shares updates

CrowdStrike incident shows we need to rethink cyber