3 things industrial control system enterprises should do to boost cyber-resilience

Here's an action plan for industrial control system cyber-resilience

  • Published:
  • 11/04/2024 12:25 PM

The increased connectivity and interoperability with IT/OT convergence via connecting OT systems, networks, and applications to enterprise IT amplifies the cybersecurity attack surface. Image: Shutterstock

Enterprises in smart critical infrastructure-driven sectors such as manufacturing, energy, water, and transportation (among many others) rely upon the cyber-resilience of industrial control system (ICS) infrastructure to sustain business continuity. Business continuity here not only encompasses the feature of non-disruptive minimal service quality to customers but also the feature of ensuring the safety properties of that service. As Brian Deken, Business Development Manager of ICS giant Rockwell Automation, put it: "As a citizen, I'd like to know whether my drinking water is safe or whether a cyber-attack is affecting it or could possibly affect it". Imagine an event in a smart city with about a million people accessing maliciously targeted non-potable drinking water. How much could this event negatively affect society's economic, health, and lifestyle welfare? Is there a strategy by which the management in such enterprises can maintain business continuity in the event of inevitable cyber-attacks to mitigate these repercussions?

In this article, we provide a brief overview of how ICSs operate, their security challenges, and examples of cyber-attacks that have significant adverse repercussions on society. Subsequently, we provide a three-point action plan for ICS management to boost cyber resilience and maintain business continuity in the event of inevitable cyber-attacks.

An Overview of ICS Operation and its Security Challenges

Smart critical infrastructure usually consists of operational technology (OT) devices, including legacy and proprietary mechanical gadgets (e.g., controllers, chillers, elevators) fitted with sensors and actuators. These are networked with each other via OT-specific wired or mobile/wireless technology (e.g., Modbus, CC-Link, and so on), where communication between interoperable gadgets takes place via custom software drivers. This OT network (e.g., spanning an industrial plant floor) is then managed via distributed and/or centralised IT and OT control using HMIs, application software, cloud computing, and human-driven dashboards. This popular IT/OT convergence in ICSs governs business process management in such systems.

The increased connectivity and interoperability with IT/OT convergence via connecting OT systems, networks, and applications to enterprise IT amplifies the cybersecurity attack surface. In other words, the recent IT/OT convergence has minimised the traditional air gap between the IT and OT parts of an ICS enterprise that was the cornerstone of ICS cybersecurity. Legacy OT was designed and implemented before cybersecurity was even a concern in ICSs. Hence, the modern ICS in smart cities has a "patched-in" cybersecurity rather than the much-needed "baked-in" cybersecurity for their networks and applications. Consequently, this increases the risk of cyber-criminals accessing sensitive ICS data and making unauthorised changes to the ICS controls of industrial operations in critical infrastructure.

To drive home this point, according to Rockwell Automation, the number of cybersecurity incidents on ICSs between 2021 and 2022 alone is about one-third the number of similar incidents between 1980-2010. In addition, there has been a 50 percent rise in ransomware attacks on ICS in 2023 compared to 2022 (says Dragos, an ICS cybersecurity market leader). It is evident that the number of ICS cyber-incidents is rising exponentially by the year. Christopher Wray, the director of the US Federal Bureau of Investigation (FBI), says that in 2024, Beijing's efforts to plant offensive malware inside US critical infrastructure covertly were greater than ever before.

Examples of ICS Cyber-Attacks

The most common forms of OT/ICS cyber attacks are operational disruption, unauthorised access or data exposure, or broader supply chain attacks. To showcase the importance of cyber resilience and business continuity in ICSs, we provide examples of ICS cyber-attacks in these forms and those that had a significant impact on society.

The LockerGoga ransomware attack of 2019 on Norsk Hydro, a multinational aluminium manufacturer, compromised the firm's IT systems, including networked servers and PCs, and the business functions reliant on them. The attack affected all 35,000 Norsk employees, led to multiple plants going offline, and eventually cost the firm approximately $75 million.

In another incident, Mondelez, a multinational food and beverage company (and maker of the popular Oreo cookies), fell prey to the NotPetya cyber-attack in 2017. The NotPetya malware encrypted and permanently damaged Mondelez's 1700 servers and 24,000 laptops. This disrupted Mondelez's production facilities and other operations across the globe and resulted in them incurring business losses amounting to $100 million because they could not complete customer orders.

In a more recent incident from 2021, Colonial Pipeline, an oil and gas company controlling nearly half of the gasoline, jet fuel, and diesel flowing along the East Coast of the USA, fell prey to the DarkSide ransomware cyber-attack. Colonial Pipeline took an immediate precautionary step to shut down all its operational technology to prevent the ransomware infection from spreading into its OT networks. As a result, Colonial Pipeline experienced business discontinuity, as 5500 miles of pipeline had to be shut down. Despite paying hackers a ransom of around $4.5 million, Colonial Pipeline took about a week to restore its operational technology networks that drive the pipeline operation.

When it comes to data breach cyberattacks, the state of New York's critical infrastructure in the year 2023 was subject to nine incidents in health care and public health, eight incidents in financial services and seven incidents in both commercial and government facilities, co-contributing to a massive $775 million cyber loss for the state. According to DiNapoli (comptroller of the state of New York), "Data breaches at companies and institutions that collect large amounts of personal information expose New Yorkers to potential invasions of privacy, identity theft and fraud."

In an incident related to a cyber-attack on a broader supply chain, Japanese car-manufacturing giant Toyota suspended operations on 28 production lines across 14 plants for at least 24 hours in 2022 because Kojima Industries, one of Toyota's key supply chain partners and a plastic parts and electronic components manufacturer was hit by a malware cyber-attack. The world's top-selling carmaker incurred a business disruption cost of about $375 million from possibly being hit by malware Emotet that enters ICSs through IT, compromising social engineering hacks. More importantly, it took Kojima months to get operations close to old routines.

In all the above examples, compromising the IT wing of the ICS adversely impacted the performance of the OT wing of the ICS. This type of 'indirect' compromise on ICS OT forms approximately 84 percent of adversary ways to disrupt critical infrastructure.

Also read: Cyber-security management landscape of the Indian automation industry: Overview, challenges, action points

Takeaways from the Cyber-Attacks

There are three primary takeaways from all these major cyber incidents. First, ICS environments with thousands of actuator/sensor devices create ever-increasing opportunities for enterprises to become cyber-vulnerable. The fact that IT system compromise is sometimes sufficient to cause disruptions in OT system functioning (as is currently the trend) implies that it can cause business disruption impact far greater than currently reported. Second, protecting ICS OT networks is mandatory but is never sufficient to avoid cyber threats and their implications for business continuity – there is nothing called perfect security. Finally, the bigger the OT network in an enterprise, the broader the cyber-risk terrain, and the greater the chance that the enterprise will eventually be cyber-breached, significantly impacting business continuity.

The only rational thing to do then for OT-driven enterprises is to accept this risk and design cyber resilience in their OT networks, business policies, structures, and operations. Cyber resilience will ensure that the IT/OT-converged ICS will be fault tolerant to always enable sustained continuity of core business processes at the minimum service guarantees. It is, however, practically and economically infeasible for cyber resilience to ensure cyber-protection of all vulnerable points in the OT network.

Three Things ICS Enterprises Should Do to Boost Cyber-Resilience

Given that cybersecurity investments within OT networked ICS enterprises are budget-constrained, we conducted systems/operational thinking research at MIT CAMS on such enterprises to derive three network-dependent strategic insights for ICS system managers to boost cyber resilience.                             

                              Figure 1: An Illustration of an Industrial Control System Network

1) Identify Cyber Vulnerabilities and Their Severity in an ICS Network

ICS infrastructure networks, by nature, have both a virtual and a physical dimension. The virtual network captures software/process/information dependencies among IT/OT processes via communication links. On the other hand, the physical network captures physical dependencies between IT/OT and cyber-physical system devices over WiFi/Bluetooth communication links. The physical network is usually an IT network (e.g., an enterprise management and control network) communicating with an OT network (e.g., a field network). A figurative illustration of the physical ICS network is shown in Figure 1 (borrowed from Hu et al). Boosting cyber-resilience is about managing the adverse system impact of exploitable major vulnerabilities in the virtual and physical ICS networks.

As part of the management task, these vulnerabilities should be identified first. Multiple major vulnerabilities exist between any link connecting parts of an ICS network that can exploited at the software, network, and hardware levels. Publicly known vulnerabilities are termed Common Vulnerabilities and Exposures (CVEs). An example is CVE 2022-46680, which affects commercial ION and PowerLogic power meters. Managers should first identify these CVEs (from MITRE's National Vulnerability Database) for each link of an ICS network. Each CVE carries a Common Vulnerability Scoring System (CVSS) score denoting its severity (e.g., CVE 2022-46680 with a CVSS score of 8.8). For unknown vulnerabilities (e.g., those resulting in zero-day attacks), the managers should rely on their domain expertise to derive approximate CVSS scores. Managers should then collect the CVSS scores for each link. It is evident that for a given enterprise in context, not all CVEs for a link are equally likely. Managers should consequently derive a single weighted CVSS score for each link in an ICS network.

2) Prioritise What Assets to Protect in the ICS Network

A lack of sufficient protection on critical ICS network assets can bring down most of the network following an adverse attack on these assets. An example is an advanced persistent threat (APT) ransomware attack that first targets critical OT devices (via a network discovery process) through CVEs in the Remote Desktop Protocol and, after gaining administrative access, locks a significant fraction of OT infrastructure.

We simulated ICS network breakdown when a few critical network assets were compromised by analysing multiple publicly available case studies on service non-availability in ICSs after a cyber incident. We observed a tipping point phenomenon wherein a few critical ICS assets became operationally unavailable to provide service, bringing down most of the ICS network. This tipping point phenomenon will amplify incident response times considerably, diminishing the chances that business continuity of core processes will be sustained.

Thus, an important action item for ICS network managers is determining which special network assets, i.e., the 'crown jewels', should be prioritised for cyber-protection to prevent a tipping point phenomenon. After all, hardly any ICS enterprise has an unlimited budget to protect all the IT and OT assets. Examples of OT' crown jewels' include critical data, logical/physical assets, and/or OT control processes, whereas IT' crown jewels' include HMIs, data servers, and OPC/application servers.

ICS system managers should first rank the importance of network assets. A popular scientific way to do this is by ranking the assets based on a network centrality (importance) measure adopted from network science theory. A centrality measure showcases how much influence a given asset's proper functioning has on the proper functioning of the other assets in the ICS network. The higher the centrality of an ICS network asset, the greater its criticality. However, there are multiple centrality measures in practice, each providing managers with a different listing order.

Our research simplifies the listing dilemma for ICS managers. It recommends managers work on a single asset criticality listing that combines multiple centrality measures in a manner specific to a particular OT environment. Our research generates this single list via a novel managerial decision-making framework derived from applying the seminal Analytic Hierarchy Process (AHP) theory in operations management research.

Figure 2: An Illustration of Incident Response (IR) Durations

3) Strategically Invest in Network Asset Cyber-Protection

Identifying the critical ICS network assets in a ranked order is just half the work in improving the cyber-resilience of ICS networks. Equally important is strategically allocating a limited protection budget among the assets in proportion to their criticality in the ICS network. This action will reduce the time to incident response (IR) (see Figure 2). Incident response time acts as a performance indicator of ICS network cyber-resilience. Our research proposes a quantitative cyber-resilience metric that correlates with this performance indicator and accounts for the ICS network structure. The shorter this time duration, the more cyber-resilient an ICS network is.

Our quant-based research recommends ICS system managers proportionately allocate a cyber-protection budget among ICS assets in decreasing order of their centrality values in the network. This recommendation boosts/optimises our proposed quantitative cyber-resilience measure that correlates with the time to incident response (IR).

Multiple managerial cyber-protection allocation policies (MCAPs) can be drawn up depending on the type of asset centrality measure deployed. Our research observes and compares MCAP effectiveness insights to reduce IR time based on whether a network of ICS assets is fragile (i.e., individual assets have low/zero recovery rates after becoming non-operational due to a cyber-attack) or non-fragile (i.e., individual assets have medium/high recovery rates). It also accounts for whether an ICS network structure is balanced (i.e., a few assets are connected to far many assets compared to the others).

We summarise our network-specific quant-based MCAP recommendations to boost ICS cyber-resilience in Figure 3. These recommendations are validated via extensive simulations run atop real-world OT asset-driven ICS network structures. The recommendations suggest budget allocation proportional to decreasing order of (a) the network influence-based centrality measure (e.g., Katz centrality that computes the centrality for a network node based on the centrality of its neighbours) of an asset if individual ICS assets are not fragile to boost cyber-resilience, and (b) the node influence-based centrality measure (e.g., degree centrality that is the fraction of network nodes a given node is connected to) of an asset if ICS assets are fragile to boost cyber-resilience.

   Figure 3: Resilience Boosting MCAPs for Various (Asset Fragility, Network Structure) Scenarios

Likewise, in scenarios when an OT-driven ICS network structure is unbalanced, budget allocation based upon path-based centrality measures (e.g., betweenness centrality, which is the sum of the fraction of shortest network paths that go through a given node for all source-destination paths) is recommended to boost ICS cyber-resilience.

Summary

Critical IT/OT converged ICS network assets must be protected to sustain business continuity, which is a notion of cyber-resilience. Not doing so will lead to a cascade of assets becoming non-operational after a cyber-attack, making the network non-resilient. A three-point action plan is proposed that makes it possible for ICS network managers to systematically rank the criticality of ICS assets and strategically invest in protecting these assets.

The action plan first involves identifying the major cyber vulnerabilities and their severity in an ICS network that can considerably impact ICS cyber resilience. The second action point involves ICS management considering this information and prioritising the assets to protect in the IT/OT converged ICS network to boost/optimise a quantified cyber-resilience metric that correlates with the time to incident response (IR) after a cyber-attack. The final action point requires ICS management to invest in asset protection in proportion to asset criticality within the ICS network.

 Ranjan Pal, Michael Siegel, MIT Sloan School of Management and Bodhibrata Nag, Indian Institute of Management Calcutta

Last Updated :

April 11, 24 12:36:45 PM IST