The Department of Homeland Security (DHS), as part of its new “Shields Up!” program, recently asked private sector businesses to take steps to protect their infrastructure from Russian cyberattack. The resources available on the DHS website highlight the risks to the pharmaceutical sector, in particular, since it has already been a focus of Russian attacks and is a critical component of U.S. infrastructure. Cyber-warfare attacks on infrastructure commonly target control systems, as they are generally poorly secured and can have an immediate destructive impact.
An attack that is targeted, coordinated, and purposeful will eventually be successful. For this reason, most cybersecurity experts recommend a “defense in depth” strategy. This strategy encourages layering defensive barriers meant to slow or prevent system penetration. But, as the United Kingdom’s NCSC points out in the article linked at bottom of this post, mitigating the impact of a cyberattack should also occur after the attack has succeeded, and is actively affecting systems.
So what should we do? Even the DHS’s website doesn’t give clear, technical guidance. Many vendors offer security services, but it sometimes feels like they are fear-peddling without any actionable plans. I have put together some recommendations that I hope will be useful (or at least thought-provoking) for making your Building Automation Systems more resilient.
So what should we do? Even the DHS’s website doesn’t give clear, technical guidance. Many vendors offer security services, but it sometimes feels like they are fear-peddling without any actionable plans. I have put together some recommendations that I hope will be useful (or at least thought-provoking) for making your Building Automation Systems more resilient.
System-Level Changes:
When making changes to your automation system, consider the following:
- System Partitions: Your fire system, HVAC, security, IT, and production infrastructure should not all be unified under one system. Such a system would be an ideal target for an attacker. Once you lose your only mechanism for control and visibility, all other steps you have taken to protect your infrastructure become waste. Determine boundaries for each system, and enforce them. This will make the next step possible.
- Checks and Balances: Systems can monitor each other’s performance without data connections. For example, process/production systems can be set up to alarm when utilities supplied by non-validated systems are out of range. Similarly, non-validated systems might use sub-metering to detect unusually high or low loads from production systems. These checks should be included in the site Fault Detection and Diagnostic (FDD) policy and periodically reviewed.
- Semi-Automatic Control States: Systems should be designed with the idea that they might have to operate with the controller offline. Part of system design should include the values to which components like VFDs, valves, chillers, and boilers should default to support ongoing operation with only periodic human intervention.
- Reduce External Dependencies: In building automation systems, it is common to share sensors between controllers. Generally, this is associated with high-cost sensors or with systems where the control element is far away from the sensing element. While this is sometimes unavoidable, the design process should specifically rationalize these choices. Each shared sensor is something that won’t be accessible if the network floods due to a Distributed Denial of Service (DDoS) attack.
- Move Control Further to the Edge: VFDs like the ABB ACH550/ACH580 and the Siemens BT300 can monitor an analog input and modulate an output using PID logic to maintain a setpoint. There are an increasing number of “Intelligent” valve actuators with similar capabilities. If malware affects a primary controller, or a DDoS attack prevents communication, these components can continue operating, regardless of the state of the other component
- Stop Building Monoliths: Putting every building on campus under a single front end, or using a single alarm interface for every system, is risky. As I previously mentioned under “System partitions”, it allows the attacker to focus on a single piece of infrastructure or access point. Monolithic infrastructure also introduces complexities that otherwise might not exist, like the need to perform protocol conversion at a large scale between two systems that can’t perform checks on each other. Many front-end packages can be visited at a URL by a web browser. Providing direct links to an array of front-end URLs is just as simple to navigate as a single URL that links to all other systems, but without the vulnerability of a single access point.
Operational Changes:
The day-to-day operation of systems should be modified to emphasize:
- Practice Taking Manual and Semi-Automatic Control: This item is already pretty well accepted. The facilities team should periodically practice manually operating systems. The goal of the exercise should be to gain proficiency and update the SOP for manual operation.
- Making Alarms More Meaningful and Actionable: If a cyberattack is causing equipment to malfunction, operators must be able to quickly identify the faulty controller(s) and take them offline. This is easier if there isn’t a flood of alarms distracting them from regaining control. This is also a best-practice, though it is not widely implemented. ISA 18.2 is the reference standard for the topic.
Project Process Changes:
Consider the following changes to the project delivery process:
- Commission Semi-Automatic Mode: The commissioning agent should monitor the building engineers putting the system in semi-automatic mode using the SOP developed by the project, to ensure that the procedure is correct.
- Documentation for Non-Validated Systems: To know if your system is behaving improperly, you’ll have to know what it does when it’s behaving properly. A system needs to have controls sequences that explain what points are used by which control programs, and how to interact with them.
- Controller Backups and Programming in TOP: “Programming” means different things in different systems. Just because you have the “programming” in the turnover package doesn’t mean you can re-create the system if the controller fails. Make sure that everything that can be exported or backed up (graphics files, controller images, programming, point configurations, etc.) is provided at turnover and maintained on-site. Ideally, this would be in a centralized database AND in cold storage.