GCP – How to build a digital twin to boost resilience
“There’s no red teaming on the factory floor,” isn’t an OSHA safety warning, but it should be — and for good reason. Adversarial testing in most, if not all, manufacturing production environments is prohibited because the safety and productivity risks outweigh the value.
If resources were not a constraint, the security team would go build another factory with identical equipment and systems and use it to conduct proactive security testing. Almost always, costs outweigh the benefits, and most businesses simply can not support the expense.
This is where digital twins can help. Digital twins are essentially IT stunt doubles, cloud-based replicas of physical systems that use real-time data to create a safe environment for security and resilience testing. The digital twin environment can be used to test for essential subsystem interactions and repercussions as the systems transition from secure states to insecure states.
- aside_block
- <ListValue: [StructValue([(‘title’, ‘Don’t test in prod: Use digital twins for safer, smarter resilience’), (‘body’, <wagtail.rich_text.RichText object at 0x3e0ffa1783a0>), (‘btn_text’, ‘Learn more’), (‘href’, ‘https://cloud.google.com/transform/dont-test-in-prod-use-digital-twins-safer-smarter-resilience’), (‘image’, None)])]>
Security teams can operationalize digital twins and resilience analysis using the following approach:
-
Gain a deep understanding about the correlations between the leading indicators of cyber resilience and the role of digital twins in becoming resilient. The table below offers this mapping.
-
Get buy-in from business leaders, including the CMO, CIO, and CTO. Security teams should be able to demonstrate the strategic value to the organization by using digital twins for adversarial security testing without disrupting production.
-
Identify the right mix of engineers and security experts, as well as appropriate technologies to execute the strategy. Google Cloud’s security and infrastructure stack is positioned to help security teams achieve operational digital twins for security (see table below).
Cyber resilience leading indicator |
Role of digital twins |
Hard-restart recovery time |
Simulate various system failure scenarios on the digital twins and discover subsequent rebuild processes. Identify areas of improvement, optimal recovery procedures, and bottlenecks. |
Cyber-physical modularity |
Use digital twins to quantify the impact of single point failures on the overall production process. Use the digital twin environment to measure metrics such as the mean operational capability of a service in a degraded state and trackability of the numbers of modules impacted by each single point failure. |
Internet denial and communications resilience |
Simulate the loss of internet connectivity to the digital twins and measure the proportion of critical services that continue operating successfully. Assess the effectiveness of the backup communication systems and the response speed. This process can also be applied to the twins of non-internet facing systems. |
Manual operations |
Disrupt the automation controls on the digital twins and measure the degree to which simulation of manual control can sustain a minimum viable operational delivery objective. Incorporate environmental and operational constraints such as the time taken for the personnel to manually control. |
Control pressure index (CPI) |
Model the enablement of security controls and dependencies on the digital twins to calculate CPI. Then, simulate failures of individual controls or a combination of controls to assess the impact. Discover defense-in-depth improvement opportunities. |
Software reproducibility |
Not applicable |
Preventative maintenance levels |
Explore and test simulated failures to optimize and measure preventative maintenance effectiveness. Simulate the impact of maintenance activities, downtime reduction, and evaluate return on investment (ROI). |
Inventory completeness |
Inventory completeness will become apparent during the digital twin construction process. |
Stress-testing vibrancy |
Conduct red teaming, apply chaos engineering principles, and stress test the digital twin environment to assess the overall impact. |
Common mode failures |
In the twin environment, discover and map critical dependencies and identify potential common mode failures that could impact the production process. In a measurable manner, identify and test methods of reducing risk of cascading failures during disruption events. |
What digital twins architecture can look like with Google Cloud
To build an effective digital twin, the physics of the electrical and mechanical systems must be represented with sufficient accuracy.
The data needed for the construction of the twin should either come from the physical sensors or computed using mathematical representations of the physical process. The twin should be modeled across three facets:
-
Subsystems: Modeling the subsystems of the system, and pertinent interactions between the subsystems (such as a robotic arm, its controller, and software interactions).
-
Networks: Modeling the network of systems and pertinent interactions (such as plant-wide data flow and machine-to-machine communication).
-
Influencers: Modeling the environmental and operational parameters, such as temperature variations, user interactions, and physical anomalies causing system and network interruptions.
Developing digital twins in diverse OT environments requires secure data transmission, compatible data storage and processing, and digital engines using AI, physics modeling, applications, and visualization. This is where comprehensive end-to-end monitoring, detection, logging, and response processes using tools such as Google Security Operations and partner solutions comes in.
The following outlines one potential architecture for building and deploying digital twins with Google Cloud:
-
Compute Engine to replicate physical systems on a digital plane
-
Cloud Storage to store data, simulate backup and recovery
-
Cloud Monitoring to emulate on-prem monitoring and evaluate recovery process
-
Manufacturing Data Engine (MDE) to securely transfer live data from the manufacturing/OT systems
-
Cloud Pub/Sub for real-time messaging service for streaming data from systems and sensors. MDE uses Pub/Sub.
-
Google Kubernetes Engine (GKE) to run failures scenarios in a modular isolated fashion
-
Google Cloud VPN to simulate secure and insecure connection to the twins and simulate connectivity failure scenarios
-
Network Intelligence Center to gain network performance metrics during failure and recovery scenarios
-
Cloud Logging to perform retrospective analysis and perform live detection.
-
Cloud Armor to evaluate defense against simulated DDoS attacks
-
Security Command Center offers two key tools: Attack Path simulation, which can emulate realistic cyberattacks in the digital twin environment; and web and vulnerability scanning to tailor the attack scenarios to simulated exploitation of existing production systems vulnerabilities.
-
BigQuery to store, query, and analyze the datastreams received from MDE and to perform adversarial testing’s postpartum analysis
-
Spanner Graph and partner solutions such as neo4j to build and enumerate the industrial process based on graph-based relationship modeling
-
Machine learning services (including Vertex AI, Gemini in Security, partner models through Vertex AI Model Garden) to rapidly generate relevant failure scenarios and discover opportunities of secure customized production optimization. Similarly, use Vision AI tools to enhance the digital twin environment, bringing it closer to the real-world physical environment.
-
Cloud Run functions for serverless compute platform, which can run failure-event-driven code and trigger actions based on digital twin insights
-
Looker to visualize and create interactive dashboards and reports based on digital twin and event data
-
Apigee to securely expose and manage APIs for the digital twin environment. This allows for controlled access to real-time data from on-prem OT applications and systems. For example, Apigee can manage APIs for accessing building OT sensor data, controlling HVAC systems, and integrating with third-party applications for energy management.
-
Google Distributed Cloud to run digital twins in an air-gapped, on-premises, containerized environment
An architectural reference for building and deploying digital twins with Google Cloud.
Security and engineering teams can use the above Google Cloud services illustration as a foundation and customize it to their specific requirements. While building and using digital twins, both security of the twins and security by the twins are critical. To ensure that the lifecycle of the digital twins are secure, cybersecurity hardening, logging, monitoring, detection, and response should be at the core design, build, and execution processes.
This structured approach enables modelers to identify essential tools and services, define in-scope systems and their data capabilities, map communication and network routes, and determine applications needed for business and engineering functions.
Getting started with digital twins
Digital twins are a powerful tool for security teams. They help us better understand and measure cyber-physical resilience through safe application of cyber-physical resilience leading indicators. They also allow for the adversarial testing and analysis of subsystem interactions and the effects of systems moving between secure and insecure conditions without compromising safety or output.
Security teams can begin right away to use Google Cloud to build and scale digital twins for security:
-
Identify the purpose and function that security teams would like to simulate, monitor, optimize, design, and maintain for resilience.
-
Select and identify the right physical or industrial object, system, or process to be replicated as the digital twin.
-
Identify pertinent data flows, and interfaces, and dependencies for data collection and integration.
-
Be sure to understand the available IT and OT, cloud, and on-premises telemetry across the physical or industrial object,system, or process.
-
Create the virtual model that accurately represents its physical counterpart in all necessary aspects.
-
The replica should be connected to its physical counterpart to facilitate real-time data flow to the digital twin. Use a secure on-premises connector such as MDE to make the secure connection between the physical and digital environments running on Google Cloud VPC.
-
To operationalize the digital twin, build the graph-based entity relationship model using Spanner Graph and partner solutions like neo4j. This uses the live data stream from the physical system and represents it on the digital twin.
-
Use a combination of Cloud Storage and BigQuery to store discrete and continuous IT and OT data such as system measurements, states, and file dumps from the source and digital twin.
-
Discover common mode failures based on the mapped processes that include internal and external dependencies.
-
Use at least one leading indicator with Google Threat Intelligence to perform threat modeling and evaluate the impact on the digital twin model.
-
Run Google’s AI models on the digital twins to further advance the complexity of cyber-resilience studies.
-
Look for security and observability gaps. Improve model fidelity. Recreate and update the digital twin environment. Repeat step 10 with a new leading indicator, new threat intelligence, or an updated threat model.
-
Based on the security discoveries from the resilience studies on the digital twin, design and implement security controls and risk mitigations in the physical counterpart.
To learn more about how to build a digital twin, you can read this ebook chapter and contact Google Cloud’s Office of the CISO.
Read More for the details.