OT Change and Incident Management: How to avoid the Titanic moment
Why OT needs structure more than ever and what ITIL can (and can’t) teach us about handling change, failure and recovery in industrial systems
Let’s start in the real world
Everyone saw the iceberg. The problem was: the slow down was too late. The Titanic didn’t sink because of bad luck, it sank because known risks were ignored, warnings went unheeded and there were no real processes to deal with crisis. In OT environments, it’s often the same: we know the risks, we see the signs, but we’re missing structure.
Picture this: It’s a regular Thursday morning at a high-speed bottling plant in southern Germany. Production is humming. Operators monitor multiple lines. Everything runs like clockwork until someone schedules a firmware update to the PLC1 that controls Line 3’s conveyor system. The update is approved informally and applied during a shift handover, assuming it won’t interfere with production.
Within 30 seconds, bottles start piling up and falling over. The line jams. A sensor fails to react. The entire cell halts. Operators respond quickly, but the issue isn’t immediately clear. There’s no detailed change log. No formal rollback strategy. No incident response plan tailored to this scenario. Downtime ticks on, with every minute costing thousands.
Now rewind the story, but imagine it in a typical IT context. A software patch is applied to a CRM2 server. Users can’t log in. But IT has a structured response:
The issue is logged.
An incident is opened.
A change record links to the patch.
A rollback plan exists.
The team meets later to analyze what went wrong and how to prevent it.
Same logic, different world. This article explores why OT (Operational Technology) needs its own approach and how to borrow the best of ITIL3 without forcing a square peg into a round hole.
Why It Matters
In IT, a failed change might mean frustrated users. In OT, it could mean production losses, safety risks or regulatory violations.
OT systems run:
Manufacturing lines
Water treatment facilities
Power generation
Railway signals
Building automation systems
And unlike IT, these systems were not designed with frequent change or rapid incident response in mind. Many still run legacy hardware or software not touched in over a decade.
But change is coming:
Digital transformation is connecting OT systems to networks and clouds.
Cyberattacks on OT environments are rising sharply.
New compliance and safety standards demand structured operations.
You can’t wing it anymore. Even if your OT team isn’t big, you still need a playbook.
ITIL in a nutshell (for comparison)
ITIL (Information Technology Infrastructure Library) is the de-facto standard in IT for managing services, systems and processes. It defines:
A. Incident Management
Restore normal service as quickly as possible.
Log, prioritize, escalate, resolve and close.
Examples: Server crash, user login failure, application error.
B. Change Management (now Change Enablement in ITIL 4)
Ensure that changes are assessed, approved and implemented with minimal disruption.
Risk assessment, CAB4 meetings, rollback plans.
C. Problem Management
Identify and remove the root cause of recurring incidents.
Conduct root cause analysis, track known errors.
It also introduces:
Defined roles (e.g., Incident Manager, Change Manager)
Workflows supported by ITSM tools (like ServiceNow or Jira)
Focus on service quality, risk reduction and continuous improvement
The OT World: What's different? (simplified)
OT doesn’t lack discipline. But it has different priorities, constraints and historical context.
Key Differences:
Story Example: An IT team deploys weekly updates to improve user experience. In a paper mill, even rebooting a PLC may require halting production, draining tanks and safety clearance.
Frameworks that help in OT
Let’s look at the major frameworks and how they support structured incident, change and problem processes in OT environments.
A. IEC 62443 (especially part 2-1)
This is the leading global standard for OT cybersecurity and operational governance. Think of it as the OT equivalent of ISO/IEC 27001, but with safety and reliability front and center.
What it offers:
Requirements for a Cybersecurity Management System (CSMS)
Guidance on secure system design, integration, maintenance
Maturity models for assessing process consistency
How it maps to ITIL-style processes:
Change Management:
Documented change process
Impact and risk assessment required
Mandatory testing and rollback
Incident Management:
Incident response plans
Defined escalation paths
Role-specific responsibilities (e.g. integrator, asset owner)
Problem Management:
Root cause analysis as part of continuous improvement
Structured documentation of learnings
Use Case Example: A machine builder wants to update SCADA software at a client site. Under IEC 62443, they must submit a documented change request including:
Purpose
Risk assessment
Test results
Backup strategy
Rollback plan
Communication with all involved parties
B. NIST SP 800-82 Rev. 3 + NIST Cybersecurity Framework (CSF)
SP 800-82 is a U.S. publication, but globally respected. It applies the NIST CSF to OT systems.
Core functions:
Identify: Asset inventory, roles, data flows
Protect: Access control, maintenance, data integrity
Detect: Anomalies, log monitoring
Respond: Contain, analyze, report incidents
Recover: Plans to restore systems and learn
How it aligns with ITIL-style logic:
Clear steps for incident response and containment
Encourages planning around configuration changes
Emphasizes feedback loops for process improvement
Example: An oil pipeline control center detects odd valve behavior. Following NIST:
Detect abnormal behavior (via logs or operators)
Trigger incident response
Isolate affected segment
Notify stakeholders
Analyze root cause
Update system rules and training
C. ISO/IEC 27001 (with OT integration)
While not OT-specific, ISO 27001 can support security governance in OT, especially in regulated industries (e.g. pharma, energy).
Supports structured risk management
Can be combined with IEC 62443 for certification
Detailed comparison: OT vs. ITIL processes
Challenges in Bridging IT and OT
Language Barrier
IT talks services, SLAs, apps
OT talks loops, sensors, safety interlocks
Tooling Gap
IT has mature platforms; OT often lacks centralized tooling
Change Aversion in OT
Not cultural laziness, but safety-first thinking
Incident Visibility
Many OT incidents (e.g. unplanned stops) aren’t logged as such
Problem Follow-up
“We fixed it” often replaces “we understood why it happened”
Getting started (without overengineering it)
Starting structured OT processes doesn’t mean implementing ITIL overnight. It means building lightweight, useful habits that create clarity and reduce risk.
A. Build a simple Asset Inventory
List your key control systems, their location, owner and purpose
Track firmware/software versions
Include critical dependencies (e.g. power, cooling, network links)
Why? You can’t manage changes or respond to incidents if you don’t know what you have.
B. Define roles and responsibilities (lightweight RACI)
Create a one-page matrix answering:
Who approves a change?
Who can execute it?
Who needs to be informed?
Who leads during an incident?
Tip: Keep it visual and post it near operator stations or control rooms.
C. Create a basic change checklist
Use paper, Excel or digital form. For each planned change:
What is being changed?
Why is it needed?
What is the expected impact?
Who tested it, where and when?
What’s the rollback plan?
When will it happen?
Who approved it?
Bonus: Add a checkbox: “Will this impact safety, quality or regulatory compliance?”
D. Establish a simple Incident Response Playbook
For example:
Detection: Alarm, operator report, unusual behavior
Containment: Isolate the system if possible (e.g. unplug network cable)
Notification: Call pre-defined contacts
Documentation: What happened? When? What was done?
Recovery: Apply fix, validate functionality
Review: Document learnings
Deliverable: Print this as a laminated card and mount near SCADA workstations.
E. Introduce "Lessons Learned" Rituals
After any incident or failed change:
Host a 20-minute team huddle
Ask: What happened? Why? What can we improve?
Write it down, even in a shared Word file or notebook
Cultural tip: Celebrate the fact that you reviewed, not that it was perfect.
F. Train using real incidents
Once a month, review a real event:
Was it handled well?
What would we do differently?
Are our procedures clear?
Benefit: Makes documentation practical, not bureaucratic.
G. Build from the bottom up
Don’t wait for enterprise software. Use what you have:
Excel or Google Sheets for tracking
Printed forms for incident logs
Shared folders for documentation
Once habits form, you can layer better tools later (e.g. asset management platforms, change tracking systems).
Why it’s more critical than ever to formalize OT processes (with verified data)
1. Rising Cyber Threats in OT amid IT/OT convergence
OT systems no longer run in isolation, they’re increasingly connected to IT and cloud environments. According to ITPro, ransomware and wiper malware incidents in OT environments rose sharply from 32% in 2023 to 56% in 2024. This trend reflects the growing vulnerability as barriers between OT and IT vanish.
Additionally, the 2025 Security Navigator Report noted a 39% increase in cyberattacks targeting OT systems between 2023 and 2024.
2. Critical infrastructure under fire
A Semperis survey, cited by Infosecurity Magazine, found that 62% of water and electricity operators in the US and UK were targeted by cyberattacks in the past year. Of those, 80% were attacked multiple times, 59% experienced operational disruption and 54% suffered permanent data or system damage.
3. Legacy OT Systems heighten risk
These environments often rely on decades-old hardware and software lacking patching, encryption or modern security controls, not designed for today's connected cybersecurity reality.
4. Regulatory pressure and need for resilience
Governance expectations now include proactive security and process documentation. In the EU, this pressure is mounting through directives like NIS 2 (Network and Information Security Directive) and the Cyber Resilience Act (CRA).
NIS 2 requires operators of essential and important entities to establish structured cybersecurity risk management, incident reporting and governance processes, including for OT systems.
CRA mandates secure-by-design principles and lifecycle cybersecurity controls for connected devices and industrial products placed on the EU market.
In the U.S., the EPA reports that as of late 2024, 97 drinking water systems serving 26.6 million people have critical or high-risk vulnerabilities, raising the bar for OT resilience planning.
5. Reputational and Business Continuity costs
It's no longer just downtime, it’s trust, safety and financial fallout. And the stakes are only rising, as seen in recent utility and critical infrastructure breaches.
6. Building resilience through process discipline
Structured processes (change control, incident response, root-cause reviews) become resilience enablers. Organizations that manage without them risk repeating mistakes.
Summary table: Why process maturity in OT is non-negotiable today
Final Thoughts
You don’t need to force ITIL into OT. But you can (and should) build a process culture that reflects the same ideas:
Plan before you change.
React quickly and safely when things go wrong.
Learn and improve after every event.
Frameworks like IEC 62443 and NIST SP 800-82 give you guidance adapted to the realities of OT, where systems are physical, risks are real and failure isn't just an error message.
Start small. Be pragmatic. Involve your people. And over time, bring structure to the chaos without getting in the way of the work.
What else should be considered? (important additions)
To round out the picture, here are additional key areas that make your OT process model robust and future-proof:
A. OT-Specific roles and responsibilities
While ITIL has clearly defined roles, OT environments often operate with lean teams. Still, clarity is critical.
Typical OT Roles:
Asset Owner - Responsible for lifecycle decisions and approvals
Control System Engineer - Designs and maintains the automation logic
Maintenance Lead - Manages availability and response to technical failures
OT Security Lead - Aligns security controls with process needs
System Integrator - Executes changes across multiple systems or vendors
Why this matters: Clear roles avoid confusion in high-pressure situations and improve coordination with IT.
B. Change testing and validation in OT
Unlike IT, OT systems can rarely be duplicated for test environments. That doesn’t mean testing is optional.
Options in OT:
Simulation mode: Some controllers offer a virtual run-through
Offline testing: Use spare hardware or cloned PLCs
Shadowing: Monitor a proposed change in read-only mode first
Digital Twin: Advanced but effective for complex plants
Goal: Always validate logic and side-effects before deploying to production.
C. Cross-functional integration
Change or incident processes in OT don’t happen in isolation. They must align with:
Production planning: To schedule changes without disrupting OEE
Quality assurance: To validate product integrity post-change
Health & Safety: To ensure safe work procedures during interventions
Tip: Involve these functions early in your templates or workflows.
D. OT-specific KPIs and metrics
If you can’t measure it, you can’t improve it. But classic IT metrics often miss the mark in OT.
Useful OT Metrics:
Change Success Rate (CSR) - % of changes without rollback or incident
Mean Time to Repair (MTTR) - Average recovery time from incidents
Unplanned Downtime per Line/Asset - Tracks stability over time
Recurring Incident Rate - Helps identify weak spots in root cause work
Use: Simple Excel dashboards can suffice at first.
E. Tools and practical workarounds
Not every OT environment has a ServiceNow license. But you don’t need one to get started.
Practical Tools:
Excel/SharePoint: Use structured templates with dropdowns
Paper-based logs: Still useful in field operations
Shift logs: Expand them to capture incidents and changes
Low-code apps: Use tools like Microsoft Power Apps for forms
Key is consistency, not sophistication.
F. Maturity Model for OT process adoption
Where are you today and where do you want to be?
Advice: Aim for Level 3 before considering software or certification.
A note for Experts: What this article is and isn’t
This article is not a blueprint for compliance or a substitute for deep technical security architecture. It’s a practical guide for OT leaders, plant managers and technical decision-makers who need to bring more structure into environments where safety, reliability and legacy constraints shape reality.
Yes, every plant is different. Yes, not all IEC 62443 parts apply equally to every sector. And yes, ITIL wasn’t made for PLCs or real-time loops. But the core message stands: in modern OT environments, lack of structured processes is no longer defensible. Not operationally, not legally and not in front of your customers.
The goal here is clarity, not completeness. The analogies are deliberate simplifications, because a well-run shopfloor needs clear thinking more than it needs buzzword fluency.
If you're already operating at Maturity Level 4+, this article probably isn’t for you. But if you're somewhere between tribal knowledge and Excel sheets, this might just help get the next conversation started.
Further reading & resources
IEC 62443 Overview (Wikipedia)
General introduction to the standard series, structure and terminology.Understanding IEC 62443 (IEC Blog)
Official blog explaining the scope and applications of the standard.NIST SP 800-82 Rev. 3 (Final Guide)
The latest NIST guide to securing OT systems, aligned with CSF.General introduction to ITIL and it’s processes
PLC (Programmable Logic Controller): A rugged industrial computer used to control machines and processes. Like the brain of a production line.
CRM (Customer Relationship Management): Software that manages interactions with customers (e.g., Salesforce, Microsoft Dynamics).
ITIL (Information Technology Infrastructure Library): A best-practice framework for managing IT services, including how to handle changes, incidents and problems in a structured way.
CAB (Change Advisory Board): A team that reviews and approves proposed changes based on risk and impact.
This is such a critical parallel. The real risk in OT isn’t just technical it’s cultural: the assumption that “it won’t interfere” often overrides the structure designed to prevent catastrophe. Formal change management isn’t bureaucracy; it’s the bulwark against the iceberg we all see but hope to avoid. Process isn’t the enemy of speed it’s the guardian of resilience.