When IBM i Skills Become A Resilience Risk
May 11, 2026 Ash Giddings
For many IBM i organizations, availability planning has traditionally focused on technology failure. Hardware faults, storage issues, site outages, and more recently cyber incidents, have shaped how high availability and disaster recovery strategies are designed.
What is less frequently acknowledged is that people have become one of the most significant availability risks in modern IBM i environments.
Skills shortages, retirement trends, and reliance on a small number of highly experienced individuals are changing the risk profile of the platform. In many cases, the greatest threat to continuity is no longer whether systems can fail over, but whether the right people are available, confident, and able to act when something goes wrong.
The Quiet Concentration of IBM i Knowledge
IBM i environments are remarkably stable. That stability has allowed organizations to retain systems and processes for decades, which is both a strength and a challenge.
Over time, deep platform knowledge tends to become concentrated in a smaller number of individuals. These people understand the nuances of the environment, the dependencies between applications, and the steps required to recover from unusual scenarios. They are frequently relied upon during outages, upgrades, and recovery events. Every business has these types of people.
As those individuals retire, change roles, or simply become unavailable, operational risk increases. Documentation rarely captures everything and is often out of date. Recovery procedures that depend on an individual’s memory or experience become fragile. Availability strategies that assume expert intervention begin to break down.
In this context, people risk becomes continuity risk.
Why Skills Shortages Directly Affect Availability
High availability and disaster recovery plans are often technically sound but operationally fragile. They work well when executed by the right people under the right conditions.
In a real incident, stress levels are high, decisions must be made quickly, and staff unfamiliar with the situation or technology may be required to act. If recovery processes are complex, manual, or rarely exercised, the likelihood of delay or error increases.
This is where many organizations discover that their HA or DR strategy is overly dependent on individual expertise. The technology may be capable, but the organization’s ability to execute is constrained by who is available at the time.
From a business continuity perspective, this is a significant gap.
Downtime Is More Expensive When Recovery Depends On People
The financial impact of downtime continues to rise. Industry estimates regularly place the cost of IT downtime in the tens or hundreds of thousands of dollars per hour, with the true impact often higher for systems supporting core financial and operational processes.
When recovery depends heavily on a small number of specialists, delays become more likely. Even short outages can extend while the right person is located, context is re-established, or confidence is built to execute recovery steps.
In these situations, the cost of downtime is no longer driven solely by technical failure, but by human dependency. Reducing that dependency is increasingly becoming a core objective of modern availability planning.
Designing HA And DR For Fewer Hands And Less Tribal Knowledge
Addressing skills risk does not mean removing people from the equation. It means designing availability strategies that are easier to operate, easier to validate, and less reliant on individual expertise.
Modern HA solutions increasingly reflect this reality. Features that simplify operations, reduce manual intervention, and make recovery behavior predictable are essential in environments where skills are scarce.
Maxava HA is designed with this operational reality in mind. Capabilities such as simulated role swap allow organizations to validate recovery behavior safely and regularly, helping teams build confidence without disrupting the production environment. Coupled with this, a clever utility called Command Scripting Function allows customers to automate many of the traditional manual tasks associated with role swaps. This moves recovery readiness from a theoretical exercise into a repeatable operational practice that more than one individual can own.
Readiness Is Not Assumed, It Is Verified
One of the challenges in environments with limited skills is knowing whether systems are genuinely ready to swap.
This is where structured validation becomes important. Services such as the Maxava Swap Ready Audit provide an objective assessment of whether an HA environment is truly prepared for a role swap. Organizations that meet the criteria receive formal vendor confirmation that their environment is swap ready, giving both IT teams and business stakeholders greater confidence in their continuity posture.
Importantly, this kind of assessment shifts readiness away from individual judgment and toward independent repeatable standards.
Ongoing Monitoring And Expert Support Reduce Operational Risk
Even well-designed HA environments evolve over time. Configuration drift, infrastructure changes, and operational shortcuts can quietly erode readiness.
For organizations with limited IBM i expertise, continuous monitoring of the HA and DR environment can provide early warning when issues arise. Proactive remediation support helps address problems before they become incidents, reducing the need for reactive firefighting during outages.
Access to expert assistance for planned and unplanned role swaps further reduces reliance on internal specialists. Knowing that experienced support is available during critical events allows organizations to respond faster and with greater confidence, even when key personnel are unavailable.
Flexibility Matters As Teams And Workloads Change
Skills shortages often coincide with changing infrastructure strategies. IBM i workloads are increasingly portable, running on premises, in private cloud environments, and in public cloud infrastructure.
Availability solutions must be flexible enough to support this reality. Support for multiple topologies, including one to one, one to many, many to one, and cascade configurations, allows organizations to design resilience around how they actually operate.
Adding a secondary node in an alternative location, whether on premises or in the cloud, can significantly improve continuity without notably increasing operational complexity or staffing requirements. That flexibility becomes especially valuable when teams are stretched thin.
Replacing Legacy HA As An Opportunity To Reduce People Risk
Many organizations encounter these challenges when replacing legacy HA solutions. Competitive product replacements often expose how much operational knowledge has been embedded in older tools and informal processes.
This moment of change provides an opportunity to reassess not just technology, but dependency on individuals. Moving to a modern HA solution combined with supporting services can significantly reduce the operational burden on internal teams, make continuity more sustainable over time, and often reduce costs.
A More Sustainable Model For IBM i Continuity
IBM i remains a trusted platform for mission critical workloads. Protecting its availability requires recognizing that resilience is as much about people as it is about systems.
By treating skills shortages as a continuity risk, and by adopting HA and DR strategies supported by validation, monitoring, and expert assistance, organizations can build a more sustainable availability model.
In an environment where experienced IBM i professionals are increasingly scarce, combining modern HA technology with flexible services is no longer just a convenience. It is a practical way to ensure that business continuity does not depend on a single individual being available at the right moment.
For more information, check out https://www.maxava.com/services
Ash Giddings is a product manager at Maxava and an IBM Champion.
This content is sponsored by Maxava.
RELATED STORIES
Why Logical Replication Has Become The New Standard for IBM i HA/DR
A Hardware Refresh Is The Perfect Time To Re-Evaluate Your HA/DR Strategy
Is Your IBM i HA/DR Actually Tested – Or Just Installed?
Maxava Consulting Services Does More Than HA/DR Project Management – A Lot More
Coming To You Live In A Datacenter Very Near You, Or One Far Away, Too
In The IBM i Trenches With: Maxava
ISE Grows MSP Business With Maxava’s Monitor Mi8
Maxava Strengthens Leadership Team With Key Appointments
Securing The Crown Jewels When Intruders Break Into The Glass House
No Matter Where You Are Going, Migrate Live Helps You Get There
Fully Managed Disaster Recovery For IBM i Partners
Using The Public Cloud For IBM i Disaster Recovery
Capture Point Restore: The Perfect Companion For High Availability
Myth Buster: Changing Your HA/DR Software Is Not Hard Work!
IBM Knows Your System, So You Already Know Its Cloud
The Case For Software-Based IBM i HA/DR
In The IBM i Trenches With: IBM Champion Ash Giddings
Why Modernize Your Legacy Monitoring?
Maxava Monitor Mi8 And The Cloud Fuels Expansion
Maxava Adds New Products, Partners, And Users Around The World

