When IBM i Skills Become A Resilience Risk

May 11, 2026 Ash Giddings

For many IBM i organizations, availability planning has traditionally focused on technology failure. Hardware faults, storage issues, site outages, and more recently cyber incidents, have shaped how high availability and disaster recovery strategies are designed.

What is less frequently acknowledged is that people have become one of the most significant availability risks in modern IBM i environments.

Skills shortages, retirement trends, and reliance on a small number of highly experienced individuals are changing the risk profile of the platform. In many cases, the greatest threat to continuity is no longer whether systems can fail over, but whether the right people are available, confident, and able to act when something goes wrong.

The Quiet Concentration of IBM i Knowledge

IBM i environments are remarkably stable. That stability has allowed organizations to retain systems and processes for decades, which is both a strength and a challenge.

Over time, deep platform knowledge tends to become concentrated in a smaller number of individuals. These people understand the nuances of the environment, the dependencies between applications, and the steps required to recover from unusual scenarios. They are frequently relied upon during outages, upgrades, and recovery events. Every business has these types of people.

As those individuals retire, change roles, or simply become unavailable, operational risk increases. Documentation rarely captures everything and is often out of date. Recovery procedures that depend on an individual’s memory or experience become fragile. Availability strategies that assume expert intervention begin to break down.

In this context, people risk becomes continuity risk.

Why Skills Shortages Directly Affect Availability

High availability and disaster recovery plans are often technically sound but operationally fragile. They work well when executed by the right people under the right conditions.

In a real incident, stress levels are high, decisions must be made quickly, and staff unfamiliar with the situation or technology may be required to act. If recovery processes are complex, manual, or rarely exercised, the likelihood of delay or error increases.

This is where many organizations discover that their HA or DR strategy is overly dependent on individual expertise. The technology may be capable, but the organization’s ability to execute is constrained by who is available at the time.

From a business continuity perspective, this is a significant gap.

Downtime Is More Expensive When Recovery Depends On People

The financial impact of downtime continues to rise. Industry estimates regularly place the cost of IT downtime in the tens or hundreds of thousands of dollars per hour, with the true impact often higher for systems supporting core financial and operational processes.

When recovery depends heavily on a small number of specialists, delays become more likely. Even short outages can extend while the right person is located, context is re-established, or confidence is built to execute recovery steps.

In these situations, the cost of downtime is no longer driven solely by technical failure, but by human dependency. Reducing that dependency is increasingly becoming a core objective of modern availability planning.

Designing HA And DR For Fewer Hands And Less Tribal Knowledge

Addressing skills risk does not mean removing people from the equation. It means designing availability strategies that are easier to operate, easier to validate, and less reliant on individual expertise.

Modern HA solutions increasingly reflect this reality. Features that simplify operations, reduce manual intervention, and make recovery behavior predictable are essential in environments where skills are scarce.

Maxava HA is designed with this operational reality in mind. Capabilities such as simulated role swap allow organizations to validate recovery behavior safely and regularly, helping teams build confidence without disrupting the production environment. Coupled with this, a clever utility called Command Scripting Function allows customers to automate many of the traditional manual tasks associated with role swaps. This moves recovery readiness from a theoretical exercise into a repeatable operational practice that more than one individual can own.

Readiness Is Not Assumed, It Is Verified

One of the challenges in environments with limited skills is knowing whether systems are genuinely ready to swap.

This is where structured validation becomes important. Services such as the Maxava Swap Ready Audit provide an objective assessment of whether an HA environment is truly prepared for a role swap. Organizations that meet the criteria receive formal vendor confirmation that their environment is swap ready, giving both IT teams and business stakeholders greater confidence in their continuity posture.

Importantly, this kind of assessment shifts readiness away from individual judgment and toward independent repeatable standards.

Ongoing Monitoring And Expert Support Reduce Operational Risk

Even well-designed HA environments evolve over time. Configuration drift, infrastructure changes, and operational shortcuts can quietly erode readiness.

For organizations with limited IBM i expertise, continuous monitoring of the HA and DR environment can provide early warning when issues arise. Proactive remediation support helps address problems before they become incidents, reducing the need for reactive firefighting during outages.

Access to expert assistance for planned and unplanned role swaps further reduces reliance on internal specialists. Knowing that experienced support is available during critical events allows organizations to respond faster and with greater confidence, even when key personnel are unavailable.

Flexibility Matters As Teams And Workloads Change

Skills shortages often coincide with changing infrastructure strategies. IBM i workloads are increasingly portable, running on premises, in private cloud environments, and in public cloud infrastructure.

Availability solutions must be flexible enough to support this reality. Support for multiple topologies, including one to one, one to many, many to one, and cascade configurations, allows organizations to design resilience around how they actually operate.

Adding a secondary node in an alternative location, whether on premises or in the cloud, can significantly improve continuity without notably increasing operational complexity or staffing requirements. That flexibility becomes especially valuable when teams are stretched thin.

Replacing Legacy HA As An Opportunity To Reduce People Risk

Many organizations encounter these challenges when replacing legacy HA solutions. Competitive product replacements often expose how much operational knowledge has been embedded in older tools and informal processes.

This moment of change provides an opportunity to reassess not just technology, but dependency on individuals. Moving to a modern HA solution combined with supporting services can significantly reduce the operational burden on internal teams, make continuity more sustainable over time, and often reduce costs.

A More Sustainable Model For IBM i Continuity

IBM i remains a trusted platform for mission critical workloads. Protecting its availability requires recognizing that resilience is as much about people as it is about systems.

By treating skills shortages as a continuity risk, and by adopting HA and DR strategies supported by validation, monitoring, and expert assistance, organizations can build a more sustainable availability model.

In an environment where experienced IBM i professionals are increasingly scarce, combining modern HA technology with flexible services is no longer just a convenience. It is a practical way to ensure that business continuity does not depend on a single individual being available at the right moment.

For more information, check out https://www.maxava.com/services

Ash Giddings is a product manager at Maxava and an IBM Champion.

This content is sponsored by Maxava.

This Issue Sponsored By

Table of Contents

Content archive

Recent Posts

Subscribe

Pages

Search