CCSS Takes Aim at IBM i Availability in New Guide

March 22, 2011 Alex Woodie

The recent earthquakes in New Zealand and Japan are stark reminders that normal business operations can end in the blink of an eye. To that end, systems management software vendor CCSS last week released a new guide that lays the basic groundwork for achieving a highly available Power Systems environment.

The first step in achieving availability is to define one’s terms. What does availability mean to you? Is it the same as a service level agreement (SLA)? Without a clear understanding of when a server or application has crossed the magical line and becomes “unavailable,” it’s impossible to create a path to higher availability.

The second step in CCSS’ best-practices guide is identifying threats to availability. Users can pull from their own history of near-misses to create their own top 10 lists. They key here is to be detailed. Common threats cited by CCSS include an inactive journal receiver and an inactive job scheduler, but there are many, many others.

The third step is to create automated responses to the threats listed in step two, according to CCSS. For example, receiving immediate notification of a failed inactive journal receiver will reduce the chance of a failure during a high availability (HA) switchover or during an attempt to recover the server on a disaster recovery (DR) box.

The fourth step builds on step three, and asks the reader to think about some things he can do to bolster automation. This could include adding a systems management tool to further automate the delivery of notifications. It’s understandable that CCSS would position its own tools, including QSystem Monitor, QMessage Monitor, and QRemote Control, as solutions for step four.

The final step in CCSS’ best practices guide is an extension of step four, and involves identifying all the server components that must be continually monitored, including IBM i objects, communication devices, and jobs. CCSS’ claim that “high visibility” is a “prerequisite to high availability” will really hit home to any IBM i shop that has experienced a disaster.

Holistic monitoring is essential to having a good availability strategy. “Applications, jobs, subsystems, and communication elements are all areas where a particular problem could quickly escalate into a problem of availability,” states CCSS product manager Paul Ratchford. “Monitoring these areas with a pro-active approach reduces the risk of a true downtime situation and virtually eliminates those circumstances where a problem may be replicated in a new environment following an unforeseen downtime event.”

The new best-practices guide is the third of the year for CCSS, which also published a guide on security in January and on job monitoring in February. The new guide can be viewed at CCSS’ website at www.ccssltd.com.