fhg
Volume 10, Number 15 -- May 12, 2010

Admin Alert: Diary of a Production System Upgrade, Part 2

Published: May 12, 2010

by Joe Hertvik

Last issue, I began discussing a Power i/Power 6 upgrade that was recently completed in my shop. This review served as a case study for discussing some techniques and pitfalls in bringing up new hardware. I'm offering this examination to document some lessons I learned while upgrading in order to help other i/OS administrators who are installing new hardware. This week, let's continue the story.

I previously documented how we started swapping out an existing production System i/Power 5 machine for an upgraded model 8204 IBM i Power 6 box with 96 Gb of memory and 4 processors. Over a single weekend, we attempted to switch processing to our Capacity BackUp machine, rebuild our development and production partitions, and then switch processing back to the new production partition.

What became obvious during the install was that we were trying to do too much in 48 hours. In short order, we had a production delay while switching over to the CBU, struggled to keep the staff rested and alert, and experienced other delays that slowed our progress. By Saturday evening, we had switched production processing to the CBU, and we had migrated our development partition to the new machine. At 11 p.m., we called it a night before starting the next step on Sunday: bringing the production partition up live.

Meanwhile, Back at the Production Partition

On Sunday, we finished restoring our existing system from a full system backup tape and made the necessary adjustments to account for the new hardware and to activate the partition. Since we had activated our CBU machine to fill in for production, making the new partition live was a three-step process.

  1. Bringing up the new production partition up as the target CBU system rather than as the source production box.
  2. Synchronizing the production partition's data with the off-site CBU that was currently running production. We had to ensure that all live data processed on the CBU was replicated back to the new partition.
  3. Switch processing roles between the CBU and the production machine, so that the new partition could take over servicing our users.

Here's how each step shook out.

Bringing up the new production partition as the CBU.

After creating the production partition and restoring its data, we IPLed the machine into restricted mode by doing the following.

1. We changed the Startup Program system value to *NONE so that on an IPL, it wouldn't fire up the production startup program. We did this by running the following Change System Value (CHGSYSVAL) command.

CHGSYSVAL SYSVAL(QSTRUPPGM) VALUE(*NONE)

2. We changed the partition's IPL attributes so that if we IPLed the system, it would automatically come up in Restricted Mode. For instructions on how to do this, see my article You Can Re-IPL an AS/400 into Restricted State.

We then tested and verified that the new partition worked. With the system ready to go, we reconfigured the production box as the target CBU system by following the switch instructions in our CBU run book.

Switching the CBU Back to the New System

By Sunday afternoon, we had the new production partition running as our CBU system. However, before we could switch production processing over to the new partition, we had to wait until the new production box finished synchronizing its data with the real CBU, which was functioning as our production partition.

Three hours later, we started investigating why the new machine wasn't finished synchronizing with the CBU. The delay was caused by running our regular Sunday morning maintenance jobs on the CBU, a process that reorganizes several large files. One job reorganized the largest file on our system, a 160 GB sales history file with 141 million records and 30 million deleted records. And it was taking forever to synchronize this file between machines.

We spent several hours on the phone with technical support. At 12 a.m. Monday morning, we decided to delay bringing the new partition up for production processing that weekend. Instead, we would fix the replication issue to allow the production box to synchronize its data with the CBU and then switch processing over to the new machine the next weekend.

Running Live on the CBU For a Week

Come Monday morning, we were still live on our CBU system, and the new production partition was functioning as the target system CBU. We were running our systems as if it were an actual disaster. This produced a few hiccups that we dealt with during the week, including:

  • Problems with running our check printing software. We discovered this problem over the weekend, but the vendor didn't have weekend support hours so it couldn't be resolved until Monday.
  • Problems with printing critical documents such as invoices, which ran on Friday night and were still on our old production machine. We saved those documents off the replaced box and restored them to the CBU for printing.
  • Our automated job scheduling software stopped working on Tuesday because the vendor had only given us a three-day key to run the software on the CBU during the weekend. We retrieved a new extended key so that we could keep running our scheduled batch software.
  • Minor configuration issues that slowed down processing until we straightened out the issue or created a workaround.

Overall, running production on the CBU for a week went fairly smoothly. It turned out to be an unexpected disaster recovery drill that we wound up passing.

Switching Over to the New Production Partition. . . Finally

A week after starting the migration, we were ready to switch processing back from the CBU to the new partition. Before doing that, we held the weekend reorganization jobs so that the replication software wouldn't have to struggle to keep current again before switching back.

Our replication software was working correctly so that the data on the new production partition matched the CBU to within 30 seconds of generation. Since we weren't overwhelming the replication software with reorganizations, we were confident that we could switch production processing over to the new partition without unnecessary delays.

The next Sunday, we began switching production processing from the CBU to the new production partition. But there was one more surprise. Our replication software wouldn't let us reverse the replication flow to make the new production partition the new source machine. After a three-hour call to the replication software vendor, we discovered there was one replication setting that was not set up correctly on the new box. After correcting the mistake, we were finally able to switch and the new production machine started servicing our users.

Lessons Learned

While our new hardware migration certainly had its share of ups and downs, it provided the following lessons that can be applied to other shops.

  1. If you're going to switch processing over to your CBU before a migration, do it before migration weekend so that you don't delay the migration.
  2. Don't plan more than you can handle in a limited time. When migrating multiple partitions to a new machine, do it over two weekends to provide adequate time. The more you schedule for a particular weekend, the more likely you are to encounter delays.
  3. When working an insane number of hours over a 48-hour weekend, schedule regular breaks for the crew. This helps keep them alert, which cuts down on mistakes.
  4. Print all critical documents such as invoices and checks before starting the migration. Don't risk holding up company processing while changing hardware.
  5. Make sure CBU keys and settings are correct. You don't want to be caught off-guard when you have to switch over.
  6. Before switching processing from a production machine to a CBU and vice versa, hold any automated jobs that produce a high number of transactions that will have to be replicated between the machines during switch back.
  7. Before the migration, contact key vendors and make sure you have off-hours contact information. High availability vendors in particular must be notified whenever there is a switch test or a migration.

While these won't be the only issues you can run into, I hope that my experience will help your next hardware migration run smoother.


RELATED STORIES

Diary of a Production System Upgrade, Part 1

You Can Re-IPL an AS/400 into Restricted State



                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot


Sponsored By
WORKSRIGHT SOFTWARE

Do you need area code information?
Do you need ZIP Code information?
Do you need ZIP+4 information?
Do you need city name information?
Do you need county information?
Do you need a nearest dealer locator system?

We can HELP! We have affordable AS/400 software and data to do all of the above. Whether you need a simple city name retrieval system or a sophisticated CASS postal coding system, we have it for you!

The ZIP/CITY system is based on 5-digit ZIP Codes. You can retrieve city names, state names, county names, area codes, time zones, latitude, longitude, and more just by knowing the ZIP Code. We supply information on all the latest area code changes. A nearest dealer locator function is also included. ZIP/CITY includes software, data, monthly updates, and unlimited support. The cost is $495 per year.

PER/ZIP4 is a sophisticated CASS certified postal coding system for assigning ZIP Codes, ZIP+4, carrier route, and delivery point codes. PER/ZIP4 also provides county names and FIPS codes. PER/ZIP4 can be used interactively, in batch, and with callable programs. PER/ZIP4 includes software, data, monthly updates, and unlimited support. The cost is $3,900 for the first year, and $1,950 for renewal.

Just call us and we'll arrange for 30 days FREE use of either
ZIP/CITY or PER/ZIP4.

WorksRight Software, Inc.
Phone: 601-856-8337
Fax: 601-856-9432
E-mail: software@worksright.com
Web site: www.worksright.com


Senior Technical Editor: Ted Holt
Technical Editor: Joe Hertvik
Contributing Technical Editors: Erwin Earley, Brian Kelly, Michael Sansoterra
Publisher and Advertising Director: Jenny Thomas
Advertising Sales Representative: Kim Reed
Contact the Editors: To contact anyone on the IT Jungle Team
Go to our contacts page and send us a message.

Sponsored Links

Help/Systems:  Robot/SCHEDULE Enterprise for UNIX, Linux, Windows & i
looksoftware:  RPG Open Access Webinar - May 18 at 10am (GMT) & May 19 at 2pm (EDT)
Essex Technology Group:  May 18-20: IBM POWER7 + COGNOS + VISION, NYC + PA + NJ


 

IT Jungle Store Top Book Picks

Easy Steps to Internet Programming for AS/400, iSeries, and System i: List Price, $49.95
The iSeries Express Web Implementer's Guide: List Price, $49.95
The System i RPG & RPG IV Tutorial and Lab Exercises: List Price, $59.95
The System i Pocket RPG & RPG IV Guide: List Price, $69.95
The iSeries Pocket Database Guide: List Price, $59.00
The iSeries Pocket SQL Guide: List Price, $59.00
The iSeries Pocket Query Guide: List Price, $49.00
The iSeries Pocket WebFacing Primer: List Price, $39.00
Migrating to WebSphere Express for iSeries: List Price, $49.00
Getting Started With WebSphere Development Studio Client for iSeries: List Price, $89.00
Getting Started with WebSphere Express for iSeries: List Price, $49.00
Can the AS/400 Survive IBM?: List Price, $49.00
Chip Wars: List Price, $29.95


 
The Four Hundred
In Orlando, Optimism Returns

Open Access for RPG Grabs Attention at COMMON

Let's Take Another Stab at Power7 Blade Bang for the Buck

Mad Dog 21/21: Hot Deals and the Cool Server Nurseries

IBM Buys Integration Appliance Maker Cast Iron

Four Hundred Stuff
Maximum Availability Unveils New HA Monitor

Profound Brings RPG:OA-Like Features to V5R3 and V5R4

.NET App Modernization Tool Unveiled by looksoftware

NGS Makes OLAP Module Easier to Use

Raz-Lee Gets the Twitter Bug

Four Hundred Monitor
Four Hundred Monitor's
Full iSeries Events Calendar

System i PTF Guide
May 1, 2010: Volume 12, Number 18

April 24, 2010: Volume 12, Number 17

April 17, 2010: Volume 12, Number 16

April 10, 2010: Volume 12, Number 15

April 3, 2010: Volume 12, Number 14

March 27, 2010: Volume 12, Number 13

TPM at The Register
BI benchmark outs HP Superdome 2 details

Xsigo scales down server I/O virtualizer

US economy adds 290,000 jobs in April

Open source R in commercial Revolution

Teradata's 2010 starts with a bang

HP software lands ex-Microsoft Windows and Office chief

SGI chills new Altix ICE supers

SGI books big(ger) loss in Q3

Cray revenues slammed in Q1

Nvidia's Fermi hits flop-hungry challengers

Cloud.com takes on virty infrastructure

TPC starts designing server virt test

THIS ISSUE SPONSORED BY:

ProData Computer Services
SEQUEL Software
WorksRight Software


Printer Friendly Version


TABLE OF CONTENTS
Development Environments

Two Ways to Prevent Division by Zero in SQL

Admin Alert: Diary of a Production System Upgrade, Part 2

Four Hundred Guru

BACK ISSUES




 
Subscription Information:
You can unsubscribe, change your email address, or sign up for any of IT Jungle's free e-newsletters through our Web site at http://www.itjungle.com/sub/subscribe.html.

Copyright © 1996-2010 Guild Companies, Inc. All Rights Reserved.
Guild Companies, Inc., 50 Park Terrace East, Suite 8F, New York, NY 10034

Privacy Statement