The X Factor: Why Maintenance Is Worth the Money–Sometimes
September 2, 2008 Timothy Prickett Morgan
There are a lot of checks that IT managers and chief financing officers do not like to write when it comes to the IT budget, and hardware maintenance is one of the checks they increasingly dislike writing. There are good reasons for this–and there are also good reasons for writing that check. Equally important, IT shops have to be smart about what they put on hardware maintenance, and how they deal with break-fix issues in general as they relate to hardware.
But before getting into why hardware maintenance is something of value and therefore worth a piece of the IT budget at all, it probably makes some sense to talk about why people are annoyed about having to pay for maintenance in the first place, which can be anywhere from hundreds of dollars to hundreds of thousands of dollars a year, depending on the server.
One or two decades ago, when your server hardware platform provider–we called them systems back them, you will remember–was almost certainly the provider of the operating system and much of the systems software for that platform. You paid for the hardware and the software separately, but the maintenance bill–including break-fix support for the hardware, onsite replacement and installation of parts (in many but not all cases), and technical support for the systems software on the machine–were included. You weren’t really paying so much for hardware maintenance as system maintenance, and as a percentage of the overall system cost–maybe a few percent per month–it didn’t seem like an economic burden.
Fast forward 10 or 15 years, and the performance of systems has grown by a factor of five to 10, depending on the architecture, and the base price of machines has dropped by about a factor of 10. But the cost of providing onsite technical support has not come down–it has gone up by about the rate of inflation. And now, the hardware is generally sold separately from the software, and both hardware and software have their own maintenance fees and warranties. To make matters worse, a server a decade ago had components that were considerably less reliable–by modern standards–and were much more likely fail in the field. Most of the vendors who make raw components have warranties to the manufacturers that last five years or more, but until recently servers and PCs were being sold with only a one-year warranty, and only recently have three-year warranties become more standard. (Yes, this annoys customers a great deal.) All of these factors have tended to make the relative expense of hardware maintenance seem very expensive compared to the value it delivers.
That doesn’t mean that companies should not spend the dough on hardware maintenance. For one thing, when and if things go wrong within a system, individual components are still expensive to replace–a failed SCSI disk can cost anywhere from $400 to $1,200, depending on the server and the disk speed and disk capacity, for example. It doesn’t take too many failures on a machine for hardware maintenance it to make economic sense. Hardware failures are, in general, a lot less frequent than one or two decades ago, to be sure, but vendors still inadvertently use unreliable components from time to time and, as is the case with all hardware (both inside the computer industry and outside of it), components have a nasty habit of figuring out when the vendor’s initial warranty has expired. The reason machines come with warranties at all is because of the infant mortality problem–most hardware failures will take place as a machine is initially being used and a key component that has a defect fails once it comes under the strain of a real workload; as systems age, then hardware failure is mainly due to components just plain wearing out.
In any event, if a component fails and the machine is not under a support contract, then you have to fix it yourself, which takes up employee time to identify the failure, order new parts, and install them as well as the cost of the components themselves. The time to get new parts is also money, as expressed in the downtime of your machine and the cost of that downtime to your business. In an IT environment that is being asked to do more with less every year, no one has time to waste on figuring out if a hard drive is failing and how to replace it in the RAID array. It may be hard to quantify, but there is real value in making that the vendor’s problem.
The other reason companies need to pay for hardware maintenance for their mission-critical machines at the very least is because this is the only way to get access to vendor tech support to figure out what is going wrong when something is in the process of going wrong. If you don’t have a maintenance contract, you can’t get help once the initial warranty on the hardware has expired. You might be able to troll support forums and knowledge bases, as techies who self-support open source software often to these days, but that is a nerve-wracking experience.
So when something is going wrong–and as iron ages, things will definitely go wrong–the hardware maintenance contract allows the hardware support people to fight it out with the support people you have for your systems and applications software and figure out who is at fault. You get to be righteous and tell everyone to get the problem fixed, and to get it fixed now. And you get to tell your boss that the industry experts are on the problem, fixing it as fast as they can. Your, er, bases are covered.
Which brings me to the next point. Hardware and software maintenance support is about being in the loop. For instance, when a RAID controller fails at one company, then 10 more, then dozens more, the vendor selling the server that has that RAID controller will work like crazy to identify not only the problem and a fix, but also the companies who have this controller in their systems. (This happened in the early 2000s with IBM‘s iSeries line, remember? And with the single-level storage architecture of the machine, which treats memory and disk as one giant virtual memory space, having RAID sets crash meant killing the computers, not just losing some storage.) Having discovered the problem, the IT vendor tells customers who might experience the problem, but probably have not done so yet, that something bad could happen and they can deal with the situation before it gets ugly. If you don’t have a maintenance contract, you get no warning because as far as the vendor is concerned you have taken it upon yourself to maintain your own iron.
And then you have to explain this to your boss, whether you are the system administrator or the chief information officer. It is probably easier to argue to put key systems and their hardware and software components on paid maintenance.
Now, arguing for maintenance does not mean everything should be on maintenance. For redundant systems, like Web servers or application servers, this doesn’t make a lot of sense. If you have spare capacity, you are in effect already paying maintenance. And on aging iron that is already paid for and is still technically viable even if it is economically dead, you have the option of buying secondhand systems and components and keeping them onsite, ready as spare parts or redundant systems in the event of a hardware failure. This is often a much more economically prudent option, particularly on vintage equipment. Here at IT Jungle, we are on a tight budget and we have hot and cold spares for systems, as well as redundant spares sitting in a closet if key systems fail. I suspect we are not the only ones who bought some spares of our vintage systems–which work just fine, thank you very much–on eBay.