How Does 800,000 CPWs in a 2U Server Grab You?
November 30, 2009 Timothy Prickett Morgan
At the SC09 supercomputing trade show in Portland, Oregon, last week, I got a tour of the very first Power7-based server that IBM is talking publicly about. It is called the Power7 IH supercomputer node, and it is one of the 2,048 nodes in the “Blue Waters” massively parallel machine that Big Blue is building for the University of Illinois.
There is not a lot of i5/OS angle to this machine, I will admit, but some of the nifty chip and networking technologies that IBM is deploying in Blue Waters can–and probably will–be deployed in commercial servers running AIX, Linux, and i operating systems.
As I explained back in September, IBM is taking four eight-core Power7 chips and integrating them on a single piece of ceramic to create a monster chip with 32 cores and 1 teraflop of aggregate number-crunching power running at 4 GHz. In the picture to the right, the chip package on the left is this multichip module. This MCM package has 512 GB/sec of aggregate memory bandwidth and 192 GB/sec of I/O bandwidth. Each MCM has 16 DDR3 memory modules allocated to it, according to Alan Brenner, a senior technical staff member of the server and network architecture team within IBM’s Systems and Technology Group who walked me through this monster node. That is 4 GB per core. IBM is able to cram a 16-socket drawer (eight processor MCMs, eight switches, and 1 TB of main memory) into a space that measures 2U high (just like most workhorse servers, which are 3.5 inches), but which is 39 inches wide (not 19 inches like normal servers) and is 72 inches deep.
The other neat part of the Blue Waters machine is a hub/switch module, which is next to the Power7 MCM in the picture above. This hub/switch fits into the same-sized socket as this Power7 IH processor MCM (they both have 5,336 pins). This hub/switch module takes the best ideas from IBM’s “Federation” supercomputing switches (which were used in the ASCI Blue and ASCI Purple supers) and InfiniBand, and mixes them up to create a new, flat switch interconnect that can link 2,048 drawers together into a reasonably flat and low-latency network. The hub/switch provides connectivity between the right Power7 nodes on the board through the motherboard underneath it all, and provides optical links out to adjacent drawers, which are clustered into four-drawer units called supernodes. Blue Waters is 512 of these supernodes linked together.
Take a look at the Power7 IH drawer, which weighs just over 300 pounds fully loaded and has 16 PCI-Express 2.0 x16 slots off to the left. Power supplies and water-cooling links are on the right, with memory banks wrapping around the processors, and the hub/switches sandwiched in between the left memory bank and the I/O slots.
While I like processing as much as anybody, it was the hub/switch that made me stop and think. Brenner explained that the hub/switch module delivered a total of 1,128 GB/sec of aggregate bandwidth. The host connection between the Power7 MCMs inside a single drawer is rated at 192 GB/sec, with another 336 GB/sec of connectivity to the seven other local nodes on the drawer. There is also 240 GB/sec of bandwidth between the nodes in a four-drawer supernode, and 320 GB/sec dedicated to linking nodes to remote nodes in the entire Blue Waters machine. And because there needs to be a way to talk to disks and such, IBM tossed in another 40 GB/sec of general purpose I/O bandwidth.
Imagine taking a single drawer of this box and using the hub/switch as the backbone of a SMP/NUMA server. You’d have 1 TB of memory and around 800,000 CPWs of aggregate transactional processing capacity. Cut the memory in half and slap in some flash storage to interface with external disk storage that links in through the substantial bandwidth leading out of the drawer, and you could create an incredibly powerful transaction processing engine.
I can imagine one of those Power7 MCMs replacing an entire Power 570 box using Power6+ chips. It might fit into a 1U server with lots of room to spare and deliver around 100,000 CPWs of AS/400-style power.