PA Semi Samples Homegrown Dual-Core Power Chip
Published: February 6, 2007
by Timothy Prickett Morgan
Startup clone Power chip maker PA Semi announced yesterday that it has completed the design of its first dual-core Power variant, the PWRficient PA6T-16228M, and that its fabrication partner is now sampling chips to potential customers. PA Semi has also put together a reference board for potential customers to put the chip through the paces with a number of different operating environments, including Linux, VxWorks, and Neutrino.
PA Semi is a 3-year-old chip designer that burst onto the scene in November 2005, when it revealed that it was working on a line of chips that were compatible with IBM's and Motorola's 64-bit PowerPC and Power processors, but which would consume a lot less energy and have a lot of the features normally not put on motherboards pulled down into the chip. The company was founded by Dan Dobberpuhl, the lead chip designer on Digital Equipment's Alpha line of RISC processors and the StrongARM line of power-efficient processors, and Jim Keller, also a DEC Alpha chip designer and the co-author of the "K8" or Opteron 64-bit architecture from Advanced Micro Devices. Dobberpuhl is PA Semi's president and chief executive officer, and until last September, Keller was vice president of engineering; he has left to pursue other interests. PA Semi hired Pete Bannon, who worked on the Alpha EV5 and EV7 chips then moved to Intel to work on the design for the "Tukwila" four-core Itaniums that are due in 2008, to be vice president of architecture. Venture capitalists have pumped $86 million into PA Semi to date.
The PWRficient family of chips is exactly the kind of chip that might have kept Apple Computer from jumping to Intel's Core 2 processors for its laptops, desktops, and servers. But unfortunately, IBM's PowerPC designs did not push power efficiency the way that PA Semi's did, and PA Semi's first chip, the dual-core PA6T-1628M, will not be shipping in volume until the fourth quarter of 2007. It will take PA Semi even more time to get a quad-core and low-cost single core PWRficient chips into the field. Apple could not wait that long to get more efficient chips into its machines.
PA Semi is a member of the Power.org development consortium, which was founded by IBM and now includes Freescale Semiconductor, the Power chip arm that was spun out of Motorola. The PWRficient chips are binary compatible with the current Power instruction set architecture version 2.03, and also includes some features of the version 2.04 and 2.05 specifications, including future memory management and hardware-assisted virtualization that will debut in IBM's own Power6 processors later this year.
Given all of this, there is a slight chance that Apple could adopt PA Semi's devices at some point in the future, since Mac OS X will run on them provided that PA Semi gives Apple a tweaked FreeBSD kernel. Given the performance and power efficiency that PA Semi is talking about, Apple and any vendor of Linux machines would do well to take a look--and IBM should probably think about using PWRficient chips in AIX and Linux machines. The PWRficient chip architecture does not include the tagged memory feature used in the PowerPC AS superset of the Power instruction set, which means the i5/OS operating system that runs on IBM's System i5 midrange servers cannot run on the chip.
"Eighteen months ago, we made a PowerPoint announcement," says Bannon. "Today, we have samples and we will meet targets for production silicon."
While the first PWRficient chip is a dual-core processor, the family will eventually scale from one to eight cores in a single chip. The PWRficient design scales from one to four integrated DDR2 main memory controllers on chip, as well as 64 KB data and instruction caches on each core and on-chip, shared L2 caches that are configured as arrays of up to 8 MB. The chips will also have 128-bit VMX vector coprocessors (just like the Power6 will) and one floating point unit per core. The PWRficient chip has an intelligent I/O bridge called Envoi and an interconnection fabric (sometimes called a crossbar) on the chip called Conexium that links the cores, L2 cache, main memory, and integrated I/O bridges together. The Conexium crossbar can be extended to support up to eight cores, four 2 MB L2 caches, and four DDR2 main memory controllers. To create a four-core chip, two dual-core chips are put on a single piece of silicon and their Conexium fabrics link together to create what is essentially a baby SMP server. The PWRficient chips support IBM's Virtualization Engine hypervisor with the same electronics that will be in the Power6 chips, and also include accelerators for RAID, iSCSI, PCI Express I/O, TCP/IP Ethernet workloads. Like other PowerPC processors, the PWRficient chips support both current 64-bit and legacy 32-bit mode operations. The initial dual-core PA6T-1628M processor has around 200 million transistors.
The PWRficient chips are implemented in a 65 nanometer process (with copper, low-K, SOI, and other technologies) and are manufactured by an unknown chip foundry in the United States, which PA Semi will not divulge. Since Freescale is a competitor, it is probably not Freescale. It could be IBM (and probably is, since IBM wants to ramp up production in its East Fishkill, New York, fab as much as possible), and it could be Texas Instruments. Intel is almost certainly not the foundry for PA Semi--but that would be funny, and given the founders' strong ties to Intel, it is even possible. The PA-6T chip uses its own socket design--PA Semi had looked at using an Opteron socket, but decided against it--that has 1,156 pins.
The dual-core PA-6T chip supports four logical partitions per core, and has features that will allow the chip to be hard partitioned into two separate processors, complete with their own I/O and auxiliary co-processors. That support could allow integrated hot-standby clustering for embedded workloads, which in turn would allow operating systems and applications running on the chips to be upgraded on one side, then brought back online and resynchronized so the other half can be upgraded. Such a rolling upgrade of software would keep the system online, which is important in many embedded applications.
According to initial tests performed on the PA6T-1628M, the chip has some significant advantages when it comes to thermals compared to the current generations of 64-bit, dual-core chips from IBM, AMD, and Intel using 65 nanometer technologies. Depending on the workload, the dual-core PA6T chip itself runs at between 5 watts and 13 watts, at a 2 GHz clock speed. This is four to five times more efficient than other PowerPC or X64 processors running at the same 2 GHz clock speed, according to Bannon. When you add in the heat from I/O features of other PowerPC or X64 processors to the chips, the differences are substantial. The dual-core PA6T consumes about 25 watts peak, including I/O, which is a lot lower than the dual-core PowerPC 970MP from IBM at 100 watts. AMD's dual-core Athlon X2 rated at 68 watts gets close to 100 watts once you add in I/O chips, and even the Core 2 Duo from Intel, which has very good thermals at 35 watts, and burns 80 watts when you add in the I/O chips. In embedded markets, a dual-core PA6T chip scaled down to 1.5 GHz would burn 13.7 watts, compared to around 68 watts for Freescale's future dual-core 8641D processor (including heat from 10 Gigabit Ethernet ports).
How did PA Semi get such good numbers? By gating the clocks in over 25,000 features on the chip, compared to hundreds in other chip designs. In simple English, the PA Semi design allows very fine-grained dialing down of clock speeds for registers and other features that are not in use, which means overall power consumption is pulled way down. Even running full out on a workload, CPU cores do not stress all components of the chip.
While this is simple to say, it is hard to do, according to Bannon, which is why no one else has done it--yet. "Clock gating is the starting point of our design, and it is at the core of how the chip is built," says Bannon. "We designed the chip around this idea, and it is really hard to do."
While designing a chip, even one based on an already-developed instruction set, is difficult, getting customers for it is perhaps the hardest part. But, PA Semi has seen some traction on this front. This month, it will ship a reference board code-named "Electra" that includes one dual-core PA-6T chip on a Micro ATX motherboard with three PCI-Express slots, one PCI-X slot, and two ports linking into the on-chip Gigabit Ethernet ports. This board will cost $8,500.
Once you have hardware, you need software, so PA Semi has tweaked the Linux 2.6.17 and 2.6.18 kernels so they can support the PA-6T 1628M processor and its on-chip features. The company has already loaded the Ubuntu distribution on top of these tweaked kernels on the Electra reference boards. The Neutrino real-time operating system from QNX Software Systems and its related Momentics Development Suite have also been ported to the PA-6T platform, and so has the VxWorks 6.2 and 6.3 real-time operating system from Wind River and its Workbench development tools.
Bannon says that PA Semi has had 100 customer engagements since launching the chip 18 months ago, and that it has 10 customers who are going to use the board, including a lot of embedded systems designers who work on defense contracts, telecom systems, storage products, embedded controllers, and, yes, game consoles.
Looking ahead, PA Semi will get a single-core PA-6T processor out the door with samples by the end of 2007 for production in 2008. This chip will include on-chip support for SATA disk drives and USB ports, and will run in the range of 5 watts to 10 watts. The quad-core PA-6T chip will sample in 2008 and will be pin compatible with the dual-core PA-6T chip. Further out, PA Semi will bring out an even less expensive single-core variant of its PA-6T. For any of these chips, to get a working system, all you need to do is add a flash drive, some DDR2 main memory, a PA-6T chip, and a graphics chip and it is ready to go as a complete system.
Bannon thinks that taking on the embedded market is the best course to start, but is hopeful that Linux support will encourage server and workstation designers to opt for the chip. "I think it is going to take some time to get there, though," he says. Given the power advantages of the PWRficient chips, a Linux or Mac OS laptop might be a good fit, and small form factor blade servers for compute farms might also make good sense, too.
PA Semi Divulges Its Power Processor Aspirations
Post this story to del.icio.us
Post this story to Digg
Post this story to Slashdot