But Wait, There's More
Some More Thoughts on the Future Power6 Chips from IBM
Last week, I told you about how I had heard a rumor that Power5+ processors were imminent in September or October and that IBM had achieved first silicon on the future Power6 processors about a month ago. I went on to speculate a heck of a lot on what the Power6 chips might look like. (You can read that original story by clicking here.
The more I think about Power6, the more I think that the Power6 design will incorporate not just a new cache hierarchy as I described, but dedicated processors for handling special functions and that it will not include more than two Power cores. I am not entirely sure that i5/OS has the software features activated yet, but the Power5 processors incorporate a feature called Fast Path that allows TCP/IP networking, as well as the Message Passing Interface (MPI) protocol used in supercomputing clusters, to be offloaded from the Power cores to specialized circuits. AIX and Linux support these features, and it allows these boxes to do more work than they might otherwise do. IBM could add different circuits to accelerate common routines that bog down the operating system--which is something it can do because it owns OS/400 and AIX and Linux is open source. Mainframes have had hardware-assisted database processing for years, and it seems like a good idea, particularly for the iSeries. Accelerating Java virtual machine functions is probably also a good idea.
A few years back, there was some talk about IBM adding vector processors to augment the already substantial floating point processing capability of the Power chips. (Each Power chip has two cores, each with two floating point units that are each capable of two instructions per clock cycle. That's eight floating point ops per clock per socket, peak performance, which ain't all that shabby.) But as the "Cell" PowerPC hybrid processor that IBM has created in conjunction with Sony and Toshiba demonstrates, such an approach is not only possible, but has been done. The AS/400 was the first asymmetrical processor in the world, using what were essentially modified Motorola 68K processors as its main processors and distributing I/O and other functions to auxiliary processors that were called I/O processors, or IOPs. Such an approach with Power6 would do inside chip packaging what the AS/400 did at the system bus level. Same idea, slightly different implementation.
There is also a slight--ever so slight--possibility that IBM will goose simultaneous multithreading (SMT) with the Power6 chips, perhaps going from two virtual threads per core to four virtual threads per core. IBM's implementation of SMT is delivering 35 to 40 percent performance over a non-SMT cores, which is just stunning when you think of the fact that Intel could only get at best about 20 percent improvement with its SMT implementation, which it calls HyperThreading. Intel's HyperThreading was a boon for marketing and for certain workloads, but not so much for performance on applications with limited threads, and that is why Intel's future chip architecture will not have HyperThreading. However, Sun Microsystems is going the other way with its "Niagara" processors, adding four threads per core and eight cores per chip to create a 32-threaded chip. These Niagara cores--as well as the future "Rock" processor cores--are based on a stripped down UltraSparc-II cores--not UltraSparc-III or UltraSparc-IV. As Thoreau once said, "Simplify." I think there is a very good chance that Power6 may have and Power7 will have many threads per core--maybe four--and maybe two or four cores. By late 2007 or early 2008, software, including operating systems, might be able to take advantage of more threads, whether the come virtually through SMT or through increasing core counts. If IBM can virtualize the threads in the Power6 and Power7 cores again as it did with the initial Power5s using SMT, doubling thread count to four virtual threads per core, it can get one SMT core to do the work of two real, non-SMT cores. That would be a neat trick, indeed.
No matter what IBM does with Power6 and Power7, let me give you some good advice: thread your applications like crazy. Because clock speeds will go down long before they will go much higher than 4 GHz. The thermal ceiling is creating a clock ceiling for processors, and we had all better get used to it.
pSeries Customers Are Adopting Logical Partitions in Droves
Sources at IBM say that the advent of more sophisticated, dynamic, logical partitioning that became available last summer on its Unix servers with the rollout of the Power5-based p5 servers and the AIX 5L V5.3 operating system, is fostering broad adoption of the partitioning technology among its Unix customers.
To give some sense of the adoption of the logical partitioning technology, called the Virtualization Engine on both the p5 and related i5 platform (which runs the proprietary OS/400 operating system), IBM has divulged some market statistics from its North American sales of the p5 boxes. Virtualization Engine microcode allows up to 10 logical partitions to be created on a single processor and for a logical partition to span from one to the maximum number of processors in a particular server. On entry level p5 machines (by which IBM means the p5 510, 520, and 550), about 20 percent of the machines shipped are going out of its factories in Rochester, Minnesota, and Dublin, Ireland, with logical partitions set up. In the midrange p5 570 server, which scales from two to 16 processors in a single system image, the adoption rate is running at about 45 percent for partitioning, and on bigger p5 590 and 595 boxes, the penetration is 100 percent. Customers who do not activate Virtualization Engine can do so in their shops; IBM doesn't have to preconfigure it. There are nominal licensing fees for the partitioning capability, which vary depending on the system.
IBM Consolidates Grand Slam Systems Down to pSeries Boxes
IBM has been a sponsor of the Grand Slam tennis events for the past 14 years, and this year, once again, it is rejiggering the systems in use by Wimbledon, the US Open, the French Open, and the Australian Open. While this week the upset of number four seed Andy Roddick (and returning US Open champion) by number 58 seed Giles Muller in straight sets (7-6, 7-6, 7-6 with three tie breakers) was the big news, IBM is bragging about the big iron in small packages that it is bringing to bear to support the US Open's Web sites. I was one of the lucky few who watched the Roddick-Muller match as part of a tour of the IBM systems behind the US Tennis Association's facilities in Flushing Meadows, Queens.
IBM sponsors the Grand Slam events, the Master's golf tour, and the Tony awards for a number of reasons, just as it used to do with the Olympics before the Olympics Committee started making demands that Big Blue felt were not justified given the benefits to IBM of being the technology supplier to the summer and winter games every four years. In exchange for lots of direct and indirect advertising--the IBM serve speedometer is in just about every shot, and prominent IBM ads are all over the US Open's Web sites and TV coverage of the US Open--the sponsorship also gives IBM a chance to use its latest technological wonders in a production environment that really stresses the systems. In 2003, the Grand Slam cycle profiled IBM's Tivoli Intelligent Orchestrator, which IBM used to dynamically provision Linux server capacity on the various xSeries systems that drive the scoring systems and Web sites. In 2004, IBM consolidated the scoring systems and Web content management systems that had been running on various Linux servers to a loosely clustered pair of i5 520 servers with three Linux partitions each. These servers and their associated storage and networking are shipped around the world in metal racks on wheels and shared by the Grand Slam events. Because of bandwidth requirements for linking the scoring and content management systems to local users at the tennis events, IBM cannot deliver these functions remotely, although it has thought about how it might do it. But the whole server front end of the US Open sites can be run elsewhere, and this year, it is.
IBM has carved out three areas in three data centers in the United States to act as the Web farms for the Grand Slams; they are located in the East, West, and Central parts of the county, and IBM won't be more specific about where they are. In each one of those centers, there are clusters of Linux and AIX servers that run WebSphere middleware that drives the US Open Web sites, which had 2.8 million visitors (spending an average of 90 minutes on the site) and 15.4 million page views in the two weeks of the US Open last year. Traffic is up maybe 50 percent so far this year, and IBM's internal budget for the event sure isn't. So server consolidation is the theme this year.
You can't take server consolidation too far, however. Although this may seem illogical at first, having three data centers is better than two because you can split the aggregate workload between two centers and have one as a hot spare for failover in the event one fails; with two data centers, you would need two fully configured setups that act as hot spares for each other. Last year, these data centers each had eight xSeries Web servers, four pSeries Web logging servers, and two pSeries application servers. This year, the workloads of one of these centers was moved to a four-core p5 550 and an eight-core p5 570, which each have Linux and AIX partitions dynamically allocated as workloads change. The new workload with the US Open is called Point Tracker, and it is an animated presentation of the tennis ball's movement as each game is played so people who cannot watch a match on TV can see what the ball is doing on their PC screens and see what the shot-by-shot game looks like. It is akin to an old radio announcement with text and pretty pictures instead of a voice. Whether or not fans think this is great (and most tennis fans seem to so far), one thing is for sure: Point Tracker burns up a lot of computing power. So do the polling, player stats, and video production applications that run on these machine. The i5 520s will store the database behind Point Tracker, which is built in real-time using cameras mounted in the two tennis arenas that literally track the ball, while the p5 machines will display the graphical representation of the ball's course throughout a game. Any excess capacity in these systems is being pumped into the World Community Grid, which IBM founded a few years ago to give free capacity on its systems to medical research organizations fighting cancer, AIDS, and other diseases.
"Cell" PowerPC Partners Try to Round Up Support for the Chip
In an effort to try to drum up licensees for the "Cell" derivative of the PowerPC architecture, IBM, Sony, and Toshiba, the three partners who co-developed the Cell chip, have released a flurry of technical specs to explain in detail the chip's architecture, its application binary interfaces, and how its C++ and Assembler compilers work. (If you want to read stuff that is way over my head, take a look at these documents at http://cell.scei.co.jp/e_download.html.) The official name of the architecture is the Cell Broadband Engine Architecture (CBEA), and the three are pitching the Cell chip for game consoles, multimedia devices, and perhaps workstations and clusters for supercomputing jobs.
The 64-bit PowerPC core at the heart of Cell is somewhat simplified in that it does not, like the past many generations of PowerPC and Power processors from IBM, support out-of-order instruction execution. It has 32 KB each of instruction and data cache, a 512 KB L2 cache, and a Rambus XDR memory controller and I/O interface all implemented on the chip. The chip also includes eight synergistic processor units (SPUs), which are in effect vector math co-processors to boost the performance of streaming media and video calculations; they can do integer and floating point math, by the way. The Power core and the SPUs are connected to each other through a high-bandwidth element interconnect bus (EIB). Multiple Cell chips can be glued together to create compute clusters.
A single Cell chip has 235 million transistors, but thanks to the 90 SOI nanometer process IBM is using to make it, only cranks out about 80 watts as it delivers 256 gigaflops of number-crunching power when it runs at about 4 GHz and at 1.2 volts. Cutting the clock speed back to 3 GHz could drop the voltage and the heat way back--maybe as low as 50 to 60 watts running at 0.9 watts, but still delivering 192 gigaflops.
The question is this: What can you do with a Cell chip? Well, it would make a very good encryption co-processor or a special co-processor for handling any streaming media, that is for sure. There are undoubtedly other possibilities. Like running AIX and OS/400 natively on it and using the vector units to somehow goose database performance. But the chip may not have support for the special memory tags that OS/400 requires, and IBM has not said that it will port AIX to Cell, even though it technically can.
Usage of Perl, Python, and PHP Tools Declines in EMEA, Says Evans Data
In a stunning development that either signals a shift in open source in Europe regarding Web programming or a bad sampling of survey subjects, a new report by Evans Data indicates a dramatic drop off among European companies in regards to the deployment PHP, Perl, and Python in their software development projects.
According to the survey, the number of developers using PHP for application development dropped by 25 percent in the past year and the number of developers who said that they would not use or evaluate the use of PHP in their software projects rose by 40 percent. Perl usage also dropped by 20 percent and the number of developers who said they would not evaluate or use Perl in future projects also rose by 20 percent compared to this time last year. The Python programming language did not fare any better, with usage down 25 percent in the past 12 months and 17 percent fewer developers saying that they intend to use or evaluate the use of Python in future projects. John Andrews, Evans Data's chief operating officer, attributed the declines to a lack of enterprise-class support from platform providers (server, operating system, and middleware) for the Three Ps of open source. The survey polled 400 developers in EMEA in June.
It could be that what Web developers want is an integrated development environment that weaves together PHP, Perl, and Python in a single tool with enterprise-class support. Any takers to make the Three P Integrated Development Environment? Start with Eclipse and weave it in? Before you jump in, though, consider this: These tools have overlapping functions, are not necessarily easy to use, and perhaps cannot be integrated in an intelligent way. Maybe European developers really want .NET or Java? Or Cold Fusion/Flash/PDF? Or to work in management? Longer vacations than they already have?
FujiFilm Goes WORM with New SuperDLTtape II Cartridge
Longtime computer tape cartridge manufacturer FujiFilm announced a new line of FujiFilm-branded Super DLTtape II media cartridges last week. The media, which is designed for use with Quantum's SDLT 600 tape drive, and which retails for around $100, offers native capacity of 300GB (or 600GB with 2:1 compression) and supports drive transfer speeds of up to 36MB per second. When used with Quantum's DLTIce functionality, any new, unused Super DLTtape II can be turned into a Write Once Read Many (WORM) cartridge, which many companies are finding necessary to comply with new industry mandates. FujiFilm says its new Super DLTtape II cartridge is based on its proprietary ATOMM (Advanced Super Thin-layer & High Output Metal Media) technology, which incorporates a nonmagnetic lower layer with an ultra-thin upper layer of high-energy metal particles applied simultaneously to a base film. This manufacturing technique results is extremely low self-demagnetization, increased high-frequency output, and significantly higher recording density, according to the company. FujiFilm and Quantum also companies a new initiative to adapt the SDLT 600A cartridge for use in the broadcasting industry, where high-definition television programming is driving a big demand for storage.