MetaRAM Quadruples DDR2 Memory Capacity in Servers

March 3, 2008 Timothy Prickett Morgan

There are a lot of different ways to skin a system design, but the one that has been favored by processor and system designers alike for the past three decades is to boost the clock speed on the processor as high as possible, make each clock do more work, and add layers and layers of progressively faster main memory to the box to keep those faster processors fed. There are a number of problems with this approach, and they have caused a performance and capacity gap between central processors and memory subsystems that is still not being properly addressed. But a startup company called MetaRAM, which has just come out of stealth mode, thinks it has come up with an clever solution that will get CPUs and memory subsystems more in synch.

The first issue that modern systems have is that clock speeds on the CPU, which can range from 2 GHz to 3 GHz or so on X64 architectures to over 4 GHz on IBM’s latest dual-core Power6 and quad-core z6 mainframe processors, are far out stripping the speeds of DDR2 main memory, which runs at 400 MHz, 533 MHz, or 667 MHz. This gap between processor speed and main memory speed is why chip makers started adding expensive Level 1 cache memories to their chip designs in the 1990s, first off the chip inside the packaging, and then bigger L2 caches were added to most chips. These caches store memory that would normally be fetched from main memory, out on the other side of the main memory controller, very close to the chip. Most servers today employ an L3 cache, either on the processor itself or near it on a ceramic package, and some even have L4 caches. These caches make up the lion’s share of the transistor budget on a processor chip, which decreases yields and makes chips more expensive to manufacture.

The other issue that system designers have is that clock speeds have hit a wall on processors because of power and cooling issues, which has necessitated a shift toward multicore processors. The basic idea is that if you cannot get a single processor to go even faster because they generate more heat than work, then the obvious thing to do is to keep processor clock speeds constant, add as many cores on the chip as possible giving the transistor technology of the day, and then figure out how to make that multithreaded chip do more work. With server virtualization very low and a lot of commercial applications being relatively monolithic (they only use a few threads, so there is a limit to how many cores a job can actually use), the main selling point for multicore processors so far in the 21st century is that you can use a virtualization hypervisor and consolidate multiple, virtualized application stacks onto a single physical server.

There’s only one problem with this. While server processors are historically underutilized, perhaps running at anywhere from 5 percent to 20 percent on infrastructure or application server workloads on X86 and X64 boxes and maybe as high as 50 percent to 60 percent on big RISC boxes running central databases, no matter what kind of machine, no matter what workload, memory systems on the box are almost always running at 100 percent. The reason is simple: Memory is the most expensive part of the system, and companies deploy as little as possible to get something approximating balanced performance out of their boxes. And here’s the rub. When you can consolidate two, three, or four virtualized server images onto a single X64 or RISC machine, and thereby drive utilization up into the mainframe-class range of 80 percent or higher, you can’t do it on a box that doesn’t have two, three, or four times the memory. (And to be honest, the overhead of the virtualized environment itself usually requires some memory of its own to boost performance acceptably.) And that means you have to buy a much heftier server to virtualize than you might otherwise do based on processor performance because of memory capacity issues.

This, as you might imagine, has server makers and memory suppliers giggling quietly with glee. You can reduce your server footprint, but boy is it ever going to cost you. Well, unless you start pressing your server maker to certify MetaRAM’s new MetaSDRAM DDR2-alike main memory, that is. And simply put, MetaRAM is doing for main memory what symmetric multiprocessing did for processing decades ago. The company has created a chipset that allows gangs of low-density DDR2 DRAM chips to be piled up and simulate denser and radically more expensive memory modules; SMP electronics allowed multiple processors to be linked through their memory buses to present a single, virtual, and more powerful processor on which applications could run. What MetaRAM has done sounds simple enough–perfectly obvious in hindsight, like a lot of technologies–but DDR main memory controllers are designed to talk directly to DRAMs and they do not have a lot of tolerance for a go-between. The MetaSDRAM chipset has to look like a DRAM chip to the memory controller and look like a memory controller to the DRAM, even though it is neither; and it has to present a profile of an 8 GB or 16 GB DDR2 DIMM to memory controllers and do so perfectly.

MetaRAM was founded two years ago by Fred Weber, the former chief technology officer at chip maker Advanced Micro Devices and one of the key designers of the 64-bit Opteron processor and its HyperTransport interconnection. Weber is the chief executive officer at MetaRAM, which just came out of stealth mode last week and which has done a very short ramp from slideware presentations to venture capitalists to finished memory products. According to Suresh Rajan, vice president of marketing at the company and Weber’s co-founder, the MetaRAM engineering team had specifications for MetaSDRAM within six months of founding the company and had samples back from its fabrication partners within a year. Rajan says that MetaRAM has secured $20 million in two rounds of venture capital–the first round from Kleiner, Perkins, Caufield, and Beyers (the VC firm in Silicon Valley), Khosla Ventures (established by Vinod Khosla, one of the four founders of Unix server giant Sun Microsystems), and Storm Ventures; in the second round of funding in May 2007, Intel Capital kicked in some dough. The company demonstrated the first MetaSDRAM in July 2007 and went into initial A0 stepping on production units with its partners in November 2007.

Like Violin Scalable Systems, which introduced a memory appliance at the end of 2007, MetaRAM has seen the wisdom of trying to make fatter memory capacity for servers out of cheaper and less dense DRAM chips. The reason is simple: money. The volume DRAM chip in the DDR2 memory space today is the 1 Gbit device, which costs somewhere between $2 and $3. This is the memory chip that is used in 1 GB memory sticks these days and that cost hardly anything; ganging these up to make 2 GB and 4 GB chips is more costly, and therefore the resulting DIMM memory modules are more expensive. But if you want 8 GB DIMMs, you have to use 2 Gbit DRAM chips, and these cost somewhere between $30 and $40 a pop. That makes an 8 GB DIMM cost about $15,000 or so, if you can find one. But by moving back to 1 Gbit DRAMs, lashing them together with a chipset and plunking them into sophisticated packaging that allows them to fit into the same DDR2 main memory slots on servers and workstations, Rajan says that working with Taiwan Semiconductor Manufacturing Corporation and Amkor Technology to make the three-dimensional DRAM packages, chipsets and associated chips that go on the MetaSDRAM package, and Hynix Semiconductor and Smart Modular Technologies to make the DIMMs, MetaRAM can deliver a memory module that looks exactly like an 8 GB module to a DDR2 main memory controller for $1,500 a piece. This is very cool, economically speaking.

Here’s the effect of such economics. Take a four-socket X64 server using quad-core Intel Xeon processors and load it up with the maximum main memory of 256 GB. That will set you back about $500,000, and $480,000 of that will come from main memory because you have to use 8 GB DIMMs to get there. Now, take the same server and plunk in the MetaSDRAM 8 GB modules distributed by Smart Modular–and probably soon by Avnet and Arrow Electronics and maybe the tier one and tier two server vendors, is my guess–and the resulting machine costs under $50,000. That’s some big savings.

I know what you are thinking. What about power consumption? Isn’t the whole point of using denser memory chips that they can increase memory capacity and stay within the same power budget? Well, if MetaRAM engineers had not invented a little algorithm called WakeOnUse, they would not be able to plug into servers. The WakeOnUse algorithm, which is embedded in the MetaRAM chipset, allows DRAMs that are not in use to be set to a lower power state and then awakened as they are needed. The net effect of this more efficient use of DRAMs is that a MetaSDRAM chip with 72 1 Gbit DRAM chips can stay in the same power budget as an 8 GB module based on 2 Gbit DRAM chips.

MetaRAM has created two chipsets for its memory modules. The MR08G2 chipset has a single AM150 access manager and five FC540 flow controllers; using 1 Gbit DDR2 registered DRAM, this chipset can look like an 8 GB memory module. This collection of electronics is in production now. Memory modules are currently available through Smart Modular and have been certified on “Santa Rosa” dual-core Opteron processors and “Clovertown” and “Harpertown” quad-core Xeons, which support DDR2 main memory thanks to Intel’s new 5100MCH “San Clemente” chipset for servers and workstations. MetaRAM is also putting the finishing touches on the MR16G2 chipset, which has two AM160 access manager chips, nine FC540 flow managers, and which presents an image of a 16 GB DDR2 memory module to machines. This latter chipset is being qualified right now, and stacks up 144 DRAMs on a single DIMM.

Now, this technology is not limited to X64 processors, according to Rajan, and this is where it gets interesting. This is a matter of qualification, of course, but any server that is using DDR2 main memory that has a controller that supports high-capacity memory modules can put MetaSDRAM in their machines. Rajan says that MetaRAM has an eight-socket Sun Fire server made by Sun running Solaris and supporting its memory modules running in the labs. Similarly, the very pricey DDR2 memory cards made by IBM, Hewlett-Packard, and others could be fitted with MetaSDRAM modules and radically increase their capacity while at the same time radically reducing the price on main memory. “As long as the memory controller can talk to DDR2 registered DIMMs, it should be possible to plug them right in to any machine,” says Rajan. Considering the price that server makers charge for their memory, someone should fork over a few grand and give MetaSDRAM a test run on big, fat, expensive server memory cards. They could end up making a killing on upgrades.

Considering that main memory is the profit margin on servers these days, you gotta wonder if the memory makers and the server makers, who want to be able to charge big bucks for memory modules based on 2 Gbit DRAMs, do not have a bounty out on Weber and Rajan. But, a number of vendors have seen the light. Appro, Colfax International, Rackable Systems, and Verari Systems and are all expected to have MetaSDRAM available for their servers in the first quarter.

The question now is this: Does this really screw up the 2 Gbit DRAM memory market?

Violin Announces Memory Appliance Server

                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot