Friday, 29 August 2008

Multicore Multifiasco

Dave Patterson is one of the most respected names in computer architecture, and rightly so, having been one of the co-founders of the RISC movement, as well as being one of the prime movers behind RAID, among other things.

Nonetheless, when he posted an article on the potential and pitfalls of multicore designs on the Computing Community Consortion blog, I couldn't help but post the following, repeated here with a few additions.

It’s my experience that every decade or so, a packaging breakthrough allows some previously forgotten or abandoned approach to be resurrected at a lower price point, and all the previous lessons are forgotten.

Here are a few of those lessons:

  1. parallel programming is inherently hard, and tools and techniques claiming to avoid the problems never work as advertised
  2. heterogeneous and asymmetric architectures are much harder to program effectively than homogeneous symmetric architectures
  3. Programmer-managed memory is much harder to use than system-managed memory (whether by the operating system or by hardware)
  4. Specialist instruction sets are much harder to use effectively than general-purpose ones

I nearly fell out of my chair laughing when the Cell processor was launched. It contained all four of these errors in one design. Despite the commercial advantages in bringing out a game that fully exploited its features, as I recall, only one game available when the Playstation 3 was launched came close.

I hereby announce Machanick’s Corollory to Moore’s Law: any rate of improvement in the number of transistors you can buy for your money will be matched by erroneous expectations that programmers will become smarter.

Unfortunately there is no Moore’s Law for IQ.

The only real practical advantage of multicore over discrete chip multiprocessors (aside from the packaging and cost advantages) is a significant reduction of interprocess communication costs — provided IPC is core-to-core, i.e., if you communicate through shared memory, you’d better make sure that the data is cached before the communication occurs. That makes the programming problem harder, not easier (see Lesson 3 above).

Good luck with transactional memories and all the other cool new ideas. Ask yourself one question: do they make parallel programming easier, or do they add one more wrinkle for programmers to take care of — that may be different in the next generation or on a rival design?

Putting huge numbers of cores on-chip is a losing game. The more you add, the smaller the fraction of the problem space you are addressing, and the harder you make programming. I would much rather up the size of on-chip caches to the extent that they effectively become the main memory, with off-chip accesses becoming page faults (I call this notion RAMpage). Whether you go multicore or aggressive uniprocessor, off-chip memory is a major bottleneck.

As Seymour Cray taught us, the thing to aim for is not peak throughput, but average throughput. 100 cores each running at 1% of its full speed because or programming inefficiencies, inherent nonparallelism of the workload and bottlenecks in the memory system is hardly an advance on two to four cores each running at at least 50% of its full speed.

In any case all of this misses the real excitement in the computing world: turn Moore’s Law on its head, and contemplate when something that cost $1,000,000 will cost $1. That’s the point where you can do something really exciting on a small, almost free device.

Further reading

My PhD thesis, completed in 1996, but now relevant to a wider audience since multicore has become mainstream, is available on Amazon. See Other Links.

Some interesting stuff here: The Perils of Parallel: Vive la (Killer App) Révolution!


Greg Pfister said...

Philip, you may want to take a look at my blog "The Perils of Parallel" (, particularly the posts titled "101 Programming Languages" (multi-part) and "Clarifying the Black Hole". I think you and I are in fairly strong agreement.

Philip Machanick said...

Thanks. I added a link to your blog in the main article. Something I think is missing: looking for the killer app in a mass-market low-cost application. The long-term trend in the computer industry has been for packaging designed for low cost to overtake packaging designed for performance first on price:performance and later on pure performance.

Peter N. Glaskowsky said...

Funny, seeing this now. I recall making similar points in articles for Microprocessor Report before I left in 2004.

But I never said "don't do these things." I just said "these are difficult things to do." The difference is crucial, since yesterday's difficult problems are tomorrow's billion-dollar opportunities.

The smartphone business sure has been showing us the truth of that observation, hasn't it? There isn't a high-end smartphone on the market that doesn't make "mistakes" 1, 2, and 4, but they work pretty darn well. There will never again be a competitive smartphone that doesn't rely on these features. In fact, the smartphones of 2028 will be vastly more complex, implementing architectural features that would have been insane to consider for the 2018 products now in design.

In the same way, today's most complicated software products couldn't have been created in 2008 because development tools didn't exist to support the large teams and subtle interdependencies they required, but they work pretty darn well too.

There is, in fact, a Moore's Law for IQ in exactly the sense that matters here, and surprise, it's the same Moore's Law. Ted Nelson understood this truth 45 years ago: computers make people smarter. More complex computer systems enable more complex development systems, which let us manage more complex projects, which create more complex computer systems. Around and around and around.

The most difficult thing to figure out is how best to use the additional complexity we get from Moore's Law. I figured that out in the early 1990s and have been devoting a portion of my attention to it ever since. I think I've made some useful progress, but the industry's been going in a different direction.

That direction has worked, but it hasn't worked perfectly. The adverse consequences to computer security alone waste billions of dollars a year, not to mention the opportunity costs for users who have to settle for walled-garden environments just to get weak-but-adequate security in return. Personally, I think the costs of security may actually be small compared with the losses in programmer productivity and project scalability, but I have no idea how to quantify those losses, nor time to explain entire other categories of costs in this message.

Suffice it to say that it isn't too late to start solving our problems at the right level, a more fundamental level; the value of the future computer industry is vastly larger than the value of the current industry. But we're leaving billions on the table with every month that goes by.