Wednesday, December 05, 2007

Erratum 298

The error in K10 that has generated this flurry of controversy is called Erratum 298. I will explain what it is about below, but I first want to put this problem on what I think is its due context.

In "terrible news" I spoke about AMD launching Phenom knowing about the existence of this bug. Because of technical characteristics of the bug and the Linux patch that works around it, I think that the BIOS patch can also work around the problem, therefore, this is not a problem that grants a product recall. Nevertheless, the performance hit of patching a system through the BIOS may be very significant, AMD claims around 10%, independent testers claim around 20%; but it seems that if the Operating System can be patched too, it only hits 1%. In practical terms this bug and the patch are as if AMD would have launched processors 10% slower.

According to "Daily Tech"'s Kristopher Kubicki, AMD halted shipping of K10 pending "application screening", that is, AMD is checking whether the applications of a customer would likely trip the bug or not before shipping. It seems that the bug may only occur when the operating system needs to set the "Accessed" or "Dirty" bits of the page table entry [ I found this article for the people interested in learning about Paging, the meaning of the accessed and dirty bits is explained there ]; like I mentioned in "terrible news", some workloads like supercomputing may not trip the bug, the reason seems to be that supercomputing doesn't do very sophisticated virtual memory management, at least not as complex as virtualization, so the simultaneous conditions required to trigger data corruption or system crash may not occur.

This means that the flow of Barcelona processors to the market is slower than anticipated, and some other customers that chose AMD because the specific advantages of AMD processors for workloads like virtualization are not receiving any product at all. In the case of "consumers", it seems that the company will give the chance to disable any patch and have a buggy system, or take the 10% to 20% performance hit.

Now that we come to that, the choice of disabling key functionality of the L3, Kubicki also quotes AMD saying that some tri-cores will have the L3 disabled. This makes sense, so, I guess it may be interpreted as good news. Let me explain why:

Caches have been sort of a "loose cannon" in the world of µarchitectures, for instance:

  • The original 266MHz Celerons without any cache were so slow that Pentium MMX 233 were noticeable faster,
  • then Intel solved the problem a bit overkill and launched the cheap and very overclocking-friendly Celeron 300A that became famous because its half-size, full core speed L2 cache made it faster than the much more expensive Pentium II's with double size, off-die, half core speed L2 caches, especially while overclocked allowing 100 MHz memory rather than 66MHz (I owned a Celeron 300A for years, it ran at 450MHz with 100Mhz bus without a hitch and outperformed Pentium II and Katmai Pentium III of the same speed).
  • the problem that killed the hyperthreading feature of top of the line Netburst processors was the cache contention, despite the large sizes of Netburst caches (they were that sensitive to cache misses),
  • one the reasons for the superiority of AMD's Durons (in their price/value space) was their supersized L1 caches,
  • and one of the great reasons why AMD's K8 could compete with Intel processors of FOUR times the total amount of L2 cache was the very efficient "exclusive" architecture of L1/L2 caches (here exclusive means that the data in L2 is not "repeated" in L1)
so, I can understand that the L3 cache in K10 could have been a good idea in the designing stages, but the test in real life conditions demonstrated that the extra memory latency and higher manufacturing costs wasn't really compensated by how much it helped performance. Still, AMD expended lots of money, opportunity costs, time to market, and risk exposure to bugs to develop this feature in K10 that ultimately was proven of dubious value. This highlights, once again, that AMD shouldn't have skipped the intermediate steps between K8 and the "triple challenge", or that the "business exploration" is very important.

Another positive lesson about the Erratum 298 is how much more responsive the Open Source software is when compared to proprietary offerings. Linux already has a patch that emulates the "Accessed" and "Dirty" bits of page descriptors, so, the performance penalty gets reduced to much more numerous page fault exceptions; on the other hand, Microsoft isn't even bothering to patch around the K10 problem; it is true that the patch performs nothing short of "major surgery" in memory subsystem of the Linux kernel, but while AMD can actually make a patch for Linux, I guess that it is unthinkable for Microsoft something as radical. For the same reason, I expect the Open Source virtualization projects Xen and VirtualBox to be much more agile than, let's say, VMWare, to tend a helping hand to AMD to still allow early K10 to run virtualization without an extreme performance hit.

I received an anonymous comment that pointed to "andikleen"'s comment that leads to the code in x86-64.org of the patch and the explanation. [ Thanks to whoever posted the comment, but please, leave a name, there is no need to sign in to anything, just overwrite "anonymous" with a name of your choosing and that'll do ]. Cyril Kowaliski @ TechReport also comments on the bug and the Linux patch finishing with a very important thing, the apparent contradiction that AMD says that few customers will be affected by this problem, but at the same time it strongly advises Phenom motherboard manufacturers to enable the BIOS fix that zaps at least 10% performance without giving the option to disable the fix. By now the whole world knows that the bug is severe, I honestly don't understand what is AMD trying to do by insisting on minimizing it...

According to the Kubicki's article we have been talking about, AMD will continue to ship defective processors until the next stepping, B3, of both Phenom and Barcelona, gets launched in March... although the "2.6 GHz Phenom model 9900 is not affected", so, presumably, the Phenom 9900 would be the first B3 K10.

There are more K10-related news: AMD is re-emphasizing 65nm K8, "Brisbane" [ DailyTech ], "of course!" is what I say. There never actually was any need for AMD to forget about K8, the world is barely moving to the dual core wave and AMD should have focused on improving their dual core offerings rather than the "triple challenge" foolish adventure that led to slower processors than what is acceptable, hotter, and buggier too. K8, on the other hand, still actually has untapped potential. Unfortunately, since AMD has such bad 65nm process, it just can't go for the 3.0 GHz and 3.2 GHz speeds, currently manufactured processors at that speed are all 90nm and will be discontinued.

In any case, AMD will have to ride three more months and more on the back of the architecture it has been slighting for over a year now, K8...

3 comments:

Anonymous said...

Chicagrafo,

Tech Report has benchmarked the impact of the errata workaround.

See following link

http://techreport.com/articles.x/13741

Kristopher said...

Neat stuff: I don't agree with all your analysis on cache, but I think you detailed everything out in an overview very well.

Great job! Looking forward to reading more of these

Eddie said...

Thanks mike for the link and kristopher for the criticism/encouragement, I am into posting a followup to this article