Wednesday, December 12, 2007

The important thing about K10

I have been reading articles like "Has Intel Crushed AMD?" by Jon Fortt in Fortune's BigTech blog, the Mario Rivas interview [he is the Computing Product Group Executive Vice-President] by Damon Poeter in ChannelWeb, as well as many other numerous discussions in message Boards about AMD, and I grew increasingly frustrated at how people is losing sight of the truly important things about K10.

There is the perception that if AMD solves the k10 problems of bugs, slow clocks, manages to produce them in quantities, then AMD may continue to consolidate its duopoly player status, and continue to be a force in the industry that must be taken into account. I think that all this optimism, very unfortunately, is unfounded.

Let us suppose AMD had launched K10 processors at 3.0 GHz, for both servers and desktops, around June of this year, and without any bugs of importance. Still AMD would be headed down, only not so fast. That is my point. Why? because the K10 design itself proved to be a dud at so many levels that it is exhausting just to mention all of them. I think that the important thing of K10 is that it proved inferior in IPC (instructions per clock) to Intel's existing double duals. This is a fact in a context of two extremely alarming things: Intel's double duals are handicapped by the front side bus (they can't communicate die to die directly, and every new core or processor on the same and the same bus diminishes the effective memory bandwidth per core) and external memory controller delays. Still, despite the handicaps, the double duals beat fair and square any K10 quad at same-clock comparisons in the vast majority of workloads.

This will get much worse, because Intel is already enjoying the advantages of 45nm, over the horizon looms the new advances in transistors, the high dielectric and the metal gates (* see note at the foot), as well as their QuickPath implementation of P2P that does away with the handicaps. I have demonstrated that there is no need to do something as good as AMD's DCA/Hyptertransport because for the vast majority of applications they were used only minimally (just to save a bit of money on the external memory controller, and to reduce the points of failure, helping speed to market), so, the industry has every reason to expect a much more competitive Intel in the short, medium and longer terms.

Does K10 has the room for future improvement? I was very wrong regarding Core, I honestly thought that the P6 line didn't have room for improvement, but Intel proved me wrong. I will try again to formulate predictions, though:

  • I don't think the cache hierarchy in K10 works. The independent L2 caches, half of the total cache space, are inefficient. The L3 level is too small compared to the L2 level to be justified (according to my simulations, for the latency steps on cache level of typical architectures, the sizes should be at least four times bigger than the previous level. These are numbers that I use in high performance optimizations where I try to adapt my software so that the "working sets" maximize the cache hierarchy performance. Also, I have said many times that the L1/L2 hierarchy of K8 behaves more like a "one and a half levels", that's why it is so size-efficient), thus, unless AMD changes this radically, I see K10 underperforming in memory-intensive applications. While the whole L3 is of dubious merit, it still occupies a significant fraction of the processor area, and consumes a significant fraction of its power... Some might say that I am saying this in hindsight, but in reality it is just that the actual performance numbers of K10 have given me confidence to go public with reservations I had from the beginning. I don't find the lackluster performance problem of K10 in any of the important advancements of this architecture (ask Ron, "Cove3" in the InvestorVillage message board for a complete list), but it has to be something, and I think the cache hierarchy may be a partial answer.
  • I don't think the migration towards quadcores will happen fast, not anything close to the migration from single core to dual core. A second core really adds usable computing power for normal Windows usage, as valuable as 70% of the first core, but the third core adds computing power that is hard to use, so it is only 35% as valuable. The fourth core is even less valuable. That's why I am so interested in three-cores, I can really think of ways to use a third core, but the fourth is still too far. This has to do with software engineering and the principle of combinatorial complexity. From the design perspective, the problem with these facts, is that while the single-die principle of K10 is oriented towards maximizing the efficiency of the four cores, it does so at very steep bin-split, yield and complexity penalties. Intel's existing double duals have the priorities reversed: inefficient multicore performance but with quick to market times, ease of manufacture and capable of top clock speeds. By the time this situation reverses, Intel will already be in the market with single-die designs, so, I am afraid K10 won't ever have the chance to be the adequate design for its time, at least from the perspective of multi-cores.
  • The problems we have seen of K10 are not accidental, I fear they are fundamental: The architecture is single die/four core, thus complex, thus requires time to develop, it is error prone, difficult to produce, and hard to make it run at top clock speeds.
I hope to have explained with sufficient detail why I think this is not a circumstantial crisis in the processor business of AMD, but an structural crisis that will aggravate.

The words of Rivas are very contradictory: He implies that the total performance of the processor really doesn't matter for the enthusiast, which is a lie by itself; but yet, the architecture he sells, optimized for multicore performance, is as enthusiast-directed as it gets. He minimizes the performance penalty of the BIOS fix of the TLB bug, contradicting the Tech Report benchmarking (in an article by Cyril Kowaliski that "Chico" asked me to read in his latest comment, Tech Report pounds on Rivas for this), and of course, it is literally brimming with promises of improvements that I don't see how to justify. Rivas is the same AMD official that acknowledged in March that the single die quadcore had been a mistake (Ashlee Vancee @ "The Register"), and digging a little bit more, Rivas, in an interview exactly one year ago, promised a place in heaven regarding Fusion (EETimes, Junko Yoshida), when the company was still trying to justify the ATI acquisition. Read the contradictions of Rivas, that will lead you to conclude that AMD is in full lying mode, presumably because the officials can not say the truth, that is, the news are to become much worse.

I never agreed with the Opteron/K10 comparison. It is true that both are monumental challenges, but that's about all their similarity. Opteron was revolutionary in ways that the industry was prepared to embrace, like the P2P connectivity, the emphasis on setting the way for single-die dual cores; and it was conservative and evolutionary on things the market wasn't willing to change: A true upgrade path for the x86 instruction set architecture for 64 bits, AMD64, while Intel was at the apex of their attempt at consolidating the Itanium ISA. K8 wasn't "marketing driven engineering", that's why it insisted in the technically superior approach of slow clocks of highly optimized execution rather than marketing gigahertz of idle instructions, represented by Netburst. Today, K10 tries to "innovate" in what is not necessary, like the single-die quadcore, the third cache level, etc., rather than innovating in things the industry is desperate for, revolutionary coprocessors for the consumer market, for example; on the other hand, today the risks associated to the K10 challenge are not at all mitigated by Intel's insistence on the incorrect approach, like at the times of the K8 challenge, but quite the contrary, the risks are heightened by Intel's practical and effective approach. Finally, AMD, at the times of the K8 challenge enjoyed the momentum of the superior product design, the Athlon, while today AMD suffers the negative momentum of having the inferior design (thus calling for a more practical approach).

My advice is to be suspicious of the theory that AMD just had a bad streak of problems and mistakes, at least regarding K10, it is very clear that AMD exposed itself to great suffering, and now that the gamble failed, the real pain is about to begin.

(*) AMD, unsurprisingly, is downplaying the silicon process race. Of course, it is so much behind already and getting ever further behind that it has to resort to deny the negative; but this subject is better left for another article.