Saturday, April 08, 2006

Core Microarchitecture for AMD investors

These are the points of the summary:

1) Complexity: It is very ambitious. I am sure that Intel will experience delays at development, because there are a number of features that directly interact with most of the µ-arch. This means that these features may generate combinatorially large numbers (millions) of pair interaction problems and running conditions extremely hard to nail down.

2) All that complexity only produces marginal benefits. If you take at face value Intel claims, having widened the µ-arch. from 3 to 4 issuance, in the very best of cases they achieved a 33% performance improvement compared to Banias; but it can be demonstrated that the theoretical maximum is not achieved in practice, not even with superlinear features such as instruction fusion.

3) All that complexity also translates into lots of transistors to support the advanced feautures, this makes the processor larger, costlier to manufacture, and hotter when running.

4) All of the extra features can't be possibly used 100% of the time, therefore they reduce the overall utilization of resources, expect less power efficiency too.

5) If 7 years ago the obsession for higher clock speeds that gave birth to the infamous Pentium 4 was in the opposite direction to where industry was moving, which eventually proved Intel catastrophically wrong; the newest "reordering" obsession (trying to extract parallelism from sequential instruction flows instead of the current industry focus on easing explicit parallelism) is again, the opposite of what the rest of the industry is trying to do.

6) see below

7) Even Hannibal agrees with me in that this µ-architecture just doesn't have room to grow. (Although he mentions that Intel may have 5 years in store with it, he is very explicit about that the improvements won't come architecturally but through miniaturization)

8) This emphasis in reordering (to extract paralellism from serial instruction flows) suits single-threaded execution. Intel supposedly is designing for the future processors fine tuned and optimized to run old fashioned serial and single threaded code. The software community is trying its best to go parallel; this reinforces my appreciation that Intel's NGMA is really a stop gap to contain AMD in 2007-2008 when they can come up with a truly new architecture with internal memory controller and all.

9) No possible way to integrate in this mess the memory controller, it is far too complicated already. So, AMD's main advantage will continue to make all Intel's efforts futile for the foreseable future.

Remember that AMD64 has two different things: a) the architectural difference of having the extended register set, which is more important than: b) the computation in packets of 64 bits

6) The introduction of Instruction Fusion implies a major chance in all the decoders and most execution units already, thus there is every reason for Intel to include the REX prefixes decoding essential for EMT64/AMD64 into the simple instruction decoders rather than the complex sequencer. Thus, the AMD64 architecture won't be "emulated" through microcode as it seems now. Also, since Intel widened so many components, it is totally reasonable to expect normal computation at 64 bits; therefore AMD will not have an specific advantage here.

10) But, there is one issue: AMD64 defines a register file twice the size; this exponentiates chances to do reordering. For that reason alone one could get over 50% performance benefits. Thus, in Core, code for AMD64/EMT64 could actually run much faster than in today's Opterons.

I guess that Intel may be up to something radical such as extending EMT64 with another 16 General Purpose Registers, or extending the instruction set with three-addresses operations to make use of the enhanced width. Today, the whole x86 arch. is two-addresses (it is too technical to explain this issue here in greater detail).

Don't panic if Core processors run much faster at EMT64 than AMD's; this is not something hard to solve for the "Grand Masters" in future generations

In conclusion, I seriously doubt that Intel could succeed being a contrarian to very evident design principles; but further than that, there is an obvious performance improvement when you follow the brute force approach of putting "more of everything" in a µ-processor.

But with larger caches, and bloated execution units, Intel is multiplying costs to obtain marginal performance benefits.

That is the nature of their price war: They will do brute-force chips to (try to) outcompete in performance AMD's well balanced propositions.

Just one trick in store from Mr. Derrick Meyer's team and the whole game is over for Intel; if things keep the same, AMD still will have very competitive products for the mid-performance (and value) segments, and undisputed leadership in 2-way processors and up.

The architectural situation is extremely important because it defines which company is going to make successful products, it is also an issue of long term ample consequences, that's why I think it is so worthy of being carefully looked at. To help myself with the insight some posters at the message board can provide, and also to stirr some discussion, I have posted early drafts of most of what you see here in the last few weeks, posts that garnered good discussion threads, still worthy of taking a look. This summary was posted here at the Yahoo AMD message board

Also, I posted my impressions about the information that came out of IDF here, but I have changed my opinion on aspects such as 64bits since then. And very importantly, the famous Bob Colwell's Stanford Lecture (video link here) (he is the father of the P6 architecture) which I commented about here.

I have already settled on a stance about Core, summarized in this article, but I encourage you to leave comments/questions (with a callsign at least, please!), that may help.


Anonymous said...

Pros and Cons of having an integrated memory controller:

Anonymous said...

Thanx for that brillian summary of the Conroe architecture. A very objective analysis.

Anonymous said...

Tried logging on as plantlife, and could not so I am posting above.

Good analysis.

Eddie said...

Another post, check the thread

Anonymous said...

You lead off with Complexity being a problem.

Do you have any evidence to present here or are we supposed to just take your word for it?

What evidence do you have to support your claim that Intel will never be able to add an integrated memory controller?

Why didn't you mention Core's ability to re-order loads, which helps very significantly to help negate the effects of their extra latency in memory access?

What makes you think there is no room for growth in this architecture?

Eddie said...

I led off with complexity because that's the root evil in this design. You ask for evidence about why complexity is an important problem, if you need the evidence, then you don't know much about design, so, sorry, but you are not up to the level of the article if you need such basic explanations.

Nevertheless, let me try: A simple invites improvements, a complex design severely discourages them. If you make one change in one part of a system, then it may have repercussions were you don't expect them, it may generate problems. It isn't something concrete, that's why fools try over and over to overcomplicate designs and crash when the reality of its disadvantages show up.

About the memory controller, I don't say that it is impossible for Intel to do that, I say that it is impossible to put it into the Core2 micro-arch. Why? Well, let me answer with an example: How would you convert your city car into an all-four-wheel-drive?