Sunday, March 18, 2007

Is nVidia AMD's reaper?

I began by assuming that acquiring ATI was a good strategic move by AMD and tried to follow all the possibilities of this very complex subject. One of the conclusions you always reach if you suppose that ATI was a sound acquisition is that nVidia is in severe jeopardy:

Intel is doing mediocre integrated graphics and chipsets to claim over half the market, AMD is doing respectable integrated graphics, chipsets and discrete graphics cards and ATI was moving faster at the consumer electronics markets than nVidia, taking all the low hanging fruit before nVidia has a chance; then there only remains one high-margin possibility for nVidia: boutique graphics cards.

Have you seen nVidia's financials? The market valuation of nVidia requires the company to continue to have very profitable businesses, which is a given today because the most important issue for high expending consumers of computer related products is graphics acceleration capabilities, and nVidia has unquestionable leadership here, although there certainly are worthy products from ATI (AMD).

While the need for graphics acceleration will increase above moore's law, will the market continue to need discrete high end graphics cards to satisfy the need for graphics acceleration? -- I think it won't.

Role Inversion

It has happened an interesting inversion of roles: Today, gamers use and require more processing power in the graphics card than in the rest of the system, so the graphics accelerators have become the most important computing engines, relegating the host processor to a secondary role, that of system butler. But this organization is very wasteful.

This inversion of roles happened because graphics accelerators can not use resources like memory of the host computer with the required efficiency, thus the accelerator must replicate them, to the great fortune of nVidia who is not in the declining market of whole systems but is the Master of the rising market of graphics computers.

But AMD technologies allow for two ways to make the processors retake their primary roles on the systems: 1) The torrenza graphics/physics coprocessor, and 2) Fusion.

What is a graphics card nowadays?: It is video electronics that has been standardized to the point of becoming a non-issue (that's why you can find graphics cards with two video connectors, just in case you want dual monitor). It is a frame buffer (around 8Mb max), or a digital to analog converter that is also a non-issue; it is a bitblitter (2D acceleration) that is not difficult to leave next to the framebuffer, and it is a computing engine complete with its own memory and all of an ad-hoc, propietary architecture; which is made available through APIs like DirectX or OpenGL. It is the computing engine the whole issue of the graphics card. But since the 3D engine requires memory and can not delegate general processing to the host processor, you end up with a "computer in an expansion card" with the following disadvantages:

  • due to the constraints of the expansion card form factor, it is ridiculously harder to cool, making it also much harder to overclock
  • so far from the system data that it requires its own RAM, a wasteful redundancy:
    • about half the memory size of the whole system
    • where you also generate heat powering it
    • CAN'T UPGRADE its size at will
    • lose it all whenever you upgrade the accelerator
  • The coprocessor also has to execute general purpose computing (being across the PCIe latencies and (comparatively) narrow band of the PCIexpress) it must do some computation that is suitable for general processors, making it larger, more expensive and more power hungry than what it needs to be
  • Needlessly the coprocessor is attached to video electronics.
    • What if all you want is to have "SLI"/"Crossfire" or two coprocessors but not twice the video electronics?
    • What if you are fine with a single coprocessor but want two (or four, or N) monitors? you still have to purchase N coprocessors.
Through #1 (torrenza coprocessors), there won't be any need to have graphics expansion cards beyond the video electronics, certainly not graphics computation, nor video memory. Through AMD's DCA a graphics coprocessor will be able to share its own memory with the that of the other processors, for great gains in usage efficiency of the resources, and easier development of cooling strategies, overclocking techniques, standards, etc. Also, it will happen that the coprocessors will dessist at also replicating general computation capabilities and specialize in vectorial (SIMD) very data-wide computation to leave the serial general computation to the host processors1. Fusion, #2, is a step further, to integrate that into the processor package2, in summary:
  • There is room for as much 3D acceleration memory as the owner wants, shared with the whole system, cooled independently, memory that you don't lose when you upgrade the coprocessor
  • There is actually enough room to properly cool and overclock
  • Both processors and coprocessors can COOPERATE, leading to better specialization (smaller accelerators, cooler, increased efficiency)
Then it is clear that if the ATI purchase has any merit, nVidia is in very serious trouble. Nevertheless, nVidia has recovered from a 52w minimum of $17.17 per share, it even more than doubled, and in general its market valuation has followed a path not antagonic to AMD's; thus the market so far seems to be dismissing this peril.

1: AMD has in its power the possibility to "plug the cord" of both Intel and nVidia at the same time: Intel is too far behind in processor interconnect to contain an onslaught of torrenza coprocessors for everything under the sun including not just graphics, but physics, java, search, artificial intelligence, cryptography, XML, communications, chess, PowerPC-x86 or Sparc-x86 hybrids... and nVidia's products can't execute x86 instructions. To me, it is proof of AMD's bad management that rather than nuking its competitors using DCA to its fullest, having these vergeltungswaffen, it insists in playing underdog using DCA's advantages to an infimum degree, and that only in multiprocessors. This gives its competitors every chance to hack the market share gains AMD won with so much effort, using technically uninteresting brute force designs (both Core architecture and MCM double duals AKA quadcores), to undercut its financial evolution, and worse, to actually prostrate into losses!. AMD does this only to try to protect a model of computing based on general processors, which, as the role inversion demonstrates vividly, is obsolete. I fear that by the time AMD is ready to exploit the DCA possibilities to any significance, Intel will already be there with an alternative, albeit inferior; just like it happened with AMD64/EM64T, that AMD64 never became important.

But since AMD chose to only pursue the "removed in time" approach #22, both Intel and nVidia know what is coming and have enough time to prevent to be displaced from their dominant positions.

Will Intel counter the threat? I see they are not doing it: The strategic plans are the war of the number of cores. Perhaps it couldn't be any other way, at 75% unit market share, and historical near monopolistic market dominance, Intel represents the status quo, thus Intel's greatest priorities are to protect existing businesses. That explains why they went so far into the "gigahertz obsession", why they have insisted so much in the out of processor memory controller, despite its terrible disadvantages when it comes to Virtualization and processor interconnects. The coprocessor model for computing will only erode Intel's dominance. I recently read news that a company says it can design search coprocessors for Microsoft which each would do the job of five processor cores with power consumption savings. I am sure that Intel is not interested in having five Xeon cores substituted by one coprocessor, as far as they are concerned, they would rather fix it so that they could sell five Xeon cores to replace one coprocessor (an interesting disgression here).

But nVidia's position is very different, they are absent from the market for general processors while they already make the best graphics processors, if they could leverage their graphics leadership into general computation they stand to have enormous gains.

The question is, can they do it?

A 3D accelerator is a very wide instruction processor with lots of internal parallelism in a proprietary arquitecture with very little visibility of its specifics from the outside. In principle, it is almost the exact opposite of x86 computing. Nevertheless, (and this is the best idea in this article), Transmeta demonstrated in the XX century that you don't need an x86 architecture to run efficiently x86 instructions; it is entirely feasible to emulate an x86 architecture with on the fly translation, in fact, so feasible that it actually has ADVANTAGES like better POWER EFFICIENCY!. This layer of translation makes it a black box the exact architecture of the physical processor. Interestingly enough, the easiest way to do efficient execution of emulated x86 is precisely VERY WIDE INSTRUCTION MACHINES!, what nVidia is the undisputed leader of!.

Jen-Hsun Huang knows this. As a matter of fact, nVidia has made inroads into the general computation market because it is needed for some high-end consumer electronics products it is developing, therefore it is acquiring expertise in general purpose computing.

A disgression about virtualization:Our fellow message boarder "jag24" has been insisting about the dramatic synergies due to consolidation of services through virtualization, because he seems to think that I dispute them. Anyway, he speaks of successful consolidation ratios of 15 to 1 or even more. If true, these are awful news for server makers, including, naturally, both AMD and Intel.

But such figures are entirely plausible. He says that in his datacenters the pre-consolidated services ran in computers with average utilization of 20%. If we assume that the 20% average utilization was to leave room for peak utilization five times the average, to consolidate 10 servers that have peak utilization patterns independent of each other, it may suffice as little as 2.8 times the hardware of one independent server, for economies of 10/2.8 (if the peaks occur independently and rarely then you only need 80% of one individual server capacity spared above normal utilization in the consolidated server, plus 10 times 20% gives 280%).

So, how come that Intel (and AMD, for that matter) relented to develop the market of virtualization by offering Vanderpool while the whole thing erodes its businesses?

This is entirely speculative on my part, but yet, it may be true: What is the "charm" that allows Intel to have such very high profit margins? What is the "moot" that protects its profit engine "castle" from the competition? -- it is the x86 instruction set architecture. IBM, Motorola, DEC, and Sun each had the capacity to make processors vastly more powerful to Intel's, but not to run x86 as fast, and this is why they all yielded to the economies of scale behind x86.

By 1998 Intel had already very clearly won the CISC/RISC wars with the Pentium Pro: Have the CISC x86 instructions outside and convert them inside to RISC (by the way, using techniques pioneered by Nexgen (not Nextgen) which eventually was acquired by AMD) but the x86 ISA had created closer, more dangerous x86 competitors like Cyrix, Nexgen, AMD kept lingering on, and oh! end of the world! Transmeta promised on the fly translation to make x86 irrelevant. Then, Intel had to leverage its dominance into creating a completely new Instruction Set Architecture to leave those competitors behind for good, and the push towards what was then called Merced, now called Itanium, happened. It failed miserably, while across the street AMD succeeded triumphantly by providing an upward migration path to x86 --AMD64, and Intel had to go back to concentrate on x86. Since Transmeta was flushed down the toilet, Intel could once again devote its energies to preserve the x86 from external attacks.

But then, the push towards virtualization emerged. Since x86 does not satisfy the "Popek and Goldberg" requisites, x86 Virtualization can only occur through some sort of either emulation of the processor, or Operating system/Applications modification (approach which is called "paravirtualization").

Intel (as well as Microsoft) is strategically committed to disuade the market from successful x86 emulation, so, it either offered a way to complete the Popek and Golberg requirements, or virtualization would have become the single greatest incentive to emulate x86.

One last thing which needs to be explained is why Transmeta had the right technical idea and nevertheless went belly up: It was a problem of not being able to exploit the idea themselves (transmeta never had any experience in very high performance processors, so Crusoe was always a not so good processor itself), and being over zealous to allow others to use the idea, which they named "Code Morphing", ended up with a mediocre processor showcasing Code Morphing, which quickly turned into being empty handed. This should be a remainder for AMD's DCA, or cache coherent hypertransport: AMD alone can not make full use of it, then it makes a lot of sense to license it rather liberally so that the industry can fully embrace it, for the indirect benefit of AMD.

This emphasizes the case: AMD should be the first to break new ground with torrenza coprocessors, any general computing business lost will eventually be over compensated by indirect businesses around the DCA technology that makes it possible!.

Going back to nVidia, since AMD has declined to go the coprocessor way, but only 2: Fusion, which is about the same thing as the coprocessor approach, only less general and more distant, nVidia (and Intel) have literally years to prepare an strategy, and as explained, nVida has at least one concrete path to follow: x86 emulation, which will extend its leadership indefinitely.

To acquire the rights to do x86 should be cheap: There are a number of companies that have them, including VIA

If this happens, AMD's cost structures doesn't withstand a chance. EGGNOG324 gave us a magnificent account of how 3DFx's missed opportunities and mistakes allowed nVidia to slaughter them, as an analogy of what may be happening today with AMD, I don't feel the need to explain exactly how in the end it may be nVidia and not Intel the AMD executioneer.

2 comments:

Anonymous said...

there are plenty of case studies and success stories on these two sites to show people getting 15:1 or better consolidation ratios using vmware.

http://www.virtualizeasap.com/about/resources/
http://www.vmware.com/customers/stories/

here are the comments from the investorvillage amd forum your refer to:

“i am not a datacenter either, but have happen to be responsible for a couple at work. we have found the more cores (getting them from more processors and/or more cores) along with enough memory increases the number of virtual machines we can host from one physical server. what we are hosting is a combination of in-house code, web servers, third party application and databases. of course high i/o database don't viritualize well but most other stuff does. across the two data centers we had a couple of years ago we had over 800 1ghz-2.8ghz single core dp machines. most of these machines were running on average ~20 % utilized. had we simply moved them to new servers as part of our hw / os (2000->2003) we would have then had 800 windows servers running at less than 10% utilized.

We did a small pilot with two teams, one tried to consolidate physical server on less servers and one tried to virtualize. After a few months we gave up on consolidation and decided to virtualize. this was before dual core processors showed up for x86. virtualization is much more flexiable and the tools that come with vmware made the project much simpler.

we completed virtualizing half our systems before the first dual core servers was landed, when it was we found we could get more virtual machines on them. on average our DP single core boxes have 7 virtual machines on them and our DP dual core boxes have 12. In the end we went to ~140 virtual servers and ~30 non-virtual servers. we also added some new servers during this time, so it would have been less total boxes if we were just counting the origanal servers."

Eddie said...

The connection between the strength nVidia has at very data-wide computing engines to the possibility of emulating x86 is, AFAIK, an original idea of mine. I would very much appreciate any reference to analysis along these lines.