Thursday, February 16, 2006

65nm is just Intel Marketing

There has been a lot of nonsense talk about the supposed advantages of intel processors at 65nm. I am fed up of watching my friends at AMD repeat the mistake. Read if you want to get out of the darkness, bear with me about the technicalities:

First fallacy: It is cheaper to manufacture the same number of transistors at 65nm than 90nm:

It is clear that a wafer of 65nm has the potential to have 92% more transistors than a wafer of the same size with 90nm features. But wafers at 65nm are much-much-much harder to do. In particular, to obtain the same yields (percentage of perfect circuits), you have to expend much more. A quick an easy way to do 65nm, or 45, or whatever, is just do them, and allow the yields to crash through the floor. If AMD wants to rush to 65nm, it can do so sacrificing the yields. Therefore, I am not impressed by Intel's 65nm tech. before knowing about the corresponding yields. I see something very interesting: Intel's dual cores have been dual dies before Yonah/Sossaman, that is, two different circuits glued toghether. That tells me that Intel's yields should be significantly worse than AMD's 90nm 95%. A bit of technicallity: The larger a circuit is, the greater the probability of having a defect in manufacturing. Specifically, for a single die dual core to be perfect, both cores should be perfect too, therefore, the probability of defects is squared. I don't know the exact numbers, but Intel's yields at 65nm shouldn't be more than 60%. That means that single die dual cores would have about 36% yield, roughly half of single dies. There it goes the 65nm size advantage. But the process is still much more expensive. And because Intel has this huge architectural deficiencies, such as the Front Side Bus, it has to abhor to use memory and is forced to compensate with large caches, that once again have the yields decreasing at its square.

[Full analysis: Let's say one cache has a yield of 3/4, and unitary cost. A cache double the size, costs twice at the same yield level, but since the yields becomes (3/4)^2 = 9/16 ~= 1/2, or a cost increase of (3/4)/(9/16) = 4/3 due to yields; then the total cost of a cache twice the size is 2.66. With 50% cache yields, (1/2)^2 = 1/4; (1/2)/(1/4) = 2; a total cost of 4, four times an unitary sized cache].

Things get really interesting when you consider QUAD cores: In the case of AMD, a 95% yield becomes a bearable 81%. Intel's... a 13%. The fact that Intel demonstrated a quad core which was in reality a double die of single die dual cores what proves is a) Intel wants to fool the industry with Public Relationships marketing shows, and b) very probably they don't have the capacity to do quad cores, therefore, the yields should be very small, and c) commercial products should be very far. Really, what did Intel do? they admitted defeat!

Thermal dissipation: It used to be that smaller transistors consumed less power. But not anymore. Before, the supporting infrastructure for integrated circuits consumed negligible power compared to what the transistors themselves consumed. But nowadays, because there are so many millions of gates in a microproc., and the internal buses are 64 bits wide at least, the consumption of the tiny wires has become significant. There is no way out of this: The wires are thinner, the resistance is higher, the lost power greater. Thus, the brute force approach, miniaturization, is becoming less and less attractive, because it has less to be gained. Comparatively speaking, and most importantly, Intel does Strained Silicon while AMD does Silicon on Insulator. SOI gives additional protection against another plague of the sub 100nm world: migrating currents. The gate, collector, and drain of transistors are so close that the transistor has trouble to really cut currents. This is an exponential problem, because the only way to compensate for the background electrical noise is to raise the voltage, raising migrating currents, a vicious cycle.

Switching speed: Oh!, yes, at 65nm is much faster. But the additional speed has a price. That's why all other processor manufacturers chose to improve the architecture rather than insist in the miniaturization, but Intel did kind of the opposite: Simplified the architecture to streamline it and make it more suitable for high speeds. The result? Eunuchs such as the Pentium4 that can't even do multiplication with a barrel shifter. Or the super deep pipelines that are just inefficient. For Intel, this was all a marketing game: To have the greater speed, the smaller feature size, because to evaluate performance is difficult, while the megahertz/nanometer number is catchy.

Intel has a manufacturing tech. of cloning factories to the cafeteria that simply is primitive, such as Sharikou detailed himself in one of his posts. AMD has APM, a robotic tech. for chips production, leaps and bounds ahead of Intel's. That's why when I hear Hector Ruiz saying that basically they will do 65nm as soon as AMD feels like it, I believe that it will be a success.

The bottom line is: Why would AMD want to disrrupt sold-out production of 90nm products to fiddle with inmature yields at 65nm? Let us the competition to rush and crash

Thursday, February 09, 2006

Sun Microsystem's multiple personality complex

[Update April 21: I no longer subscribe to the point of view expressed here, for historic value I leave the rest as it was when published. I would say that Sun is managing to adequately walk the sharp edges of Sparc/Opteron and Java and has thus become a company with long term potential]

Sun Microsystems is a leader of technology, their products rock and fly, but it is not advisable at all to invest in it, because it is a company with a multiple personality complex.

Sunw tires to sustain its business of selling servers at a premium because of their unique architecture, Sparc; but at the same time they are the owners of Java, the technology that makes irrelevant the hardware to run applications. This conflict harms them a lot, especially their Sparc line of business, which is the bread-and-butter one. Now, they are doing the best AMD Opteron servers, that can be explained by the special AMD processors Andy Bechtolsheim says mentioned, so they in effect are becoming yet another x86 server company. What about Sparc? Isn’t it true that if they were capable of doing low cost (commodity or zero-premium) Sparcs they wouldn’t need to do Opterons for entry level servers? Or conversely, what does make them think that Opterons will not be able to scale to the heights of the best Sparcs?. The same duality/dilemma applies to Solaris/Linux/Windows

Java is a technology much more logical for IBM, that’s why they have embraced it so strongly as to become second leaders of it. They have computing hardware ranging from super main frames to less than pcs, therefore, Java, just as Linux does, actually helps them to focus. Sun’s reaction to IBM’s leadership challenge in Java is to extend Java with ever more absurd things, turning what once was small and beautiful into indigestible bloatware. That’s why Java has lost relevance.

I am not CEO of Sun, just a potential investor, so I can only see too many contradictions in that company but not solutions.

I agree with Sharikou: Journal of Pervasive 64bit Computing: SUN drives me crazy. In the mean time, the Ultra 20 makes me salivate.

Tuesday, February 07, 2006

Pacífica vs. Vanderpool

Virtualization is becoming a very important issue in the server market. One of the reasons is that to ensure availability of services, every computer has to be configured exactly as the service application recommends. If you have an Oracle database for critical information, you can't be fooling around installing every other nice feature on your server, everything in that server has to be conformant to Oracle's specification. Further than that, is actually desirable to have every computer running a critical service with a minimum of extra features, so that in the case of a problem, the process of locating/correcting it is greatly simplified. But at the same time, it is impractical to install three physical machines in a data center, each to do web serving, databases, and fileserving for a corporation. Virtualization is what enables the same physical infrastructure to be shared in a safe way among different services that each think have the whole computer for themselves. Try to understand this extending the concept of multitasking to computers: In multitasking, every task is not aware of the existence of other tasks, they just do their job, and the operating system is the one who switches tasks back and forth in the machine's processors for execution. With multitasking the computer seems to do many things at once. Equally, in vitualization, every operating system/service is unaware of the existence of others, they just do their job; but from the outside, the same machine behaves as if it were many different computers.

Virtualization can be done 100% in software, provided a little assistance from the operating systems that will run in that computer. An example of this is VMWare's line of virtualization software, and the excellent open source Xen. But of course, there are limitations, and help from hardware is so crucial that can determine whether a particular server machine can be partitioned into as many virtual servers as needed successfully or not.

Intel offers Virtual Technologies, VT, codenamed "Vanderpool", and AMD offers "Pacífica". Since this is an issue of relevance for the medium/long term x86 server markets, it is good to make a comparison among them.

There is an excellent set of articles published in "Theinquierer.net" about this subject, in three parts: 1, 2, and 3.

In part 1, we find the following:
"VT['s memory management] is a software solution [...]. As with most software virtualisation techniques, it is quite costly compared to doing the same thing in hardware. [...] VT manages memory in software."
Whereas Pacífica is a hardware solution that allows interesting advanced features without equivalent in Vanderpool such as the modes of "Shadow Page Tables and Nested Page Tables".

Second part explains that Pacifica's Shadow Page Tables (SPT) mean that all accesses to the microprocessor's page directory address (where the actual translation between addresses the tasks require and the physical adresses), or CR3, are shadowed by Pacifica, which activates the Virtual Machine Manager (VMM) to put the right value into the CR3 that each virtual machine sees.

In the third part, it is explained that Nested Page Tables (NPT) adds another level of indirection, but in hardware, to account for the Virtual Machine Management. Therefore, there are three levels: Application, that manage their own space, the Operating System, which manages many application's memory, and the VMM. Since this is done in hardware, the performance hit can be made negligible, for instance, through caching.

It also mentions the Device Exclusion Vector, a table that allows/forbids devices to put data through DMA into memory when the wrong virtual machine is running. This is another result of Hypertransport/Direct Connect Architecture.

The article summarizes:
"Looking back over the Pacifica spec, it is clear that it is indeed a bigger body of water than a Vanderpool. The basic architecture of the K8 gives AMD more toys to play with, the memory controller and directly connected devices. AMD can virtualise both of these items directly while Intel has to do so indirectly if it can do so at all."