Intel Woodcrest, AMD's Opteron and Sun's UltraSparc T1: Server CPU Shoot-out
by Johan De Gelas on June 7, 2006 12:00 PM EST- Posted in
- IT Computing
The Official SPEC Numbers
SPEC FP and Int 2000 are the standard benchmarks to evaluate CPU performance. However, the benchmark numbers are highly dependant on the compiler. SPEC fp and Integer show the best case performance as the CPU runs on the aggressively compiled and highly optimized code. In the real world, code is compiled in a more conservative/less optimized way.
In practice this means that Intel's SPEC numbers - thanks to it's highly capable compiler team - are (slightly) higher than in real applications. Nevertheless, SPEC CPU 2000 is a good starting point to understand what a CPU is capable off. As mentioned earlier, the Xeon 5100 is the Xeon Woodcrest, based on the new core architecture.
The new Woodcrest is about 20-25% faster than the fastest dual-core Opteron. The 7% clockspeed advantage is most likely a result of the fact that the Woodcrest was baked with a newer 65nm process. If AMD manages to keep up with Intel when it comes to clockspeed, the advantage of their newest CPU might shrink to 15% or less. However, Intel's Woodcrest will have a much bigger advantage in all applications that make heavy use of 64 and 128-bit SSE.
When it comes to integer performance, the Woodcrest numbers are simply stunning and vastly superior to any other architecture. Let us find out if this vastly superior integer performance in SPEC Int 2000 pays off in server applications.
Latencies...
LMBench is a set of micro-benchmarks which can be helpful for determining memory latency and instruction latencies. We tested with LMBench 3.0a-5. It must be said that LMBench is usually right, but not always. If the benchmark is not aware of some of the particularities of a certain architecture, it can measure wrong values. So we have to double check if the values measured make sense.
The massive 4 MB L2 cache has an amazingly low latency of 14 cycles. This seems to be the worst case, as we have measured 12 cycles with other benchmarking tools such as ScienceMark. Nevertheless, even 14 cycles at 3 GHz is pretty amazing. The Core Duo, a.k.a. Yonah, accesses a shared cache that's half as large in 14 cycles at a substantially lower 2.33 GHz.
On the other hand, the memory latency very high; luckily the 4 MB L2 cache will minimize that effect. The problem seems to be the FB-DIMMs. The Advanced Memory Buffer introduces extra latency, and of course the registered DDR-2 533 chips with a CAS latency of 4 have a higher latency by themselves. This results in a memory subsystem with pretty high 115 ns latency, while the Opteron has access to the RAM in only 73 ns
ScienceMark didn't agree completely and reported about 65-70 ns latency on the Opteron system and 70-76 ns (230 cycles) on the Woodcrest system. We have reason to believe that Woodcrest's latency is closer to what LMBench reports: the excellent prefetchers are hiding the true latency numbers from Sciencemark. It must also be said that the measurements for the Opteron on the Opteron are only for the local memory, not the remote memory.
SPEC FP and Int 2000 are the standard benchmarks to evaluate CPU performance. However, the benchmark numbers are highly dependant on the compiler. SPEC fp and Integer show the best case performance as the CPU runs on the aggressively compiled and highly optimized code. In the real world, code is compiled in a more conservative/less optimized way.
In practice this means that Intel's SPEC numbers - thanks to it's highly capable compiler team - are (slightly) higher than in real applications. Nevertheless, SPEC CPU 2000 is a good starting point to understand what a CPU is capable off. As mentioned earlier, the Xeon 5100 is the Xeon Woodcrest, based on the new core architecture.
SPECfp | ||
Clockspeed | SPEC fp 2000 | |
POWER5+ | 2200 | 3271 |
Itanium 2 | 1666 | 2851 |
Xeon 5160 | 3000 | 2783 |
Opteron | 2800 | 2256 |
Pentium 4 E | 3733 | 2232 |
The new Woodcrest is about 20-25% faster than the fastest dual-core Opteron. The 7% clockspeed advantage is most likely a result of the fact that the Woodcrest was baked with a newer 65nm process. If AMD manages to keep up with Intel when it comes to clockspeed, the advantage of their newest CPU might shrink to 15% or less. However, Intel's Woodcrest will have a much bigger advantage in all applications that make heavy use of 64 and 128-bit SSE.
SPECint | ||
Clockspeed | SPEC Int 2000 | |
Xeon 5160 | 3000 | 3057 |
Pentium 4 E | 3733 | 1870 |
Opteron | 2800 | 1837 |
Pentium 4 Xeon | 3733 | 1813 |
POWER5+ | 2200 | 1705 |
Itanium 2 | 1666 | 1502 |
When it comes to integer performance, the Woodcrest numbers are simply stunning and vastly superior to any other architecture. Let us find out if this vastly superior integer performance in SPEC Int 2000 pays off in server applications.
Latencies...
LMBench is a set of micro-benchmarks which can be helpful for determining memory latency and instruction latencies. We tested with LMBench 3.0a-5. It must be said that LMBench is usually right, but not always. If the benchmark is not aware of some of the particularities of a certain architecture, it can measure wrong values. So we have to double check if the values measured make sense.
LMBench | |||||||
Clockspeed | L1 (ns) | L1 (cycles) | L2 (ns) | L2 (cycles) | RAM (ns) | RAM (cycles) | |
Xeon 5160 3 GHz | 3000 | 1.01 | 3 | 4.7 | 14 | 117.3 | 345 |
Pentium- M 1.6 GHz | 1593 | 2 | 3 | 6 | 10 | 92.1 | 147 |
Sun T1 1 GHz | 980 | 3 | 3 | 22.1 | 22 | 107.5 | 105 |
Opteron 275 | 2209 | 1 | 3 | 5.5 | 12 | 73 | 161 |
Xeon Irwindale 3.6 GHz | 3594 | 1 | 4 | 8 | 28 | 48.8 | 175 |
The massive 4 MB L2 cache has an amazingly low latency of 14 cycles. This seems to be the worst case, as we have measured 12 cycles with other benchmarking tools such as ScienceMark. Nevertheless, even 14 cycles at 3 GHz is pretty amazing. The Core Duo, a.k.a. Yonah, accesses a shared cache that's half as large in 14 cycles at a substantially lower 2.33 GHz.
On the other hand, the memory latency very high; luckily the 4 MB L2 cache will minimize that effect. The problem seems to be the FB-DIMMs. The Advanced Memory Buffer introduces extra latency, and of course the registered DDR-2 533 chips with a CAS latency of 4 have a higher latency by themselves. This results in a memory subsystem with pretty high 115 ns latency, while the Opteron has access to the RAM in only 73 ns
ScienceMark didn't agree completely and reported about 65-70 ns latency on the Opteron system and 70-76 ns (230 cycles) on the Woodcrest system. We have reason to believe that Woodcrest's latency is closer to what LMBench reports: the excellent prefetchers are hiding the true latency numbers from Sciencemark. It must also be said that the measurements for the Opteron on the Opteron are only for the local memory, not the remote memory.
91 Comments
View All Comments
Questar - Thursday, June 8, 2006 - link
Why? Because AMD got creamed?ashyanbhog - Thursday, June 8, 2006 - link
and Intel woodcrest may have fantastic performance when compared to earlier xeons,but Intel is 3 years late to the party, Opteron was here in 2003!
also remember, woodcrest is a brand new design from PIII base, manufactured on 65nm process. It is still to make its debut in the market and be available in volumes. Amd its indeed nice to see it being compared to a 3 year old design manufactued on 90nm process.
AMD still has two product launches to come this year. Move to DDR2 for opterons which should cut some power usage for the total system AND introduction of products manufactured on 65nm at the fag end of the year. Will woodcrest and conroe still retain their performance margins then? if not, for how many months or weeks has Intel grabbed this "performance crown"?
zsdersw - Thursday, June 8, 2006 - link
Consider the following:- If comparisons could be made between new products from both companies (i.e., Woodcrest versus K8L), they would be made. In the game of leapfrog that we have betweeen AMD and Intel, the comparisons will always be between existing tech and new tech. Will you be pointing out how AMD is "late to the party" when they release their new stuff?
- Making its debut and availability in volume is an issue for both AMD and Intel. It's not a valid point unless you make it across the board.
- 65nm will allow clock speeds of Opterons/A64's to increase.. but Conroe/Woodcrest speeds will be increasing as well.
ashyanbhog - Thursday, June 8, 2006 - link
not because AMD got creamed!a 35 billion$ dollar turnover company (Intel) is bound to make a comeback one day.
it Anandtech's review setup, its full of holes
the mysql benchmark on Dual Dual core opterons where they see a 30% drop against single core dual processor numbers in this becnhmark contradicts their own earlier benchmark where they see a 10% performance increase.
http://www.anandtech.com/IT/showdoc.aspx?i=2447&am...">http://www.anandtech.com/IT/showdoc.aspx?i=2447&am...
they also use a substandard MSI motherboard in one of the Opteron systems and fail to mention which system was used for the benchmarks
mistakes like this, genuine or intentional, are rife throughout the review report
the whole thing looks like the rig was setup to push the performance diff b/w woodcrest and Opterons to the max,
why would anybody two months to tweak settings before they publish the review!
Questar - Thursday, June 8, 2006 - link
Why? Because AMD got creamed?duploxxx - Thursday, June 8, 2006 - link
yeah right its a workstation motherboard it uses an nforce controller so maybe they rate it as server board it still is a budget board used for workstations, not a real server board or server chipset like they used on the intel woodcrest.check the servers like sun galaxy and hp dl385 they have amd chipsets... big difference.
the nforce has a shared memory bus...
zsdersw - Thursday, June 8, 2006 - link
Yeah, that's one of the 3 Opteron servers. At any rate, the MSI board is a basic server board.. it's still a server board.duploxxx - Thursday, June 8, 2006 - link
yeah they have done 1 real bench with an hp. all other benches were done with the 2 MSI basic boards...still waiting for the wintel benches
wolaris - Thursday, June 8, 2006 - link
In corporate environments, no-one with any hardware budget at all runs webserver and database on the same machine, as it hurts both performance and reliability. This affects T1 most, as its low clock speed and simple cores are not meant for database workloads.I think that you should run web serving tests using common, high-performance Opteron DB server and separate webservers, as it would be the case in real-world scenarios.
MrKaz - Thursday, June 8, 2006 - link
So Power consuming of the new Intel processor on .65nm at already high clock speed of 3.0Ghz is already consuming more than the older AMD Opteron on .90nm 2.8Ghz and DDR.When AMD releases socket F will go DDR2 (less power) and better .90nm samples (lower power). So then "new" Intel is already getting beaten...
And those tests where done with Cool&Quite?
Also don’t forget this tests where done with Woodcrest 3.0Ghz VS Opteron 2.2Ghz and 2.4Ghz, so when AMD releases the 2.8Ghz and 3.0Ghz with socket F the performance lead of Intel will vanish…
I think the biggest surprise here is how bad Xeon (P4) was (IS!!), and people keep buying it.