Intel Woodcrest: the Birth of a New King
by Jason Clark & Ross Whitehead on July 13, 2006 12:05 AM EST- Posted in
- IT Computing
The New Benchmark Suite
We've made some changes to our benchmarks to accommodate the required multiple load scenarios we used in this article. The first benchmark we overhauled was the Dell DVD Store test (http://linux.dell.com/dvdstore/). In the last article (the first time we used Dell DVD Store), we used the stock Dell SQL driver along with a medium sized database (which is approximate 3GB). This time around we wanted to use a larger database to show a more enterprise based e-commerce scenario. To get a larger database we took the medium database and upped the customers to 20 million from 2 million and upped the products from a hundred thousand to 1 million. This resulted in a 14GB database.
We modified the driver code as well. We started off by taking the included C# driver source code and changing the way it created the threads (users). Basically, in stock form the driver creates all the threads and users in one shot and then starts executing orders. Since we wanted to be able to dynamically add threads to achieve certain load levels, we added a method to the class to add users. At the same time we also added a few properties so that we could use a Windows Form application to house the class and report back various performance counters. This allows us to graph CPU usage and orders per minute over the duration of the test, and we can save the graphs for historical reporting. The Forum benchmark also got an overhaul using the same GUI driver, and a few changes to the way the queries were executed against the database.
Both of the benchmark applications record their results back to a database server, where we average the results over the N number of runs for our graphs. We also allow the GUI to take command line parameters, which allows us to set up batch files to run an entire platform. On average it takes almost 20 hours to run a platform (due to the fact we run 5 iterations of each load point). It is important to look at the deviations between benchmark runs to ensure scores are consistent and representative of typical performance. The deviations are all relatively low which is very good, with the average deviation being 1.6%.
Dell & Forum SQL Trace Analysis
The Dell and Forum benchmarks are quite different workloads, which you will see in the benchmark results. Dell executes approximately 10 times more queries during the test, and the durations are approximately 4 times less than that of the Forum benchmark durations. To summarize, Dell is a workload with a high transaction volume, and each query executes in a very short amount of time. The Forum workload has a medium transaction volume, and the queries execute in a reasonable amount of time but are much more read intensive (larger datasets are returned).
Test Configuration
Below are the configurations of the test machines. We should note that the Opteron system memory was set to 1T and NUMA was enabled.
Client
Dual AMD Opteron 256
4GB Memory
Gigabit Ethernet
Windows 2003 x64 Server
Woodcrest/Dempsey System
Intel OEM System (Pre-Production)
8GB 533MHz FB-DIMM
Windows 2003 x64 Enterprise Server SP1
SQL 2005 Enterprise SP1 x64
14 x Ultra 320 SCSI Drives in RAID 0
LSI Logic 320-2 Controller
Opteron 280/285 System
Tyan S2891 Motherboard
8GB PC3200 DDR 400MHz
Windows 2003 x64 Enterprise Server SP1
SQL 2005 Enterprise SP1 x64
14 x Ultra 320 SCSI Drives in RAID 0
LSI Logic 320-2 Controller
We've made some changes to our benchmarks to accommodate the required multiple load scenarios we used in this article. The first benchmark we overhauled was the Dell DVD Store test (http://linux.dell.com/dvdstore/). In the last article (the first time we used Dell DVD Store), we used the stock Dell SQL driver along with a medium sized database (which is approximate 3GB). This time around we wanted to use a larger database to show a more enterprise based e-commerce scenario. To get a larger database we took the medium database and upped the customers to 20 million from 2 million and upped the products from a hundred thousand to 1 million. This resulted in a 14GB database.
We modified the driver code as well. We started off by taking the included C# driver source code and changing the way it created the threads (users). Basically, in stock form the driver creates all the threads and users in one shot and then starts executing orders. Since we wanted to be able to dynamically add threads to achieve certain load levels, we added a method to the class to add users. At the same time we also added a few properties so that we could use a Windows Form application to house the class and report back various performance counters. This allows us to graph CPU usage and orders per minute over the duration of the test, and we can save the graphs for historical reporting. The Forum benchmark also got an overhaul using the same GUI driver, and a few changes to the way the queries were executed against the database.
Both of the benchmark applications record their results back to a database server, where we average the results over the N number of runs for our graphs. We also allow the GUI to take command line parameters, which allows us to set up batch files to run an entire platform. On average it takes almost 20 hours to run a platform (due to the fact we run 5 iterations of each load point). It is important to look at the deviations between benchmark runs to ensure scores are consistent and representative of typical performance. The deviations are all relatively low which is very good, with the average deviation being 1.6%.
Dell & Forum SQL Trace Analysis
The Dell and Forum benchmarks are quite different workloads, which you will see in the benchmark results. Dell executes approximately 10 times more queries during the test, and the durations are approximately 4 times less than that of the Forum benchmark durations. To summarize, Dell is a workload with a high transaction volume, and each query executes in a very short amount of time. The Forum workload has a medium transaction volume, and the queries execute in a reasonable amount of time but are much more read intensive (larger datasets are returned).
Test Configuration
Below are the configurations of the test machines. We should note that the Opteron system memory was set to 1T and NUMA was enabled.
Client
Dual AMD Opteron 256
4GB Memory
Gigabit Ethernet
Windows 2003 x64 Server
Woodcrest/Dempsey System
Intel OEM System (Pre-Production)
8GB 533MHz FB-DIMM
Windows 2003 x64 Enterprise Server SP1
SQL 2005 Enterprise SP1 x64
14 x Ultra 320 SCSI Drives in RAID 0
LSI Logic 320-2 Controller
Opteron 280/285 System
Tyan S2891 Motherboard
8GB PC3200 DDR 400MHz
Windows 2003 x64 Enterprise Server SP1
SQL 2005 Enterprise SP1 x64
14 x Ultra 320 SCSI Drives in RAID 0
LSI Logic 320-2 Controller
59 Comments
View All Comments
Kiijibari - Thursday, July 13, 2006 - link
Due to the integrated memory controller, the scaling of Opterons is "nearly" linear. 10% more frequency gives you around 8% better benchmark results. That is true for SMP setups, too. Because you also add more memory bandwidth channels with each CPU. Of course you have to setup NUMA correctly then (SRAT enable, NODE interleave disable). By using SRAT it may be possible to raise also the performance of a 2way system. I am not sure if it was done for the benched article, it just stated, that NUMA was "enabled" not which kind ... :(
Anyways, I doubt that there will be a 3 GHz S940 Opteron. It will be S1207, i.e. it will feature DDR2 memory. Hence the performance scaling will be even better than "linear" (if you are willing to compare S940 vs. S1207) ;-)
cheers
Kiijibari
Accord99 - Thursday, July 13, 2006 - link
Maybe if your benchmark is heavily CPU bound, but not every test is, especially ones dealing with multi-gigabyte databases where the storage subsystem becomes more important.
Too bad there weren't more Opteron scores but a simple linear extrapolation from the two Opteron results would indicate that it would take a 3.4GHz Opteron to match the Woodcrest at saturation for the Dell Dvd Store benchmark, while a 3GHz Opteron would match the Woodcrest on the Forum benchmark at saturation. At the lower load points, it would probably take 4+GHz.
Kiijibari - Thursday, July 13, 2006 - link
Ah ok, sorry, if you were referring to Database test, of course I ment CPU bound applications.
However i cant see your point. If you are looking on databases then the most important stuff is the I/O subsystem, if it does not stress the CPU too much. Thus I dont understand, why a Woodcrest should be better than an Opteron or a Netburst setup.
As long as they feature the same harddisks & controllers and the CPU load is low, performance should be the same.
But the 2 test here were all CPU bound. You can see that in the DVD test, all system ties until Load 3 or 4, after that the woodcrests pull off, due to its higher processing power.
With the Forum benchmark, well I guess there were some problems with "throttling", mentioned in the text, thus the benchmark already benefits in stage 1 from the higher woocdrest performance.
cheers
Kiijibari
Kiijibari - Thursday, July 13, 2006 - link
Yes indeed the test is (mostly) crap.On the one hand it is ok with me, because you can get (somehow) a woodcrest system and nothing better from AMD.
However I expect something more in the conclusion then, but there is just that "we dont know" sentences: "How those parts will compete with future AMD products is unknown".
Dear people at anandtech, I give you a hint concerning that topic:
AMD will introduce 65nm technology. That will them enable to raise core clocks, while lowering power consumption. This is really no big speculation, it's a well known fact.
In addition to that, there is also an error in the article, Socket F wont add FB-DIMM, it will add DDR2. Download & read the updated BIOS guide from the AMD webpage. (For your convenience, here is the link: http://www.amd.com/us-en/assets/content_type/white...">http://www.amd.com/us-en/assets/content...e/white_... )
*No* I repeat *no* mention about FB-DIMM, but of course a lot information about DDR2.
Hence you can easily draw the conclusion, that AMD will have the better wattage package in 2007, as they lower CPU wattage with 65nm, and lower RAM wattage with DDR2, too.
Maybe the lowered DDR2 wattage will be already enough to even the Wodcrest wattage advantange with a Socket F 90nm CPU, but that is speculation. I dont know the absolut wattage differences between DDR1 and DDR2.
Anyways, the current Intel advantage is just due to the former "mobile" CPU Core2. Everybody knows that Netburst was/is a power hungry monster and that FBDs draw more power than any other kind of memory nowadays. Thus, any wattage advantage is due to the CPU.
cheers
Kiijibari
defter - Thursday, July 13, 2006 - link
Just a few months ago, when there were Conroe samples and benchmarks available some people were saying: "we know nothing about real performance let's wait for final benchmarks". Now when talking about 65nm Opterons these people are saying: "it's a fact that 65nm Opteron will be much faster" even though that there even aren't any samples available. Funny how things change...
How about some real facts:
- Fastest 130nm K8 reached 2.6GHz
- First 90nm K8s became available in about October 2004
- First faster-than-130nm 90nm K8 (2.8GHz model) became available in June 2005
With 130nm->90nm transition it took AMD 9 months until the newer process (90nm) achieved higher clockspeeds than the older process (130nm). Now, you seem to think that this kind of situation is impossible with 65nm and K8 will get sudden and major boost immideately?
Well, the last part is pretty obvious. DDR2 consumes significantly less power with equal bandwidth than DDR1. However, I would guess that AMD fans would scream a bloody murder if somebody would benchmark Socket F Opteron with DDR2-400 :) When comparing DDR1-400 against DDR2-667, I doubt that there will be a significant difference in power consumption.
Kiijibari - Thursday, July 13, 2006 - link
Hi,you cant compare it to Conroe, Conroe is a new architecture, first 65nm AMD chips will be a simple Die shrink, nothing to worry about as long AMD does not have major problems.
I havent checked your introduction dates, but I remind something like the same. AMD always introduces mid-range CPUs first. Because of that, I did not state, that a 3 GHz part will be out around christmas (this year). it will be some time in 2007, maybe they will skip higher clock parts and move to lower clock QuadCore parts. I dont know. However Intel will have a clock advantage until then: It could also be, that they hold that advantage. But there is to much speculation with that, AMD is using SOI and adding SiGe with 65nm, Intel is not.
To the RAM power consumption issue:
FB-DIMMs also run with DDR2 memory chips, too. Thus the additional +5W FBD wattage(Source: http://www.techreport.com/etc/2006q2/woodcrest/ind...">http://www.techreport.com/etc/2006q2/woodcrest/ind... ) is (only) due to the controller.
I dont think that there is a big wattage difference on different DDR2 speed grades, well there is surely some, but imporantant thing is the voltage, and that is always lower with DDR2 than DDR1 (1.8V vs. 2.5V).
Concerning that topic:
http://download.micron.com/pdf/pubs/designline/dl1...">http://download.micron.com/pdf/pubs/designline/dl1...
There they calcualte 2.7W for moderate use of a DDR2-533 module and 3W for a DDR2-400 module under high load.
Compared to DDR1 that is a 40-50% less power usage.(http://www.kingston.com/press/2006/memory/05b.asp)">http://www.kingston.com/press/2006/memory/05b.asp)
Anyways FBD adds +5W with every other memory module. That's bad, cause normally you have a lot of them in servers :(
But hopefully that will be changed with new, better controllers and/or bigger modules.
cheers
Kiijibari
JarredWalton - Thursday, July 13, 2006 - link
Could be wrong (replying to Kiijibari), but DDR2 is pin compatible with FB-DIMMs; you just need to implement the corrected memory controller. Initial Socket F should be registered DDR2, but the rumor mill has that later revisions will include FB-DIMM support.Kiijibari - Thursday, July 13, 2006 - link
Sorry, didnt saw your post soon enough, otherwise I would have answered it in my other post :)FBD is *not* pin compatible to DDR2 modules, they have their own interface. But they use normal, off the shelf DDR2 chips, the same which are also used with DDR2 modules.
Advantage of this is, that you can change the used memory modules, and the mainboard is still compatible to the new modules. That is due to the fact, that the systems just sees the FBD module controller, what is behind that, is not of interest.
Because of this, there will be compatible DDR3 FBDs for current Bensley platforms.
But there is also rumour that DDR3 modules will be compatible to DDR2 ones, too. There is very little difference(only voltage and 8xprefetch) between DDR2 and DDR3 and the modules will feature the same 240 pin interface.
Anyways AMD will go to FBD later (~2008), maybe after the wattage problem has been solved by Intel et al ;-) ( source: http://pc.watch.impress.co.jp/docs/2006/0623/kaiga...">http://pc.watch.impress.co.jp/docs/2006/0623/kaiga... )
But as I said earlier, Socket F is DDR2 only, no FBD in the BIOS guide.
cheers
Kiijibari
JarredWalton - Thursday, July 13, 2006 - link
Er, sorry, but by "pin compatible" I mean it also uses 240 pins. Same size DIMMs, but you need the right type of memory controller - just like registered vs. unbuffered RAM. One way or another, I'm sure AMD will support FBD in the future; the question is when? Although, since they link RAM to each CPU socket it's certainly not as big of a concern since two sockets already support four memory channels. If they can get 4 registered DIMMs per channel, they're already up to 16 DIMM sockets.