Intel Woodcrest: the Birth of a New King
by Jason Clark & Ross Whitehead on July 13, 2006 12:05 AM EST- Posted in
- IT Computing
The New Benchmark Suite
We've made some changes to our benchmarks to accommodate the required multiple load scenarios we used in this article. The first benchmark we overhauled was the Dell DVD Store test (http://linux.dell.com/dvdstore/). In the last article (the first time we used Dell DVD Store), we used the stock Dell SQL driver along with a medium sized database (which is approximate 3GB). This time around we wanted to use a larger database to show a more enterprise based e-commerce scenario. To get a larger database we took the medium database and upped the customers to 20 million from 2 million and upped the products from a hundred thousand to 1 million. This resulted in a 14GB database.
We modified the driver code as well. We started off by taking the included C# driver source code and changing the way it created the threads (users). Basically, in stock form the driver creates all the threads and users in one shot and then starts executing orders. Since we wanted to be able to dynamically add threads to achieve certain load levels, we added a method to the class to add users. At the same time we also added a few properties so that we could use a Windows Form application to house the class and report back various performance counters. This allows us to graph CPU usage and orders per minute over the duration of the test, and we can save the graphs for historical reporting. The Forum benchmark also got an overhaul using the same GUI driver, and a few changes to the way the queries were executed against the database.
Both of the benchmark applications record their results back to a database server, where we average the results over the N number of runs for our graphs. We also allow the GUI to take command line parameters, which allows us to set up batch files to run an entire platform. On average it takes almost 20 hours to run a platform (due to the fact we run 5 iterations of each load point). It is important to look at the deviations between benchmark runs to ensure scores are consistent and representative of typical performance. The deviations are all relatively low which is very good, with the average deviation being 1.6%.
Dell & Forum SQL Trace Analysis
The Dell and Forum benchmarks are quite different workloads, which you will see in the benchmark results. Dell executes approximately 10 times more queries during the test, and the durations are approximately 4 times less than that of the Forum benchmark durations. To summarize, Dell is a workload with a high transaction volume, and each query executes in a very short amount of time. The Forum workload has a medium transaction volume, and the queries execute in a reasonable amount of time but are much more read intensive (larger datasets are returned).
Test Configuration
Below are the configurations of the test machines. We should note that the Opteron system memory was set to 1T and NUMA was enabled.
Client
Dual AMD Opteron 256
4GB Memory
Gigabit Ethernet
Windows 2003 x64 Server
Woodcrest/Dempsey System
Intel OEM System (Pre-Production)
8GB 533MHz FB-DIMM
Windows 2003 x64 Enterprise Server SP1
SQL 2005 Enterprise SP1 x64
14 x Ultra 320 SCSI Drives in RAID 0
LSI Logic 320-2 Controller
Opteron 280/285 System
Tyan S2891 Motherboard
8GB PC3200 DDR 400MHz
Windows 2003 x64 Enterprise Server SP1
SQL 2005 Enterprise SP1 x64
14 x Ultra 320 SCSI Drives in RAID 0
LSI Logic 320-2 Controller
We've made some changes to our benchmarks to accommodate the required multiple load scenarios we used in this article. The first benchmark we overhauled was the Dell DVD Store test (http://linux.dell.com/dvdstore/). In the last article (the first time we used Dell DVD Store), we used the stock Dell SQL driver along with a medium sized database (which is approximate 3GB). This time around we wanted to use a larger database to show a more enterprise based e-commerce scenario. To get a larger database we took the medium database and upped the customers to 20 million from 2 million and upped the products from a hundred thousand to 1 million. This resulted in a 14GB database.
We modified the driver code as well. We started off by taking the included C# driver source code and changing the way it created the threads (users). Basically, in stock form the driver creates all the threads and users in one shot and then starts executing orders. Since we wanted to be able to dynamically add threads to achieve certain load levels, we added a method to the class to add users. At the same time we also added a few properties so that we could use a Windows Form application to house the class and report back various performance counters. This allows us to graph CPU usage and orders per minute over the duration of the test, and we can save the graphs for historical reporting. The Forum benchmark also got an overhaul using the same GUI driver, and a few changes to the way the queries were executed against the database.
Both of the benchmark applications record their results back to a database server, where we average the results over the N number of runs for our graphs. We also allow the GUI to take command line parameters, which allows us to set up batch files to run an entire platform. On average it takes almost 20 hours to run a platform (due to the fact we run 5 iterations of each load point). It is important to look at the deviations between benchmark runs to ensure scores are consistent and representative of typical performance. The deviations are all relatively low which is very good, with the average deviation being 1.6%.
Dell & Forum SQL Trace Analysis
The Dell and Forum benchmarks are quite different workloads, which you will see in the benchmark results. Dell executes approximately 10 times more queries during the test, and the durations are approximately 4 times less than that of the Forum benchmark durations. To summarize, Dell is a workload with a high transaction volume, and each query executes in a very short amount of time. The Forum workload has a medium transaction volume, and the queries execute in a reasonable amount of time but are much more read intensive (larger datasets are returned).
Test Configuration
Below are the configurations of the test machines. We should note that the Opteron system memory was set to 1T and NUMA was enabled.
Client
Dual AMD Opteron 256
4GB Memory
Gigabit Ethernet
Windows 2003 x64 Server
Woodcrest/Dempsey System
Intel OEM System (Pre-Production)
8GB 533MHz FB-DIMM
Windows 2003 x64 Enterprise Server SP1
SQL 2005 Enterprise SP1 x64
14 x Ultra 320 SCSI Drives in RAID 0
LSI Logic 320-2 Controller
Opteron 280/285 System
Tyan S2891 Motherboard
8GB PC3200 DDR 400MHz
Windows 2003 x64 Enterprise Server SP1
SQL 2005 Enterprise SP1 x64
14 x Ultra 320 SCSI Drives in RAID 0
LSI Logic 320-2 Controller
59 Comments
View All Comments
Lonyo - Thursday, July 13, 2006 - link
What's the fastest Opteron dual core CPU you can buy?What's the fastest Woodcrest CPU that will be released?
AMD don't make anything faster than 2.6GHz, so it doesn't really matter what speed Intel have to be at to beat it, they beat it with their top end part. And the Opteron is nearing its end (at 90nm), Woodcrest is new, so it will go faster probably, same as 65nm Opterons will go faster.
Woodcrest is not behind Opteron, it is better per watt, and the high end Woodcrest beats the high end Opteron. Enough said. Whether Intel is clock for clock better or not still doesn't matter. They are better, and if they are not better clock for clock, it doesn't seem to matter because, again, they have higher clocks.
Spoonbender - Thursday, July 13, 2006 - link
"What's the fastest Woodcrest CPU that will be released? "Umm.... None?
Does that mean AMD beats Intel by an infinite margin then?
True, if Intel has a 3ghz part out, and AMD only has 2.6, then it makes sense to compare these two.
But for now, let's just keep in mind that Intel doesn't have a 3ghz part out. They don't have a 2.6GHz part either. We are still comparing an unreleased product to one that has been out for a while.
Cooler - Thursday, July 13, 2006 - link
Their on new egg right now...http://www.newegg.com/Product/ProductList.asp?Subm...">http://www.newegg.com/Product/ProductLi...rchInDes...
xtremejack - Thursday, July 13, 2006 - link
You should note the way the two processors are compared here. Both are dual-CPU systems. Intel's FSB based system architecture means lower system bandwidth than AMD's DirectConnect architecture. The Opteron's have an on-die memory controller and a point-to-point interconnect. I am sure if you put Woodcrest on a Paxville system, you would see significantly worse performance. The 3.0Ghz Woodcrest is probably capable of a bit more performance, but the lower bandwidth FSB does not help it reach its full potential. Also coupled with the fact that FB-DIMMS have more latency than standard DDR2 means the Woodcrest isn't at a serious advantage compared to the Opteron system.Bottom-line system performance for the Woodcrest processor is still 5-20% better than Opteron. But thats way better than being 30% lower during Paxville days.
Now Conroe does not have all these complications that Woodcrest has, thats why you may see better performance advantage, also since it is a single-CPU solution, the system architecture is much simpler.
swtethan - Thursday, July 13, 2006 - link
how many people running servers are going to overclock their system? :Dfitten - Thursday, July 13, 2006 - link
Zero. I'd fire any IT person on the spot if I found out they had overclocked a production server.FesterOZ - Thursday, July 13, 2006 - link
I find this article somewhat surprising in tone. My company is a Fortune 500 and a big Dell shop so we have had access to Woodcrest workstations and servers for testing for a while. We have also tested these vs HP 9300 Athlon based Workstations and vs Sun x4100 servers and HP DL385s. Based on our tests which involve business applications, trading applications, etc., the performance of Woodcrest vs the Athlons is slightly better (about 5-10%). Nothing to really rave about, especially when its the latest Intel designs on 65nm. This actually disappointed our in-house Dell groupies, especially since they were comparing the top of the line new CPU design from Intel vs AMD's older platform. As a result we are moving away from Dell simply because they do not offer choice of CPU's at the moment and into HP's world, with our first purchase being 3 full chassis of AMD blade servers.IMHO, its now a two baron world with a missing king, each with strenghts and weaknesses.
Kiijibari - Thursday, July 13, 2006 - link
>IMHO, its now a two baron world with a missing king, each with strenghts and weaknesses.Yes I back that opinion.
Woodcrest is hampered by its FBDs. While it delivers much better bandwidth, it has worse latencies. Furthermore the 4 MB L2 cache & Core2 prefetch does not help that much in a multithreaded server environment, than in the average desktop application area.
What I want to say is, that the performance difference between Conroe/Athlon64 will be bigger than that between Woodcrest/Opteron.
First "tests"( I was told by an administrator of a huge financal institute) also showed a Woodcrest performace lack with gcc compiled 64bit applications. Are some of your applications 64bit, too ? It would be interesting, to get more 64bit statments. For some reason, there are none from Intel so far ...
cheers
Kiijibari
Spoonbender - Thursday, July 13, 2006 - link
Yes, I've been wondering about 64-bit performance too. Intel hasn't mentioned it with a word, but I hope they've made a decent implementation this time around.duploxxx - Thursday, July 13, 2006 - link
no they didn't, still the same as in the Netburst. some small 64bit testing has been done on XS forums seeing a core architecture gaining 17-18% performance on a 64bit os + program like 64bit cinebench. the opty 940 gained 31-38%