Server Guide Part 1: Introduction to the Server World
by Johan De Gelas on August 17, 2006 1:45 PM EST- Posted in
- IT Computing
TCO
Originally described by the Gartner group, TCO sounds like something that does not belong on a hardware enthusiast site. It has frequently been abused by managers and financial people who understand very little of IT to delay necessary IT investments, so many view it as a pejorative term.
However it is impossible to make a well thought-out server buying decision without understanding TCO, and many typical server hardware features are based on the idea of lowering TCO. Hardware enthusiasts mostly base their buying decision on TCA or Total Cost of Acquisition. The enthusiast motherboard and chipset business is a typical example of how to ignore TCO. As the products are refreshed every 6 months, many of the new features don't work properly, and you find yourself flashing the BIOS, installing new drivers and tweaking configurations before you hopefully get that RAID, Firewall or sound chip to work properly. Luckily you don't have to pay yourself for all the hours you spend....
TCO is a financial estimate of the total cost of buying and using a server. Think of it as the cost that it takes to buy, deploy, support and adapt a certain server during it's lifecycle. So when evaluating servers you should look at the following costs:
There are two big problems with the "hardware choice does not matter much" kind of reasoning. The first is that the TCA is still a big part of the total TCO. For example this study[1] estimates that the price of buying the server is still about 40-50% of the TCO, while maintenance comprises a bit more than 10% and operation costs take about 40% of TCO pie. Thus we can't help but be wary when a vendor claims that a high price is okay, because the maintenance on his product is so much lower than the competition.
Secondly, certain hardware choices have an enormous impact on the rest of the TCO picture. One example is hot-spare and hot-swappable RAID arrays which on average significantly reduce the time that a server is unreachable. This will also become clearer as we dig deeper into the different hardware features of modern servers and the choices you will have to make.
RAS features
Studies done by IBM say that about 50% of the hardware failures are related to hard disk problems and 25% are due to a power supply failure. Fans with 8% are a distant third, so it is clear you need power supplies and hard disks of high reliability, the R of RAS. You also want to increase availability, the A of RAS, by using some redundancy for the most vulnerable parts of your server. RAID, redundant power supplies and fans are a must for a critical server. The S in RAS stands for Serviceability, which relates to hot-swappable/pluggable drives and other areas. Do you need to shut down the server to perform maintenance; what items can be replaced/repaired while keeping the system running? All three items are intertwined, and higher-end (and more expensive) servers will have features designed to improve all three areas.
Originally described by the Gartner group, TCO sounds like something that does not belong on a hardware enthusiast site. It has frequently been abused by managers and financial people who understand very little of IT to delay necessary IT investments, so many view it as a pejorative term.
However it is impossible to make a well thought-out server buying decision without understanding TCO, and many typical server hardware features are based on the idea of lowering TCO. Hardware enthusiasts mostly base their buying decision on TCA or Total Cost of Acquisition. The enthusiast motherboard and chipset business is a typical example of how to ignore TCO. As the products are refreshed every 6 months, many of the new features don't work properly, and you find yourself flashing the BIOS, installing new drivers and tweaking configurations before you hopefully get that RAID, Firewall or sound chip to work properly. Luckily you don't have to pay yourself for all the hours you spend....
TCO is a financial estimate of the total cost of buying and using a server. Think of it as the cost that it takes to buy, deploy, support and adapt a certain server during it's lifecycle. So when evaluating servers you should look at the following costs:
- The total cost of buying the server
- The time you will spend installing it in your network
- The time you will spend on configuring the software and remote management
- Facility management: the space it takes in your datacenter and the electricity it consumes
- The hours you spend on troubleshooting, reconfiguring, securing and repairing the server
- The costs associated with users waiting for the system to respond
- The costs associated with outages and failures, with users not being able to reach your server
- The upgrade costs and the time you spend on upgrading your server to meet new demands
- Cost of security breaches, etc.
There are two big problems with the "hardware choice does not matter much" kind of reasoning. The first is that the TCA is still a big part of the total TCO. For example this study[1] estimates that the price of buying the server is still about 40-50% of the TCO, while maintenance comprises a bit more than 10% and operation costs take about 40% of TCO pie. Thus we can't help but be wary when a vendor claims that a high price is okay, because the maintenance on his product is so much lower than the competition.
Secondly, certain hardware choices have an enormous impact on the rest of the TCO picture. One example is hot-spare and hot-swappable RAID arrays which on average significantly reduce the time that a server is unreachable. This will also become clearer as we dig deeper into the different hardware features of modern servers and the choices you will have to make.
RAS features
Studies done by IBM say that about 50% of the hardware failures are related to hard disk problems and 25% are due to a power supply failure. Fans with 8% are a distant third, so it is clear you need power supplies and hard disks of high reliability, the R of RAS. You also want to increase availability, the A of RAS, by using some redundancy for the most vulnerable parts of your server. RAID, redundant power supplies and fans are a must for a critical server. The S in RAS stands for Serviceability, which relates to hot-swappable/pluggable drives and other areas. Do you need to shut down the server to perform maintenance; what items can be replaced/repaired while keeping the system running? All three items are intertwined, and higher-end (and more expensive) servers will have features designed to improve all three areas.
32 Comments
View All Comments
schmidtl - Thursday, August 17, 2006 - link
Looks good. Little history of progression on the S of RAS: disk drives were the first, and the industry sees a large proliferation of RAID configurations with hot swappable drives without any system performance degradation. High end servers have redundant/hot swappable power supplies (Dell brought that en masse to Intel servers). Recently, even CPUs have become hot swappable, something that's been around for a few years on IBM's zSeries mainframes and now pSeries servers (Power5+).stevenestes - Tuesday, March 17, 2015 - link
I posted a video talking about server basics and an in depth intro to servers, check it out if you'd like https://www.youtube.com/watch?v=v4x6ce66dug