Skip to content

Generic performance metrics mean less and less in servers

Computing has long promoted performance metrics as a means of visualising performance of a system. Back in the day it used to be MIPS (millions of instructions per second) then for a while it was the processor’s clock frequency. AMD was the first major chip maker to move away from frequency by offering “performance ratings” on its processors, something that Intel adopted later on. Frequency and performance ratings are useful tools in the consumer market where consumers generally have to trust the numbers for a relative performance of a product within a chip vendor’s product line. Of course the performance rating between AMD and Intel (and for that matter any other chip vendor) does not transfer.

However in servers it is a different story. About a decade ago, if my memory serves me correctly, AMD again tried to shift the metric away from frequency and started talking about performance-per-watt. This mattered a lot for those folks buying racks of servers back then (I was one of those people), because by 2005 datacenter energy prices had started to go up considerably and it wasn’t a matter of how much processing power you had but whether there was enough Amps being supplied to your rack to fill it with kit. I remember seeing many racks that were left at half-full because the companies hit their rack’s power budget and the datacenter couldn’t cool any more servers, even if the company renting the rack would pay for extra power.

These days companies such as Google, Microsoft, Amazon, Rackspace, Facebook and numerous other large cloud providers are the ones that put in the large orders for servers and really drive the server industry due to their collective buying power. Aside from buying power, some of these companies build their own datacenters or at the very least have significant input into how the infrastructure is presented to them, whether it be in the shape of a rack or the source of the electricity.

The point here is that these customers do enough deals with various stakeholders – local governments, utility companies, etcetera – so that their bottom line is affected in different ways with each input having a different weighting. For example, a $1 rise power may hit one datacenter operator more than it hits another. Then there’s the definition of performance.

Performance ratings are generated using a wide range of benchmarks, which is how it should be. Except a particular user may want to run just one workload on a particular server for its lifetime, say a relational database, and couldn’t care less how the chip performs transcoding video. Again this relates back to the point made in the previous paragraph regarding power, it’s all about personalising the metric for the particular user.

While metrics such as performance-per-watt are a very good way of glancing at the power efficiency of a chip, you can bet your bottom Dollar that high volume server purchases are not based on that figure. Rather it’s based on months of testing on very specific workloads and how its performance in terms of computation, power, space, maintenance and even the chip vendor’s roadmap fits into its bottom line calculation.

Ultimately it’s all about personalising performance, which is why you are starting to see a range of servers. Two socket servers with “big core” chips will continue to sell by the bucket load but it will be joined by so-called small core-based servers that will provide this personalised performance that some companies are looking for. The question then becomes, how do you mix these big core and little core servers at the datacenter level. The answer to that is scalable fabrics, something I will touch on soon.