Web Performance, Part II: What are you calling average?

For a decade, the holy grail of Web performance has been a low average performance time. Every company wants to have the lowest time, in some kind of chest-thumping, testosterone-pumped battle for supremacy.

Well, I am here to tell you that the numbers you have been using for the last decade have been lying. Well, lying is perhaps to strong a term. Deeply misleading is perhaps the more accurate way to describe the way that an average describes a population of results.
Now before you call your Web performance monitoring and measurement firms and tear a strip off them, let’s look at the facts. The numbers that everyone has been holding up as the gospel truth have been averages, or, more correctly, Arithmetic Means. We all learned these in elementary school: the sum of X values divided by X produces a value that approximates the average value for the entire population of X values.

Where could this go wrong in Web performance?

We wandered off course in a couple of fundamental ways. The first is based on the basic assumption of Arithmetic Mean calculations, that the population of data used is more or less Normally Distributed.

Well folks, Web performance data is not normally distributed. Some people are more stringent than I am, but my running assumption is that in a population of measurements, up to 15% are noise resulting from “stuff happens on the Internet”. This outer edge of noise, or outliers, can have a profound skewing effect on the Arithmetic Mean for that population.

“So what?”, most of you are saying. Here’s the kicker: As a result of this skew, the Arithmetic Mean usually produces a Web performance number that is higher than the real average of performance.

So why do we use it? Simple: Relational databases are really good at producing Arithmetic Means, and lousy at producing other statistical values. Short of writing your own complex function, which on most database systems equates to higher compute times, the only way to produce more accurate statistical measures is to extract the entire population of results and produce the result in external software.
If you are building an enterprise class Web performance measurement reporting interface, and you want to calculate other statistical measures, you better have deep pockets and a lot of spare computing cycles, because these multi-million row calculations will drain resources very quickly.

So, for most people, the Arithmetic Mean is the be all and end all of Web performance metrics. In the next part of this series, I will discuss how you can break free of this madness and produce values that are truer representations of average performance.

Leave a Reply Cancel reply