Category: Culture of Performance

Web Performance: Nice Display. Now Show Me the Data.

Today’s Web interfaces are all about the Flash (literally). Smooth charting, cool effects, callouts to references — ways to try and simplify complex data collections.
Problem-solving and diagnosis requires a far deeper dive than the flashiest interface could ever provide, because it comes down to the numbers. The actual measurements that make up the flashy chart. If you look at the data used by a professional trader and a someone at home looking at stock charts, there is a substantial difference.

When you get down to that level of analysis, the interface becomes irrelevant. Any analyst worth her or his salary (or salt – same thing) can tell you more from a spreadsheet full of relevant numbers than they can from any pretty graphic. This is true in any field.

When do traders or Web performance analysts use pretty charts? When they have to explain complex issues to non-technical or non-specialist audiences. When these analysts work on solving the sticky problems faced in the everyday world, they always fall back on the numbers.

Web performance data consists of the same few components, regardless of which company is providing the data. In effect, beyond a few key pieces of information about how the measurement data is captured, all Web performance data is the same.

Just because the components that make up the data are the same does not guarantee that the data from two different providers is of the same quality. In an imaginary system, Web performance data from all the major providers could flow into a centralized repository and be transformed using an XSLT or some other mangler so that it would be indistinguishable in most cases to tell which firm was the source.

But a skilled analyst would quickly learn to recognize the data that can be trusted. That would be the data that quickly and accurately represented the issues he was trying to diagnose. The data that flowed with the known patterns of the Web site.

The data that helped him do his job more effectively.

In the end, a pretty interface can go a long way to hide the quality of the data that is being represented. A shiny gloss on poor data does not make it better data. It is critical that the data that underlies that pretty chart is able to live up to the quality demands of the people who use it every day.

Selling the interface is selling the brand. Trust in the data builds the reputation.
Which one sold you when you chose your Web performance measurement provider?

The Dog and The Toolbox: Using Web Performance Services Effectively

The Dog and The Toolbox

One day, a dog stumbled upon a toolbox left on the floor. There was a note on it, left by his master, which he couldn’t read. He was only a dog, after all.

He sniffed it. It wasn’t food. It wasn’t a new chew toy. So, being a good dog, he walked off and lay on his mat, and had a nap.

When the master returned home that night, the dog was happy and excited to see him. He greeted his master with joy, and brought along his favorite toy to play with.
He was greeted with yelling and anger and “bad dog”. He was confused. What had he done to displease his master? Why did the master keep yelling at him, and pointing at the toolbox. He had been good and left it alone. He knew that it wasn’t his.

With his limited understanding of human language, he heard the words “fix”, “dishwasher”, and “bad dog”. He knew that the dishwasher was the yummy cupboard that all of the dinner plates went in to, and came out less yummy and smelling funny.

He also knew that the cupboard had made a very loud sound that had scared the dog two nights ago, and then had spilled yucky water on the floor. He had barked to wake his master, who came down, yelling at the dog, then yelling at the machine.
But what did fix mean? And why was the master pointing at the toolbox?

The Toolbox and Web Performance

It is far too often that I encounter companies that have purchased Web performance service that they believe will fix their problems. They then pass the day-to-day management of this information on to a team that is already overwhelmed with data.

What is this team supposed to do with this data? What does it mean? Who is going to use it? Does it make my life easier?

When it comes time to renew the Web performance services, the company feels gipped. And they end up yelling at the service company who sold them this useless thing, or their own internal staff for not using this tool.

To an overwhelmed IT team, Web performance tools are another toolbox on the floor. They know it’s there. It’s interesting. It might be useful. But it makes no sense to them, and is not part of what they do.

Giving your dog the toolbox does not fix your dishwasher. Giving an IT team yet another tool does not improve the performance of a Web site.

Only in the hands of a skilled and trained team does the Web performance of a site improve, or the dishwasher get fixed. As I have said before, a tool is just a tool. The question that all organizations must face is what they want from their Web performance services.

Has your organization set a Web performance goal? How do you plan to achieve your goals? How will you measure success? Does everyone understand what the goal is?

After you know the answers to those questions, you will know that that as amazing as he is, your dog will not ever be able to fix your dishwasher.

But now you know who can.

Managing Web Performance: A Hammer is a Hammer

Give almost any human being a hammer, and they will know what to do with it. Modern city dwellers, ancient jungle tribes, and most primates would all look at a hammer and understand instinctively what it does. They would know it is a tool to hit other things with. They may not grasp some of the subtleties, such as that is designed to drive nails into other things and not beat other creatures into submission, but they would know that this is a tool that is a step up from the rock or the tree branch.

Simple tools produce simple results. This is the foundation of a substantial portion of the Software-as-a-Service (SaaS) model. SaaS is a model which allows companies to provide a simple tool in a simple way to lower the cost of the service to everyone.
Web performance data is not simple. Gathering the appropriate data can be as complex as the Web site being measured. The design and infrastructure that supports a SaaS site is usually far more complex than the service it presents to the customer. A service that measures the complexity of your site will likely not provide data that is easy to digest and turn into useful information.

As any organization who has purchased a Web performance measurement service, a monitoring tool, a corporate dashboard expecting instant solutions will tell you, there are no easy solutions. These tools are the hammer and just having a hammer does not mean you can build a house, or craft fine furniture.

In my experience, there are very few organizations that can craft a deep understanding of their own Web performance from the tools they have at their fingertips. And the Web performance data they collect about their own site is about as useful to them as a hammer is to a snake.

Web Performance: Blogs, Third Party Apps, and Your Personal Brand

The idea that blogs generate a personal brand is as old as the “blogosphere”. It’s one of those topics that rages through the blog world every few months. Inexorably the discussion winds its way to the idea that a blog is linked exclusively to the creators of its content. This makes a blog, no matter what side of the discussion you fall on, the online representation of a personal brand that is as strong as a brand generated by an online business.

And just as corporate brands are affected by the performance of their Web sites, a personal brand can suffer just as much when something causes the performance of a blog Web site to degrade in the eyes of the visitors. For me, although my personal brand is not a large one, this happened yesterday when Disqus upgraded to multiple databases during the middle of the day, causing my site to slow to a crawl.

I will restrain my comments on mid-day maintenance for another time.

The focus of this post is the effect that site performance has on personal branding. In my case, the fact that my blog site slowed to a near standstill in the middle of the day likely left visitors with the impression that my blog about Web performance was not practicing what it preached.

For any personal brand, this is not a good thing.
In my case, I was able to draw on my experience to quickly identify and resolve the issue. Performance returned to normal when I temporarily disabled the Disqus plugin (it has since been reactivated). However, if I hadn’t been paying attention, this performance degradation could have continued, increasing the negative effect on my personal brand.

Like many blogs, Disqus is only one of the outside services I have embedded in my site design. Sites today rely on AdSense, Lookery, Google Analytics, Statcounter, Omniture, Lijit, and on goes the list. These services have become as omnipresent in blogs as the content. What needs to be remembered is that these add-ons are often overlooked as performance inhibitors.

Many of these services are built using the new models of the over-hyped and mis-understood Web 2.0. These services start small, and, as Shel Israel discussed yesterday, need to focus on scalability in order to grow and be seen as successful, rather than cool, but a bit flaky. As a result, these blog-centric services may affect performance to a far greater extent than the third-party apps used by well-established, commercial Web sites.

I am not claiming that any one of these services in and of themselves causes any form of slowdown. Each has its own challenges with scaling, capacity, and success. It is the sheer number of the services that are used by blog designers and authors poses the greatest potential problem when attempting to debug performance slowdowns or outages. The question in these instances, in the heat of a particularly stressful moment in time, is always: Is it my site or the third-party?

The advice I give is that spoken by Michael Dell: You can’t manage what you can’t measure. Yesterday, I initiated monitoring of my personal Disqus community page, so I could understand how this service affected my continuing Web performance. I suggest that you do the same, but not just of this third-party. You need to understand how all of the third-party apps you use affect how your personal brand performance is perceived.

Why is this important? In the mind of the visitor, the performance problem is always with your site. As with a corporate site that sees a sudden rise in response times or decrease in availability, it does not matter to the visitor what the underlying cause of the issue is. All they see is that your site, your brand (personal or corporate), is not as strong or reliable as they had been led to believe.

The lesson that I learned yesterday, one that I have taught to so many companies but not heeded myself, is that monitoring the performance of all aspects of your site is critical. And while you as the blog designer or writer might not directly control the third-party content you embed in your site, you must consider how it affects your personal brand when something goes wrong.

You can then make an informed decision on whether the benefit of any one third-party app is outweighed by the negative effect it has on your site performance and, by extension, your personal brand.

Web Performance, Part IX: Curse of the Single Metric

While this post is aimed at Web performance, the curse of the single metric affects our everyday lives in ways that we have become oblivious to.

When you listen to a business report, the stock market indices are an aggregated metric used to represent the performance of a set group of stocks.

When you read about economic indicators, these values are the aggregated representations of complex populations of data, collected from around the country, or the world.

Sport scores are the final tally of an event, but they may not always represent how well each team performed during the match.

The problem with single metrics lies in their simplicity. When a single metric is created, it usually attempts to factor in all of the possible and relevant data to produce an aggregated value that can represent a whole population of results.
These single metrics are then portrayed as a complete representation of this complex calculation. The presentation of this single metric is usually done in such a way that their compelling simplicity is accepted as the truth, rather than as a representation of a truth.

In the area of Web performance, organizations have fallen prey to this need for the compelling single metric. The need to represent a very complex process in terms that can be quickly absorbed and understand by as large a group of people as possible.

The single metrics most commonly found in the Web performance management field are performance (end-to-end response time of the tested business process) and availability (success rate of the tested business process). These numbers are then merged and transformed by data from a number of sources (external measurements, hit counts, conversions, internal server metrics, packet loss), and this information is bubbled up in an organization. By the time senior management and decision-makers receive the Web performance results, that are likely several steps removed from the raw measurement data.

An executive will tell you that information is a blessing, but only when it speeds, rather than hinders, the decision-making process. A Web performance consultant (such as myself) will tell that basing your decisions on a single metric that has been created out of a complex population of data is madness.

So, where does the middle-ground lie between the data wonks and the senior leaders? The rest of this post is dedicated to introducing a few of the metrics that will, in a small subset of metrics, give a senior leaders better information to work from when deciding what to do next.

A great place to start this process is to examine the percentile distribution of measurement results. Percentiles are known to anyone who has children. After a visit to the pediatrician, someone will likely state that “My son/daughter is in the XXth percentile of his/her age group for height/weight/tantrums/etc”. This means that XX% of the population of children that age, as recorded by pediatricians, report values at or below the same value for this same metric.

Percentiles are great for a population of results like Web performance measurement data. Using only a small set of values, anyone can quickly see how many visitors to a site could be experiencing poor performance.

If at the median (50th percentile), the measured business process is 3.0 seconds, this means that 50% of all of the measurements looked at are being completed in 3.0 seconds or less.

If the executive then looks up to the 90th percentile and sees that it’s at 16.0 seconds, it can be quickly determined that something very bad has happened to affect the response times collected for the 40% of the population between these two points. Immediately, everyone knows that for some reason, an unacceptable number of visitors are likely experiencing degraded and unpredictable performance when they visit the site.

A suggestion for enhancing averages with percentiles is to use the 90th percentile value as a trim ceiling for the average. Then side-by-side comparisons of the untrimmed and trimmed averages can be compared. For sites with a larger number of response time outliers, the average will decrease dramatically when it is trimmed, while sites with more consistent measurement results will find their average response time is similar with and without the trimmed data.

It is also critical to examine the application’s response times and success rates throughout defined business cycles. A single response time or success rate value eliminates

  • variations by time of day
  • variations by day of week
  • variations by month
  • variations caused by advertising and marketing

An average is just an average. If at peak buiness hours, response times are 5.0 seconds slower than the average, then the average is meaningless, as business is being lost to poor performance which has been lost in the focus on the single metric.

All of these items have also fallen prey to their own curse of the single metric. All of the items discussed above aggregate the response time of the business process into a single metric. The process of purchasing items online is broken down into discrete steps, and different parts of this process likely take longer than others. And one step beyond the discrete steps are the objects and data that appear to the customer during these steps.

It is critical to isolate the performance for each step of the process to find the bottlenecks to performance. Then the components in those steps that cause the greatest response time or success rate degradation must be identified and targeted for performance improvement initiatives. If there are one or two poorly performing steps in a business process, focusing performance improvement efforts on these is critical, otherwise precious resources are being wasted in trying to fix parts of the application that are working well.

In summary, a single metric provides a sense of false confidence, the sense that the application can be counted on to deliver response times and success rates that are nearly the same as those simple, single metrics.

The average provides a middle ground, a line that says that is the approximate mid-point of the measurement population. There are measurements above and below this average, and you have to plan around the peaks and valleys, not the open plains. It is critical never to fall victim to the attractive charms that come with the curse of the single metric.

Web Performance Concepts Series – Revisited

Two years ago I created a series of five blog articles, aimed at both business and technical readers, with the goal of explaining the basic statistical concepts and methods I use when analyzing Web performance data in my role as a Web performance consultant.

Most of these ideas were core to my thinking when I developed GrabPERF in 2005-2006, as I determined that it was vital that people not only receive Web performance measurement data for their site, but they receive it in a way that allows them to inform and shape the business and technical decisions they make on a daily basis.

While I come from a strong technical background, it is critical to be able to present the data that I work with in a manner that can be useful to all components of an organization, from the IT and technology leaders who shape the infrastructure and design of a site, to the marketing and business leaders who set out the goals for the organization and interact with customers, vendors and investors.

Providing data that helps negotiate the almost religious dichotomy that divides most organizations is crucial to providing a comprehensive Web performance solution to any organization.

These articles form the core of an ongoing series of discussion focused on the the pitfalls of Web performance analysis, and how to learn and avoid the errors others have already discovered.

The series went over like a lead balloon and this left me puzzled. While the basic information in the articles was technical and focused on the role that simple statistics play in affecting Web performance technology and business decisions inside an organization, they formed the core of what I saw as an ongoing discussion that organizations need to have to ensure that an organization moves in a single direction, with a single purpose.

I have decided reintroduce this series, dredging it from the forgotten archives of this blog, to remind business and IT teams of the importance of the Web performance data they use every day. It also serves as a guide to interpreting the numbers that arise from all the measurement methodologies that companies use, a map to extract the most critical information in the raging sea of data.

The five articles are:

  1. Web Performance, Part I: Fundamentals
  2. Web Performance, Part II: What are you calling average?
  3. Web Performance, Part III: Moving Beyond Average
  4. Web Performance, Part IV: Finding The Frequency
  5. Web Performance, Part V: Baseline Your Data

I look forward to your comments and questions on these topics.

Web Performance: Optimizing Page Load Time

Aaron Hopkins posted an article detailing all of the Web performance goodness that I have been advocating for a number of years.

To summarize:

  • Use server-side compression
  • Set your static objects to be cacheable in browser and proxy caches
  • Use keep-alives / persistent connections
  • Turn your browsers’ HTTP pipelining feature on

These ideas are not new, and neither are the finding in his study. As someone who has worked in the Web performance field for nearly a decade, these are old-hat. However, it’s always nice to have someone new inject some life back into the discussion.

Web Performance, Part IV: Finding The Frequency

In the last article, I discussed the aggregated statistics used most frequently to describe a population of performance data.
stats-articles
The pros and cons of each of these aggregated values has been examined, but now we come to the largest single flaw: these values attempt to assign a single value to describe an entire population of numbers.

The only way to describe a population of numbers is to do one of two things: Display every single datapoint in the population against the time it occurred, producing a scatter plot; or display the population as a statistical distribution.

The most common type of statistical distribution used in Web performance data is the Frequency Distribution. This type of display breaks the population down into measurements of a certain value range, then graphs the results by comparing the number of results in each value container.

So, taking the same population data used in the aggregated data above, the frequency distribution looks like this.
stats-articles-frequency
This gives a deeper insight into the whole population, by displaying the whole range of measurements, including the heavy tail that occurs in many Web performance result sets. Please note that a statistical heavy tail is essentially the same as Chris Anderson’s long tail, but in statistical analysis, a heavy tail represents a non-normally distributed data set, and skews the aggregated values you try and produce from the population.

As was noted in the aggregated values, the ‘average’ performance like falls between 0.88 and 1.04 seconds. Now, when you take these values and compare them to the frequency distribution, these values make sense, as the largest concentration of measurement values falls into this range.

However, the 85th Percentile for this population is at 1.20 seconds, where there is a large secondary bulge in the frequency distribution. After that, there are measurements that trickle out into the 40-second range.

As can be seen, a single aggregated number cannot represent all of the characteristics in a population of measurements. They are good representations, but that’s all they are.

So, to wrap up this flurry of a visit through the world of statistical analysis and Web performance data, always remember the old adage: Lies, Damn Lies, and Statistics.
In the next article, I will discuss the concept of performance baselining, and how this is the basis for Web performance evolution.

Web Performance, Part II: What are you calling average?

For a decade, the holy grail of Web performance has been a low average performance time. Every company wants to have the lowest time, in some kind of chest-thumping, testosterone-pumped battle for supremacy.

Well, I am here to tell you that the numbers you have been using for the last decade have been lying. Well, lying is perhaps to strong a term. Deeply misleading is perhaps the more accurate way to describe the way that an average describes a population of results.
Now before you call your Web performance monitoring and measurement firms and tear a strip off them, let’s look at the facts. The numbers that everyone has been holding up as the gospel truth have been averages, or, more correctly, Arithmetic Means. We all learned these in elementary school: the sum of X values divided by X produces a value that approximates the average value for the entire population of X values.

Where could this go wrong in Web performance?

We wandered off course in a couple of fundamental ways. The first is based on the basic assumption of Arithmetic Mean calculations, that the population of data used is more or less Normally Distributed.

Well folks, Web performance data is not normally distributed. Some people are more stringent than I am, but my running assumption is that in a population of measurements, up to 15% are noise resulting from “stuff happens on the Internet”. This outer edge of noise, or outliers, can have a profound skewing effect on the Arithmetic Mean for that population.

“So what?”, most of you are saying. Here’s the kicker: As a result of this skew, the Arithmetic Mean usually produces a Web performance number that is higher than the real average of performance.

So why do we use it? Simple: Relational databases are really good at producing Arithmetic Means, and lousy at producing other statistical values. Short of writing your own complex function, which on most database systems equates to higher compute times, the only way to produce more accurate statistical measures is to extract the entire population of results and produce the result in external software.
If you are building an enterprise class Web performance measurement reporting interface, and you want to calculate other statistical measures, you better have deep pockets and a lot of spare computing cycles, because these multi-million row calculations will drain resources very quickly.

So, for most people, the Arithmetic Mean is the be all and end all of Web performance metrics. In the next part of this series, I will discuss how you can break free of this madness and produce values that are truer representations of average performance.

Copyright © 2024 Performance Zen

Theme by Anders NorenUp ↑