Effective Web Performance: An Introduction and A Manifesto

Every so often, you wake up and realize that the world has changed around you. Or, to say it better, your view of the world has changed so profoundly, but also so subtly and slowly that it is imperceptible unless you take the time to look back at where you came from.
Six years ago, if you had asked me what the most important problems in Web performance were, I would have reeled off a list that was focused on technology and configuration: HTTP compression, HTTP persistent connections, caching, etc. In fact, six years on, these are still the concepts that dominate Web performance conversations.
Slowly, glacially, shaped by six years of working with customers and clients, listening to the Web performance conversations that flow across the Web and within companies, I realize that technology is only one component of the Web performance solution.

Web Performance is NOT Just Technology

Most organizations focus too much of their efforts on solving the technical problems because they are discrete, easy to track, and produce quantifiable results.
Fair enough.
But a highly tuned engine with a rusted chassis, four flat wheels, and a voided warranty still has a Web performance problem, even if it is technically sound.
Web Performance VennThe complexity of the issue arises from the terminology used. Web performance, in current parlance, refers almost completely to the delivery of the site in an appropriate and measurable manner.
Web performance is not simply the generation and delivery of HTML and other objects. Web performance is conversation that defines the basic nature of any Web site.
Approaching Web performance, as I had for so many years, as a technical problem with a discrete solution overlooks the true nature of Web performance. A culture of effective Web performance absorbs a number of different inputs, and then ensures that the site performs across many different vectors, not just the two-dimensional response/success over time graph.

Web Performance is Culture and Communication

Web performance is an issue of culture. And at the root of all cultures lies communication.
The Web performance conversation has three components, each one shaping the potential response to the problem and providing elements of the solution.

1. Technical Capabilities

Technical organizations spend a great deal of their time defining what they can’t do. In an organization that has a culture of effective Web performance, the technical teams provide clear definitions of the current capabilities, and clearly demonstrate how far they can take the organization down the chosen path, hopefully without spending all of the company’s treasure.

2. Business Objectives

Just as the technical organization has to define what they can do with what they have, the business organization has to come to the table with a clear definition of what they want to achieve. If a business goal is clearly stated to the technical team, then a conversation about where there may be challenges and opportunities can occur. When business and IT talk and listen, a company is becoming far more effective at delivering the best site they can.

3. Customer Expectations

Neglected, forgotten, nay, even ignored, the role of the customers’ expectations in the Web performance equation is just as critical as the other two participants. With clear business objectives and defined technical capabilities, a site can still be seen as a Web performance failure if the expectations of the customer are not met. And it is not simply listening to customer and providing everything they want. It’s understanding why they need a feature/function/option in order to be more successful at what they do, and balancing that against the other two players in the conversation.

Now What?

But where does an organization that wants to take Web performance beyond the technical problem, and into the realm of the strategic solution go?
Do a search on any search engine and you will find page upon page of technical solutions to a supposedly technical problem. Web performance is not solely a technical problem. In many cases, the site is configured and tweaked and tuned and accelerated to such a degree that you have to wonder if is under-performing out of spite more than any other reason.
Scratch the surface. Look beyond the shiny toys and massively-scaled infrastructure and you will find that technology is not the issue. The demand placed on the site by the business are bogging the site down in ways that no amount of tuning could improve.
Perhaps the business goals of the site, the need to support the business, have pushed the technology to its breaking point or beyond, but the technology team cannot clearly articulate what the problem or solution is.
Maybe customers, used to competitors delivering one level of Web performance and experience are simply not happy with the site, no matter how tuned it is and how clearly the call to action may be.
Making a Web site perform effectively means stepping back and asking some key questions:

  • Why do we have a site?
  • How does this site help our business?
  • Why do our customers use our site?
  • Do we like using our site?
  • What are our competitors doing?
  • What are the best Web companies doing?

These seem like silly questions. But you may be surprised by the differing answers you get.
And from there, the conversation can start.

Takeaways

Simply put, Web performance is not about understanding how to make your site faster. Web performance is about understanding what you can do to make your site betterAn effective Web site is one that is shaped by a culture of effective Web performance.
Striving to make a better, more effective Web site may lead to such profound cultural and organizational changes that the process ends up making a better company. A company where the Web site is seen as an active conversation shared with employees, shareholders, investors, and customers.
A conversation where you explain what can be done, why you are doing it, and how you will do it. A conversation where you listen to what must be done, how it is expected to work, and what the customer defines as success.
So when you wake up six years from now, and realize that the day you stopped treating your Web site as a technical problem that needed to be fixed, and started seeing it as an opportunity to create a more effective business, I hope you smile.

Effective Web Performance: Third-Party Providers (Or Why Herding Cats is Fun)

It’s a rare Web site these days that hosts all of its own content. From the smallest blog to the largest retailer, Web sites farm out their images, streams, and pages to CDNs, and absorb feeds, ads, and data streams from any number of outside providers.
Effective Web performance demands that a site take responsibility for the entire site, not just the parts under direct local management. Why? Because customers see a problem with your site, not with a provider.
How can the performance all of the third-party content on a site be managed? Using the exactly the same strategies already place to manage the performance of local content.

Measure from the outside-in

Customers come from the Internet. That measuring the performance of a site from the perspective of visitors is being mentioned here should not be a surprise. Critical to this part of managing third-parties is the ability to see into the page and determine if there are performance issues requesting and transmitting data from third-parties.
In the first article of this series, I detailed a number of approaches to actively gathering performance data. This method, whether from the datacenter or the last mile, will provide the early warning signs that there is an issue with a third-party, and feeding this data into the performance issue management plan.

Measure from inside the browser

The network and application performance of a third-party page component is just the start of the process, as this is what it takes to get the object to the browser. But what if this object then launches a number of actions, or starts to render on the screen. This may lead to a whole different range of issues that are a blind spot when analyzing Web performance.
Measuring the performance of discrete page elements from within the visitors browser will provide deeper insight into what effects the customer sees and which third-parties will need to be approached in order to improve the overall Web performance of the site.

Have clear and useful SLOs and SLAs

Service level objectives and service level agreements are often thrown about whenever there is the suspicion that there is a Web performance issue. Using these documents and frameworks as a club to beat up partners with is counter-productive.
SLOs and SLAs should clearly detail:

  • the performance expectations of the Web site owner
  • the performance and delivery capabilities of the third-party provider

Guess what? Arriving at this in a way that doesn’t lead to resentment and mistrust on both sides requires open and honest discussion.

Share data

If Web site owners and third-parties are going to work together to ensure the most effective Web performance strategy possible, then data must flow freely. Vendors will need access to the same data that Web site owners have (and vice versa) in order to ensure that if an issue is detected, everyone can examine all of the available data, and solve the problem quickly.

Communication

A recurring and critical theme when establishing a culture of effective Web performance is communication. When working with third-parties, this is even more critical, as the performance culture of one organization may be completely different from another. The Web site owner may have one site of criteria that determines a Web performance issue, while the vendor has another, and unless these are understood, problems will occur.
Clear communication paths must be baked into the SLA. Named contacts or contact paths will be there, as will expected response times for inbound requests, and escalation procedures.
When there is a performance issue, both sides will need to be very clear about how each other will respond.

Takeaways

Third-party content on Web sites is a fact. It shouldn’t be a headache. Effective Web performance measurement strategies, shared sources of Web performance data, and clearly understood paths and methods of communication will make using third-party content less stress-inducing to everyone.

Effective Web Performance: The Culture of Performance

A quote from Avinash Kaushik (Occam’s Razor and @avinashkaushik) to start this post.

I have a 10/90 rule . If your budget is $100 then spend $10 on tools and professional services to implement them, and spend $90 on hiring people to analyze data you collect on your website.

The web is quite complex, you are going to access multiple sources of data, you are going to have to do a lot of leg work. Blood, sweat and tears. You don’t just need tools for that (remember 85% of the data you get from any tool, free or paid is essentially the same). You need people!

Hire the best people you can find, tools will never be a limitation for them.

from This I Believe [A Manifesto for Web Marketers & Analysts]

Staring at this as I sipped my coffee stopped me dead.

Beside me I have two full pages of notes on what makes up the Web performance culture of company, and here is one of the most succinct points summed up for me in two short paragraphs.

Web performance is not just about tools and methodologies. Effective Web performance requires dedicated and trained human resources. And those people need to be able to work in a culture that values and understands the importance of Web performance to the business. Without a culture of Web performance, any tool, technology, and methodology purchased to make things better is useless.

In a previous post I touched on the question of whether an organization sees Web performance as a technology or business issue. Answering this question is key to understanding a company’s perspective on Web performance issues.

Start by asking Who is responsible for Web performance? at a company. Is there a cross-functional team that meets regularly to discuss current performance, long-term trends, the competitive landscape, effects on customer experience, and how performance concerns are shaping and guiding upcoming development efforts?

Or is Web performance a set of anonymous charts and tables that have no context ,originating from the inscrutable measurement system, bundled up into an executive report by an unnamed staff member for a once a month meeting?

Most companies understand Web performance is crucial. They understand it affects the bottom line and customer experience. They understand all of the ideas and concepts of Web performance. But like the proverbial horse and water, they don’t drink from the stream in front of them. They don’t drink because they are too busy watching for cougars, wolverines, and poachers. They have too much going on to make Web performance a priority.

Part of developing a strong culture of Web performance is creating a business culture that is customer-centric. When a company turns their perspective around and makes delighting the customer a part of everything they do, the customer experience on the Web becomes a critical component of the culture.

The key to making Web performance a part of a customer-centric culture is to shift Web performance discussions from the abstract (full of numbers and charts representing the potential of Web performance to affect customers) to the real (effect of Web performance on towns and cities and people and the bottom line). Attaching a name, a place, or a value to every number on a Web performance chart makes it easier for people in an organization to absorb the effect it has on them as an employee.

Moving the discussion about Web performance from the testing lab and NOC to the breakroom and the hallway takes a greater effort. It starts by making Web performance data available to all, not just those who are tasked with monitoring it.

A culture of Web performance means that the $90 you spent on people is supplemented by a team of avid amateurs who notice changes and trends that may slip through the cracks. These amateurs are encouraged to participate in Web performance discussions, where the experts are encouraged to listen then contribute.

Why listen to avid amateurs? In many cases, they are the people who work directly with customers and use the products on a daily basis. Their feedback comes from real experience, set alongside abstract values. Once a measurement has a story, it makes it easier to understand the problem.

An example of the success of amateurs is Wikipedia. A population of amateur contributors, as well as a core of experts in certain fields, have ensured that this is a useful resource. A Web performance culture full of avid amateurs allows comments and stories to flow from the customer-centric parts of an organization into the technology and business parts of the organization. These stories and inputs make the Web performance more real, and make a chart in a report more important.

A culture of Web performance is one that is adopted by an entire company. It is a way of examining the reality of a site in a way that is customer-centric and customer-driven. A strong Web performance culture absorbs information from many sources, and filters the data through a customer filter, and makes every measurement count.

Web Performance: How long can you ignore the money?

Web performance is everywhere. People intuitively understand that when a site is slow, something’s wrong. Web performance breeds anecdotal tales of lost carts, broken catalogs, and searches gone wrong. Web performance can get you name in lights, but not in the way you or your company would like.
It’s a mistake to consider Web performance a technology problem. Web performance is really a business problem that has a technological solution.
Business problems have solutions that any mid-level executive can understand. A site that can’t handle the amount of traffic coming in requires tuning and optimization, not the firing of the current VP of Operations and a new marketing strategy.
Can you imagine the fate of the junior executive who suggested that a new marketing strategy was the solution to brick-and-mortar stores that are too small and crowded to handle the number of prospective customers (or former prospective customers) coming in the door?
Every Web performance event costs a company money, in the present and in the future. So when someone presents your company with the reality of your current Web performance, what is your response?
Some simple ideas for living with the reality that Web performance hurts business.

  1. Be able to explain the issue to everyone in the company and to customers who ask. Gory details and technical mumbo-jumbo make people feel like there is something being hidden from them. Tell the truth, but make it clear what happened.
  2. Do not blame anyone in public. A great way to look bad to everyone is to say that someone else caused the problem. Guess what? All that the people who visited your site during the problem will remember is that your site had the problem. Save frank discussions for behind closed doors.
  3. Be able to explain to the company what the business cost was. While everyone is pointing fingers inside your company, remind them that the outage cost them $XX/minute. Of course, you can only tell them that if you know what that number is. Then gently remind everyone that this is what it cost the whole company.
  4. Take real action. I don’t mean things like “We will be conducting an internal review of our processes to ensure that this is not repeated”. I mean things like listening and understanding what technology or business process failed and got you into this position in the first place. Was it someone just hitting the wrong switch? Or was it a culture of denial that did not allow the reality of Web performance to filter up to levels where real change could be implemented?
  5. Demand quantitative proof that this will never happen again. Load test. Monitor. Measure. Correlate data from multiple sources. Decide how Web performance information will be communicated inside your company. Make the data available so people can ask questions. Be prepared to defend your decisions with real information.

The most successful Web companies have done thing very well. It is the core of their success and it is what makes them ruthlessly strive for Web performance excellence.
These companies understood that in order to succeed they needed to create a culture where business performance and Web performance are the same thing.

Web Performance, Part IV: Finding The Frequency

In the last article, I discussed the aggregated statistics used most frequently to describe a population of performance data.
stats-articles
The pros and cons of each of these aggregated values has been examined, but now we come to the largest single flaw: these values attempt to assign a single value to describe an entire population of numbers.
The only way to describe a population of numbers is to do one of two things: Display every single datapoint in the population against the time it occurred, producing a scatter plot; or display the population as a statistical distribution.
The most common type of statistical distribution used in Web performance data is the Frequency Distribution. This type of display breaks the population down into measurements of a certain value range, then graphs the results by comparing the number of results in each value container.
So, taking the same population data used in the aggregated data above, the frequency distribution looks like this.
stats-articles-frequency
This gives a deeper insight into the whole population, by displaying the whole range of measurements, including the heavy tail that occurs in many Web performance result sets. Please note that a statistical heavy tail is essentially the same as Chris Anderson’s long tail, but in statistical analysis, a heavy tail represents a non-normally distributed data set, and skews the aggregated values you try and produce from the population.
As was noted in the aggregated values, the ‘average’ performance like falls between 0.88 and 1.04 seconds. Now, when you take these values and compare them to the frequency distribution, these values make sense, as the largest concentration of measurement values falls into this range.
However, the 85th Percentile for this population is at 1.20 seconds, where there is a large secondary bulge in the frequency distribution. After that, there are measurements that trickle out into the 40-second range.
As can be seen, a single aggregated number cannot represent all of the characteristics in a population of measurements. They are good representations, but that’s all they are.
So, to wrap up this flurry of a visit through the world of statistical analysis and Web performance data, always remember the old adage: Lies, Damn Lies, and Statistics.
In the next article, I will discuss the concept of performance baselining, and how this is the basis for Web performance evolution.

Web Performance, Part II: What are you calling average?

For a decade, the holy grail of Web performance has been a low average performance time. Every company wants to have the lowest time, in some kind of chest-thumping, testosterone-pumped battle for supremacy.
Well, I am here to tell you that the numbers you have been using for the last decade have been lying. Well, lying is perhaps to strong a term. Deeply misleading is perhaps the more accurate way to describe the way that an average describes a population of results.
Now before you call your Web performance monitoring and measurement firms and tear a strip off them, let’s look at the facts. The numbers that everyone has been holding up as the gospel truth have been averages, or, more correctly, Arithmetic Means. We all learned these in elementary school: the sum of X values divided by X produces a value that approximates the average value for the entire population of X values.
Where could this go wrong in Web performance?
We wandered off course in a couple of fundamental ways. The first is based on the basic assumption of Arithmetic Mean calculations, that the population of data used is more or less Normally Distributed.
Well folks, Web performance data is not normally distributed. Some people are more stringent than I am, but my running assumption is that in a population of measurements, up to 15% are noise resulting from “stuff happens on the Internet”. This outer edge of noise, or outliers, can have a profound skewing effect on the Arithmetic Mean for that population.
“So what?”, most of you are saying. Here’s the kicker: As a result of this skew, the Arithmetic Mean usually produces a Web performance number that is higher than the real average of performance.
So why do we use it? Simple: Relational databases are really good at producing Arithmetic Means, and lousy at producing other statistical values. Short of writing your own complex function, which on most database systems equates to higher compute times, the only way to produce more accurate statistical measures is to extract the entire population of results and produce the result in external software.
If you are building an enterprise class Web performance measurement reporting interface, and you want to calculate other statistical measures, you better have deep pockets and a lot of spare computing cycles, because these multi-million row calculations will drain resources very quickly.
So, for most people, the Arithmetic Mean is the be all and end all of Web performance metrics. In the next part of this series, I will discuss how you can break free of this madness and produce values that are truer representations of average performance.