New Server – New Fun!

This is the amazing “new” server that this blog is hosted on. Using a dynamic DNS service and Ubuntu on a 9 year-old MacBook, I have taken control of my blog again, trying to revive it from the untouched archive it has been for the last few years.

[Oh, and if you have arrived here from a Newest Industry link, don’t panic. Just search for the article you want – they’re all here.]

I’ve just recently changed companies (through an acquisition), so I will have new web performance items to hook into as I get a chance to work with new customers.

All raise your glass – PerformanceZen is back.

Performance Trends for ???? – Smarter Systems

IF-repair by Yo Mostro (Flickr)Most of the trending items that I have discussed in the last two weeks are things that can be done today, problems that we are aware of and know need to be resolved. One item on my trend list, the appearance of smarter performance measurement systems, is something the WPO industry may have to wait a few years to appear.

A smarter performance measurement system is one that can learn what, when, and from where items need to be monitored by analyzing the behavior of your customers/employees and your systems. A hypothetical scenario of a smarter performance measurement system at work would be in the connection between RUM and synthetic monitoring. All of the professionals in WPO claim that these must be used together, but the actual configuration relies on humans to deliver the advantages that come from these systems. If RUM/analytics know where your customers are, what they do, and when they do it, then why can’t these same systems deploy (maybe even create and deploy!) synthetic tests to those regions automatically to capture detailed diagnostic data?

Why do measurement systems rely on us to manually configure the defaults for measurements? Why can’t we take a survey when we start with a system (and then every month or so after that) that helps the system determine the what/when/where/why/how of data and information we are looking to collect and have the system create a set of test deployment defaults and information displays that match our requirements?

The list of questions goes on, but they don’t have to. Measurement systems have, for too long, been built to rely on expert humans to configure and interpret results. Now we have a chance to step back and ask “If we built a performance measurement system for the a non-expert, what would it look like?”

More data isn’t the goal of performance measurement systems – more information is what we want.

Performance Trends 2013 – Employees are Customers too

Peter Levine from Andreesen Horowitz wrote an article on The Renaissance of Enterprise Computing yesterday that finally sprouted the seed of an idea that has been dormant at the back of my brain for a few months. While the ideas of enterprise computing and web/mobile performance seem disconnected, they’re not.

When companies begin to rely on outside services (Levine mentions Box, Google Docs, and others in his article) they have given part of their infrastructure over to an outside organization. And, when you do that, this means that any performance hiccups that affect us as consumers can have a very major effect on us as employees.

Even if your company decides to purchase and deploy an enterprise application within your own infrastructure or datacenters, the performance and experience that your employees experience when using it on their desktops or on their mobile devices can affect productivity and effectiveness in the workplace. An unmanaged (read unmonitored) solution can have shut down groups in the company for minutes or hours.

Think of the call-center. No matter the industry you’re in, what increase customer calls: slow performance or a poor experience with the web/mobile application. Now, if your employees rely on a variant of the same web application to answer questions in the call-center, have you actually improved the customer experience and increased employee productivity?

Some considerations when managing, designing, or buying an enterprise application in the coming  year:

  • What do your peers tell you about their experience implementing the solution or using an outside service – has it made employees more effective and efficient?
  • Are employees already using a “workaround” that makes them more effective and efficient? Why aren’t they using the internal or mandated solution?
  • Is performance and experience a driving factor in the lack of adoption of the mandated solution?
  • Do you have clear and insightful performance information that shows when employees are experiencing issues performing critical tasks? Can you clearly understand what the root cause is?
  • Are employees experiencing issues using the application in certain browsers or on certain mobile devices? How quickly can your design or your outside service respond to these issues?
  • Are you reviewing the chosen solution regularly to understand how usage is changing and how this could affect the performance of the application in the future?

Performance issues are not simply affecting the customers you serve. Your own employees use many of the same systems and applications in their day-to-day tasks, so a primary goal of managing these application should be to ensure that the applications deliver performance and experience that encourages employees to use them, no matter whether they are developed in-house or purchased as software or SaaS.

Web Performance Trends 2013 – Third Party Services

Every site has them. Whether they’re for analytics, advertising, customer support, or CDN services, third-party services are here to stay. However, for 2013, I believe that these services will face a level of scrutiny that many have avoided up until now.

Recent performance trends indicate that while web site content has been tested and scaled to meet even the highest levels of traffic, the third-party services that these sites have some to rely on (with a few exceptions) are not yet prepared to handle the largest volumes of traffic that occur when many of their customers experience a peak on the same day.

In 2013, I see web site owners asking their third-party service providers to provide verification that their systems be able to handle the highest volumes of traffic on their busiest days, with an additional amount of overhead – I suggest 20% – available for growth and to absorb “super-spikes”. Customer experience is built on the performance of the entire site, so leaving a one component of site delivery untested (and definitely unmonitored!) leaves companies exposed to brand and reputation degradation as well as performance degradation.

In your own organizations, make 2013 the year you:

  • Implement tight controls over how outside content is deployed and managed
  • Implement tight change control policies that clearly describe the process for adding third-party content to your site, including the measurement of performance impacts
  • Define clear SLAs and SLOs for your third-party content providers, including the performance levels at which their content will be disabled or removed from the site.

When speak to your third-party content and service providers about their plans for 2013, ask them to:

  • Explicitly detail how they handled traffic on their busiest days in 2012, and what they plan to do to effectively handle growth in 2013
  • Clearly demonstrate how they are invested in helping their customers deliver successful mobile sites and apps in 2013
  • Lay out how they will provide more transparent access to system performance metrics and what the goals of their performance strategy for 2013 are.

Take control of your third-party content. Don’t let it control you.

Web Performance Trends for 2013 – Performance Optimization

As we approach the end of 2012, I will be looking at a few trends that will become important in 2013. In a previous post, I identified optimization as an important performance trend to watch. It is one of the items on a performance checklist that companies can directly influence through the design and implementation of their web and mobile sites.

The key to optimization in any organization is to think of objects transmitted to customers, regardless of where they originate, as having a cost to you and to the customer. So, a site that makes $100,000 in a day and transfers 10 million objects to customers has an object-to-revenue ratio of 100. But, if the site is optimized and only 7.5 million objects are transferred to make $100,000, that ratio goes down to 75; and if the reduction in objects causes revenue go up to $150,000, the ratio drops to 50.

This approach is simplistic and does not include the actual cost to deliver each object, which includes costs for bandwidth, CDN services, customer service providers, etc. as well as revenue generated by third-party ads and services you present to customers. The act of balancing the cost of the site (to develop and manage), the performance you measure, the revenue you generate, the experience your customers have, and the reputation of your brand is an ongoing process that must be closely considered every time someone asks, “And if we add this to the site/app…”.

There is no optimal figure for site optimization. But there are some simple rules:

  • Use Sprites where you can. Combing multiple small images into aggregated image maps that you can use CSS to display gives you a double-plus good improvement – fewer objects to download and more text (HTML, JavaScript, and CSS) that can be delivered to visitors in a compressed format
  • Combine JavaScript and CSS files. Listen to your designers – they will likely try to convince that each file needs to be separate for some arcane reason. Listen and then ask if this is the most efficient way to deploy this particular function or formatting. Ask the developer to produce a cost/benefit analysis of doing it their way versus using something that is already in place
  • Control your third-party services. This means having a sane method for managing these services, and shutting them off if necessary. Have every team that is responsible for the site meet to approve (or deny) the addition of new third-party services. And those who want it better come with a strong cost/benefit analysis.

Optimization is the act of making the sites you create as effective and efficient as the business you run. No matter how “low” the cost to operate a web site is, each object on a site can cost the company more money than it is worth in revenue. And if that object slows the site down, it could turn a profitable transaction into a lost customer.

Effective Web Performance – Will A CDN Really Solve The Problem?

Content Delivery Networks (CDNs) have been available for over a decade, with companies leaning on these distributed networks to accelerate the delivery of their content, absorb the load of the peak traffic periods, and help deal with the problem of geography. During my career I have helped companies assess and evaluate CDN solutions so that they understood the performance improvement benefit they would receive for the money they were spending.

But a something bothered me during these evaluations. I felt that companies were rushing into the decision because it was the “right” thing to do, that their site could only get faster if they used a CDN. This rush often left question unasked and unanswered.

In this post, I want to provide 4 questions that you should ask before deploying a CDN, questions that will help you ensure that a CDN is the performance improvement solution you need right now.

Is the performance issues due to slow application components?

If the performance issue can be tracked down to application elements, then a CDN is likely not the immediate starting point for you and your team. If generating dynamic content or database lookups are causing the performance issue, this is on you, in your datacenter.

CDNs do not make slow applications run faster; CDNs move bits to customers faster more efficiently.

How are we measuring page response times?

If you are measuring response times using the load time of the entire page, then you may be inflating your response times by including all of the third-party content that has been included. You may want to set up parallel measurements that allow you to compare the full page with a page measurement that only includes the content you are directly responsible for managing. If you find that third-party content is slowing down your pages, then a CDN can’t help you.

Full page load times only give you one performance perspective. Modern performance measurement teams need to understand how long it takes for a page to be ready and usable for the customer – the perceived rendering or page ready time. What you could discover is that customers can do 90% of what they want on your page well before all of the content fully loads. If this is true, than the perceived load time for your company to determine that a CDN isn’t necessary right now.

Have we done everything to optimize our own performance?

Often companies choose the easy way (CDN) to solve a hard problem (improve performance). Unfortunately, the easy way can be more expensive than taking the time to design pages and content that ensure long-term and sustainable performance. A CDN could mask a bad design until it slows down so much that even the CDN can no longer help.

Measure your page and challenge the devops team to tweak the exiting design until they don’t think they can get any more performance out of it. Then, during your CDN evaluations, set up measurements that compare the performance of origin and CDN delivered pages. You might find that making the pages as fast as you possibly can before purchasing the services of a CDN make the difference between the origin and CDN times less dramatic than they would have been before optimization.

Can we effectively estimate if the CDN cost will be offset by an increase in revenue?

CDNs don’t come cheap, so choosing to deploy a CDN better pay for itself over time. Challenge the CDNs you are evaluating to present you with ROI calculations that show how their service more than pays for itself and case studies (or customers) that prove that the cost of the CDN was offset by a business metric that can be easily tracked.

Summary

Don’t mis-understand the questions I am posing here: I am still a strong advocate for the use of CDNs. However, the days of companies simply purchasing the services of a CDN because it’s the “right thing to do” are long over.

As companies evolve their perspective on web and mobile performance, they need to ensure they have done everything they can to make their own applications faster. Once the hard work of tuning and optimization is complete, the process of choosing a CDN must include deeper, more probing questions about the performance and business benefits that come along with the service.

 

Effective Web Performance: The Culture of Performance

A quote from Avinash Kaushik (Occam’s Razor and @avinashkaushik) to start this post.

I have a 10/90 rule . If your budget is $100 then spend $10 on tools and professional services to implement them, and spend $90 on hiring people to analyze data you collect on your website.

The web is quite complex, you are going to access multiple sources of data, you are going to have to do a lot of leg work. Blood, sweat and tears. You don’t just need tools for that (remember 85% of the data you get from any tool, free or paid is essentially the same). You need people!

Hire the best people you can find, tools will never be a limitation for them.

from This I Believe [A Manifesto for Web Marketers & Analysts]

Staring at this as I sipped my coffee stopped me dead.

Beside me I have two full pages of notes on what makes up the Web performance culture of company, and here is one of the most succinct points summed up for me in two short paragraphs.

Web performance is not just about tools and methodologies. Effective Web performance requires dedicated and trained human resources. And those people need to be able to work in a culture that values and understands the importance of Web performance to the business. Without a culture of Web performance, any tool, technology, and methodology purchased to make things better is useless.

In a previous post I touched on the question of whether an organization sees Web performance as a technology or business issue. Answering this question is key to understanding a company’s perspective on Web performance issues.

Start by asking Who is responsible for Web performance? at a company. Is there a cross-functional team that meets regularly to discuss current performance, long-term trends, the competitive landscape, effects on customer experience, and how performance concerns are shaping and guiding upcoming development efforts?

Or is Web performance a set of anonymous charts and tables that have no context ,originating from the inscrutable measurement system, bundled up into an executive report by an unnamed staff member for a once a month meeting?

Most companies understand Web performance is crucial. They understand it affects the bottom line and customer experience. They understand all of the ideas and concepts of Web performance. But like the proverbial horse and water, they don’t drink from the stream in front of them. They don’t drink because they are too busy watching for cougars, wolverines, and poachers. They have too much going on to make Web performance a priority.

Part of developing a strong culture of Web performance is creating a business culture that is customer-centric. When a company turns their perspective around and makes delighting the customer a part of everything they do, the customer experience on the Web becomes a critical component of the culture.

The key to making Web performance a part of a customer-centric culture is to shift Web performance discussions from the abstract (full of numbers and charts representing the potential of Web performance to affect customers) to the real (effect of Web performance on towns and cities and people and the bottom line). Attaching a name, a place, or a value to every number on a Web performance chart makes it easier for people in an organization to absorb the effect it has on them as an employee.

Moving the discussion about Web performance from the testing lab and NOC to the breakroom and the hallway takes a greater effort. It starts by making Web performance data available to all, not just those who are tasked with monitoring it.

A culture of Web performance means that the $90 you spent on people is supplemented by a team of avid amateurs who notice changes and trends that may slip through the cracks. These amateurs are encouraged to participate in Web performance discussions, where the experts are encouraged to listen then contribute.

Why listen to avid amateurs? In many cases, they are the people who work directly with customers and use the products on a daily basis. Their feedback comes from real experience, set alongside abstract values. Once a measurement has a story, it makes it easier to understand the problem.

An example of the success of amateurs is Wikipedia. A population of amateur contributors, as well as a core of experts in certain fields, have ensured that this is a useful resource. A Web performance culture full of avid amateurs allows comments and stories to flow from the customer-centric parts of an organization into the technology and business parts of the organization. These stories and inputs make the Web performance more real, and make a chart in a report more important.

A culture of Web performance is one that is adopted by an entire company. It is a way of examining the reality of a site in a way that is customer-centric and customer-driven. A strong Web performance culture absorbs information from many sources, and filters the data through a customer filter, and makes every measurement count.

Web Performance: How long can you ignore the money?

Web performance is everywhere. People intuitively understand that when a site is slow, something’s wrong. Web performance breeds anecdotal tales of lost carts, broken catalogs, and searches gone wrong. Web performance can get you name in lights, but not in the way you or your company would like.

It’s a mistake to consider Web performance a technology problem. Web performance is really a business problem that has a technological solution.

Business problems have solutions that any mid-level executive can understand. A site that can’t handle the amount of traffic coming in requires tuning and optimization, not the firing of the current VP of Operations and a new marketing strategy.

Can you imagine the fate of the junior executive who suggested that a new marketing strategy was the solution to brick-and-mortar stores that are too small and crowded to handle the number of prospective customers (or former prospective customers) coming in the door?

Every Web performance event costs a company money, in the present and in the future. So when someone presents your company with the reality of your current Web performance, what is your response?

Some simple ideas for living with the reality that Web performance hurts business.

  1. Be able to explain the issue to everyone in the company and to customers who ask. Gory details and technical mumbo-jumbo make people feel like there is something being hidden from them. Tell the truth, but make it clear what happened.
  2. Do not blame anyone in public. A great way to look bad to everyone is to say that someone else caused the problem. Guess what? All that the people who visited your site during the problem will remember is that your site had the problem. Save frank discussions for behind closed doors.
  3. Be able to explain to the company what the business cost was. While everyone is pointing fingers inside your company, remind them that the outage cost them $XX/minute. Of course, you can only tell them that if you know what that number is. Then gently remind everyone that this is what it cost the whole company.
  4. Take real action. I don’t mean things like “We will be conducting an internal review of our processes to ensure that this is not repeated”. I mean things like listening and understanding what technology or business process failed and got you into this position in the first place. Was it someone just hitting the wrong switch? Or was it a culture of denial that did not allow the reality of Web performance to filter up to levels where real change could be implemented?
  5. Demand quantitative proof that this will never happen again. Load test. Monitor. Measure. Correlate data from multiple sources. Decide how Web performance information will be communicated inside your company. Make the data available so people can ask questions. Be prepared to defend your decisions with real information.

The most successful Web companies have done thing very well. It is the core of their success and it is what makes them ruthlessly strive for Web performance excellence.

These companies understood that in order to succeed they needed to create a culture where business performance and Web performance are the same thing.

Web Performance, Part IX: Curse of the Single Metric

While this post is aimed at Web performance, the curse of the single metric affects our everyday lives in ways that we have become oblivious to.

When you listen to a business report, the stock market indices are an aggregated metric used to represent the performance of a set group of stocks.

When you read about economic indicators, these values are the aggregated representations of complex populations of data, collected from around the country, or the world.

Sport scores are the final tally of an event, but they may not always represent how well each team performed during the match.

The problem with single metrics lies in their simplicity. When a single metric is created, it usually attempts to factor in all of the possible and relevant data to produce an aggregated value that can represent a whole population of results.

These single metrics are then portrayed as a complete representation of this complex calculation. The presentation of this single metric is usually done in such a way that their compelling simplicity is accepted as the truth, rather than as a representation of a truth.

In the area of Web performance, organizations have fallen prey to this need for the compelling single metric. The need to represent a very complex process in terms that can be quickly absorbed and understand by as large a group of people as possible.

The single metrics most commonly found in the Web performance management field are performance (end-to-end response time of the tested business process) and availability (success rate of the tested business process). These numbers are then merged and transformed by data from a number of sources (external measurements, hit counts, conversions, internal server metrics, packet loss), and this information is bubbled up in an organization. By the time senior management and decision-makers receive the Web performance results, that are likely several steps removed from the raw measurement data.

An executive will tell you that information is a blessing, but only when it speeds, rather than hinders, the decision-making process. A Web performance consultant (such as myself) will tell that basing your decisions on a single metric that has been created out of a complex population of data is madness.

So, where does the middle-ground lie between the data wonks and the senior leaders? The rest of this post is dedicated to introducing a few of the metrics that will, in a small subset of metrics, give a senior leaders better information to work from when deciding what to do next.

A great place to start this process is to examine the percentile distribution of measurement results. Percentiles are known to anyone who has children. After a visit to the pediatrician, someone will likely state that “My son/daughter is in the XXth percentile of his/her age group for height/weight/tantrums/etc”. This means that XX% of the population of children that age, as recorded by pediatricians, report values at or below the same value for this same metric.

Percentiles are great for a population of results like Web performance measurement data. Using only a small set of values, anyone can quickly see how many visitors to a site could be experiencing poor performance.

If at the median (50th percentile), the measured business process is 3.0 seconds, this means that 50% of all of the measurements looked at are being completed in 3.0 seconds or less.

If the executive then looks up to the 90th percentile and sees that it’s at 16.0 seconds, it can be quickly determined that something very bad has happened to affect the response times collected for the 40% of the population between these two points. Immediately, everyone knows that for some reason, an unacceptable number of visitors are likely experiencing degraded and unpredictable performance when they visit the site.

A suggestion for enhancing averages with percentiles is to use the 90th percentile value as a trim ceiling for the average. Then side-by-side comparisons of the untrimmed and trimmed averages can be compared. For sites with a larger number of response time outliers, the average will decrease dramatically when it is trimmed, while sites with more consistent measurement results will find their average response time is similar with and without the trimmed data.

It is also critical to examine the application’s response times and success rates throughout defined business cycles. A single response time or success rate value eliminates

  • variations by time of day
  • variations by day of week
  • variations by month
  • variations caused by advertising and marketing

An average is just an average. If at peak buiness hours, response times are 5.0 seconds slower than the average, then the average is meaningless, as business is being lost to poor performance which has been lost in the focus on the single metric.

All of these items have also fallen prey to their own curse of the single metric. All of the items discussed above aggregate the response time of the business process into a single metric. The process of purchasing items online is broken down into discrete steps, and different parts of this process likely take longer than others. And one step beyond the discrete steps are the objects and data that appear to the customer during these steps.

It is critical to isolate the performance for each step of the process to find the bottlenecks to performance. Then the components in those steps that cause the greatest response time or success rate degradations must be identified and targeted for performance improvement initiatives. If there are one or two poorly performing steps in a business process, focusing performance improvement efforts on these is critical, otherwise precious resources are being wasted in trying to fix parts of the application that are working well.

In summary, a single metric provides a sense of false confidence, the sense that the application can be counted on to deliver response times and success rates that are nearly the same as those simple, single metrics.

The average provides a middle ground, a line that says that is the approximate mid-point of the measurement population. There are measurements above and below this average, and you have to plan around the peaks and valleys, not the open plains. It is critical never to fall victim to the attractive charms that come with the curse of the single metric.

Web Performance, Part IV: Finding The Frequency

In the last article, I discussed the aggregated statistics used most frequently to describe a population of performance data.

stats-articles

The pros and cons of each of these aggregated values has been examined, but now we come to the largest single flaw: these values attempt to assign a single value to describe an entire population of numbers.

The only way to describe a population of numbers is to do one of two things: Display every single datapoint in the population against the time it occurred, producing a scatter plot; or display the population as a statistical distribution.

The most common type of statistical distribution used in Web performance data is the Frequency Distribution. This type of display breaks the population down into measurements of a certain value range, then graphs the results by comparing the number of results in each value container.

So, taking the same population data used in the aggregated data above, the frequency distribution looks like this.

stats-articles-frequency

This gives a deeper insight into the whole population, by displaying the whole range of measurements, including the heavy tail that occurs in many Web performance result sets. Please note that a statistical heavy tail is essentially the same as Chris Anderson’s long tail, but in statistical analysis, a heavy tail represents a non-normally distributed data set, and skews the aggregated values you try and produce from the population.

As was noted in the aggregated values, the ‘average’ performance like falls between 0.88 and 1.04 seconds. Now, when you take these values and compare them to the frequency distribution, these values make sense, as the largest concentration of measurement values falls into this range.

However, the 85th Percentile for this population is at 1.20 seconds, where there is a large secondary bulge in the frequency distribution. After that, there are measurements that trickle out into the 40-second range.

As can be seen, a single aggregated number cannot represent all of the characteristics in a population of measurements. They are good representations, but that’s all they are.

So, to wrap up this flurry of a visit through the world of statistical analysis and Web performance data, always remember the old adage: Lies, Damn Lies, and Statistics.

In the next article, I will discuss the concept of performance baselining, and how this is the basis for Web performance evolution.