Category: Effective Web Performance

Effective Web Performance: The Culture of Performance

2009-08-17 / spierzchala / 1 Comment

A quote from Avinash Kaushik (Occam’s Razor and @avinashkaushik) to start this post.

I have a 10/90 rule . If your budget is $100 then spend $10 on tools and professional services to implement them, and spend $90 on hiring people to analyze data you collect on your website.

The web is quite complex, you are going to access multiple sources of data, you are going to have to do a lot of leg work. Blood, sweat and tears. You don’t just need tools for that (remember 85% of the data you get from any tool, free or paid is essentially the same). You need people!

Hire the best people you can find, tools will never be a limitation for them.

from This I Believe [A Manifesto for Web Marketers & Analysts]

Staring at this as I sipped my coffee stopped me dead.

Beside me I have two full pages of notes on what makes up the Web performance culture of company, and here is one of the most succinct points summed up for me in two short paragraphs.

Web performance is not just about tools and methodologies. Effective Web performance requires dedicated and trained human resources. And those people need to be able to work in a culture that values and understands the importance of Web performance to the business. Without a culture of Web performance, any tool, technology, and methodology purchased to make things better is useless.

In a previous post I touched on the question of whether an organization sees Web performance as a technology or business issue. Answering this question is key to understanding a company’s perspective on Web performance issues.

Start by asking Who is responsible for Web performance? at a company. Is there a cross-functional team that meets regularly to discuss current performance, long-term trends, the competitive landscape, effects on customer experience, and how performance concerns are shaping and guiding upcoming development efforts?

Or is Web performance a set of anonymous charts and tables that have no context ,originating from the inscrutable measurement system, bundled up into an executive report by an unnamed staff member for a once a month meeting?

Most companies understand Web performance is crucial. They understand it affects the bottom line and customer experience. They understand all of the ideas and concepts of Web performance. But like the proverbial horse and water, they don’t drink from the stream in front of them. They don’t drink because they are too busy watching for cougars, wolverines, and poachers. They have too much going on to make Web performance a priority.

Part of developing a strong culture of Web performance is creating a business culture that is customer-centric. When a company turns their perspective around and makes delighting the customer a part of everything they do, the customer experience on the Web becomes a critical component of the culture.

The key to making Web performance a part of a customer-centric culture is to shift Web performance discussions from the abstract (full of numbers and charts representing the potential of Web performance to affect customers) to the real (effect of Web performance on towns and cities and people and the bottom line). Attaching a name, a place, or a value to every number on a Web performance chart makes it easier for people in an organization to absorb the effect it has on them as an employee.

Moving the discussion about Web performance from the testing lab and NOC to the breakroom and the hallway takes a greater effort. It starts by making Web performance data available to all, not just those who are tasked with monitoring it.

A culture of Web performance means that the $90 you spent on people is supplemented by a team of avid amateurs who notice changes and trends that may slip through the cracks. These amateurs are encouraged to participate in Web performance discussions, where the experts are encouraged to listen then contribute.

Why listen to avid amateurs? In many cases, they are the people who work directly with customers and use the products on a daily basis. Their feedback comes from real experience, set alongside abstract values. Once a measurement has a story, it makes it easier to understand the problem.

An example of the success of amateurs is Wikipedia. A population of amateur contributors, as well as a core of experts in certain fields, have ensured that this is a useful resource. A Web performance culture full of avid amateurs allows comments and stories to flow from the customer-centric parts of an organization into the technology and business parts of the organization. These stories and inputs make the Web performance more real, and make a chart in a report more important.

A culture of Web performance is one that is adopted by an entire company. It is a way of examining the reality of a site in a way that is customer-centric and customer-driven. A strong Web performance culture absorbs information from many sources, and filters the data through a customer filter, and makes every measurement count.

Effective Web Performance: Positively Managing Performance Issues

2009-08-17 / spierzchala / 1 Comment

The moment a Web site goes live, the publishers lose control of the performance.

When I say lose control of the performance, I mean that despite everything that has been done to ensure scalability and capacity, the Web is inherently an infrastructure that is out of anyone’s direct ability to manage.

This is something that needs to be accepted. And while the datacenter is only that part of an application/infrastructure/network that can be directly managed by the Web site’s owners, a company has to accept that the real datacenter is the Internet. Not a datacenter that is on the Internet; the Internet as the datacenter.

Now that your head is spinning, let’s step back and consider this idea for a minute. The whole concept of the Internet being the datacenter makes operations and IT folks very uncomfortable. Why? There is no way for one company to manage the Internet. As a result, the general perspective is that the Internet can’t be trusted, and all that can be done is manage what can be managed directly.

Ignoring the Internet allows many organizations to leave the entire Internet out of their application or performance planning. They will measure and monitor, and they may even employ third-parties to help improve performance. When the shiny exterior is peeled back, it’s pretty clear that these organizations have built their entire performance culture on the assumption that if a problem exists on the Internet, there is nothing that can be done by them to fix it.

This may be effectively true. And it is not positive way to ensure effective Web performance

Having a what-if, emergency response plan in place is never a bad idea. If a problem appears on the Internet, and it affects your Web site, what are you going to do about it? Whine and moan and point fingers? Or take actions that effectively and clearly communicate to customers the steps you are taking to make things right?
Wait. Managing the Internet through customer communication?

I argue that besides working feverishly behind the scenes to resolve the problem, customer communication is the next most critical component of any Web performance issue management plan.

Web performance issue management plan. You have one, don’t you?
Well, when you get around to it, here are some concepts that should be built into the plan.

Effectively monitor your site

How can measurement and monitoring be part of issue management? Well, isn’t it always good policy to detect and begin investigating problems before your customers do?

Key to the measurement plan is monitoring the parts of your application that customers use. A homepage test will not give you vital information on issues with your authentication process, and is the same as saying the car starts, while ignoring the four flat tires.

If you aren’t effectively monitoring your site, your business is at risk.

Measure where the customers are

If your organization is focused on what it can control, then it will want to measure from locations that are controlled, and can provide stable, consistent, repeatable data.

Hate to break this to you, Sparky, but my Internet connection isn’t an OC-48 provisioned through a large carrier with a written SLA. Real people have provider networks that are congested, under-built, and deliver bandwidth using the old best effort approach.

Some customers may have given up on wires altogether, and access the site through wireless broadband or mobile devices.

Understand how your customers use your site. Then plan your response to managing the Internet from the outside-in.

Test with what your customers use

The greatest cop-out any Web site can make is Our site is best viewed using…
I’m sorry. This isn’t good enough.

Customers demand that your site work the way they want it to, not the other way around. If a customer wants to use Safari on a Mac, or Chromium on Linux, then understanding how the site performs and responds with these browsers is critical.
The one-browser/one-platform world no longer exists. If a large number of customers with one particular configuration indicate that they are having a problem with the new site, what is the proper reaction?

And why did this happen in the first place?

Monitor and respond to social media

No, this isn’t just here for buzzwords and SEO. In the last year, Twitter and Facebook have become the de-facto soapboxes for people who want to announce that their favorite site isn’t working. Wouldn’t hurt to monitor these sites for issues that might not be detected by traditional performance monitoring.

This approach means that you have to be willing to accept responsibility when something affects your site performance or availability, even if it isn’t your fault. No need to tell folks exactly what the problem is, but acknowledging that there is a legitimate issue that you recognize will go a long way toward making visitors/customers more understanding of the situation.

Get your message out effectively

Communicating about a performance issue means that the Marketing and PR teams will have to be brought in.

What? Marketing and Operations/IT working together? Yes. In a situation where there is a major outage or issue, Marketing will DEMAND to be involved. Wouldn’t it be easier if these two parts of the organization knew each other and a plan for responding to critical performance issues?

If Marketing understands the degree of the problem, what it will take to fix, and what is being done about it, they can craft a message that handles any question that might come in, while acknowledging that there is an issue.

A corollary to this: If there is an issue, don’t deny it exists. Denying a problem when it clear to anyone using the site that there is one is worse than saying nothing at all.

Takeaway

Practicing effective Web performance means a company understands that directly managing the Internet is impossible, but having a process to respond to Internet performance issues is critical. A Web performance incident plan shows that you understand that stuff happens on the Internet and you’re working on it.

Effective Web Performance: Choosing a CDN

2009-08-13 / spierzchala / 1 Comment

Content Delivery Networks (CDNs) are a key component to any Web performance strategy. If you examine the content from any large online business or media provider, it won’t take long to find the objects that these organizations have entrusted to CDNs to ensure faster delivery and a better user experience.

When working with CDNs, it is critical to understand some terms or concepts that you will be presented with. Each CDN will present them in it’s own unique way and using its own unique terminology. Having an understanding of the underlying concepts, you will be able to have discussions with CDNs that are more meaningful, and targeted on your needs.

The Massively Distributed Model

CDNs fall into one of two categories, the first being the massively distributed model. CDNs that use this method will demonstrate how they have hardware and caching content servers in almost every city and town of any size in the world. As well, they have their systems located on every major consumer network in order to ensure that they are as close to the end-user as possible.

The CDN everywhere model, while far-reaching and seemingly extremely effective does have its disadvantages. First, the CDN infrastructure relies on having extremely accurate maps of the Internet in order to direct visitors to the most proximate CDN server location. However, these maps are only truly effective when visitors use DNS servers that are on the same network that they are. Services such as OpenDNS and DNS Advantage can seriously effect the proximity algorithms of the distributed CDN by removing the key piece of localization information that they need to ensure that the best cache location is selected.

Also, as with any proxy caching methodology, this model relies on use. More popular items stay in the cache longer, while less popular items may be pushed aside or stored further upstream at parent caches for retrieval, adding a few extra milliseconds for the initial request. Also, new content has to be pushed out to the edge, and may take a few hours to be completely propagated.

The Massively Concentrated Model

CDNs that use this model rely on a smaller number of locations than the massively distributed model. However, these locations tend to be massive and incredibly well connected, relying on the concept that even if they are a few more hops away, their content is always there and ready for requests.

These sites have massive amounts of storage and rely on private networks to ensure that new content is immediately pushed out to the super-nodes as soon as it is added. And while they may be those extra few hops away, the performance difference may not be enough for the average site visitor to notice.

The obvious disadvantage of the massively concentrated model is that it is great for serving those places where there is a lot of traffic. However, in regions with less traffic, or less developed infrastructures, the fewer boots on the ground may begin to have an effect on performance.

Other CDN Concepts

Application Proxy

CDNs offer many institutions the ability to use their network for all incoming requests, even if they are for dynamic content that will require processing in the client datacenter. In these instances, the CDN acts as an application proxy, using its superior knowledge of routing and traffic patterns to move requests from the edge of the Internet back to the datacenter more effectively.

Remember: Just because the CDN is providing fast routing and delivery to the visitor, your application is still the bottleneck. Poor app design or slow queries will affect the application in exactly the same way that it would if the call was coming straight to your datacenter.

Traffic Acceleration

In certain circumstances, security and regulatory concerns completely eliminate the ability of a business to use the standard CDN model. Banks, government agencies, and health-care providers cannot store data in an environment whose security they cannot vouch for, no matter how many safeguards are put in place.
These organizations still need to be able to deliver a good customer experience, so there has to be a way to help accelerate their content without taking control of it. Traffic acceleration serves this purpose by using proprietary network protocol adaptations that remove some of the overhead associated with standard network protocols.

Content is intercepted at the datacenter and routed across private networks using the streamlined network protocols to an network location that is as close to the visitor as possible. Once it has reached the appropriate location, it is converted back to standard TCP and passed to the visitor.

The method above describes how a standard Web request works, but this can also be extended to true point-to-point VPNs with endpoints separated by great network and/or physical distances.

Validating the Claims

Any component of choosing or using a CDN is quantifying the effectiveness of the solution. The standard for many years has been the bake-off method of comparison. The prospect’s origin site is measured against the same site delivered by one or more CDNs. The CDN vendor with the fastest performance and the best price usually wins.

Before walking into a bake-off, come prepared. Turn your CDN bake-off into an episode of Iron Chef. Come to the table with the ingredients, and make the CDNs prepare a solution that meets your needs.

Measure Transactions

The standard base measurement that CDNs will use in a bake-off is single object(s) or page measurement. Your visitors do not just visit a single page, so ensure that the CDN has an effective solution that produces noticeable performance improvements across all the key functions of your site, including the secure components of the site, where the money is made.

Measure from the Edge

Backbone measurements are great for baselining and detecting operational issues that require a consistent and stable dataset. Your customers, however, do not have direct connections to high-priced datacenters with fat pipes.

The two CDN models will react differently to under certain circumstances, and this will appear in edge measurements. Measuring on the ground, from the ISPs that your customers use, will give you a clear sense of how much improvement a CDN will provide when compared to the performance of your origin datacenter.

The edge is messy, chaotic, and what your customers deal with everyday.

Understand the SLAs/SLOs

CDNs will always provide either service level agreement (SLA) with service level objectives (SLOs) stated in it. This topic is at once recognizable and about as well understood as 11 Dimensional Theoretical Physics.

I have written briefly about SLAs and SLOs before [here and here]. Do your research before you wade into this polite version of white-collar trench warfare.
Make sure you understand what the goal of the SLA is. Make sure that the SLOs are clear, measurable, valid, and enforceable. Then ensure that the method used to measure the SLOs is one that your organization can understand and can accept as valid.

Finally, ensure that the SLOs are reviewed monthly.

Takeaways

Understanding the foundational technology that underlies the CDNs you use or are considering using will help you make better decisions.

Effective Web Performance

2009-08-12 / spierzchala / 4 Comments

Slap up some measurements. Look at some graphs. Make a few calls. Your site is faster. You’re a hero.
Right.

Effective Web performance is something that requires planning, preparation, execution, and the willingness to try more than once to get things right. I have discussed this problem before, but wanted to expand my thoughts into some steps that I have seen work effectively in organizations that have effectively established Web performance improvement strategies that work.

This process, in its simplest form, consists of five steps. Each step seems simple, but skipping any one of them will likely leave your Web performance process only half-baked, unable to help your team effectively improve the site.

1. Identification – What do we want/need to measure?

We want to measure everything. From everywhere.

This is an ineffective approach to Web performance measurement. This approach leads to a mass of data flowing towards you, causing your team to turn and flee, finding any way possible to hide from the coming onslaught.

Work with your team to carefully chose your Web performance targets. Identify two or three things about your site’s performance that you want to explore. Make these items discrete and clearly understood by everyone on your team. Clearly state their importance to improving Web performance. Get everyone to sign off on this.

Now, what was just said above will not be easy. There will be disagreements among people, among different parts of the organization, about which items are the most crucial to measure. This is a good thing.

Perhaps the greatest single hindrance to Web performance improvement is the lack of communication. An active debate is better than quiet acceptance and a grudging belief that you are going the wrong way. Corporate silos and a culture of assurance will not allow your company to make the decisions you need to have an effective Web performance strategy.

2. Selection – What data will we need to collect?

In order to identify a Web performance issue (which is far more important than trying to solve it), the data that will be examined will need to be decided on. This sounds easy – response time and success rate. We’re done.
Right.

Now, if your team wants to be effective, they have to understand the complexity of what they are measuring. Then an assessment of what useful data can be extract to isolate the specific performance issue under study can be made.
Choose your metrics carefully, as the wrong data is worse than no data.

3. Execution – How will we collect the data?

Once what is to be measured is decided on, the mechanics of collecting the data can be decided on. In today’s Web performance measurement environment, there are solutions to meet every preferred approach.

Active Synthetic Monitoring. This is the old man of the methods, having been around the longest. A URL or business process is selected, scripted, and them pushed out to an existing measurement network that is managed/controlled. These have the advantage of providing static, consistent metrics that can be used as baselines for long-term trending. However, they are locked to a single process, and do not respond or indicate where your customers are going now.
Passive User Monitoring – Browser-Side. A relative newcomer to the measurement field, this process allows companies to tag pages and follow the customer performance experience as they move through a site. This methodology can also be used to discretely measure the browser-side performance of page components that may be invisible to other measurement collection methods. It does have a weakness in that it is sometimes hard to sell within an organization because of its perceived similarity to Web analytics approaches and its need to develop an effective tagging strategy.
Passive User Monitoring – Server-Side. This methods follows customers as they move through a site, but collects data from a users interaction with the site, rather than with the browser. Great for providing details of how customers moved through a site and how long it took to move from page to page. It is weak in providing data on how long it took for data to be delivered to the customer, and how long it took their browser to process and render the requested data.

Organizations often choose one of the methods, and stay with it. This has the effect of seeing the world through hammer goggles: If all you have is a hammer, then every problem you need to solve has to be turned into a nail.

Successful organizations have a complex, correlative approach to effective Web performance analysis. One that links performance data from multiple inputs and finds a way to link the relationships between different data sets.

If your team isn’t ready for the correlative approach, then at least keep an open mind. Not every Web performance problem is a nail.

4. Information – How do we make the data useful?

Your team now has a great lump of data, collected in a way that is understood, and providing details about things they care about.
Now what?

Web performance data is simply the raw facts that come out of the measurement systems. It is critical that during the process of determining why, what and how to measure that you also decided how you were going to process the data to produce metrics that made sense to your team.
Strategies include:

Feeding the data into a business analytics tool
Producing daily/weekly/monthly reports on the Key Performance Indicators (KPIs) that your team uses to measure Web performance
Annotate change, for better or worse
Correlate. Correlate. Correlate. Nature abhors a vacuum.

Providing a lot of raw data is the same as a vacuum – a whole bunch of nothing.

5. Action – How do we make meaningful Web performance changes?

Data has been collected and processed into meaningful data. People throughout the organization are having a-ha moments, coming up with ideas or realizations about the overall performance of the site. There are cries to just do something.
Stick to the plan. And assume that the plan will evolve in the presence of new information.

Prioritizing Web performance improvements falls into the age-old battle between the behemoths of the online business: business and IT.
Business will want to focus on issues that have the greatest effect on the bottom-line. IT will want to focus on the issues that have the greatest effect on technology.
They’re both wrong. And they’re both right.

Your online business is just that: a business that, regardless of its mission, based on technology. Effective Web performance relies on these two forces being in balance. The business cannot be successful without a sound and tuned online platform, and the technology needed to deliver the online platform cannot exist without the revenue that comes from the business done on that platform.

Effective Web performance relies on prioritizing issues so that they can be done within the business and technology plans. And an effective organization is one that has communicated (there’s that word again) what those plans are. Everyone needs to understand that the business makes decisions that effect technology and vice-versa. And that if these decisions are made in isolation, the whole organization will either implode or explode.

Takeaway

Effective Web performance is hard work. It takes a committed organization that understands that running an online business requires that everyone have access to the information they need, collected in a meaningful way, to meet the goals that everyone has agreed to.

Web Performance: On the edge of performance

2009-08-10 / spierzchala / 0 Comments

A decade of working in the Web performance industry can leave one with the idea that no matter how good a site is, there is always the opportunity to be better, be faster. However, I am beginning to believe, just from my personal experience on the Internet, that speed has reached its peak with the current technologies we have.

This does not bode well for an Internet that is shifting more directly to true read/write, data/interaction heavy Web sites. This needs to have home broadband that is not only fast, but which has equality for inbound and outbound connection speeds.

But will faster home broadband really make that much of a difference? Or will faster networks just show that even with the best connectivity to the Internet money can buy, Web sites are actually hurting themselves with poor design and inefficient data interaction designs?

For companies on the edge of Web performance, who are trying to push their ability to improve the customer experience as hard as possible, who are moving hard and fast to the read/write web, here are some ways you can ensure that you can still deliver the customer experience your visitors expect.

Confirm your customers’ bandwidth

This is pretty easy. Most reasonably powerful Web analytics tools can confirm this for you, breaking it down by dialup, and high broadband type. It’s a great way to ensure that your preconceptions about how your customers interact with your Web site meets the reality of their world.

It is also a way to see just how unbalanced your customers’ inbound and outbound connection speeds. If it is clear that traffic is coming from connection types or broadband providers that are heavily weighted towards download, then optimization exercises cannot ignore the effect of data uploads on the customer experience.

Design for customers’ bandwidth

Now that you’ve confirmed the structure of your customers’ bandwidth, ensure that your site and data interaction design are designed with this in mind. Data that uses a number of inefficient data calls behind the scenes in order to be more AJAXy may hurt itself when it tries to make those calls over a network that’s optimized for download and not upload.

Measure from the customer perspective

Web performance measurement has been around a long time. But understanding how the site performs from the perspective of true (not simulated) customer connectivity, right where they live and work, will highlight how your optimizations may or may not be working as expected.

Measurements from high-throughput, high-quality datacenter connections give you some insight into performance under the best possible circumstances. Measure from the customer’s desktop, and even the most thoughtfully planned optimization efforts may have been like attacking a mammoth with a closed safety pin: ineffective and it annoys the mammoth [to paraphrase Hugh Macleod].

As well as synthetic measurements, measure performance right from within the browser. Understanding how long it takes pages to render, how long it takes to show content above the fold, and to gather discrete times on complex Flash and AJAX events within the page will give you even more control over finding those things you can fix.

Takeaway

In the end, even assuming your customers have the best connectivity, and you have taken all the necessary precautions to get Web performance right, don’t assume that the technology can save you from bad design and slow applications.
Be constantly vigilant. And measure everything.

Web Performance: How long can you ignore the money?

2009-08-03 / spierzchala / 1 Comment

Web performance is everywhere. People intuitively understand that when a site is slow, something’s wrong. Web performance breeds anecdotal tales of lost carts, broken catalogs, and searches gone wrong. Web performance can get you name in lights, but not in the way you or your company would like.

It’s a mistake to consider Web performance a technology problem. Web performance is really a business problem that has a technological solution.
Business problems have solutions that any mid-level executive can understand. A site that can’t handle the amount of traffic coming in requires tuning and optimization, not the firing of the current VP of Operations and a new marketing strategy.

Can you imagine the fate of the junior executive who suggested that a new marketing strategy was the solution to brick-and-mortar stores that are too small and crowded to handle the number of prospective customers (or former prospective customers) coming in the door?

Every Web performance event costs a company money, in the present and in the future. So when someone presents your company with the reality of your current Web performance, what is your response?

Some simple ideas for living with the reality that Web performance hurts business.

Be able to explain the issue to everyone in the company and to customers who ask. Gory details and technical mumbo-jumbo make people feel like there is something being hidden from them. Tell the truth, but make it clear what happened.
Do not blame anyone in public. A great way to look bad to everyone is to say that someone else caused the problem. Guess what? All that the people who visited your site during the problem will remember is that your site had the problem. Save frank discussions for behind closed doors.
Be able to explain to the company what the business cost was. While everyone is pointing fingers inside your company, remind them that the outage cost them $XX/minute. Of course, you can only tell them that if you know what that number is. Then gently remind everyone that this is what it cost the whole company.
Take real action. I don’t mean things like “We will be conducting an internal review of our processes to ensure that this is not repeated”. I mean things like listening and understanding what technology or business process failed and got you into this position in the first place. Was it someone just hitting the wrong switch? Or was it a culture of denial that did not allow the reality of Web performance to filter up to levels where real change could be implemented?
Demand quantitative proof that this will never happen again. Load test. Monitor. Measure. Correlate data from multiple sources. Decide how Web performance information will be communicated inside your company. Make the data available so people can ask questions. Be prepared to defend your decisions with real information.

The most successful Web companies have done thing very well. It is the core of their success and it is what makes them ruthlessly strive for Web performance excellence.

These companies understood that in order to succeed they needed to create a culture where business performance and Web performance are the same thing.

Methodology Before Measurement

2009-01-05 / spierzchala / 0 Comments

Measure what is measurable, and make measurable what is not so.

Galileo Galilei

The greatest challenge facing companies today is not finding ways to measure performance. The key issue is one of understanding what should be measured and validating that there is agreement on what the purpose of the measurement is.

Organizations are complex. And with complexity arises the need to gather data for different purposes. In my series discussing Why Web Measurements?, I broke organizations down into four groups, each one having distinctly different needs for measurements and data.

While this series focuses on Web performance, the four categories (Customer Generation, Customer Retention, Business Operations, and Technical Operations) can be broadly applied to all aspects of your business.
In each of the four categories, whether it is for Web performance or financial analysis, determining what and why to measure is a critical predecessor to the establishment of measurements and the examination of data.

2009 will be a year of reflection and retrenchment. Companies will be examining all aspects of their business, all of their relationships with vendors, all of the ways they measure themselves. The question that must be asked before succumbing to the rushing panic of cost-cutting and layoffs is: Do you fundamentally understand why and what you measure and what it is really telling you?

SLA: The myth of simplicity

2008-12-17 / spierzchala / 1 Comment

Service Level Agreements. SLAs.

Three of the most contentious words, and most contentious acronym, in the technology sector. Arguments are had, suits are filed, and relationships broken and strained as a result of this single concept.

How can something seemingly simple as setting an agreed upon level of service delivery be so problematic and misunderstood?

The word agreement is the key to the problem. SLAs assume that all parties understand and agree of the level of service. And how that information is to be reported. And who is responsible for reporting the data. And how long you have to file grievances. And who handles problems. And…well, lawyers are involved.

As Guy Kawasaki states regarding the lies of venture capitalists: there is no such thing as a vanilla term sheet.

There is also no such thing as a vanilla SLA. A company that tries to present you with a standardized SLA is trying to pull something over on you.

Some rules about SLAs.

The vendor does not define the SLA. If the vendor selling the product tells you, the customer, what your expected level of service is, then they don’t care about you. Find another vendor.
The customer does not define the SLA. If the customer tells you that they cannot sign an SLA unless you, the vendor, agree to their conditions, walk away from the deal.
An SLA is not an SLO. Service Level Objectives are the targets of success defined by both parties within the SLA. These numbers, however, are not the alpha and the omega of an SLA.
A customer-initiated penalty condition is always in the vendors favor. If the vendor states that the client must initiate the SLA grievance conversation when SLOs are violated, then the vendor is assuming that you are not looking at the data.
SLOs should never be based on single, aggregated metrics from the data. If some bozo tries to say that they provide 99% availability and 3 second average performance, walk away. That is not an SLO.
SLAs are not set in stone. If something is not working, or if targets change, or anything changes, then the parties have to be willing to sit down on a schedule (defined in the SLA) and renegotiate their SLA.
The vendor and the customer have transparent access to the data used for the SLO. If the ccustomer cannot see the data that the vendor is using in the SLO anytime it wants, there will always be a level of mistrust. If you like having all your customers mistrust you, this is a great strategy.
The Problem and Issue Management processes are clearly defined. When something bad happens, or a change needs to be made, the customer and the vendor have to have very clearly defined roles in the process. Responsibility and trust. Do you have that in your current SLA?
The customer and the vendor decide when a problem or issue is resolved. It is not up to one side in an SLA to decide when an issue or problem is resolved. As there are likely penalties involved the longer the abnormal state exists, the customer has a vested interest in quick resolution. As there is likely lost revenue on the table, the customer has the same interest. But the customer also has the seemingly unreasonable idea that this will never happen again, it will be clearly documented, and that getting the right solution is better than getting a solution.
Communication is the key to a good SLA. In the 9 previous points, the emphasis is on communication, the sharing of information. Current SLAs seem to be designed to hide information from each side, and only release it under the most dire situation. People talk. The information will get out. You want your well-crafted brand to implode because you have a reputation as sneaky and untrustworthy?

I’ve likely missed many of the key points, but these are the ones that I see, from both sides of the field, on a pretty regular basis.

In the end, an SLA is not simple. It is not standardized. It is not defined by one side or the other. It is a negotiated treaty of behavior that, in the end, defines the daily operational relationship between two organizations. If you enter an SLA process with both sides trying to find the best way to work together in the long term, there is a good chance that the SLA will be easier than if you go in as stone-cold adversaries.

Why Web Measurements? Part IV: Technical Operations

2008-12-08 / spierzchala / 2 Comments

In the first three parts of this series, the focus has been on the business side of the business: Customer Generation, Customer Retention, and Business Operations. The final component of any discussion of why companies measure their Web performance falls down to Technical Operations.

Why is Technical Operations last?

This part of the conversation is the last, mainly because it is the most mature. A technical audience will understand the basics of a distributed Web performance measurement system, or a Web analytics system, or a QA testing tool without too much explanation. The problems that these tools solve are well-defined and have been around for many years.

Quickly thinking about these types of problems makes it clear, however, that the kind of data needed in a technical operations environment is substantially different than that which is needed at the Business Operations level. Here, the devil is in the details; at Business Operations, the devil is in the patterns and trends.

What are you trying to measure?

The short answer is that a Technical Operations team is trying to measure everything. More data is better data at this level. The key is the ability to correlate multiple sources of system inputs (Web performance data, systems data, network data, traffic data, database queries, etc.) to detect the patterns of behavior which could indicate impending crises or complete system outage, or simply a slower than expected response time during peak business hours.

And while Technical Operations teams thrive on data, they do not thrive on explaining this data very well to others. So the metrics which are important in one organization may not be the key ones in another. Or they may be called by a completely different name. Which is why Technical Operations sigh and throw up their hands in despair when talking to management who are working from Business Operations data.

How do you measure it?

Measure early. Measure often.

This sums up the philosophy of most Technical Operations teams. They want to gather as much data as possible. So much data that the gathering of this data is often one step away from affecting the performance of their own systems. This is how the scientific mind works. So, be prepared to control this urge to measure and instrument everything with a need to ensure that the system is operationally sound.

Summary

Even in the well-developed area of Technical Operations, there is still opportunity to ensure that you are measuring the right things the right way. Do an audit of your measurements. Ask the question “why do we measure this this way?”.
Measure meaningful things in a meaningful way.

Why Web Measurements? Part III: Business Operations

2008-12-05 / spierzchala / 1 Comment

In the Customer Generation and Customer Retention articles of this series, the focus was on Web performance measurements designed to serve an audience outside of your organization. Starting with Business Operations, the focus shifts toward the use of Web performance measurements inside your organization.

Why Business Operations?

When I was initially developing these ideas with my colleague Jean Campbell, the idea was to call this section Reporting and Quality of Service. What we found was that this didn’t completely encompass all of the ideas that fall under these measurements. The question became: which part of the organization do reporting and QoS measurements serve?

What was clear was these were the metrics that reported on the health of the Web service to management and the company as a whole. This was the measurement data that the line of business tied to revenue and analytics data to get a true picture of the health of the online business.

What are you measuring?

Measurements for business operations need to capture the key metrics that are critical for making informed business decisions.

How do we compare to our competitors?
Are we close to breaching our SLAs?
Are the third-parties we use close to breaching their SLAs?
What parts of the site affect performance / user experience the most so we can set priorities?
How does Web performance correlate with all the other data we use in our online business?

Every company will use different measures to capture this information, and correlate the data in different ways. The key is that you do use it to understand how Web performance ties into the line of business.

How often do I look at it?

Well, honestly, most people who work in business operations only need to examine Web performance once a day in a summary business KPI report (your company has a useful daily KPI report that everyone understands and uses, right?), and in greater detail at weekly and monthly management meetings.

The goal of the people examining business operations data is not to solve the technical problems that are being encountered, but to understand how the performance of their site affects the general business health of the company, and how it plays in the competitive marketplace.

What metrics do I need?

Business operations teams need to understand

End-to-end response time for measured business processes
Page-level response times for measured business processes
Success rate of the transaction during the measurement period
How third-parties are affecting performance
How Web analytics and Web performance relate
How different regions are affected by performance
How does performance look from the customer ISPs and desktops

Detailed technical data is lost on these people. It is their role to take all of the data they have, and present a picture of the application as it affects the business, and discuss challenges that they face at a technical level in terms of how they affect the business.

Summary

For people who work at an extremely detailed level with Web measurement data (the topic for the next part of this series), Business Operations metrics seem light, fluffy, and often meaningless. But these metrics serve a distinct audience: the people who run the company. Frankly, if the senior business leaders at an organization are worried on a daily basis about the minute technical details that go into troubleshooting and diagnosing performance issues, I would be concerned.
The objective of Business Operations measurements is to convey the health of the Web systems that support the business, and correlate that health with other KPIs used by the management team.