Tag: SLO

Web Performance Trends 2013 – Third Party Services

2012-12-04 / spierzchala / 0 Comments

Every site has them. Whether they’re for analytics, advertising, customer support, or CDN services, third-party services are here to stay. However, for 2013, I believe that these services will face a level of scrutiny that many have avoided up until now.

Recent performance trends indicate that while web site content has been tested and scaled to meet even the highest levels of traffic, the third-party services that these sites have some to rely on (with a few exceptions) are not yet prepared to handle the largest volumes of traffic that occur when many of their customers experience a peak on the same day.

In 2013, I see web site owners asking their third-party service providers to provide verification that their systems be able to handle the highest volumes of traffic on their busiest days, with an additional amount of overhead – I suggest 20% – available for growth and to absorb “super-spikes”. Customer experience is built on the performance of the entire site, so leaving a one component of site delivery untested (and definitely unmonitored!) leaves companies exposed to brand and reputation degradation as well as performance degradation.

In your own organizations, make 2013 the year you:

Implement tight controls over how outside content is deployed and managed
Implement tight change control policies that clearly describe the process for adding third-party content to your site, including the measurement of performance impacts
Define clear SLAs and SLOs for your third-party content providers, including the performance levels at which their content will be disabled or removed from the site.

When speak to your third-party content and service providers about their plans for 2013, ask them to:

Explicitly detail how they handled traffic on their busiest days in 2012, and what they plan to do to effectively handle growth in 2013
Clearly demonstrate how they are invested in helping their customers deliver successful mobile sites and apps in 2013
Lay out how they will provide more transparent access to system performance metrics and what the goals of their performance strategy for 2013 are.

Take control of your third-party content. Don’t let it control you.

Effective Web Performance: Choosing a CDN

2009-08-13 / spierzchala / 1 Comment

Content Delivery Networks (CDNs) are a key component to any Web performance strategy. If you examine the content from any large online business or media provider, it won’t take long to find the objects that these organizations have entrusted to CDNs to ensure faster delivery and a better user experience.

When working with CDNs, it is critical to understand some terms or concepts that you will be presented with. Each CDN will present them in it’s own unique way and using its own unique terminology. Having an understanding of the underlying concepts, you will be able to have discussions with CDNs that are more meaningful, and targeted on your needs.

The Massively Distributed Model

CDNs fall into one of two categories, the first being the massively distributed model. CDNs that use this method will demonstrate how they have hardware and caching content servers in almost every city and town of any size in the world. As well, they have their systems located on every major consumer network in order to ensure that they are as close to the end-user as possible.

The CDN everywhere model, while far-reaching and seemingly extremely effective does have its disadvantages. First, the CDN infrastructure relies on having extremely accurate maps of the Internet in order to direct visitors to the most proximate CDN server location. However, these maps are only truly effective when visitors use DNS servers that are on the same network that they are. Services such as OpenDNS and DNS Advantage can seriously effect the proximity algorithms of the distributed CDN by removing the key piece of localization information that they need to ensure that the best cache location is selected.

Also, as with any proxy caching methodology, this model relies on use. More popular items stay in the cache longer, while less popular items may be pushed aside or stored further upstream at parent caches for retrieval, adding a few extra milliseconds for the initial request. Also, new content has to be pushed out to the edge, and may take a few hours to be completely propagated.

The Massively Concentrated Model

CDNs that use this model rely on a smaller number of locations than the massively distributed model. However, these locations tend to be massive and incredibly well connected, relying on the concept that even if they are a few more hops away, their content is always there and ready for requests.

These sites have massive amounts of storage and rely on private networks to ensure that new content is immediately pushed out to the super-nodes as soon as it is added. And while they may be those extra few hops away, the performance difference may not be enough for the average site visitor to notice.

The obvious disadvantage of the massively concentrated model is that it is great for serving those places where there is a lot of traffic. However, in regions with less traffic, or less developed infrastructures, the fewer boots on the ground may begin to have an effect on performance.

Other CDN Concepts

Application Proxy

CDNs offer many institutions the ability to use their network for all incoming requests, even if they are for dynamic content that will require processing in the client datacenter. In these instances, the CDN acts as an application proxy, using its superior knowledge of routing and traffic patterns to move requests from the edge of the Internet back to the datacenter more effectively.

Remember: Just because the CDN is providing fast routing and delivery to the visitor, your application is still the bottleneck. Poor app design or slow queries will affect the application in exactly the same way that it would if the call was coming straight to your datacenter.

Traffic Acceleration

In certain circumstances, security and regulatory concerns completely eliminate the ability of a business to use the standard CDN model. Banks, government agencies, and health-care providers cannot store data in an environment whose security they cannot vouch for, no matter how many safeguards are put in place.
These organizations still need to be able to deliver a good customer experience, so there has to be a way to help accelerate their content without taking control of it. Traffic acceleration serves this purpose by using proprietary network protocol adaptations that remove some of the overhead associated with standard network protocols.

Content is intercepted at the datacenter and routed across private networks using the streamlined network protocols to an network location that is as close to the visitor as possible. Once it has reached the appropriate location, it is converted back to standard TCP and passed to the visitor.

The method above describes how a standard Web request works, but this can also be extended to true point-to-point VPNs with endpoints separated by great network and/or physical distances.

Validating the Claims

Any component of choosing or using a CDN is quantifying the effectiveness of the solution. The standard for many years has been the bake-off method of comparison. The prospect’s origin site is measured against the same site delivered by one or more CDNs. The CDN vendor with the fastest performance and the best price usually wins.

Before walking into a bake-off, come prepared. Turn your CDN bake-off into an episode of Iron Chef. Come to the table with the ingredients, and make the CDNs prepare a solution that meets your needs.

Measure Transactions

The standard base measurement that CDNs will use in a bake-off is single object(s) or page measurement. Your visitors do not just visit a single page, so ensure that the CDN has an effective solution that produces noticeable performance improvements across all the key functions of your site, including the secure components of the site, where the money is made.

Measure from the Edge

Backbone measurements are great for baselining and detecting operational issues that require a consistent and stable dataset. Your customers, however, do not have direct connections to high-priced datacenters with fat pipes.

The two CDN models will react differently to under certain circumstances, and this will appear in edge measurements. Measuring on the ground, from the ISPs that your customers use, will give you a clear sense of how much improvement a CDN will provide when compared to the performance of your origin datacenter.

The edge is messy, chaotic, and what your customers deal with everyday.

Understand the SLAs/SLOs

CDNs will always provide either service level agreement (SLA) with service level objectives (SLOs) stated in it. This topic is at once recognizable and about as well understood as 11 Dimensional Theoretical Physics.

I have written briefly about SLAs and SLOs before [here and here]. Do your research before you wade into this polite version of white-collar trench warfare.
Make sure you understand what the goal of the SLA is. Make sure that the SLOs are clear, measurable, valid, and enforceable. Then ensure that the method used to measure the SLOs is one that your organization can understand and can accept as valid.

Finally, ensure that the SLOs are reviewed monthly.

Takeaways

Understanding the foundational technology that underlies the CDNs you use or are considering using will help you make better decisions.

SLA: The myth of simplicity

2008-12-17 / spierzchala / 1 Comment

Service Level Agreements. SLAs.

Three of the most contentious words, and most contentious acronym, in the technology sector. Arguments are had, suits are filed, and relationships broken and strained as a result of this single concept.

How can something seemingly simple as setting an agreed upon level of service delivery be so problematic and misunderstood?

The word agreement is the key to the problem. SLAs assume that all parties understand and agree of the level of service. And how that information is to be reported. And who is responsible for reporting the data. And how long you have to file grievances. And who handles problems. And…well, lawyers are involved.

As Guy Kawasaki states regarding the lies of venture capitalists: there is no such thing as a vanilla term sheet.

There is also no such thing as a vanilla SLA. A company that tries to present you with a standardized SLA is trying to pull something over on you.

Some rules about SLAs.

The vendor does not define the SLA. If the vendor selling the product tells you, the customer, what your expected level of service is, then they don’t care about you. Find another vendor.
The customer does not define the SLA. If the customer tells you that they cannot sign an SLA unless you, the vendor, agree to their conditions, walk away from the deal.
An SLA is not an SLO. Service Level Objectives are the targets of success defined by both parties within the SLA. These numbers, however, are not the alpha and the omega of an SLA.
A customer-initiated penalty condition is always in the vendors favor. If the vendor states that the client must initiate the SLA grievance conversation when SLOs are violated, then the vendor is assuming that you are not looking at the data.
SLOs should never be based on single, aggregated metrics from the data. If some bozo tries to say that they provide 99% availability and 3 second average performance, walk away. That is not an SLO.
SLAs are not set in stone. If something is not working, or if targets change, or anything changes, then the parties have to be willing to sit down on a schedule (defined in the SLA) and renegotiate their SLA.
The vendor and the customer have transparent access to the data used for the SLO. If the ccustomer cannot see the data that the vendor is using in the SLO anytime it wants, there will always be a level of mistrust. If you like having all your customers mistrust you, this is a great strategy.
The Problem and Issue Management processes are clearly defined. When something bad happens, or a change needs to be made, the customer and the vendor have to have very clearly defined roles in the process. Responsibility and trust. Do you have that in your current SLA?
The customer and the vendor decide when a problem or issue is resolved. It is not up to one side in an SLA to decide when an issue or problem is resolved. As there are likely penalties involved the longer the abnormal state exists, the customer has a vested interest in quick resolution. As there is likely lost revenue on the table, the customer has the same interest. But the customer also has the seemingly unreasonable idea that this will never happen again, it will be clearly documented, and that getting the right solution is better than getting a solution.
Communication is the key to a good SLA. In the 9 previous points, the emphasis is on communication, the sharing of information. Current SLAs seem to be designed to hide information from each side, and only release it under the most dire situation. People talk. The information will get out. You want your well-crafted brand to implode because you have a reputation as sneaky and untrustworthy?

I’ve likely missed many of the key points, but these are the ones that I see, from both sides of the field, on a pretty regular basis.

In the end, an SLA is not simple. It is not standardized. It is not defined by one side or the other. It is a negotiated treaty of behavior that, in the end, defines the daily operational relationship between two organizations. If you enter an SLA process with both sides trying to find the best way to work together in the long term, there is a good chance that the SLA will be easier than if you go in as stone-cold adversaries.

Service Level Agreements in Web Performance

2004-12-14 / spierzchala / 1 Comment

Service Level Agreements (SLAs) appear to finally be maturing in the realm of Web performance. Both of the Web performance companies that I have worked for have understood their importance, but convincing the market of the importance of these metrics has been a challenge up until recently.

In the bandwidth and networking industries, SLAs have been a key component of contracts for many years. However, as Doug Kaye outlined in his book Strategies for Web Hosting and Managed Services, SLAs can also be useless.

The key to determining a useful Web performance SLA rests on some clear business concepts: relevance and enforceability. Many papers have been written on how to calculate SLAs, but that leaves companies still staggering with the understanding that they need SLAs, but don’t understand them.

Relevance

Relevance is a key SLA metric because an SLA defined by someone else may have no meaning to the types of metrics your business measures itself on. Whether the SLA is based on performance, availability or a weighted virtual metric designed specifically by the parties bound by the agreement, it has to mean something, and be meaningful.

The classic SLA is average performance of X seconds and availability of Y% over period Z. This is not particularly useful to businesses, as they have defines business metrics that they already use.

Take for example a stock trading company. in most cases, they are curious, but not concerned with their Web performance and availability between 17:00 and 08:00 Eastern Time. But when the markets are open, these metrics are critical to the business.

Now, try and take your stock-trading metric and overlay it at Amazon or eBay. Doesn’t fit. So, in a classic consultative fashion, SLAs have to be developed by asking what is useful to the client.

Who is to be involved in the SLA process?
How do SLAs for Internal Groups differ from those for External vendors?
Will this be pure technical measurement? Will business data be factored in?

Asking and answering these questions makes the SLA definition process relevant to the Web performance objectives set by the organization.

Enforceability

The idea that an SLA with no teeth could exist is almost funny. But if you examine the majority of SLAs that are contracted between businesses in the Web performance space today, you will find that they are so vaguely defined and meaningless to the business objectives that actually enforcing the penalty clauses is next to impossible.

As real world experience shows, it is extremely difficult for most companies enforce SLAs. If the relevance objectives discussed above are hammered out so that the targets are clear and precise, then enforcement becomes a snap. The relevance objective often fails, because the SLA is imposed by one party on another; or an SLA is included in a contract as a feature, but when something goes wrong, escape path is clear for the “violating” party.

If an organization would like to try and develop a process to define enforceable SLAs, start with the internal business units. These are easier to develop, as everyone has a shared business objective, and all disputes can be arbitrated by internal executives or team leaders.

Once the internal teams understand and are able to live with the metrics used to measure the SLAs, then this can be extended to vendors. The important part of this extension is that third-party SLA measurement organizations will need to become involved in this process.

Some would say that I am tooting my own horn by advocating the use of these third-party measurement organizations, as I have worked for two of the leaders in this area. The need for a neutral third-party is crucial in this scenario; it would be like watching a soccer match (football for the enlightened among you) without the mediating influence of the referee.

If your organization is now considering implementing SLAs, then it is crucial that these agreements are relevant and enforceable. That way, both parties understand and will strive to meet easily measured and agreed upon goals, and understand that there are penalties for not delivering performance excellence.