OCSP and the GoDaddy Event

Image by vissago - http://www.flickr.com/photos/vissago/
Image by vissago – http://www.flickr.com/photos/vissago/

The GoDaddy DNS event (which I wrote about here) has been the subject of many a post-mortem and water-cooler conversation in the web performance world for the last week. In addition to the many well-publicized issues that have been discussed, there was one more, hidden effect that most folks may not have noticed – unless you use Firefox.
Firefox uses OCSP lookups to validate the certificate of SSL certificates. If you go to a new site and connect using SSL, Firefox has a process to check the validity of SSL cert. The results are of the lookup cached and stored for some time (I have heard 3 days, this could be incorrect) before checking again.
Before the security wonks in the audience get upset, realize I’m not an OCSP or SSL expert, and would love some comments and feedback that help the rest of us understand exactly how this works. What I do know is that anyone who came to a site the relied on an SSL cert provided and/or signed by GoDaddy at some point in its cert validation path discovered a nasty side-effect of this really great idea when the GoDaddy DNS outage occurred: If you can’t reach the cert signer, the performance of your site will be significantly delayed.
Remember this: It was GoDaddy this time; next time, it could be your cert signing authority.
How did this happen? Performing an OCSP lookup requires a opening a new TCP connection so that an HTTP request can be made to the OCSP provider. A new TCP connection requires a DNS lookup. If you can’t perform a successful DNS lookup to find the IP address of the OCSP host…well, I think you can guess the rest.
Unlike other third-party outages, these are not ones that can be shrugged off. These are ones that will affect page rendering by blocking the downloading the mobile or web application content you present to customers.
I am not someone who can comment on the effectiveness of OCSP lookups in increasing web and mobile security. OCSP lookup for Firefox are simply one more indication of how complex the design and management of modern online applications is.
Learning from the near-disaster state and preventing it from happening again is more important that a disaster post-mortem. The signs of potential complexity collapse exist throughout your applications, if you take the time to look. And while something like OCSP may like like a minor inconvenience, when it affects a discernible portion of your Firefox users, it becomes a very large mouse scaring a very jumpy elephant.

Effective Web Performance: Third-Party Providers (Or Why Herding Cats is Fun)

It’s a rare Web site these days that hosts all of its own content. From the smallest blog to the largest retailer, Web sites farm out their images, streams, and pages to CDNs, and absorb feeds, ads, and data streams from any number of outside providers.
Effective Web performance demands that a site take responsibility for the entire site, not just the parts under direct local management. Why? Because customers see a problem with your site, not with a provider.
How can the performance all of the third-party content on a site be managed? Using the exactly the same strategies already place to manage the performance of local content.

Measure from the outside-in

Customers come from the Internet. That measuring the performance of a site from the perspective of visitors is being mentioned here should not be a surprise. Critical to this part of managing third-parties is the ability to see into the page and determine if there are performance issues requesting and transmitting data from third-parties.
In the first article of this series, I detailed a number of approaches to actively gathering performance data. This method, whether from the datacenter or the last mile, will provide the early warning signs that there is an issue with a third-party, and feeding this data into the performance issue management plan.

Measure from inside the browser

The network and application performance of a third-party page component is just the start of the process, as this is what it takes to get the object to the browser. But what if this object then launches a number of actions, or starts to render on the screen. This may lead to a whole different range of issues that are a blind spot when analyzing Web performance.
Measuring the performance of discrete page elements from within the visitors browser will provide deeper insight into what effects the customer sees and which third-parties will need to be approached in order to improve the overall Web performance of the site.

Have clear and useful SLOs and SLAs

Service level objectives and service level agreements are often thrown about whenever there is the suspicion that there is a Web performance issue. Using these documents and frameworks as a club to beat up partners with is counter-productive.
SLOs and SLAs should clearly detail:

  • the performance expectations of the Web site owner
  • the performance and delivery capabilities of the third-party provider

Guess what? Arriving at this in a way that doesn’t lead to resentment and mistrust on both sides requires open and honest discussion.

Share data

If Web site owners and third-parties are going to work together to ensure the most effective Web performance strategy possible, then data must flow freely. Vendors will need access to the same data that Web site owners have (and vice versa) in order to ensure that if an issue is detected, everyone can examine all of the available data, and solve the problem quickly.

Communication

A recurring and critical theme when establishing a culture of effective Web performance is communication. When working with third-parties, this is even more critical, as the performance culture of one organization may be completely different from another. The Web site owner may have one site of criteria that determines a Web performance issue, while the vendor has another, and unless these are understood, problems will occur.
Clear communication paths must be baked into the SLA. Named contacts or contact paths will be there, as will expected response times for inbound requests, and escalation procedures.
When there is a performance issue, both sides will need to be very clear about how each other will respond.

Takeaways

Third-party content on Web sites is a fact. It shouldn’t be a headache. Effective Web performance measurement strategies, shared sources of Web performance data, and clearly understood paths and methods of communication will make using third-party content less stress-inducing to everyone.