Category: Web performance concepts

The Dichotomy of the Web: Andy King's Website Optimization

2008-08-30 / spierzchala / 4 Comments

The Web is a many-splendored thing, with a very split personality. One side is drive to find ways to make the most money possible, while the other is driven to implement cool technology in an effective and efficient manner (most of the time).

Andy King, in Website Optimization (O’Reilly), tries to address these two competing forces in a way that both can understand. This is important because, as we all know from our own lives, most of the time these two competing parts of the same whole are right; they just don’t understand the other side.

I have seen this trend repeated throughout my nine years in the Web performance industry, five years as a consultant. Companies torn asunder, viewing the Business v. Technology interaction as a Cold War, one that occasionally flares up in odd places which serve as proxies between the two.

Website Optimization appears at first glance to be torn asunder by this conflict. With half devoted to optimizing the site for business and the other to performance and design optimization, there will be a cry from the competing factions that half of this book is a useless waste of time.

These are the organizations and individuals who will always be fighting to succeed in this industry. These are the people and companies who don’t understand that success in both areas is critical to succeeding in a highly competitive Web world.
The first half of the book is dedicated to the optimization of a Web site, any Web site, to serve a well-defined business purpose. Discussing terms such as SEO, PPC, and CRO can curdle the blood of any hardcore techie, but they are what drive the design and business purpose of a Web site. Without a way to get people to a site, and use the information on the site to do business or complete the tasks that they need to, there is no need to have a technological infrastructure to support it.

Conversely, a business with lofty goals and a strategy that will change the marketplace will not get a chance to succeed if the site is slow, the pages are large, and design makes cat barf look good. Concepts such HTTP compression, file concatenation, caching, and JS/CSS placement drive this side of the personality, as well as a number of application and networking considerations that are just too far down the rat hole to even consider in a book with as broad a scope as this one.

Although on the surface, the concepts discussed in this book will see many people put it down as it isn’t business or techie enough, those who do buy the book will show that they have a grasp of the wider perspective, the one that drives all successful sites to stand tall in a sea of similarity.

See the Website Optimization book companion site for more information, chapter summaries and two sample chapters.

Performance Improvement From Caching and Compression

2006-10-03 / spierzchala / 3 Comments

This paper is an extension of the work done for another article that highlighted the performance benefits of retrieving uncompressed and compressed objects directly from the origin server. I wanted to add a proxy server into the stream and determine if proxy servers helped improve the performance of object downloads, and by how much.
Using the same series of objects in the original compression article[1], the CURL tests were re-run 3 times:

Directly from the origin server
Through the proxy server, to load the files into cache
Through the proxy server, to avoid retrieving files from the origin.[2]

This series of three tests was repeated twice: once for the uncompressed files, and then for the compressed objects.[3]
As can be seen clearly in the plots below, compression caused web page download times to improve greatly, when the objects were retrieved from the source. However, the performance difference between compressed and uncompressed data all but disappears when retrieving objects from a proxy server on a corporate LAN.

Instead of the linear growth between object size and download time seen in both of the retrieval tests that used the origin server (Source and Proxy Load data), the Proxy Draw data clearly shows the benefits that accrue when a proxy server is added to a network to assist with serving HTTP traffic.

	MEAN DOWNLOAD TIME
Uncompressed Pages
Total Time Uncompressed — No Proxy	0.256
Total Time Uncompressed — Proxy Load	0.254
Total Time Uncompressed — Proxy Draw	0.110
Compressed Pages
Total Time Compressed — No Proxy	0.181
Total Time Compressed — Proxy Load	0.140
Total Time Compressed — Proxy Draw	0.104

The data above shows just how much of an improvement is gained by adding a local proxy server, explicit caching descriptions and compression can add to a Web site. For sites that do force a great of requests to be returned directly to the origin server, compression will be of great help in reducing bandwidth costs and improving performance. However, by allowing pages to be cached in local proxy servers, the difference between compressed and uncompressed pages vanishes.

Conclusion

Compression is a very good start when attempting to optimize performance. The addition of explicit caching messages in server responses which allow proxy servers to serve cached data to clients on remote local LANs can improve performance to even a greater extent than compression can. These two should be used together to improve the overall performance of Web sites.

[1]The test set was made up of the 1952 HTML files located in the top directory of the Linux Documentation Project HTML archive.

[2]All of the pages in these tests announced the following server response header indicating its cacheability:

Cache-Control: max-age=3600

[3]A note on the compressed files: all compression was performed dynamically by mod_gzip for Apache/1.3.27.

Using Client-Side Cache Solutions And Server-Side Caching Configurations To Improve Internet Performance

2006-10-03 / spierzchala / 2 Comments

In todays highly competitive e-commerce marketplace, the performance of a web-site plays a key role in attracting new and retaining current clients. New technologies are being developed to help speed up the delivery of content to customers while still allowing companies to get their message across using rich, graphical content. However, in the rush to find new technologies to improve internet performance, one low-cost alternative to these new technologies is often overlooked: client-side content caching.

This process is often overlooked or dismissed by web administrators and content providers seeking to improve performance. The major concern that is expressed by these groups is that they need to ensure that clients always get the freshest content possible. In their eyes, allowing their content to be cached is perceived as losing control of their message.

This bias against caching is, in most cases, unjustified. By understanding how server software can be used to distinguish unique caching policies for each type of content being delivered, client-side performance gains can be achieved with no new hardware or software being added to an existing web-site system.

Caching

When a client requests web content, this information is either retrieved directly from the origin server, from a browser cache on a local hard drive or from a nearby cache server^[1]. Where and for how long the data is stored depends on how the data is tagged when it leaves the web server. However, when discussing cache content, there are three states that content can be in: non-cacheable, fresh or stale.

The non-cacheable state indicates a file that should never be cached by any device that receives it and that every request for that file must be retrieved from the origin server. This places an additional load on both client and server bandwidth, as well as on the server which responds to these additional requests. In many cases, such as database queries, news content, and personalized content marked by unique cookies, the content provider may explicitly not want data to be cached to prevent stale data from being received by the client.

A fresh file is one that has a clearly defined future expiration date and/or does not indicate that it is non-cacheable. A file with a defined lifespan is only valid for a set number of seconds after it is downloaded, or until the explicitly stated expiry date and time is reached. At that point, the file is considered stale and must be re-verified (preferred as it requires less bandwidth) or re-loaded from the origin server.^[2]

If a file does not explicitly indicate it is non-cacheable, but does not indicate an explicit expiry period or time, the cache server assigns the file an expiry time defined in the cache servers configuration. When that deadline is reached and the cache server receives a request for that file, the server checks with the origin server to see whether the content has changed. If the file is unchanged, the counter is reset and the existing content is served to the client; if the file is changed, the new content is downloaded, cached according to its settings and then served to the client.

A stale file is a file in cache that is no longer valid. A client has requested information that had previously been stored in the cache and the control data for the object indicates that it has expired or is too stale to be considered for serving. The browser or cache server must now either re-validate the file with or retrieve the file from the origin server before the data can served to the client.
The state of an item being considered for caching is determined using one or more of 5 HTTP header messages^[3] two server messages, one client message, and two that can be sent by either the client or the server.^[4] These headers include: Pragma: no-cache; Cache-Control; Expires; Last-Modified; and If-Modified-Since. Each of these identifies a particular condition that the proxy server must adhere to when deciding whether the content is fresh enough to be served to the requesting client.

Pragma: no-cache is an HTTP/1.0 client and server header that informs caching servers not to serve the requested content to the client from their cache (client-side) and not cache the marked information if they receive it (server-side). This response has been deprecated in favor of the new HTTP/1.1 Cache-Control header, but is still used in many browsers and servers. The continued use of this header is necessary to ensure backwards-compatibility, as it cannot be guaranteed that all devices and servers will understand the HTTP/1.1 server headers.

Cache-Control is a family of HTTP/1.1 client and server messages that can be used to clearly define not only if an item can be cached, but also for how long and how it should be validated upon expiry. This more precise family of messages replaces the older Pragma: no-cache message. There are a large number of options for this header field, but four that are especially relevant to this discussion.^[5]

Cache-Control: private/public

This setting indicates what type of devices can cache the data. The private setting allows the marked items to be cached by the requesting client, but not by any cache servers encountered en-route. The public setting indicates that any device can cache this content. By default, public is assumed unless private is explicitly stated.

Cache-Control: no-cache

This is the HTTP/1.1 equivalent of Pragma: no-cache and can be used by clients to force an end-to-end retrieval of the requested files and by servers to prevent items from being cached.

Cache-Control: max-age=x

This setting allows the indicated files to be cached either by the client or the cache server for x seconds.

Cache-Control: must-revalidate

This setting informs the cache server that if the item in cache is stale, it must be re-validated before it can be served to the client.

A number of these settings can be combined to form a larger Cache-Control header message. For example, an administrator may want to define how long the content is valid for, and then indicate that, at the end of that period, all new requests must be revalidated with the origin server. This can be accomplished by creating a multi-field Cache-Control header message like the one below.

Cache-Control: max-age=3600, must-revalidate

Expires sets an explicit expiry date and time for the requested file. This is usually in the future, but a server administrator can ensure that an object is always re-validated by setting an expiry date that is in the past an example of this will be shown below.

Last-Modified can indicate one of several conditions, but the most common is the last time the state of the requested object was updated. The cache server can use this to confirm an object has not changed since it was inserted into the cache, allowing for re-validation, versus completely re-loading, of objects in cache.

If-Modified-Since is a client-side header message that is sent either by a browser or a cache server and is set by the Last-Modified value of the object in cache. When the origin server has not set an explicit cache expiry value and the cache server has had to set an expiry time on the object using its own internal configuration, the Last-Modified value is used to confirm whether content has changed on the origin server.

If the Last-Modified value on an object held by the origin server is newer than that held by the client, the entire file is re-loaded. If these values are the same, the origin server returns a 304 Not Modified HTTP message and the cache object is then served to the client and has its cache-defined counter reset.

Using an application trace program, clients are able to capture the data that flows out of and in to the browser application. The following two examples show how a server can use header messages to mark content as non-cacheable, or set very specific caching values.

Server Messages for a Non-Cacheable Object

HTTP/1.0 200 OK
Content-Type: text/html
Content-Length: 19662
Pragma: no-cache
Cache-Control: no-cache
Server: Roxen/2.1.185
Accept-Ranges: bytes
Expires: Wed, 03 Jan 2001 00:18:55 GMT

In this example, the server returns three indications that the content is non-cacheable. The first two are the Pragma: no-cache and Cache-Control: no-cache statements. With most client and cache server configurations, one of these headers on its own should be enough to prevent the requested object from being stored in cache. The web administrator in this example has chosen to ensure that any device, regardless of the version of HTTP used, will clearly understand that this object is non-cacheable.

However, in order to guarantee that this item is never stored in or served from cache, the Expires statement is set to a date and time that is in the past.^[6] These three statements should be enough to guarantee that no cache serves this file without performing an end-to-end transfer of this object from the origin server with each request.

Specific Caching Information in Server Messages

HTTP/1.1 200 OK
Date: Tue, 13 Feb 2001 14:50:31 GMT
Server: Apache/1.3.12
Cache-Control: max-age=43200
Expires: Wed, 14 Feb 2001 02:50:31 GMT
Last-Modified: Sun, 03 Dec 2000 23:52:56 GMT
ETag: "1cbf3-dfd-3a2adcd8"
Accept-Ranges: bytes
Content-Length: 3581
Connection: close
Content-Type: text/html

In the example above, the server returns a header message Cache-Control: max-age=43200. This immediately informs the cache that the object can be stored in cache for up to 12 hours. This 12-hour time limit is further guaranteed by the Expires header, which is set to a date value that is exactly 12 hours ahead of the value set in the Date header message.^[7]

These two examples present two variations of web server responses containing information that makes the requested content either completely non-cacheable or cacheable only for a very specific period of time.

How does caching work?

Content is cached by devices on the internet, and these devices then serve this stored content when the same file is requested by the original client or another client that uses that same cache. This rather simplistic description covers a number of different cache scenarios, but two will be the focus of this paper browser caching and caching servers.^[8]

For the remainder of this paper, the caching environment that will be discussed is one involving a network with a number of clients using a single cache server, the general internet, and a server network with a series of web servers on it.

Browser Caching

Browser caching is what most people are familiar with, as all web browsers perform this behavior by default. With this type of caching, the web browser stores a copy of the requested files in a cache directory on the client machine in order to help speed up page downloads. This performance increase is achieved by serving stored files from this directory on the local hard drive instead of retrieving these same files from the web server, which resides across a much slower connection than the one between the hard-drive and the local application, when an item that is stored in cache is requested.

To ensure that old content is not being served to the client, the browser checks its cache first to see if an item is in cache. If the item is in cache, the browser then confirms the state of the object in cache with the origin server to see if the item has been modified at the source since the browser last downloaded it. If the object has not been modified, the origin server sends a 304 Not Modified message, and the item is served from the local hard drive and not across the much slower internet.

First Request for a file

REQUEST

GET /file.html HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-powerpoint, application/vnd.ms-excel, application/msword, application/x-comet, */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)
Host: 24.5.203.101
Connection: Keep-Alive

RESPONSE

HTTP/1.1 200 OK
Date: Tue, 13 Feb 2001 20:00:22 GMT
Server: Apache
Cache-Control: max-age=604800
Last-Modified: Wed, 29 Nov 2000 15:28:38 GMT
ETag: "1df-28f1-3a2520a6"
Accept-Ranges: bytes
Content-Length: 10481
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html

In the above example^[9], the file is retrieved from the server for the first time, and the server sends a 200 OK response and then returns the requested file. The items marked in blue indicate cache control data sent to the client by the server.

Second Request for a file

REQUEST

GET /file.html HTTP/1.1
Accept: */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
If-Modified-Since: Wed, 29 Nov 2000 15:28:38 GMT
If-None-Match: "1df-28f1-3a2520a6"
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)

RESPONSE

Host: 24.5.203.101
Connection: Keep-Alive
HTTP/1.1 304 Not Modified
Date: Tue, 13 Feb 2001 20:01:07 GMT
Server: Apache
Connection: Keep-Alive
Keep-Alive: timeout=5, max=100
ETag: "1df-28f1-3a2520a6"
Cache-Control: max-age=604800

The second request for a file sees the client send a request for the same object 40 seconds later, but with two additions. The server asks if the file has been modified since the last time it was requested by the client (If-Modified-Since). If the date in that field cannot be used by the origin server to confirm the state of the requested object, the client asks if the objects Etag tracking code has changed using the If-None-Match header message.^[10] The origin server responds by verifying that object has not been modified and confirms this by returning the same Etag value that was sent by the client. This rapid client-server exchange allows the browser to quickly determine that it can serve the file directly from its local cache directory.

Caching Server

A caching server performs functions similar to those of a browser cache, only on a much larger scale. Where a browser cache is responsible for storing web objects for a single browser application on a single machine, a cache server stores web objects for a larger number of clients or perhaps even an entire network. With a cache server, all web requests from a network are passed through caching server, which then will serve the requested files to the client. The cache server can deliver content either directly from its own cache of objects, or by retrieving objects from the internet and then serving them to clients. ^[11]

Cache servers are a more efficient than browser caches as this network-level caching process makes the object available to all users of the network once it has been retrieved. With a browser cache, each user and, in fact, each browser application on a specific client must maintain a unique cache of files that is not shared with other clients or applications.

Also, cache servers use additional information provided by the web server in the headers sent along with each web request. Browser caches simply re-validate content with each request, confirming that the content has not been modified since it was last requested. Cache servers use the values sent in the Expires and Cache-Control header messages to set explicit expiry times for objects they store.

First Request for a file through a cache server

REQUEST

GET http://24.5.203.101/file.html HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-powerpoint, application/vnd.ms-excel, application/msword, application/x-comet, */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)
Host: 24.5.203.101
Proxy-Connection: Keep-Alive

RESPONSE

HTTP/1.0 200 OK
Date: Tue, 16 Jan 2001 15:46:42 GMT
Server: Apache
Cache-Control: max-age=604800
Last-Modified: Wed, 29 Nov 2000 15:28:38 GMT
ETag: "1df-28f1-3a2520a6"
Content-Length: 10481
Content-Type: text/html
Connection: Close

The first request from the client through a cache server shows two very interesting things.^[12] The first is that although the client request was sent out as HTTP/1.1, the server responded using HTTP/1.0. The browser caching example above demonstrated that the responding server uses HTTP/1.1. The change in protocol is the first clue that this data was served by a cache server.

The second item of interest is that the file that is initially served by the proxy server has a Date field set to January 16, 2001. This server is not serving stale data; this is the default time set by the cache server to indicate a new object that has been inserted in the cache.^[13]

Second Request for a file through a cache server Second Browser

REQUEST

GET http://24.5.203.101/file.html HTTP/1.1
Host: 24.5.203.101
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; 0.7) Gecko/20010109
Accept: */*
Accept-Language: en
Accept-Encoding: gzip,deflate,compress,identity
Keep-Alive: 300
Connection: keep-alive

RESPONSE

HTTP/1.0 200 OK
Date: Tue, 16 Jan 2001 15:46:42 GMT
Server: Apache
Cache-Control: max-age=604800
Last-Modified: Wed, 29 Nov 2000 15:28:38 GMT
ETag: "1df-28f1-3a2520a6"
Content-Length: 10481
Content-Type: text/html
Connection: Close

Third Request for a file through a cache server Second Client Machine

REQUEST

GET http://24.5.203.101/file.html HTTP/1.0
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/vnd.ms-powerpoint,
application/vnd.ms-excel, application/msword, */*
Accept-Language: en-us
User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)
Host: 24.5.203.101
Proxy-Connection: Keep-Alive

RESPONSE

HTTP/1.0 200 OK
Date: Tue, 16 Jan 2001 15:46:42 GMT
Server: Apache
Cache-Control: max-age=604800
Last-Modified: Wed, 29 Nov 2000 15:28:38 GMT
ETag: "1df-28f1-3a2520a6"
Content-Length: 10481
Content-Type: text/html
Connection: Close

A second request through the cache server, using another browser on the same client configured to use the cache server, indicates that this client retrieved the file from the cache server, not from the origin server. The Date field is the same as the initial request and the protocol has once again been swapped from HTTP/1.1 to HTTP/1.0.

The third example shows that the object is now not only available to different browsers on the same machine, but now that it is available to different machines on the same network, using the same cache server. By requesting the same content from another client machine on the same network, it is clear that the object is served to the client by the cache server, as the Date field set to the same value observed in the previous two examples.

Why should data be cached?

Many web pages that are downloaded by web browsers today are marked as being non-cacheable. The theory behind this is that there is so much dynamic and personalized content on the internet today that if any of it is cached, people using the web may not have the freshest possible content or they may end up receiving content that was personalized for another client making use of the same cache server.

The dynamic and personalized nature of the web today does make this a challenge, but if the design of a web-site is examined closely, it can be seen that these new features of the web can work hand-in-hand with content caching.

How does caching the perceived user experience? In both the browser caching and caching server discussions above, it has been demonstrated that caching helps attack the problem of internet performance on three fronts. First, caching moves content closer to the client, by placing it on local hard-drives or in local network caches. With data stored on or near the client, the network delay encountered when trying to retrieve the data is reduced or eliminated.

Secondly, caching reduces network traffic by serving content that is fresh as described above. Cache servers will attempt to confirm with the origin server that the objects stored in cache if not explicitly marked for expiry are still valid and do not need to be fully re-loaded across the internet. In order to gain the maximum performance benefit from object caching, it is vital to specify explicit cache expiry dates or periods.

The final performance benefit to properly defining caching configurations of content on an origin server is that server load is reduced. If the server uses carefully planned explicit caching policies, server load can be greatly reduced, improving the user experience.

When examining how the configuration of a web server can be modified to improve content cacheability, it is important keep in mind two very important considerations. First, the content and site administrators must have a very granular level of control over how the content being served will or wont be cached once it leaves their server. Secondly, within this need to control how content is cached, ways should be found to minimize the impact that client requests have on bandwidth and server load by allowing some content to be cached.

Take the example of a large, popular site that is noted for its dynamic content and rich graphics. Despite having a great deal of dynamic content, caching can serve a beneficial purpose without compromising the nature of the content being served. The primary focus of the caching evaluation should be on the rich graphical content of the site.

If the images of this site all have unique names that are not shared by any other object on the site, or the images all reside in the same directory tree, then this content can be marked differently within the server configuration, allowing it to be cached.^[14] A policy that allows these objects to be cached for 60, 120 or 180 seconds could have a large affect on reducing the bandwidth and server strain at the modified site. During this seemingly short period of time, several dozen of even several hundred different requests for the same object could originate from a large corporate network or ISP. If local cache servers can handle these requests, both the server and client sides of the transaction could see immediate performance improvements.

Taking a server header from an example used earlier in the paper, it can be demonstrated how even a slight change to the server header itself can help control the caching properties of dynamic content.

Dynamic Content

HTTP/1.1 200 OK
Date: Tue, 13 Feb 2001 14:50:31 GMT
Server: Apache/1.3.12
Cache-Control: no-cache, must-revalidate
Expires: Sat, 13 Jan 2001 14:50:31 GMT
Last-Modified: Sun, 03 Dec 2000 23:52:56 GMT
ETag: "1cbf3-dfd-3a2adcd8"
Accept-Ranges: bytes
Content-Length: 3581
Connection: close
Content-Type: text/html

Static Content

HTTP/1.1 200 OK
Date: Tue, 13 Feb 2001 14:50:31 GMT
Server: Apache/1.3.12
Cache-Control: max-age=43200, must-revalidate
Expires: Wed, 14 Feb 2001 02:50:31 GMT
Last-Modified: Sun, 03 Dec 2000 23:52:56 GMT
ETag: "1cbf3-dfd-3a2adcd8"
Accept-Ranges: bytes
Content-Length: 3581
Connection: close
Content-Type: text/html

As can been seen above, the only difference in the headers sent with the Dynamic Content and the Static Content are the Cache-Control and Expires values. The Dynamic Content example sets Cache-Control to no-cache, must-revalidate and Expires to one month in the past. This should prevent any cache from storing this data or serving it when a request is received to retrieve the same content.
The Static Content modifies these two settings, making the requested object cacheable for up to 12 hours (Cache-Control value set to 43,200) seconds and an Expires value that is exactly 12 hours in the future. After the period specified, the browser cache or caching server must re-validate the content before it can be served in response to local requests.

The must-revalidate item is not necessary, but it does add additional control over content. Some cache servers will attempt to serve content that is stale under certain circumstances, such as if the origin server for the content cannot be reached. The must-revalidate setting forces the cache server to re-validate the stale content, and return an error if it cannot be retrieved.

Differentiating caching policies based on the type of content served allows a very granular level of control over what is not cached, what is cached, and for how long the content can be cached for and still be considered fresh. In this way, server and web administrators can improve site performance a little or no additional development or capital cost.

It is very important to note that defining specific server-side caching policies will only have a beneficial affect on server performance if explicit object caching configurations are used. The two main types of explicit caching configurations are those set by the Expires header and the Cache-Control family of headers as seen in the example above. If no explicit value is set for object expiry, performance gains that might have been achieved are eliminated by a flood of unnecessary client and cache server requests to re-validate unchanged objects with the origin server.

Conclusion

Despite the growth of dynamic and personalized content on the web, there is still a great deal of highly cacheable material that is served to clients. However, many sites do not take advantage of the performance gains that can be achieved by isolating the dynamic and personalized content of their site from the relatively static content that is served alongside it.

Using the inherent ability to set explicit caching policies within most modern web-server applications, objects handled by a web-sever can be separated into unique content groups. With distinct caching policies for each defined group of web objects, the web-site administrator, not the cache administrator, has control over how long content is served without re-validation or re-loading. This granular control of explicit content caching policies can allow web-sites to achieve noticeable performance gains with no additional outlay for hardware and software.

caching_for_performance2002 Download

Endnotes

[1] The proximity that is referred to here is network proximity, not physical proximity. For example, AOLs network has some of the worlds largest cache servers and they are concentrated in Virginia; however, because of the structure of AOLs network, these cache servers are not far from the client.

[2] A re-verify is preferred as it consumes less bandwidth than a full re-load of the object from the origin server. With a re-verification, the origin server just confirms that the file is still valid and the cache server can simply reset the timer on the object.

[3] An HTTP header message is a data-control message sent by a web client or a web server to indicate a variety of data transmission parameters concerning the requests being made. Caching information is included in the information sent to and from the server.

[4] There are actually a substantially larger number of header messages that can be applied to a client or a server data transmission to communicate caching information. The most up-to-date list of the messages can be found in section 13 of RFC 2616, Hypertext Transfer Protocol — HTTP/1.1.

[5] A complete listing of the Cache-Control settings can be found in RFC 2616, Hypertext Transfer Protocol — HTTP/1.1, section 14.9.

[6] The initial file request that generated this header was sent on February 12, 2001.

[7] The Date header message indicates the date and time on the origin server when it responded to the request.

[8] A third type of caching, Reverse Caching or HTTPD Accelerators, are used at the server side to place highly cacheable content into high-speed machines that use solid-state storage to make retrieval of these objects very fast. This reduces the load on the web servers and allows them to concentrate on the generation of dynamic and personalized content.

[9] The data shown here is just application trace data.

[10] The Etag or entity tag is used to identify specific objects on a web server. Each item has unique Etag value, and this value is changed each time the file is modified. As an example, the Etag for a local web file was captured. This data was re-captured after the file was modified two carriage returns were inserted.

Test 1 Original File

ETag: "21ccd-10cb-399a1b33"

Test 2 Modified File

ETag: "21ccd-10cd-3a8c0597"

[11] This is where the other name for a cache server comes from, as the cache server acts as a proxy for the client making the request. The term proxy server is outdated as the term proxy assumes that the device will do exactly as the client requests; this is not always the case due to the security and content control mechanisms which are a part of all cache servers today. The client isnt always guaranteed to receive the complete content they requested. Infact, many networks do not allow any content into the network that does not first go through the cache devices on that network.

[12] The data shown here is just application trace data. For a more complete example of what the application and network properties of a web object retrieval are, please see Appendix A and B.

[13] All the data captures used in this example were taken on February 11-14, 2001.

[14] The description used here is based on the configuration options available with the Apache/1.3.x server family, which allows caching options to be set down to the file level. Other server applications may vary in their methods of applying individual caching policies to different sets of content on the same server.

Web Performance, Part VII: Reliability and Consistency

2006-09-07 / spierzchala / 0 Comments

In this series, the focus has been on the basic Web performance concepts, the ones that have dominated the performance management field for the last decade. It’s now time to step beyond these measures, and examine two equally important concepts, ones that allow a company to analyze their Web performance outside the constraints of performance and availability.

Reliability is often confused with availability when it is used in a Web performance context. Reliability, as a measurement and analysis concept goes far beyond the binary 0 or 1 that the term availability limits us to, and places it in the context of how availability affects the whole business.
Typical measures used in reliability include:

Minutes of outage
Number of failed measurements
Core business hours

Reliability is, by its very nature, a more complex way to examine the successful delivery of content to customers. It forces the business side of a company to define what times of day and days of the week affect the bottom-line more, and forces the technology side of the business to be able to account not simply for server uptime, but also for exact measures of when and why customers could not reach the site.
This approach almost always leads to the creation of a whole new metric, one that is uniquely tied to the expectations and demands of the business it was developed in. It may also force organizations to focus on key components of their online business, if a trend of repeated outages appears with only a few components of the Web site.

Consistency is uniquely paired with Reliability, in that it extends the concept of performance to beyond simple aggregates, and considers what the performance experience is like for the customer on each visit. Can a customer say that the site always responds the same way, or do you hear that sometimes your site is slow and unusable? Why is the performance of your site inconsistent?

A simple way to think of consistency is the old standby of the Standard Deviation. This gives the range in which the population of the measurements is clustered around the Arithmetic Mean. This value can depend on the number of measures in the population, as well as the properties of these unique measures.

Standard Deviation has a number of flaws, but provides a simple way to define consistency: a large standard deviation value indicates a high degree of inconsistency within the measurement population, whereas a low small standard deviation value indicates a higher degree of consistency.

The metric that is produced for consistency differs from the reliability metric in that it will always be measured in seconds or milliseconds. But the same insight may arise from consistency, that certain components of the Web site contribute more to the inconsistency of a Web transaction. Isolating these elements outside the context of the entire business process gives organizations the information they need to eliminate these issues more quickly.

Companies that have found that simple performance and availability metrics constrain their ability to accurately describe the performance of their Web site need to examine ways to integrate a formula for calculating Reliability, and a measure of Consistency into their performance management regime.

Web Performance, Part VI: Benchmarking Your Site

2006-09-01 / spierzchala / 2 Comments

In the last article in this series, the concept of baselining your measurements was discussed. This is vital, in order for you and your organization to be able to identify the particular performance patterns associated with your site.

Now that’s under control, you’re done, right?

Not a chance. Remember that your site is not the only Web site your customers visit. So, how are you doing against all of those other sites?

Let’s take a simple example of the performance for one week for one of the search firms. This is simply an example; I am just too lazy to change the names to protect the innocent.

Doesn’t look too bad. An easily understood pattern of slower performance during peak business hours appears in the data, presenting a predictable pattern which would serve as a great baseline for any firm. However, this baseline lacks context. If anyone tries to use a graph like this, the next question you should ask is “So what?”.

What makes a graph like this interesting but useless? That’s easy: A baseline graph is only the first step in the information process. A graph of your performance tells you how your site is doing. There is, however, no indication of whether this performance trend is good or bad.

Examining the performance of the same firm within a competitive and comparative context, the predictable baseline performance still appears predictable, but not as good as it could be. The graph shows that most of the other firms in the same vertical, performing the identical search, over the same period of time, and from the same measurement locations, do not show the same daytime pattern of performance degradation.

The context provided by benchmarking now becomes a critical factor. By putting the site side-by-side with other sites delivering the same service, an organization can now question the traditional belief that the site is doing well because we can predict how it will behave.

A simple benchmark such as the one above forces a company to ask hard questions, and should lead to reflection and re-examination of what the predictable baseline really means. A benchmark result should always lead a company to ask if their performance is good enough, and if they want to get better, what will it take.
Benchmarking relies on the idea of a business process. The old approach to benchmarks only considered firms in the narrowly defined scope of the industry verticals; another approach considers company homepages without any context or reliable comparative structure in place to compensate for the differences between pages and sites.

It is not difficult to define a benchmark that allows for the comparison of a major bank to a major retailer, and a major social networking site, and a major online mail provider. By clearly defining a business process that these sites share (in this case let’s take the user-authentication process) you can compare companies across industry verticals.

This cross-discipline comparison is crucial. Your customers do this with your site every single day. They visit your site, and tens, maybe hundreds, of other sites every week. They don’t limit their comparison to sites in the same industry vertical; they perform cross-vertical business process critiques intuitively, and then share these results with others anecdotally.

In many cases, a cross-vertical performance comparison cannot be performed, as there are too many variables and differences to perform a head-to-head speed analysis. Luckily for the Web performance field, speed is only one metric that can be used for comparison. By stretching Web site performance analysis beyond speed, comparing sites with vastly different business processes and industries can be done in a way that treats all sites equally. The decade-long focus on speed and performance has allowed other metrics to be pushed aside.

Having a fast site is good. But that’s not all there is to Web performance. If you were to compare the state of Web performance benchmarking to the car-buying public, the industry has been stuck in the role of a power-hungry, horsepower-obsessed teenage boy for too long. Just as your automobile needs and requirements evolve (ok, maybe this doesn’t apply to everyone), so do your Web performance requirements.

Web Performance, Part V: Baseline Your Data

2006-08-30 / spierzchala / 6 Comments

Up to this point, the series has focused on the mundane world of calculating statistical values in order to represent your Web performance data in some meaningful way. Now we step into the more exciting (I lead a sheltered life) world of analyzing the data to make some sense from it.

When companies sign up with a Web performance company, it has been my experience that the first thing that they want to do is get in there and push all the buttons and bounce on the seats. This usually involves setting up a million different measurements, and then establishing alerting thresholds for every single one of them that is of critical importance, emailed to the pagers of the entire IT team all the time.

Well interesting, it is also a great way for people to begin to actually ignore the data because:

It’s not telling them what they need to know
It’s telling them stuff when they don’t need to know it.

When I speak to a company for the first time, I often ask what their key online business processes are. I usually get either stunned silence or “I don’t know” as a response. Seriously, what has been purchased is a tool, some new gadget that will supposedly make life better; but no thought has been put into how to deploy and make use of the data coming in.

I have the luxury of being able to concentrate on one set of data all the time. In most environments, the flow of data from systems, network devices, e-mail updates, patches, business data simply becomes noise to be ignored until someone starts complaining that something is wrong. Web performance data becomes another data flow to react to, not act on.

So how do you begin to corral the beast of Web performance data? Start with the simplest question: what do we NEED to measure?

If you talked to IT, Marketing and Business Management, they will likely come up with three key areas that need to be measured:

Search
Authentication
Shopping Cart

Technology folks say, but that doesn’t cover the true complexity of our relational, P2P, AJAX-powered, social media, tagging Web 2.0 site.

Who cares! The three items listed above pay the bills and keep the lights on. If one of these isn’t working, you fix it now, or you go home.

Now, we have three primary targets. We’re set to start setting up alerts, and stuff, right?

Nope. You don’t have enough information yet.

This is your measurement after the first day. This gives you enough information to do all those bright and shiny things that you’ve heard your new Web performance tool can do, doesn’t it?

Here’s the same measurement after 4 days. Subtle but important changes have occurred. The most important of these is that the first day that data was gathered happened to be on a Friday night. Most sites would agree that the performance on a Friday night is far different than what you would find on a Monday morning. Monday morning shows this site showing a noticeable performance shift upward.

And what do you do when your performance looks like this?

Baselining is the ability to predict the performance of your site under normal circumstances on an ongoing basis. This is based on the knowledge that comes from understanding how the site has performed in the past, as well as how it has behaved under situations of abnormal behavior. Until you can predict how your site should behave, you can begin to understand why it behaves the way it does.

Focusing on the three key transaction paths or business processes listed above helps you and your team wrap your head around what the site is doing right now. Once a baseline for the site’s performance exists, then you can begin to benchmark the performance of your site by comparing it to others doing the same business process.

Web Performance, Part IV: Finding The Frequency

2006-08-30 / spierzchala / 1 Comment

In the last article, I discussed the aggregated statistics used most frequently to describe a population of performance data.

The pros and cons of each of these aggregated values has been examined, but now we come to the largest single flaw: these values attempt to assign a single value to describe an entire population of numbers.

The only way to describe a population of numbers is to do one of two things: Display every single datapoint in the population against the time it occurred, producing a scatter plot; or display the population as a statistical distribution.

The most common type of statistical distribution used in Web performance data is the Frequency Distribution. This type of display breaks the population down into measurements of a certain value range, then graphs the results by comparing the number of results in each value container.

So, taking the same population data used in the aggregated data above, the frequency distribution looks like this.

This gives a deeper insight into the whole population, by displaying the whole range of measurements, including the heavy tail that occurs in many Web performance result sets. Please note that a statistical heavy tail is essentially the same as Chris Anderson’s long tail, but in statistical analysis, a heavy tail represents a non-normally distributed data set, and skews the aggregated values you try and produce from the population.

As was noted in the aggregated values, the ‘average’ performance like falls between 0.88 and 1.04 seconds. Now, when you take these values and compare them to the frequency distribution, these values make sense, as the largest concentration of measurement values falls into this range.

However, the 85th Percentile for this population is at 1.20 seconds, where there is a large secondary bulge in the frequency distribution. After that, there are measurements that trickle out into the 40-second range.

As can be seen, a single aggregated number cannot represent all of the characteristics in a population of measurements. They are good representations, but that’s all they are.

So, to wrap up this flurry of a visit through the world of statistical analysis and Web performance data, always remember the old adage: Lies, Damn Lies, and Statistics.
In the next article, I will discuss the concept of performance baselining, and how this is the basis for Web performance evolution.

Web Performance, Part III: Moving Beyond Average

2006-08-30 / spierzchala / 0 Comments

In the previous article in this series, I talked about the fallacy of ‘average’ performance. Now that this has been dismissed, what do I propose to replace it with. There are three aggregated values that can be used to better represent Web performance data:

The links take you to articles that better explain the math behind each of these statistics. The focus here is why you would choose to use them rather than Arithmetic Mean.

The Median is the central point in any population of data. It is equal to the calculated value of the 50th Percentile, and is the point where half of the population lies above and below. So, in a large population of data, it can provide a good estimation of where center or average performance value is, regardless of the outliers at either end of the scale.

Geometric Mean is, well, a nasty calculation that I prefer to allow programmatic functions to handle for me. The advantage that it has over the Arithmetic Mean is that is influenced less by the outliers, producing a value that is always lower than or equal to the Arithmetic Mean. In the case of Web performance data, with populations of any size, the Geometric Mean is always lower than the Arithmetic Mean.

The 85th Percentile is the level below which 85% of the population of data lies. Now, some people use the 90th or the 95th, but I tend to cut Web sites more slack by granting them a pass on 15% of the measurement population.
So, what do these values look like?

These aggregated performance values are extracted from the same data population. Immediately, some things become clear. The Arithmetic Mean is higher than the Median and the Geometric Mean, by more than 0.1 seconds. The 85th Percentile is 1.19 seconds and indicates that 85% of all measurements in this data set are below this value.

Things that are bad to see:

An Arithmetic Mean that is substantially higher than the Geometric Mean and the Median
An 85th Percentile that is more than double the Geometric Mean

In these two cases, it indicates that there is a high number of large values in the measurement population, and that the site is exhibiting consistency issues, a topic for a later article in this series.

In all, these three metric provide a good quick hit, a representative single number that you can present in a meeting to say how the site is performing. But they all suffer from the same flaw — you cannot represent the entire population with an entire number.

The next article will discuss Frequency Distributions, and their value in the Web performance analysis field.

Web Performance, Part II: What are you calling average?

2006-08-30 / spierzchala / 1 Comment

For a decade, the holy grail of Web performance has been a low average performance time. Every company wants to have the lowest time, in some kind of chest-thumping, testosterone-pumped battle for supremacy.

Well, I am here to tell you that the numbers you have been using for the last decade have been lying. Well, lying is perhaps to strong a term. Deeply misleading is perhaps the more accurate way to describe the way that an average describes a population of results.
Now before you call your Web performance monitoring and measurement firms and tear a strip off them, let’s look at the facts. The numbers that everyone has been holding up as the gospel truth have been averages, or, more correctly, Arithmetic Means. We all learned these in elementary school: the sum of X values divided by X produces a value that approximates the average value for the entire population of X values.

Where could this go wrong in Web performance?

We wandered off course in a couple of fundamental ways. The first is based on the basic assumption of Arithmetic Mean calculations, that the population of data used is more or less Normally Distributed.

Well folks, Web performance data is not normally distributed. Some people are more stringent than I am, but my running assumption is that in a population of measurements, up to 15% are noise resulting from “stuff happens on the Internet”. This outer edge of noise, or outliers, can have a profound skewing effect on the Arithmetic Mean for that population.

“So what?”, most of you are saying. Here’s the kicker: As a result of this skew, the Arithmetic Mean usually produces a Web performance number that is higher than the real average of performance.

So why do we use it? Simple: Relational databases are really good at producing Arithmetic Means, and lousy at producing other statistical values. Short of writing your own complex function, which on most database systems equates to higher compute times, the only way to produce more accurate statistical measures is to extract the entire population of results and produce the result in external software.
If you are building an enterprise class Web performance measurement reporting interface, and you want to calculate other statistical measures, you better have deep pockets and a lot of spare computing cycles, because these multi-million row calculations will drain resources very quickly.

So, for most people, the Arithmetic Mean is the be all and end all of Web performance metrics. In the next part of this series, I will discuss how you can break free of this madness and produce values that are truer representations of average performance.

Web Performance, Part I: Fundamentals

2006-08-30 / spierzchala / 9 Comments

If you ask 15 different people what the phrase Web performance means to them, you will get 30 different answers. Like all things in this technological age, the definition is in the eye of the beholder. To the Marketing person, it is delivering content to the correct audience in a manner that converts visitors into customers. To the business leader, it is the ability of a Web site to deliver on a certain revenue goal, while managing costs and creating shareholder/investor value.

For IT audiences, the mere mention of the phrase will spark a debate that would frighten the UN Security Council. Is it the Network? The Web server? The designers? The application? What is making the Web site slow?

So, what is Web performance? It is everything mentioned above, and more. Working in this industry for nine years, I have heard all facets of the debate. And all of the above positions will appear in every organization with a Web site to varying degrees.

In this ongoing series, I will examine various facets of Web performance, from the statistical measures used to truly analyze Web performance data, to the concepts that drive the evolution of a company from “Hey, we really need to know how fast our Web page loads” to “We need to accurately correlate the performance of our site to traffic volumes and revenue generation”.

Defining Web performance is much harder than it seems. It’s simplest metrics are tied into the basic concepts of speed and success rate (availability). These concepts have been around a very long time, and are understood all the way up to the highest levels of any organization.

However, this very simple state is one that very few companies manage to evolve away from. It is the lowest common denominator in Web performance, and only provides a mere scraping of the data that is available within every company.
As a company evolves and matures in its view toward Web performance, the focus shifts away from the basic data, and begins to focus on the more abstract concepts of reliability and consistency. These force organizations to step away from the aggregated and simplistic approach of speed and availability, to a place where the user experience component of performance is factored into the equation.

After tackling consistency and reliability, the final step is toward performance optimization. This is a holistic approach to Web performance, a place where speed and availability data are only one component of an integrated whole. Companies at this strata are usually generating their own performance dashboards with combinations of data sources that correlate disparate data sources in a way that provides a clear and concise view not only of the performance of their Web site, but also of the health of their entire online business.

During this series, I will refer to data and information very frequently. In today’s world, even after nearly a decade of using Web performance tools and services, most firms only rely on data. All that matters is that the measurements arrive.
The smartest companies move to the next level and take that data and turn it into information, ideas that can shape the way that they design their Web site, service their customers, and view themselves against the entire population of Internet businesses.

This series will not be a technical HOWTO on making your site faster. I cover a lot of that ground in another of my Web sites.

What this series will do is lead you through the minefield of Web performance ideas, so that when you are asked what you think Web performance is, you can present the person asking the question with a clear, concise answer.
The next article in this series will focus on Web performance measures: why and when you use them, and how to present them to a non-technical audience.