Tag: web compression

HTTP Compression – Have you checked ALL your browsers?

Apache has been my web server of choice for more than a decade. It was one of the first things I learned to compile and manage properly on linux, so I have a great affinity for it. However, there are still a few gotchas that are out there that make me grateful that I still know my way around the httpd.conf file.

HTTP compression is something I have advocated for a long time (just Googled my name and compression – I wrote some of that stuff?) as just basic common sense.
Make Stuff Smaller. Go Faster. Cost Less Bandwidth. Lower CDN Charges. [Ok, I can’t be sure of the last one.]

But, browsers haven’t always played nice. At least up until about 2008. After then, I can be pretty safe in saying that even the most brain-damaged web and mobile browsers could handle pretty much any compressed content we threw at them.

Oh, Apache! But where were you? There is an old rule that is still out there, buried deep in the httpd.conf file that can shoot you. I actually caught it yesterday when looking at a site using IE8 and Firefox 8 measurement agents at work. Firefox was about 570K while IE was nearly 980K. Turns out that server was not compressing CSS and JS files sent to IE due to this little gem:

 BrowserMatch \bMSIE !no-gzip gzip-only-text/html

This was in response to some issues with HTTP Compression in IE 5 and early versions of IE6 – remember them? – and was appropriate then. Guess what? If you still have this buried in your Apache configuration (or any web server or hardware device that does compression for you), break out the chisels: it’s likely your httpd.conf file hasn’t been touched since the stone age.

Take. It. Out. NOW!

Your site shouldn’t see traffic from any browsers that don’t support compression (unless they’re robots and then, oh well!) so having rules that might accidentally deny compression might cause troubles. Turn the old security ACL rule around for HTTP compression:

Allow everything, then explicitly disable compression.

That should help prevent any accidents. Or higher bandwidth bills due to IE traffic.

mod_gzip Compile Instructions

The last time I attempted to compile mod_gzip into Apache, I found that the instructions for doing so were not documented clearly on the project page. After a couple of failed attempts, I finally found the instructions buried at the end of the ChangeLog document.

I present the instructions here to preserve your sanity.

Before you can actually get mod_gzip to work, you have to uncomment it in the httpd.conf file module list (Apache 1.3.x) or add it to the module list (Apache 2.0.x).


Now there are two ways to build mod_gzip: statically compiled into Apache and a DSO-File for mod_so. If you want to compile it statically into Apache, just copy the source to Apache src/modules directory and there into a subdirectory named ‘gzip’. You can activate it via a parameter of the configure script.

 ./configure --activate-module=src/modules/gzip/mod_gzip.a
 make
 make install

This will build a new Apache with mod_gzip statically built in.

The DSO-Version is much easier to build.

 make APXS=/path/to/apxs
 make install APXS=/path/to/apxs
 /path/to/apachectl graceful

The apxs script is normally located inside the bin directory of Apache.

Compressing Web Output Using mod_deflate and Apache 2.0.x


In a previous paper, the use of mod_gzip to dynamically compress the output from an Apache server. With the growing use of the Apache 2.0.x family of Web servers, the question arises of how to perform a similar GZIP-encoding function within this server. The developers of the Apache 2.0.x servers have included a module in the codebase for the server to perform just this task.

mod_deflate is included in the Apache 2.0.x source package, and compiling it in is a simple matter of adding it to the configure command.

	./configure --enable-modules=all --enable-mods-shared=all --enable-deflate

When the server is made and installed, the GZIP-encoding of documents can be enabled in one of two ways: explicit exclusion of files by extension; or by explcit inclusion of files by MIME type. These methods are specified in the httpd.conf file.


Explicit Exclusion

SetOutputFilter DEFLATE
DeflateFilterNote ratio
SetEnvIfNoCase Request_URI .(?:gif|jpe?g|png)$ no-gzip dont-vary
SetEnvIfNoCase Request_URI .(?:exe|t?gz|zip|bz2|sit|rar)$ no-gzip dont-vary
SetEnvIfNoCase Request_URI .pdf$ no-gzip dont-vary

Explicit Inclusion

DeflateFilterNote ratio
AddOutputFilterByType DEFLATE text/*
AddOutputFilterByType DEFLATE application/ms* application/vnd* application/postscript

Both methods enable the automatic GZIP-encoding of all MIME-types, except image and PDF files, as they leave the server. Image files and PDF files are excluded as they are already in a highly compressed format. In fact, PDFs become unreadable by Adobe’s Acrobat Reader if they are further compressed by mod_deflate or mod_gzip.

On the server used for testing mod_deflate for this article, no Windows executables or compressed files are served to visitors. However, for safety’s sake, please ensure that compressed files and binaries are not GZIP-encoded by your Web server application.

For the file-types indicated in the exclude statements, the server is told explicitly not to send the Vary header. The Vary header indicates to any proxy or cache server which particular condition(s) will cause this response to Vary from other responses to the same request.

If a client sends a request which does not include the Accept-Encoding: gzip header, then the item which is stored in the cache cannot be returned to the requesting client if the Accept-Encoding headers do not match. The request must then be passed directly to the origin server to obtain a non-encoded version. In effect, proxy servers may store 2 or more copies of the same file, depending on the client request conditions which cause the server response to Vary.

Removing the Vary response requirement for objects not handled means that if the objects do not vary due to any other directives on the server (browser type, for example), then the cached object can be served up without any additional requests until the Time-To-Live (TTL) of the cached object has expired.

In examining the performance of mod_deflate against mod_gzip, the one item that distinguished the two modules in versions of Apache prior to 2.0.45 was the amount of compression that occurred. The examples below demonstrate that the compression algorithm for mod_gzip produces between 4-6% more compression than mod_deflate for the same file.[1]

Table 1 – /compress/homepage2.html

CompressionSizeCompression %
No compression56380 bytesn/a
Apache 1.3.x/mod_gzip16333 bytes29% of original
Apache 2.0.x/mod_deflate19898 bytes35% of original

Table 2 – /documents/spierzchala-resume.ps

CompressionSizeCompression %
No Compression63451 bytesn/a
Apache 1.3.x/mod_gzip19758 bytes31% of original
Apache 2.0.x/mod_deflate23407 bytes37% of original

Attempts to increase the compression ratio of mod_deflate in Apache 2.044 and lower using the directives provided for this module produced no further decrease in transferred file size. A comment from one of the authors of the mod_deflate module stated that the module was written specifically to ensure that server performance was not degraded by using this compression method. The module was, by default, performing the fastest compression possible, rather than a mid-range compromise between speed and final file size.

Starting with Apache 2.0.45, the compression level of mod_deflate is configurable using the DeflateCompressionLevel directive. This directive accepts values between 1 (fastest compression speed; lowest compression ratio) and 9 (slowest compression speed; highest compression ratio), with the default value being 6. This simple change makes the compression in mod_deflate comparable to mod_gzip out of the box.

Using mod_deflate for Apache 2.0.x is a quick and effective way to decrease the size of the files that are sent to clients. Anything that can produce between 50% and 80% in bandwidth savings with so little effort should definitely be considered for any and all Apache 2.0.x deployments wishing to use the default Apache codebase.


[1] A note on the compression in mod_deflate for Apache 2.044 and lower: The level of compression can be modified by changing the ZLIB compression setting in mod_deflate.c from Z_BEST_SPEED (equivalent to “gzip -1”) to Z_BEST_COMPRESSION (equivalent to “gzip -9”). These defaults can also be replaced with a numeric value between 1 and 9.

More info on hacking mod_deflate for Apache 2.0.44 and lower can be found here.

HTTP Compression and Web 2.0

HTTP Compression is a well-acknowledged way to improve Web performance and decrease bandwidth usage by compressing text content before transmitting it to the client. This has become an increasingly interesting topic for Web 2.0 sites starting to experience their first growth pains.

COMPANYCOMPRESSION
TechnoratiNO
FlickrNO
WikipediaYES
BloggerYES
FeedsterNO
BloglinesYES
GizmodoYES
TypePadYES
Weblogs INCNO
Scripting NewsNO
MemeorandumNO

The sample above is far from representative. However, I would have thought that companies leading the Web 2.0 revolution would be learning the Web performance lessons of the previous generation.

All data was gathered using the GrabPERF Monitoring System


Performance Improvement From Caching and Compression

This paper is an extension of the work done for another article that highlighted the performance benefits of retrieving uncompressed and compressed objects directly from the origin server. I wanted to add a proxy server into the stream and determine if proxy servers helped improve the performance of object downloads, and by how much.

Using the same series of objects in the original compression article[1], the CURL tests were re-run 3 times:

  1. Directly from the origin server
  2. Through the proxy server, to load the files into cache
  3. Through the proxy server, to avoid retrieving files from the origin.[2]

eries of three tests was repeated twice: once for the uncompressed files, and then for the compressed objects.[3]
As can be seen clearly in the plots below, compression caused web page download times to improve greatly, when the objects were retrieved from the source. However, the performance difference between compressed and uncompressed data all but disappears when retrieving objects from a proxy server on a corporate LAN.

Instead of the linear growth between object size and download time seen in both of the retrieval tests that used the origin server (Source and Proxy Load data), the Proxy Draw data clearly shows the benefits that accrue when a proxy server is added to a network to assist with serving HTTP traffic.

Uncompressed Pages

 MEAN DOWNLOAD TIME
Total Time Uncompressed — No Proxy0.256
Total Time Uncompressed — Proxy Load0.254
Total Time Uncompressed — Proxy Draw0.110

Compressed Pages

 MEAN DOWNLOAD TIME
Total Time Compressed — No Proxy0.181
Total Time Compressed — Proxy Load0.140
Total Time Compressed — Proxy Draw0.104

The data above shows just how much of an improvement is gained by adding a local proxy server, explicit caching descriptions and compression can add to a Web site. For sites that do force a great of requests to be returned directly to the origin server, compression will be of great help in reducing bandwidth costs and improving performance. However, by allowing pages to be cached in local proxy servers, the difference between compressed and uncompressed pages vanishes.

Conclusion

Compression is a very good start when attempting to optimize performance. The addition of explicit caching messages in server responses which allow proxy servers to serve cached data to clients on remote local LANs can improve performance to even a greater extent than compression can. These two should be used together to improve the overall performance of Web sites.


[1] The test set was made up of the 1952 HTML files located in the top directory of the Linux Documentation Project HTML archive.
[2] All of the pages in these tests announced the following server response header indicating its cacheability: Cache-Control: max-age=3600
[3] A note on the compressed files: all compression was performed dynamically by mod_gzip for Apache/1.3.27.

Copyright © 2024 Performance Zen

Theme by Anders NorenUp ↑