How much improvement can you see with compression? The difference in measured download times on a very lightly loaded server indicates that the time to download the Base Page (the initial HTML file) improved by between 1.3 and 1.6 seconds across a very slow connection when compression was used.

Base Page Performance

There is a slightly slower time for the server to respond to a client requesting a compressed page. Measurements show that the median response time for the server averaged 0.23 seconds for the uncompressed page and 0.27 seconds for the compressed page. However, most Web server administrators should be willing to accept a 0.04 increase in response time to achieve a 1.5 second improvement in file transfer time.

First Byte Performance

Web pages are not completely HTML. How do improved HTML (and CSS) download times affect overall performance? The graph below shows that overall download times for the test page were 1 to 1.5 seconds better when the HTML files were compressed.

Total Page Performance

To further emphasize the value of compression, I ran a test on a Web server to see what the average compression ratio would be when requesting a very large number of files. As well, I wanted to determine what the affect on server response time would be when requesting large numbers of compressed files simultaneously.

There were 1952 HTML files in the test directory and I checked the results using CURL across my local LAN.[1]


 

Large sample of File Requests (1952 HTML Files)

mod_gzip

  UncompressedCompressed
First Byte   
 Mean0.0910.084
 Median0.0300.036
Total Time   
 Mean0.2800.128
 Median0.1730.079
Bytes per Page   
 Mean63492416
 Median37501543
Total Bytes 123923184716160

mod_deflate[2]

  UncompressedCompressed
First Byte   
 Mean0.0440.046
 Median0.0280.031
Total Time   
 Mean0.2410.107
 Median0.1690.050
Bytes per Page   
 Mean63492418
 Median37501544
Total Bytes 123923184720735
 mod_gzipmod_deflate
Average Compression0.4330.438
Median Compression0.4270.427

As expected, the First Byte download time was slightly higher with the compressed files than it was with the uncompressed files. But this difference was in milliseconds, and is hardly worth mentioning in terms of on-the-fly compression. It is unlikely that any user, especially dial-up users, would notice this difference in performance.

That the delivered data was transformed to 43% of the original file size should make any Web administrator sit up and notice. The compression ratio for the test files ranged from no compression for files that were less than 300 bytes, to 15% of original file size for two of the Linux SCSI Programming HOWTOs.

Compression ratios do not increase in a linear fashion when compared to file size; rather, compression depends heavily on the repetition of content within a file to gain its greatest successes. The SCSI Programming HOWTOs have a great deal of repeated characters, making them ideal candidates for extreme compression.

Smaller files also did not compress as well as larger files, exactly for this reason. Fewer bytes means a lower probability of repeated bytes, resulting in a lower compression ratio.


 

Average Compression by File Size

  mod_gzip mod_deflate
0-999 0.713 0.777[3]
1000-4999 0.440 0.440
5000-9999 0.389 0.389
10000-19999 0.369 0.369
20000-49999 0.350 0.350
50000 and up 0.329 0.331
 mod_gzipmod_deflate0-9990.7130.777[3]1000-49990.4400.4405000-99990.3890.38910000-199990.3690.36920000-499990.3500.35050000 and up0.3290.331
 mod_gzipmod_deflate
0-9990.7130.777[3]
1000-49990.4400.440
5000-99990.3890.389
10000-199990.3690.369
20000-499990.3500.350
50000 and up0.3290.331

The data shows that compression works best on files larger than 5000 bytes; after that size, average compression gains are smaller, unless a file has a large number of repeated characters. Some people argue that compressing files below a certain size is a wasteful use of CPU cycles. If you agree with these folks, using the 5000 byte value as floor value for compressing files should be a good starting point. I am of the opposite mindset: I compress everything that comes off my servers because I consider myself an HTTP overclocker, trying to squeeze every last bit of download performance out of the network.

Conclusion

With a few simple commands, and a little bit of configuration, an Apache Web server can be configured to deliver a large amount of content in a compressed format. These benefits are not simply limited to static pages; dynamic pages generated by PHP and other dynamic content generators can be compressed by using the Apache compression modules. When added other performance tuning mechanisms and appropriate server-side caching rules, these modules can substantially reduce the bandwidth for a very low cost.


[1] The files were the top level HTML files from the Linux Documentation Project. They were installed on an Apache 1.3.27 server running mod_gzip and an Apache 2.0.44 server using mod_deflate. Minimum file size was 80 bytes and maximum file size was 99419 bytes.

[2] mod_deflate for Apache/2.0.44 and earlier comes with the compression ratio set for Best Speed, not Best Compression. This configuration can be modified using the tips found here; and starting with Apache/2.0.45, there will be a configuration directive that will allow admins to configure the compression ratio that they want.

In this example, the compression ratio was set to Level 6.

[3] mod_deflate does not have a lower bound for file size, so it attempts to compress files that are too small to benefit from compression. This results in files smaller than approximately 120 bytes becoming larger when processed by mod_deflate.