Category: Uncategorized

Aren't tracer rounds illegal?

2006-10-06 / spierzchala / 0 Comments

So, after 6 years of controlling and managing my own Web server, I have handed responsibility over to 1 & 1. I wish I could say that there was a really good reason why I’ve done this, but frankly, it’s because I don’t need a lot of oooommmmph for my personal domains (they run happily on a low-end Pentium II Celeron), and the price was right.
GrabPERF is still happily hosted by the folks at Technorati, while WordPress.com controls my blog.
In some ways, I am glad that someone else has these headaches now.

Performance Improvement From Compression

2006-10-03 / spierzchala / 2 Comments

How much improvement can you see with compression? The difference in measured download times on a very lightly loaded server indicates that the time to download the Base Page (the initial HTML file) improved by between 1.3 and 1.6 seconds across a very slow connection when compression was used.

Base Page Performance

There is a slightly slower time for the server to respond to a client requesting a compressed page. Measurements show that the median response time for the server averaged 0.23 seconds for the uncompressed page and 0.27 seconds for the compressed page. However, most Web server administrators should be willing to accept a 0.04 increase in response time to achieve a 1.5 second improvement in file transfer time.

First Byte Performance

Web pages are not completely HTML. How do improved HTML (and CSS) download times affect overall performance? The graph below shows that overall download times for the test page were 1 to 1.5 seconds better when the HTML files were compressed.

Total Page Performance

To further emphasize the value of compression, I ran a test on a Web server to see what the average compression ratio would be when requesting a very large number of files. As well, I wanted to determine what the affect on server response time would be when requesting large numbers of compressed files simultaneously.

There were 1952 HTML files in the test directory and I checked the results using CURL across my local LAN.[1]

Large sample of File Requests (1952 HTML Files)

mod_gzip

		Uncompressed	Compressed
First Byte
	Mean	0.091	0.084
	Median	0.030	0.036
Total Time
	Mean	0.280	0.128
	Median	0.173	0.079
Bytes per Page
	Mean	6349	2416
	Median	3750	1543
Total Bytes		12392318	4716160

mod_deflate[2]

		Uncompressed	Compressed
First Byte
	Mean	0.044	0.046
	Median	0.028	0.031
Total Time
	Mean	0.241	0.107
	Median	0.169	0.050
Bytes per Page
	Mean	6349	2418
	Median	3750	1544
Total Bytes		12392318	4720735

	mod_gzip	mod_deflate
Average Compression	0.433	0.438
Median Compression	0.427	0.427

As expected, the First Byte download time was slightly higher with the compressed files than it was with the uncompressed files. But this difference was in milliseconds, and is hardly worth mentioning in terms of on-the-fly compression. It is unlikely that any user, especially dial-up users, would notice this difference in performance.

That the delivered data was transformed to 43% of the original file size should make any Web administrator sit up and notice. The compression ratio for the test files ranged from no compression for files that were less than 300 bytes, to 15% of original file size for two of the Linux SCSI Programming HOWTOs.

Compression ratios do not increase in a linear fashion when compared to file size; rather, compression depends heavily on the repetition of content within a file to gain its greatest successes. The SCSI Programming HOWTOs have a great deal of repeated characters, making them ideal candidates for extreme compression.

Smaller files also did not compress as well as larger files, exactly for this reason. Fewer bytes means a lower probability of repeated bytes, resulting in a lower compression ratio.

Average Compression by File Size

	mod_gzip	mod_deflate
0-999	0.713	0.777[3]
1000-4999	0.440	0.440
5000-9999	0.389	0.389
10000-19999	0.369	0.369
20000-49999	0.350	0.350
50000 and up	0.329	0.331

mod_gzip

mod_deflate

0-999

0.713

0.777[3]

1000-4999

0.440

5000-9999

0.389

10000-19999

0.369

20000-49999

0.350

50000 and up

0.329

0.331

mod_gzip

mod_deflate

0-999

0.713

0.777[3]

1000-4999

0.440

5000-9999

0.389

10000-19999

0.369

20000-49999

0.350

50000 and up

0.329

0.331

The data shows that compression works best on files larger than 5000 bytes; after that size, average compression gains are smaller, unless a file has a large number of repeated characters. Some people argue that compressing files below a certain size is a wasteful use of CPU cycles. If you agree with these folks, using the 5000 byte value as floor value for compressing files should be a good starting point. I am of the opposite mindset: I compress everything that comes off my servers because I consider myself an HTTP overclocker, trying to squeeze every last bit of download performance out of the network.

Conclusion

With a few simple commands, and a little bit of configuration, an Apache Web server can be configured to deliver a large amount of content in a compressed format. These benefits are not simply limited to static pages; dynamic pages generated by PHP and other dynamic content generators can be compressed by using the Apache compression modules. When added other performance tuning mechanisms and appropriate server-side caching rules, these modules can substantially reduce the bandwidth for a very low cost.

[1] The files were the top level HTML files from the Linux Documentation Project. They were installed on an Apache 1.3.27 server running mod_gzip and an Apache 2.0.44 server using mod_deflate. Minimum file size was 80 bytes and maximum file size was 99419 bytes.

[2] mod_deflate for Apache/2.0.44 and earlier comes with the compression ratio set for Best Speed, not Best Compression. This configuration can be modified using the tips found here; and starting with Apache/2.0.45, there will be a configuration directive that will allow admins to configure the compression ratio that they want.

In this example, the compression ratio was set to Level 6.

[3] mod_deflate does not have a lower bound for file size, so it attempts to compress files that are too small to benefit from compression. This results in files smaller than approximately 120 bytes becoming larger when processed by mod_deflate.

Baseline Testing With cURL

2006-10-03 / spierzchala / 3 Comments

cURL is an application that can be used to retrieve any Internet file that uses the standard URL format — http://, ftp://, gopher://, etc. Its power and flexibility can be added to applications by using the libcurl library, whose API can be accessed easily using most of the commonly used scripting and programming languages.

So, how does cURL differ from some of the other command-line URL retrieval tools such as WGET? Both do very similar things, and can be coaxed to retrieve large lists of files or even mirror entire Web sites. In fact, for the automated retrieval of single files for the Internet for storage on local filesystems — such as downloading source files onto servers for building applications — WGET’s syntax is the simplest to use.

However, for simple baseline testing, WGET lacks cURL’s ability to produce timing results that can be written to an output file in a user-configurable format. cURL gathers a large amount of data about a transfer that can then be used for analysis or logging purposes. This makes it a step ahead of WGET for baseline testing.

cURL Installation

For the purposes of our testing, we have used cURL 7.10.5-pre2 as it adds support for downloading and interpreting GZIP-encoded content from Web servers. Because it is a pre-release version, it is currently only available as source for compiling. The compilation was smooth, and straight-forward.

$ ./configure --with-ssl --with-zlib
$ make
$ make test
[...runs about 120 checks to ensure the application and library will work as expected..]
# make install

The application installed in /usr/local/bin on my RedHat 9.0 laptop.
Testing cURL is straight-forward as well.

$ curl http://slashdot.org/
[...many lines of streaming HTML omitted...]

Variations on this standard theme include:

Send output to a file instead of STDOUT

	$ curl -o ~/slashdot.txt http://slashdot.org/

Request compressed content if the Web server supports it

	$ curl --compressed http://slashdot.org/

Provide total byte count for downloaded HTML

	$ curl -w %{size_download} http://slashdot.org/

Baseline Testing with cURL

With the application installed, you can now begin to design a baseline test. This methodology is NOT a replacement for true load testing, but rather a method for giving small and medium-sized businesses a sense of how well their server will perform before it is deployed into production, as well as providing a baseline for future tests. This baseline can then be used as a basis for comparing performance after configuration changes in the server environment, such as caching rule changes or adding solutions that are designed to accelerate Web performance.

To begin, a list of URLs needs to be drawn up and agreed to as a baseline for the testing. For my purposes, I use the files from the Linux Documentation project, intermingled with a number of images. This provides the test with a variety of file sizes and file types. You could construct your own file-set out of any combination of documents/files/images you wish. However, the file-set should be large — mine runs to 2134 files.

Once the file-set has been determined, it should be archived so that this same group can be used for future performance tests; burning it to a CD is always a safe bet.

Next, extract the filenames to a text file so that the configuration file for the tests can be constructed. I have done this for my tests, and have it set up in a generic format so that when I construct the configuration for the next test, I simply have to change/update the URL to reflect the new target.

The configuration of the rest of the parameters should be added to the configuration file at this point. These are all the same as the command line versions, except for the URL listing format.

Listing of test_config.txt

-A "Mozilla/4.0 (compatible; cURL 7.10.5-pre2; Linux 2.4.20)"
-L
-w @logformat.txt
-D headers.txt
-H "Pragma: no-cache"
-H "Cache-control: no-cache"
-H "Connection: close"
url="http://www.foobar.com/1.html"
url="http://www.foobar.com/2.png"
[...file listing...]

In the above example, I have set cURL to:

Use a custom User-Agent string
Follow any re-direction responses that contain a “Location:” response header
Dump the server response headers to headers.txt
Circumvent cached responses by sending the two main “no-cache” request headers
Close the TCP connection after each object is downloaded, overriding cURL’s default use of persistent connections
Format the timing and log output using the format that is described in logformat.txt

Another command-line option that I use a lot is –compressed, which, as of cURL 7.10.5, handles both the deflate and gzip encoding of Web content, including decompression on the fly. This is great for comparing the performance improvements and bandwidth savings from compression solutions against a baseline test without compression. Network administrators may also be interested in testing the improvement that they get using proxy servers and client-side caches by inserting –proxy <proxy[:port]> into the configuration, removing the “no-cache” headers, and testing a list of popular URLs through their proxy servers.

The logformat.txt file describes the variables that I find of interest and that I want to use for my analysis.

Listing of logformat.txt

\n
%{url_effective}\t%{http_code}\t%{content_type}\t%{time_total}\t%{time_lookup}\t /
	%{time_connect}\t%{time_starttransfer}\t{size_download}\n
\n

These variables are defined as:

url_effective: URL used to make the final request, especially when following re-directions
http_code: HTTP code returned by the server when delivering the final HTML page requested
content_type: MIME type returned in the final HTML request
time_total: Total time for the transfer to complete
time_lookup: Time from start of transfer until DNS Lookup complete
time_connect: Time from start of transfer until TCP connection complete
time_starttransfer: Time from start of transfer until data begins to be returned from the server
size_download: Total number of bytes transferred, excluding headers

As time_connect and time_starttransfer are cumulative from the beginning of the transfer, you have to do some math to come up with the actual values.

TCP Connection Time = time_connect – time_lookup
Time First Byte = time_starttransfer – time_connect
Redirection Time = time_total – time_starttransfer

If you are familiar with cURL, you may wonder why I have chosen not to write the output to a file using the -o <file> option. It appears that this option only records the output for the first file requested, even in a large list of files. I prefer to use the following command to start the test and then post-process the results using grep.

$ curl -K test_config.txt >> output_raw_1.txt
[...lines and lines of output...]
$ grep -i -r "^http://www.foobar.com/.*$" output_raw_1.txt >> output_processed_1.txt

And voila! You now have a tab delimited file you can drop into your favorite spreadsheet program to generate the necessary statistics.

Hacking mod_deflate for Apache 2.0.44 and lower

2006-10-03 / spierzchala / 2 Comments

NOTE: This hack is only relevant to Apache 2.0.44 or lower. Starting with Apache 2.0.45, the server contains the DeflateCompressionLevel directive, which allows for user-configured compression levels in the httpd.conf file.

One of the complaints leveled against mod_deflate for Apache 2.0.44 and below has been the lower compression ratio that it produces when compared to mod_gzip for Apache 1.3.x and 2.0.x. This issue has been traced to a decision made by the author of mod_deflate to focus on fast compression versus maximum compression.

In discussions with the author of mod_deflate and the maintainer of mod_gzip, the location of the issue was quickly found. The level of compression can be easily modified by changing the ZLIB compression setting in mod_deflate.c from Z_BEST_SPEED (equivalent to “zip -1”) to Z_BEST_COMPRESSION (equivalent to “zip -9”). These defaults can also be replaced with a numeric value between 1 and 9. A “hacked” version of the mod_deflate.c code is available here.

In this file, the compression level has been set to 6, which is regarded as a good balance between speed and compression (and also happens to be ZLIB’s default ratio). Some other variations are highlighted below.

Original Code

zRC = deflateInit2(&ctx->stream, Z_BEST_SPEED, Z_DEFLATED, c->windowSize, c->memlevel, Z_DEFAULT_STRATEGY);

Hacked Code

1. zRC = deflateInit2(&ctx->stream, Z_BEST_COMPRESSION, Z_DEFLATED, c->windowSize, c->memlevel, Z_DEFAULT_STRATEGY);

2. zRC = deflateInit2(&ctx->stream, 6, Z_DEFLATED, c->windowSize, c->memlevel, Z_DEFAULT_STRATEGY);

3. zRC = deflateInit2(&ctx->stream, 9, Z_DEFLATED, c->windowSize, c->memlevel, Z_DEFAULT_STRATEGY);

A change has been made to mod_deflate in Apache 2.0.45 that adds a directive named DeflateCompressionLevel to the mod_deflate options. This will accept a numeric value between 1 (Best Speed) and 9 (Best Compression), with the default set at 6.

Compressing Web Output Using mod_gzip for Apache 1.3.x and 2.0.x

2006-10-03 / spierzchala / 4 Comments

Web page compression is not a new technology, but it has just recently gained higher recognition in the minds of IT administrators and managers because of the rapid ROI it generates. Compression extensions exist for most of the major Web server platforms, but in this article I will focus on the Apache and mod_gzip solution.
The idea behind GZIP-encoding documents is very straightforward. Take a file that is to be transmitted to a Web client, and send a compressed version of the data, rather than the raw file as it exists on the filesystem. Depending on the size of the file, the compressed version can run anywhere from 50% to 20% of the original file size.

In Apache, this can be achieved using a couple of different methods. Content Negotiation, which requires that two separate sets of HTML files be generated — one for clients that can handle GZIP-encoding, and one for those who can’t — is one method. The problem with this solution should be readily apparent: there is no provision in this methodology for GZIP-encoding dynamically-generated pages.
The more graceful solution for administrators who want to add GZIP-encoding to Apache is the use of mod_gzip. I consider it one of the overlooked gems for designing a high-performance Web server. Using this module, configured file types — based on file extension or MIME type — will be compressed using GZIP-encoding after they have been processed by all of Apache’s other modules, and before they are sent to the client. The compressed data that is generated reduces the number of bytes transferred to the client, without any loss in the structure or content of the original, uncompressed document.

mod_gzip can be compiled into Apache as either a static or dynamic module; I have chosen to compile it as a dynamic module in my own server. The advantage of using mod_gzip is that this method requires that nothing be done on the client side to make it work. All current browsers — Mozilla, Opera, and even Internet Explorer — understand and can process GZIP-encoded text content.

On the server side, all the server or site administrator has to do is compile the module, edit the appropriate configuration directives that were added to the httpd.conf file, enable the module in the httpd.conf file, and restart the server. In less than 10 minutes, you can be serving static and dynamic content using GZIP-encoding without the need to maintain multiple codebases for clients that can or cannot accept GZIP-encoded documents.

When a request is received from a client, Apache determines if mod_gzip should be invoked by noting if the “Accept-Encoding: gzip” HTTP request header has been sent by the client. If the client sends the header, mod_gzip will automatically compress the output of all configured file types when sending them to the client.
This client header announces to Apache that the client will understand files that have been GZIP-encoded. mod_gzip then processes the outgoing content and includes the following server response headers.

	Content-Type: text/html
	Content-Encoding: gzip

These server response headers announce that the content returned from the server is GZIP-encoded, but that when the content is expanded by the client application, it should be treated as a standard HTML file. Not only is this successful for static HTML files, but this can be applied to pages that contain dynamic elements, such as those produced by Server-Side Includes (SSI), PHP, and other dynamic page generation methods. You can also use it to compress your Cascading Stylesheets (CSS) and plain text files. As well, a whole range of application file types can be compressed and sent to clients. My httpd.conf file sets the following configuration for the file types handled by mod_gzip:

	mod_gzip_item_include mime ^text/.*
	mod_gzip_item_include mime ^application/postscript$
	mod_gzip_item_include mime ^application/ms.*$
	mod_gzip_item_include mime ^application/vnd.*$
	mod_gzip_item_exclude mime ^application/x-javascript$
	mod_gzip_item_exclude mime ^image/.*$

This allows Microsoft Office and Postscript files to be GZIP-encoded, while not affecting PDF files. PDF files should not be GZIP-encoded, as they are already compressed in their native format, and compressing them leads to issues when attempting to display the files in Adobe Acrobat Reader.[1] For the paranoid system administrator, you may want to explicitly exclude PDF files.

	mod_gzip_item_exclude mime ^application/pdf$

Another side-note is that nothing needs to be done to allow the GZIP-encoding of OpenOffice (and presumably, StarOffice) documents. Their MIME-type is already set to text-plain, allowing them to be covered by one of the default rules.

How beneficial is sending GZIP-encoded content? In some simple tests I ran on my Web server using WGET, GZIP-encoded documents showed that even on a small Web server, there is the potential to produce a substantial savings in bandwidth usage.

http://www.pierzchala.com/bio.html	Uncompressed File Size: 3122 bytes
http://www.pierzchala.com/bio.html	Compressed File Size: 1578 bytes
http://www.pierzchala.com/compress/homepage2.html	Uncompressed File Size: 56279 bytes
http://www.pierzchala.com/compress/homepage2.html	Compressed File Size: 16286 bytes

Server administrators may be concerned that mod_gzip will place a heavy burden on their systems as files are compressed on the fly. I argue against that, pointing out that this does not seem to concern the administrators of Slashdot, one of the busiest Web servers on the Internet, who use mod_gzip in their very high-traffic environment.

The mod_gzip project page for Apache 1.3.x is located at SourceForge. The Apache 2.0.x version is available from here.

[1] From http://www.15seconds.com/issue/020314.htm:
“Both Internet Explorer 5.5 and Internet Explorer 6.0 have a bug with decompression that affects some users. This bug is documented in: the Microsoft knowledge Base articles, Q312496 is for IE 6.0 â€¦ , the Q313712 is for IE 5.5. Basically Internet Explorer doesn’t decompress the response before it sends it to plug-ins like Adobe Photoshop.”

Compressing PHP Output

2006-10-03 / spierzchala / 4 Comments

A little-used or discussed feature of PHP is the ability to compress output from the scripts using GZIP for more efficient transfer to requesting clients. By automatically detecting the ability of the requesting clients to accept and interpret GZIP encoded HTML, PHP4 can decrease the size of files transferred to the client by 60% to 80%.
The information given here is known to work on systems running Red Hat 8.0, Apache/1.3.27, Apache/2.0.44 and PHP/4.3.1.

[Note: Although not re-tested since this article was originally written, compression is still present in the PHP 5.x releases and can be used to effectively compress content on shared or hosted servers where compression is not enabled within the Web server.]

Configuring PHP

The configuration needed to make this work is simple. Check your installed Red Hat RPMS for the following two packages:

zlib
zlib-devel

For those not familiar with zlib, it is a highly efficient, open-source compression library. This library is used by PHP uses to compress the output sent to the client.
Compile PHP4 with your favourite ./configure statement. I use the following:

Apache/1.3.27
./configure –without-mysql –with-apxs=/usr/local/apache/bin/apxs –with-zlib

Apache/2.0.44
./configure –without-mysql –with-apxs2=/usr/local/apache2/bin/apxs –with-zlib

After doing make && make install, PHP4 should be ready to go as a dynamic Apache module. Now, you have to make some modifications to the php.ini file. This is usually found in /usr/local/lib, but if it’s not there, don’t panic; you will find some php.ini* files in the directory where you unpacked PHP4. Simply copy one of those to /usr/local/lib and rename it php.ini.

Within php.ini, some modifications need to be made to switch on the GZIP compression detection and encoding. There are two methods to do this.

Method 1:

output_buffering = On
output_handler = ob_gzhandler
zlib.output_compression = Off

Method 2:

output_buffering = Off
output_handler =
zlib.output_compression = On

Once this is done, PHP4 will automatically detect if the requesting client accepts GZIP encoding, and will then buffer the output through the gzhandler function to dynamically compress the data sent to the client.

The ob_gzhandler

The most important component of this entire process is placing the ob_gzhandler PHP command on the page itself. It needs to be placed in the code at the top of the page, above the HTML tag in order to work. It takes the addition of the following line to complete the process:

<?php ob_start("ob_gzhandler"); ?>

In WordPress installs, this becomes the first line in the HEADER.PHP file. But be careful to check that it’s working properly. If the Web application has the compression function built into it, and you add the ob_gzhandler function, a funky error message will appear at the top of the page telling you that your can’t invoke compression twice.

Web servers with native compression are smarter than that – they realize that the file is already compressed and don’t run it through the compression algorithm again.

Once this is in place, you will be able to verify the decrease in size using any HTTP browser capture tool (Firebug, Safari Web Inspector, Fiddler2, etc.)

So?

The winning situation here is that for an expenditure of $0 (except your time) and a tiny bit more server overhead (you’re probably still using fewer resources than if you were running ASP on IIS!), you will now be sending much smaller, dynamically generated html documents to your clients, reducing your bandwidth usage and the amount of time it takes to download the files.

How much of a size reduction is achieved? Well, I ran a test on my Web server, using WGET to retrieve the file. The configuration and results of the test are listed below.

Method 0: No Compression wget www.pierzchala.com/resume.php	File Size: 9415 bytes
Method 1: ob_gzhandler wget –header=”Accept-Encoding: gzip,*” www.pierzchala.com/resume.php	File Size: 3529 bytes
Method 2: zlib.output_compression wget –header=”Accept-Encoding: gzip,*” www.pierzchala.com/resume.php	File Size: 3584 bytes

You will have to experiment with the method that give the most efficient balance between file size and overhead and processing time on your server.

A 62% reduction in transferred file size without affecting the quality of the data sent to the client is a pretty good return for 10 minutes of work. I recommend including this procedure in all of your future PHP4 builds.

Home Office for a God…errrr, Goddess

2006-10-03 / spierzchala / 0 Comments

Kathy Sierra.

Kathy Sierra and her home office in a Silver Streak trailer [here].

We are not worthy.

But the whole idea of a playful office is one that is very powerful to me. The “office” I commute to is a broad open space, with no walls. And as we are growing, the noise is becoming difficult to work around.

With both boys in school in the mornings, it is now much more peaceful for me to work from home. At MY desk, the one I bought. An old oak teacher’s desk, of which there appear to be millions in circulation.

In my chair, the one I feel comfortable in. I have an Aeron at work, and I think it’s overrated.

It is vital to work where you will be most creative, most comfortable.

Google Reader: More thoughts

2006-09-29 / spierzchala / 0 Comments

So far, Google Reader is meeting or exceeding all my expectations in all areas except one: auto-refresh. Bloglines occasionally checks back in and loads up new articles, and it would be great if the Google Reader did the same thing.

UPDATE: Looks the interface does this. My mistake!

Bye-Bye Bloglines

2006-09-29 / spierzchala / 0 Comments

Niall Kennedy informed us that the new Google Reader interface is out.

Ummmm….wow.

It’s exactly what I have been looking for in a Web-based reader.

GMAIL SMTP Server Outage?

2006-09-28 / spierzchala / 0 Comments

Judging by the flood of mail coming into my account for the last 15 minutes, there was an outage with the GMAIL SMTP servers sometime overnight or early this morning.

Anyone know what happened?

Technorati Tags: GMAIL, SMTP, GMAIL+Outage