Author: spierzchala

  • Compressing Web Output Using mod_gzip for Apache 1.3.x and 2.0.x

    Web page compression is not a new technology, but it has just recently gained higher recognition in the minds of IT administrators and managers because of the rapid ROI it generates. Compression extensions exist for most of the major Web server platforms, but in this article I will focus on the Apache and mod_gzip solution.
    The idea behind GZIP-encoding documents is very straightforward. Take a file that is to be transmitted to a Web client, and send a compressed version of the data, rather than the raw file as it exists on the filesystem. Depending on the size of the file, the compressed version can run anywhere from 50% to 20% of the original file size.
    In Apache, this can be achieved using a couple of different methods. Content Negotiation, which requires that two separate sets of HTML files be generated — one for clients that can handle GZIP-encoding, and one for those who can’t — is one method. The problem with this solution should be readily apparent: there is no provision in this methodology for GZIP-encoding dynamically-generated pages.
    The more graceful solution for administrators who want to add GZIP-encoding to Apache is the use of mod_gzip. I consider it one of the overlooked gems for designing a high-performance Web server. Using this module, configured file types — based on file extension or MIME type — will be compressed using GZIP-encoding after they have been processed by all of Apache’s other modules, and before they are sent to the client. The compressed data that is generated reduces the number of bytes transferred to the client, without any loss in the structure or content of the original, uncompressed document.
    mod_gzip can be compiled into Apache as either a static or dynamic module; I have chosen to compile it as a dynamic module in my own server (more compile instructions here). The advantage of using mod_gzip is that this method requires that nothing be done on the client side to make it work. All current browsers — Mozilla, Opera, and even Internet Explorer — understand and can process GZIP-encoded text content.
    On the server side, all the server or site administrator has to do is compile the module, edit the appropriate configuration directives that were added to the httpd.conf file, enable the module in the httpd.conf file, and restart the server. In less than 10 minutes, you can be serving static and dynamic content using GZIP-encoding without the need to maintain multiple codebases for clients that can or cannot accept GZIP-encoded documents.
    When a request is received from a client, Apache determines if mod_gzip should be invoked by noting if the “Accept-Encoding: gzip” HTTP request header has been sent by the client. If the client sends the header, mod_gzip will automatically compress the output of all configured file types when sending them to the client.
    This client header announces to Apache that the client will understand files that have been GZIP-encoded. mod_gzip then processes the outgoing content and includes the following server response headers.

    		Content-Type: text/html
    		Content-Encoding: gzip
    		

    These server response headers announce that the content returned from the server is GZIP-encoded, but that when the content is expanded by the client application, it should be treated as a standard HTML file. Not only is this successful for static HTML files, but this can be applied to pages that contain dynamic elements, such as those produced by Server-Side Includes (SSI), PHP, and other dynamic page generation methods. You can also use it to compress your Cascading Stylesheets (CSS) and plain text files. As well, a whole range of application file types can be compressed and sent to clients. My httpd.conf file sets the following configuration for the file types handled by mod_gzip:

    		mod_gzip_item_include mime ^text/.*
    		mod_gzip_item_include mime ^application/postscript$
    		mod_gzip_item_include mime ^application/ms.*$
    		mod_gzip_item_include mime ^application/vnd.*$
    		mod_gzip_item_exclude mime ^application/x-javascript$
    		mod_gzip_item_exclude mime ^image/.*$
    		

    This allows Microsoft Office and Postscript files to be GZIP-encoded, while not affecting PDF files. PDF files should not be GZIP-encoded, as they are already compressed in their native format, and compressing them leads to issues when attempting to display the files in Adobe Acrobat Reader.[1] For the paranoid system administrator, you may want to explicitly exclude PDF files.

    		mod_gzip_item_exclude mime ^application/pdf$
    		

    Another side-note is that nothing needs to be done to allow the GZIP-encoding of OpenOffice (and presumably, StarOffice) documents. Their MIME-type is already set to text-plain, allowing them to be covered by one of the default rules.
    How beneficial is sending GZIP-encoded content? In some simple tests I ran on my Web server using WGET, GZIP-encoded documents showed that even on a small Web server, there is the potential to produce a substantial savings in bandwidth usage.

    http://www.pierzchala.com/bio.html Uncompressed File Size: 3122 bytes
    http://www.pierzchala.com/bio.html Compressed File Size: 1578 bytes
    http://www.pierzchala.com/compress/homepage2.html Uncompressed File Size: 56279 bytes
    http://www.pierzchala.com/compress/homepage2.html Compressed File Size: 16286 bytes

    Server administrators may be concerned that mod_gzip will place a heavy burden on their systems as files are compressed on the fly. I argue against that, pointing out that this does not seem to concern the administrators of Slashdot, one of the busiest Web servers on the Internet, who use mod_gzip in their very high-traffic environment.
    The mod_gzip project page for Apache 1.3.x is located at SourceForge. The Apache 2.0.x version is available from here.


    [1] From http://www.15seconds.com/issue/020314.htm

  • Long Week…off to read for a while…

    I have a whole bunch of new books I will be ploughing through this weekend, between being daddy and helping out around the yard.

    With any luck, I will get through them this weekend….

  • Siebel: To buy, or not to buy…ask Oracle!

    Oracle to buy Siebel?
    1) Siebel sucks.
    2) Oracle blows.
    Won’t this produce a null company?
    —-
    Scoble notes.

  • Music: Guilty Pleasures

    Ok, I am forced to admit this, thanks to the team from Apple Matters, that I have a copy of the BareNaked Ladies doing the theme from the RoadRunner Cartoons.
    As for the New Kids on the Block: there is a 12-step program to help folks with that, as well as BNL doing New Kid on the Block on Gordon.

  • USA: Why this country is becoming a Third-World Nation

    Go team! “I love it when a plan comes together…”

    GET! ME! OUTTA! HERE! NOW!

    Via Dowbrigade.

  • Apache: Your host is an idiot

    Ok, I broke my Web server and I didn’t even notice. I broke it so badly that when it re-started, it didn’t even create an access_log. I noticed it a few minutes ago, switched over to my backup Web machine, fixed the problem, and re-launched the primary Web server.
    What did I do? I removed the default cURL RPM that came with FC3 and replaced it with cURL compiled from source.
    Unfortunately, PHP couldn’t find libcurl.
    I am amazed the server was still running in any form.
    Off to run some more tests…UGH.

  • Fight the Bull: More BS

    The Bullfighters at work.

    Press Release
    Source: SAP AG
    SAP Launches More Than 100 Industry-specific Analytic Applications
    Tuesday April 26, 4:00 am ET
    SAP(R) Analytics Deliver on Enterprise Services Architecture Commitment COPENHAGEN, Denmark, April 26 /PRNewswire-FirstCall/ — SAP AG (NYSE: SAP – News) today unveiled more than 100 industry-specific analytic applications that empower users with innovative new ways to drive core processes and business decisions based on actionable business insight. SAP® Analytics are a new breed of model-driven composite applications that change the analytic application playing field across more than 25 industries. By merging data from SAP and non-SAP applications with business intelligence queries, SAP Analytics eliminate disparate islands of data and seamlessly combine transactional, analytic and collaborative steps across multiple business functions, departments and even organizational boundaries. The announcement was made at SAPPHIRE® ’05, SAP’s international customer conference, being held in Copenhagen, Denmark, April 26-28.

    I need to drive those core processes using model-driven composite applications!
    Yeaaaaarrrgh! Brain! Hurts! Must! Flee!

  • Apple: Safari Lead DDoS and Web Performance Threat to RSS?

    Om Malik points out a potential threat to blogs: OSX 10.4 “Tiger”. The new Safari that ships with this OS comes with the RSS reader turned on by default!

    That upgrade while great for the consumers, could come as a big shocker for those blogs whose feeds are included as part of SafariÂ’s default starter package. Infact it could be the biggest stress test for RSS thus far!
    Most RSS readers are set to poll for updates every hour, and imagine when half-a-million Tiger Safari users who start hitting a server at the same time, pulling down RSS updates, because they have not changed the default settings. Server meltdown? Or an unintended denial of service? Apple says that most of the default feeds are going to be major news sites like CNN. New York Times, and LA Times. At this time they are not including any personal blogs as part of the default list. Even for them it is not going to be easy.

    As a Web performance geek, I ask you: do you measure and monitor the performance and availability of your blog infrastructure?
    Didn’t think so….
    Enjoy the Weekend!

  • The American Came…And We Kicked Their Ass!

    192 years ago, the newly United States invaded Canada.
    We turned around and burned the White House down.
    Thanks Andrew!