Almost every experienced web developer who works with Apache is at least vaguely familar with mod_rewrite. It’s most frequently used to reformat ugly URLs containing a lot of query values into a nice, search-engine friendly URL. Implementing mod_rewrite is generally one of the first steps to a SEO overhaul on a website with a lot of dynamic content; hence it’s popularity. However, Apache is the swiss army knife of HTTP servers and was one of the first to offer a whole gamut of external, loadable modules (e.g., “mod_”…). These little tidbits of compiled-C goodness act as plugins that extend the functionality of Apache. In Part One of a finite-but-unknown-length series about these useful tools, I take a look at mod_ext_filter and some of it’s practical uses.
At the most basic level, mod_ext_filter is simply a way to run content through an external program before it is outputted to the client. Think of it as a filter sitting between Apache and the user’s web browser. It intercepts the output that Apache would normally send back to the user, does something (generally modify or log the content), and then passes the result along to the browser. In fact, you could, in theory, sniff web browsers’ language setting and dynamically translate all of your webpages into foreign languages using mod_ext_filter. However, practically speaking, it would be horribly inefficent and probably too slow to be usable. Where the module really shines is making small changes to content. And since it’s an Apache module, it will work with anything Apache serves: HTML documents, PHP/Perl/Ruby/Python scripts, images, PDFs, etc.
Assume that you’ve been told by the VP of Marketing that they want to “increase brand impact” by adding a trademark notice (a “TM”) after each mention of your company’s flagship product, SuperWidget. You could simply do a search-and-replace on all the files on the web server. However, this wouldn’t prevent an intern who didn’t get the memo from uploading a new document sans-TM. What we could do is use mod_ext_filter and sed to perform a search-and-replace every time a document is requested from the web server.
First, inside Apache’s config file (”http.conf”) we define the filter:
ExtFilterDefine add-tm mode=output intype=text/html cmd="/bin/sed 's/SuperWidgets/SuperWidgets<sup>TM</sup>/g'"
“add-tm” is the name of our filter, which will be referenced a bit later.
“mode=output” tells Apache that this filter should be applied on content that’s going out to the web browser. Currently “output” is the only mode supported.
The “intype” parameter is used to specify which MIME type this filter will be applied to. In this example, the filter is only run on “text/html” documents. This is important to note because a “TM” would not be added within PDF files, which would most likely corrupt the document when a user attempted to download/view it.
“cmd” is simply the command that is run when the filter is activated. The intercepted content is piped to this command via STDIN and outputted via STDOUT. Since almost all Linux commands support standard streams, this makes things quite handy.
Next we add:
While the opening “Location” tag looks like a self-closing tag (it ends with a “/>”), the slash actually specifies that this filter be applied at the root of this website. You could also use <Location /products/superwidget> to only filter files in that directory. Alternatively, you can also use “Directory” (for absolute paths) or “File” containers to specify where the filter is run. See the Apache docs for more info.
Using mod_ext_filter and sed to automatically add a trademark notice.
That’s just one, simple example of the power of mod_ext_filter. Odds are there are more robust methods of implementing this solution for the marketing dept. Other, more practical, uses include:
- Dynamically add a copyright footer, even on plain ‘ole static HTML pages. (Though mod_include would be better for this unless you’re doing something tricky that requires the extra logic of an external program. I’ll be covering this in an upcoming blog entry.)
- A profanity filter that can easily be used with any blog or messageboard software.
- Dynamically adding a watermark to images using “composite” (Step-by-step HOWTO).
- And my favorite… Automatically add a wrapper around external links for tracking purposes. This could be done by piping the output through search-and-replace regex using Perl, sed, or awk. This is great when used with a CMS and end-users will be self-publishing content, possibly adding their own links.
One thing to keep in mind is that mod_ext_filter’s flexibility comes at a price: it’s not a speed demon because the filter is shelling out everytime it’s called. For most websites though, the performance hit is neglible if you keep the filter command simple. Even if the module is too pokey for your site, it’s a great way to prototype an idea before you sit down and crank out a custom module written in C using the Apache API
For more information on this great little tool, check out the official docs.
One additional caveat: while mod_rewrite can be used in either a .htaccess file (per directory) or within http.conf (per server), mod_ext_filter only works from within http.conf. This means that those of you out there who use a shared webhost are most likely out of luck. If this really burns you, keep in mind that you can lease a dedicated server for less than $70/mo nowadays. It’s something to consider if you have the systems admin skills (or are willing to learn) and a few paying web design/development clients that you could move over.