A brief introduction to mod_perl - Part 2
Posted by Steve on Fri 1 May 2009 at 08:30
In our previous brief introduction to mod_perl we showed how to install it, and how to use it to improve the performance of simple Perl-based CGI-scripts. In this conclusion we'll show how you can do more useful things with a little bit of effort.
In this article we're going to be creating Perl code then causing it to be used by the Apache process. To do this we need to place the code on the local system in a location the embedded perl intepreter can load it from.
By default the mod_perl interpreter will have search path that includes most of the common directories you'd expect when loading perl, but I find it makes sense to locate the code we're working with in the central location of /etc/apache2/perl.
To configure the search-path we'll add the following directive to /etc/apache2/conf/00-mod-perl:# # Load the startup script # PerlRequire /etc/apache2/perl/startup.pl
This will cause the specified startup script to be executed when the server starts. Create that file with the following contents:# # Add /etc/apache2/perl to the mod_perl search path. # use lib qw( /etc/apache2/perl ); # # This script had no errors (?!) # 1;
Once you've done that and restarted Apache then you may continue, and your loaded copy of Apache will have the updated search path.
Doing Something Useful
One of the problems I frequently see with this site is clients attempting to spider the site, downloading every single article, blog entry, and poll. In general I don't mind if people mirror the content here, so long as they do so slowly, and don't hog the limited resources.
But there are many bad spiders out there that will make requests such as these:188.8.131.52 - [30/Apr/2009:19:36:29 +0100] "GET /articles/535#comment_2 HTTP/1.0" 184.108.40.206 - [30/Apr/2009:19:36:30 +0100] "GET /articles/535#comment_3 HTTP/1.0" 220.127.116.11 - [30/Apr/2009:19:36:30 +0100] "GET /articles/535#comment_5 HTTP/1.0" 18.104.22.168 - [30/Apr/2009:19:36:31 +0100] "GET /articles/535#comment_4 HTTP/1.0"
Clearly this is a grossly broken client (because the "#comment_" fragments shouldn't ever appear in real requests), but it can be hard to detect that.
So, as a simple means of fighting the requests from these bad clients I put together a perl module to drop them:
This is a mod_perl handler script which we can cause to be loaded and executed for every incoming request. The actual code is pretty simple and only does a couple of things:
- Extracts the IP & User-Agent of each incoming request.
- If that IP & Agent pair have been "bad" in the past reject their request.
- Otherwise let it proceed
- Unless the request contains "#" in which case we record the bad behaviour and reject it.
To enable this module we need to do two things:
- Load it up.
- Ensure it is invoked
This can be achieved by saving the code to the file /etc/apache2/perl/DropClients.pm, then creating the file /etc/apache2/conf/dropclients with this content:# # Load the module # PerlModule DropClients # # Ensure it is invoked as an access handler # <Location /> PerlAccessHandler DropClients </Location>
Once you've done this you'll find that the DropClients::request() method being invoked once for each incoming request, and clients that mis-behave can be dropped.
As an example this is a normal request:skx@gold:~$ echo -e "GET / HTTP/1.0\n" | nc lenny 80 |grep ^HTTP HTTP/1.1 200 OK
Now a broken one:skx@gold:~$ echo -e "GET /#foo HTTP/1.0\n" | nc lenny 80 |grep ^HTTP HTTP/1.1 403 Forbidden
Since we've now made a bad request we'll be locked out of the server:skx@gold:~$ echo -e "GET / HTTP/1.0\n" | nc lenny 80 |grep ^HTTP HTTP/1.1 403 Forbidden
(If you look at the code itself you'll see it is very naive and you can remove the block by running "rm -rf /tmp/blah".)
Different Handlers Types
The previous module we installed was a PerlAccessHandler - a handler that is invoked on incoming requests - but that is only one of a number of handlers that may be invoked. Some common handlers are:
Explicitly load some code in response to a request.
Called to filter, modify, or update the response Apache is sending to a client. (e.g. replacing words, adding a footer, or compression)
A full list of the HTTP handlers available is included as part of the mod_perl documentation, along with some sample code demonstrating how they are used.
This concludes our brief introduction to mod_perl. If you'd like to learn more then there is a lot of excellent information available upon the mod_perl homepage, including detailed API documentation and links to interesting code.
For example the mod_perl: Cute tricks with Perl & Apache page shows code for automatically creating compressed content, blocking bad user-agents, and adding footers to pages automatically.
A short article isn't really sufficient to introduce a whole new way of manipulating a server, and its content, but I hope this was useful regardless.