Making prettier URLs with mod_rewrite

Posted by Steve on Mon 9 May 2005 at 18:49

mod_rewrite is a module for Apache which allows you to rewrite and manipulate URLs which are sent to your webserver. It has many uses from the simple to the complex. Here we'll introduce the basics of enabling and using the module.

The mod_rewrite module allows you to do almost anything to incoming requests to your Apache or Apache2 server.

The module, once loaded, is configured by a list of rules. These rules allow you to manipulate URLs, and incoming requests.

Enabling mod_rewrite for Apache

Enabling the module for Apache should be a simple matter of adding the following line to your Apache's configuration file /etc/apache/httpd.conf:

LoadModule rewrite_module /usr/lib/apache/1.3/mod_rewrite.so

Once that has been done your server can be restarted to allow these new configuration directives to take effect:

root@lappy:~# /etc/init.d/apache restart
Enabling mod_rewrite for Apache2

To enable the module to be loaded you need to run the following command:

root@lappy:~# a2enmod rewrite
Module rewrite installed; run /etc/init.d/apache2 force-reload to enable.

As the output of the command suggests you need to reload your server to cause the loading to take effect:

root@lappy:~# /etc/init.d/apache2 force-reload
Forcing reload of web server: Apache2.

If all goes well your module is now loaded. Now you just need to add the rewriting rules to your server.

Simple Examples

Once the module has been loaded by your server you can start to use it to alter the way your server handles requests.

To fully understand the way the rules work we'll need a simple example, so we'll use this very website as one!

Right now you'll be reading this article at the following URL:

That's not a terribly friendly URL though, and some web spiders might not actually bother to index it - because they don't like URLs which have "?" characters in them.

To fix this we can add the following to our server configuration file:

RewriteEngine on
RewriteRule ^/articles/([0-9]+)$                /?article=$1    [PT]

This snippet, (inside a virtual host directive so that it's not global), does three things:

  • The first line turns on the rewriting engine, allowing the following line to take effect.
  • The second line has a rewriting rule which we'll cover shortly, as well as the flags 'PT' which we'll also explain.

When it comes to rules there are two things that you should be familiar with - regular expressions, and flags.

Regular expressions are patterns which can be used to match against text, they allow you to specify ranges of characters or specific ones.

In our case the rule we've shown above is:

^/articles/([0-9]+)$

This rule will match any incoming request which starts with "/articles/", and then ends with any number of numerical digits. Because the rule makes use of the "(" and ")" characters to setup a match the number which was at the end of the request will be captured as the first match.

The second half of the rule is what the incoming request will be transformed into, in our case:

?article=$1

(Where the $1 is the numerical digits which were captured from the incoming request).

Finally the flags used at the end of the rule "PT" means to "pass through" the request to the next Apache handler. It's required in our case to make sure that the CGI scripts this site run receive the modifed URI.

With this rule in place the following link should also take you to this article:

Similar rewriting can be applied to fixup other "ugly" URLs, and make them more memorable and bookmarkable:

The preceeding examples are all made possible by the following snippet:

RewriteEngine on
RewriteRule ^/users/([A-Za-z0-9]+)/scratchpad$  /?scratchpad=$1 [PT]
RewriteRule ^/users/([A-Za-z0-9]+)$             /?user=$1       [PT]
RewriteRule ^/articles/([0-9]+)$                /?article=$1    [PT]
RewriteRule ^/polls/([0-9]+)$                   /?poll=$1       [PT]
RewriteRule ^/search/(.*)$                      /?search=$1     [PT]
RewriteRule ^/submit/*$                         /?submit=new    [PT]

mod_rewrite will stop processing at the first line which matches - this is something that you should bear in mind if you wish to make multiple transformations in series.

To avoid the processing of rules stopping at the first match you can use the "[OR]" flag, which will allow the rule to match against the current rule or the next one.

Multiple flags can be combined, seperated by "," characters such as this one:

RewriteRule /foo        /new-location [NC,OR]
RewriteRule /bar        /new-location [NC]

These two rules rewrite the URLs paths /foo or /bar to a new location - ignoring case.

Common Tasks with mod_rewrite

One very common example of using mod_rewrite is to prevent other websites "hotlinking" to images upon your server.

Assuming you run the website "example.com", and you wish to prevent other websites from using your images, and your bandwidth you can stop this with the following snippet:

Rewriteengine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://example.com/.*$ [NC]
RewriteRule .*.(gif|GIF|jpg|JPG)$ - [F]

What this does is to match against the HTTP header which client browsers send to make their requests. First looking for an empty Referer field, and then looking for one which mentions your website - which is assumed to be legitimate. (The "NC" flag means "no case distinction").

If these two rules don't match then the last one which looks for images will be processed - the "F" flag means "send back a forbidden response".

The net effect should be that clients which send no referer header, or those which have a referer of your own website will be allowed to see the images - but hotlinked images from other sites will receive a forbidden error.

(Note that this isn't 100% effective as many people block the HTTP Referer header when surfing, for privacy reasons).

Another common task might be changing extensions on files, moving from .htm files to .html files.

This can be achived with the following rule:

RewriteRule (.*).htm$  $1.html  [PT]

Whilst this is just a simple introduction mod_rewrite is incredibly sophisticated and powerful. To get the best out of it you will almost certainly need to read the mod_rewrite documentation.

 

 


Posted by Anonymous (195.85.xx.xx) on Tue 10 May 2005 at 08:07
Another tip:

use if for security: (From the Apress book:Hardening Apache)

Disable TRACE (used for cross-site scripting)

RewriteEngine on
RewriteCond %{REQUEST_METHOD} ^TRACE
RewriteRule .* [F]

Remember to place these directives in a container or outside all containers.

[ Parent | Reply to this comment ]

Posted by Steve (82.41.xx.xx) on Tue 10 May 2005 at 18:47
[ Send Message | View Steve's Scratchpad | View Weblogs ]

I've always preferred mod_security for Apache when it comes to making security improvements.

Still your suggestion is a nice simple one which can help - thanks for sharing!

Steve
-- Steve.org.ukfoo

[ Parent | Reply to this comment ]

Posted by Anonymous (58.65.xx.xx) on Tue 29 Jan 2008 at 12:54
Options +FollowSymlinks
RewriteEngine On
RewriteBase /blog/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule .* /blog/index.php [L]

I've above lines in my .htaccess file but these doesn't effect my urls, I'm still getting http://www.youpark.com/blog/index.php?cat=16
where cat is variable, e.g 14, 15, 16.

Anybody know why is it not working? Anything wrong with my rules?

Imran
http://www.mobilephoneupdates.com

[ Parent | Reply to this comment ]

Posted by Anonymous (193.231.xx.xx) on Tue 10 May 2005 at 09:43
excelent article !

[ Parent | Reply to this comment ]

Posted by Anonymous (63.202.xx.xx) on Wed 25 May 2005 at 15:21
Could I request somewhat of a detailed example of how the other flags *should* be properly used via *simple* examples? That would be very helpful in understanding some concepts. This article has mentioned the flags.

[NC] - no case - got from apache docs
[PT] - pass through
[OR] - boolean OR
[L] - ??

Are there others?

[ Parent | Reply to this comment ]

Posted by Steve (82.41.xx.xx) on Wed 25 May 2005 at 15:28
[ Send Message | View Steve's Scratchpad | View Weblogs ]

There are more flags detailed in the Apache documentation.

To specifically answer your question though, "L" stands for "last":

Stop the rewriting process here and don't apply any more rewriting rules. This corresponds to the Perl last command or the break command from the C language. Use this flag to prevent the currently rewritten URL from being rewritten further by following rules.

There are a lot of other flags from "G" meaning "gone" - to send back a response meaning that this page has gone away permanently, to others.

I could give more examples, but a lot of them are hard to explain succinctly - instead I think it's probably as well to focus on the common ones, and leave the more advanced ones for people willing to read the documenation and experiment.

Steve
-- Steve.org.uk

[ Parent | Reply to this comment ]

Posted by Anonymous (58.65.xx.xx) on Tue 29 Jan 2008 at 12:55
Options +FollowSymlinks
RewriteEngine On
RewriteBase /blog/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule .* /blog/index.php [L]

I've above lines in my .htaccess file but these doesn't effect my urls, I'm still getting http://www.youpark.com/blog/index.php?cat=16
where cat is variable, e.g 14, 15, 16.

Anybody know why is it not working? Anything wrong with my rules?

Imran
http://www.mobilephoneupdates.com

[ Parent | Reply to this comment ]

Posted by Anonymous (202.7.xx.xx) on Tue 28 Jun 2005 at 03:45
After spending a few hours on trying to get this to work I finally found what the 'problem' was. (I am writing this here for reference.)

I have a debian 3.1 (Sarge), with Apache 1.3.x, php4.

The article mentions that the mod_rewrite module needs to be loaded in /etc/apache/httpd.conf. In my install all modules are loaded in /etc/apache/modules.conf, so check that mod_rewrite is not already loaded there.

Also make sure there are no backup copies of httpd.conf are in the /apache/ directory. (I read somewhere that apache will parse ALL files in the /apache/ directory. So create a new folder /apache/backup and move all backup files to it.

>>>> THE MAJOR PROBLEM <<<<
Here is the solution to the major problem. It has to do with 'AllowOverride' option.

You will find in the httpd.conf file the tags:
{Directory /some/directory} ... {/Directory} (I am using curly brackets because the pointy ones get parsed as HTML)

This allows us to set options for each specific directory. (So your httpd.conf file will have a lot of those tags for different dirrectories.)
Look out for the directory that affects /var/www (the web directory)(i.e. {Directory /var/www})

Look for the option 'AllowOverride' and change it from 'None' to 'All'. This means that Apache will now parse directives that are included in the .htaccess file which is located in the /var/www directory and subdirectories.

This is important since this tells Apache that you can override options by using a .htaccess file. (When the option is set to 'None', apache will ignore the .htaccess file.)

------------------------------------

This was quite frustrating to figure out because, mod_rewrite was loaded, the phpinfo page also showed that it was loaded, but whenever I used a .htaccess file nothing would work. (i.e. it was completely ignored!)

It seems that the default install has AllowOverride turned OFF (i.e. set to 'None'), so you have to manually set it to 'All' in the directories you want apache to parse the .htaccess file.

It is all working nicely now! (I looked everywhere on Google but no one stated the obvious, I hope this can help someone and save them a lot of time!)

[ Parent | Reply to this comment ]

Posted by Anonymous (83.57.xx.xx) on Mon 24 Oct 2005 at 22:02
Thanks for this comment. It has been very useful for me.

[ Parent | Reply to this comment ]

Posted by Anonymous (71.65.xx.xx) on Wed 26 Oct 2005 at 02:58
Wow, man, thanks for this. I have been pulling my freaking hair out over this very problem and your tip solved it exactly. I appreciate it!!!!

[ Parent | Reply to this comment ]

Posted by Anonymous (68.14.xx.xx) on Wed 22 Feb 2006 at 23:59
Thanks for the tip!

[ Parent | Reply to this comment ]

Posted by Anonymous (62.253.xx.xx) on Sat 29 Apr 2006 at 17:39
I tried to get mod_rewrite working in Debian, but even reading this page didn't give me the full solution.

I eventually found this page which talked about setting AllowOverride to “all” in sites-available/default.

That should be the missing piece of information for anyone who has got this far and failed.

[ Parent | Reply to this comment ]

Posted by Anonymous (82.198.xx.xx) on Wed 7 Mar 2007 at 13:13
Your tip was very helpful to me. I appreciate that :)

[ Parent | Reply to this comment ]

Posted by Anonymous (87.63.xx.xx) on Tue 6 Mar 2012 at 15:16
Thanks so much dude! That was exactly the piece of information I was missing, and I was going nuts!
Kudos to you.

[ Parent | Reply to this comment ]

Posted by Anonymous (81.189.xx.xx) on Sat 25 Nov 2006 at 15:10
Yea thats what i needed :) you saved me a lot of time man !!

[ Parent | Reply to this comment ]

Posted by Anonymous (212.129.xx.xx) on Mon 11 Dec 2006 at 12:06
Well I am happy for those who worked out, but in my situation ???

I have on the apache mod_rewite on! but still ... One hosting provider with the contract www.example.eu later I got www.example2.org attached to the first domain. The forwarding is working OK i guess .. by pressing www.example2.org it shows the right page, but my Mod rewrite didn`t work ... I still can see the url like www.example.eu/example2/index.php in place of wat I really liked to vieuw!

www.example2.org ....

Someone help me plzzz!

[ Parent | Reply to this comment ]

Posted by Anonymous (86.121.xx.xx) on Sun 1 Apr 2007 at 16:22
hello,

first you need the vhost directive something like that

<VirtualHost 123.456.789.012:80>
ServerName domain1.com
ServerAlias www.domain1.com domain_2.com www.domain_2.com
DocumentRoot "/var/www/data/"
php_admin_flag safe_mode Off
Alias /webalizer "/var/www/data/webalizer"
ErrorLog "/var/www/data/logs/domain1-error.log"
CustomLog "/var/www/data/logs/domain1.access.log" combined
<Directory "/var/kunden/webs/ded5myps7/wedding">
order allow,deny
allow from all
</Directory>
<Directory "/var/www/data">
AllowOverride all
</Directory>
</VirtualHost>

so, this host has 2 diferent domains. you can also use

ServerAlias *.domain1.com

for a wildcard domain. that means if you try yourname.domain1.com you will see the content of domain1.com

then in the htacces you need something like that

RewriteEngine On

RewriteCond %{HTTP_HOST} !www.domain1.com
RewriteRule (.*) http://www.domain1.com/$1 [R=permanent]

hope this will help you and ecuse my english.

[ Parent | Reply to this comment ]

Posted by Anonymous (58.65.xx.xx) on Tue 29 Jan 2008 at 12:56
Options +FollowSymlinks
RewriteEngine On
RewriteBase /blog/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule .* /blog/index.php [L]

I've above lines in my .htaccess file but these doesn't effect my urls, I'm still getting http://www.youpark.com/blog/index.php?cat=16
where cat is variable, e.g 14, 15, 16.

Anybody know why is it not working? Anything wrong with my rules?

Imran
http://www.mobilephoneupdates.com

[ Parent | Reply to this comment ]

Posted by Anonymous (71.145.xx.xx) on Fri 13 Apr 2007 at 20:03
Thanks. I just wasted an hour on this, but I'm sure I would have wasted more without your tip. It's incredibly frustrating that this isn't documented more clearly.

[ Parent | Reply to this comment ]

Posted by Anonymous (77.185.xx.xx) on Fri 5 Mar 2010 at 16:30
"I hope this can help someone and save them a lot of time!"

You just did that :)

[ Parent | Reply to this comment ]

Posted by Anonymous (58.65.xx.xx) on Tue 29 Jan 2008 at 12:52
Options +FollowSymlinks
RewriteEngine On
RewriteBase /blog/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule .* /blog/index.php [L]

I've above lines in my .htaccess file but these doesn't effect my urls, I'm still getting http://www.youpark.com/blog/index.php?cat=16
where cat is variable, e.g 14, 15, 16.

Anybody know why is it not working? Anything wrong with my rules?

Imran
http://www.mobilephoneupdates.com

[ Parent | Reply to this comment ]

Posted by Anonymous (203.101.xx.xx) on Thu 7 Feb 2008 at 14:55

My url looks like http://int14/calonex/index.php?pages=7

My url looks like http://int14/calonex/index.php?pages=8

My url looks like http://int14/calonex/index.php?pages=6

I need to write Mod_write like

http://int14/calonex/6 or like

http://int14/calonex/

Advance thanks

my Email php_jerry@yahoo.co.in

[ Parent | Reply to this comment ]

Posted by Anonymous (189.182.xx.xx) on Wed 14 Jan 2009 at 03:10
In my server, the module dont say installed...it say "This module is already enabled"... and i can't use it.

[ Parent | Reply to this comment ]

Sign In

Username:

Password:

[Register|Advanced]

 

Flattr

 

Current Poll

Which init system are you using in Debian?






( 1604 votes ~ 7 comments )

 

 

Related Links