Speeding up dynamic websites with caching

Posted by Steve on Tue 1 Feb 2005 at 11:43

When you run a largely dynamic website there's a fair amount of effort which needs to be expanded for each visitor. For example pulling out headlines and article bodies from a database. Given that most users will not login and customize their screens you can imagine that a lot of the time you're serving new visitors you're actually serving identical content.

With this in mind a website such as this can actually gain a lot of speed and performance improvements by using caching, where either the results of the database or of the page generation are saved and served to the visitors in preference to rebuilding them dynamically.

The common way to achive caching like this is to setup a caching proxy server "in front" of the webserver, so that it will cache pages passing through it.

Squid the popular caching proxy server can be setup to do this, and the process of implementing a reverse proxy in Squid is simple enough to manage.

As this only caches the complete pages served to clients the speedup, whilst useful, cannot compare to caching database queries though.

If you wish to cache large segments of your page then you'll need to actually modify your code to do so. Thankfully rather than designing a system yourself there is already existing code for this job.

Initially used by LiveJournal, and later used by other large database intensive sites such as /. memcached is a high-performance memory caching system.

Described simply there are two parts to the memcache software:

  • A server which will store objects directly in memory.
  • A client which call either pull objects from the servers memory, or update them.

The key to the system is having "keys". A key is used to store objects, and retrieve them from the memory of the server (which doesn't need to be on a different machine but can be if you wish).

Currently there are clients available for several languages and environments including Perl, PHP, Java, and C.

On a Debian Unstable server you can install the server and the perl API by running:

apt-get install libcache-memcached-perl memcached

The server can then be started with the command:

/etc/init.d/memcached start

Now that the server is started we can look at how to use it. As already described we will be looking at adding or removing values to the cache by the use of keys. The idea is that when retrieving something for use in your page you will:

  • Try to retrieve it from the memory cache (fast)
    • If that fails then retrieve it from the database (slower).
    • Store it into the cache so that the next time you don't touch the database

Using the perl API you can store and retrieve something very simply as the following code should show:

#!/usr/bin/perl
use Cache::Memcached;

my $memd = new Cache::Memcached {
    'servers' => [ "localhost:11211" ],
  };

# Get a value - will fail the first time as it has not been set.
my $val = $memd->get( "my_key" );

if ( $val )
{
   print "Value is '$val'\n";
}

# Set a value
$memd->set("my_key", "Some value");

$memd->disconnect_all();

Running this sample the first time will set a value in the memory cache for "my_key", and the second time will show this to you:

skx@mystery:~$ perl test-memcache.pl
skx@mystery:~$ perl test-memcache.pl
Value is 'Some value'

(If you wish to do this work on a Stable machine the source code the memory cache deamon is available for download here. Build it on an unstable machine with libevent1 and libevent1-dev and link it statically).

Now that we've tested the installation works then we need to look at how it can be used in a typical application.

If we are using a perl CGI script which interfaces with a database using DBI we might have something like this:

# Get an 'about' page from the database
sub get_about {
    my ( $db, $id ) = @_;

    # fetch the required data
    my $sql = $db->prepare ( "select bodytext from about_pages where id = ?;" );
    $sql->execute ( $id );
    my $article = $sql->fetchrow_array();
    $sql->finish();

    # return as a hash reference
    return ( {about_body  => $article} );
}

This fetches a single page of HTML data from a database using a single select. If this is called a lot of times it might be worth caching in memory. We can do this by storing the values in a key called "about+name", and rework the code to be:

# Get an 'about' page from the database
sub get_about {
    my ( $memd,$db, $id ) = @_;

    # Get it from the memory cache first and return it if present.
    my $cached = $memd->get( "about_$id" );
    if ( $cached )
    {
       return( $cached );
    }

    # fetch the required data from the database
    my $sql = $db->prepare ( "select bodytext from about_pages where id = ?;" );
    $sql->execute ( $id );
    my $article = $sql->fetchrow_array();
    $sql->finish();

    # Add it to the memory cache.
    my $cached = $memd->set( "about_$id", $article );

    # return as a hash reference
    return ( {about_body  => $article} );
}

In summery updating your code should be simple enough if the fetching of data from your database is already well organised - if not you might find the job harder.

Having the extra lookups to see if something is cached should have a minimal impact upon the speed of your site, but the value of having the results cached should more than make up for it.

 

 


Posted by ptecza (83.31.xx.xx) on Sat 5 Feb 2005 at 16:57
Hi Steve!

The link to /. site is improper, because it points at LiveJournal.

My best regards!

Pawel

[ Parent | Reply to this comment ]

Posted by ptecza (83.31.xx.xx) on Sat 5 Feb 2005 at 17:02
Hello again! :)

You also should correct a small bug on Add Comment page:

s/thankyou/thank you/

Cheers!

P.

[ Parent | Reply to this comment ]

Posted by Steve (82.41.xx.xx) on Sun 6 Feb 2005 at 12:47
[ View Steve's Scratchpad | View Weblogs ]

I've also updated that text now - many thanks!

Steve
-- Steve.org.uk

[ Parent | Reply to this comment ]

Posted by Steve (82.41.xx.xx) on Sun 6 Feb 2005 at 12:47
[ View Steve's Scratchpad | View Weblogs ]

Thanks for that - I've corrected the link now.

Steve
-- Steve.org.uk

[ Parent | Reply to this comment ]

Sign In

Username:

Password:

[Register|Advanced]

 

Flattr

 

Current Poll

What do you use for configuration management?








( 487 votes ~ 5 comments )