Producing and using website statistics

Posted by Steve on Mon 31 Jan 2005 at 00:28

It's very useful to be able to view the statistics of websites, to see how visitors are finding your sites, which pages are the most popular, etc. Debian contains several packages for presenting this information to you, and here we'll look at two of them.

When it comes to viewing statistics of your website there are a few things that you have to bear in mind:

  • Looking at the Apache logfiles doesn't tell you the complete number of visitors to your website - because users may be sharing a proxy, or cache.
  • You can't tell how many visitors you've actually had because of caches, proxies, and badly behaving browsers.
  • You can't tell how people move around your website due to caching at the client side.
  • You may never know how your users arrived at your site because Referrer information may be missing, or incorrect.
  • You can't tell how long users read your pages for, nor can you tell how they left your site or where they went next.

With those caveats out of the way the package you'll choose to display your statistics will probably depend on two things:

  • How easy the setup and maintainence is.
  • Whether the information presented is that which you care about.

In a very simple way the total number of visits to your website can be achieved by merely counting all the lines inside your apache access log with the following command:

wc -l /var/log/apache/access.log

However this doesn't take very much into account, for example a single visit to your website to view the front page might result in multiple requests, for example to load a CSS file and a group of graphics.

Similar simple statistics can be achieved from the command line, such as showing the number of unique visitors to your site:

awk '{print $1}' | sort -u | wc -l

(This extracts the first part each line in the logfile, which is the hostname or IP address of the visitor, sorts these entries removing duplicates and then counts them)

However this doesn't take into accounts "visits per day", or "visits per month". In short if you wish to view interesting statstics like this you'll need to create a lot of different scripts.

Alternatively you could install a real statistics viewer which has already been created, such as awstats or analog.

Both of these tools work in exactly the same way. They will read in the logfile which Apache has produced, and then process the entries internally before producing a collection of HTML pages somewhere with statistics inside them.

The Debian packages will be installed to work with the default Apache configuration which Debian users, which has the logfile located in /var/log/apache/access.log. If you've moved this for your sites then you'll need to make changes.

Awstats

Awstats is a versatile logfile processor which is written in Perl.

You can see a sample of the output which it produces by looking at the online Awstats sample page - this shows you the unique visitors per month, top search requests which users used to find your site, and other information.

The awstats package is configured via the files in /etc/awstats/ directory. There is a global configuration file, and a local one which may be modified to make changes.

The most obvious changes to make are the following settings:

LogFile="/var/log/apache/access.log"

# Enter the log file type you want to analyze.
# Possible values:
#  W - For a web log file
#  S - For a streaming log file
#  M - For a mail log file
#  F - For a ftp log file
# Example: W
# Default: W
#
LogType=W

# Examples for Apache combined logs (following two examples are equivalent):
# LogFormat = 1
# LogFormat = "%host %other %logname %time1 %methodurl %code %bytesd %refererquo
t %uaquot"
#
LogFormat=4


SiteDomain=""

By default when awstats runs it merely produces a datafile in /var/lib/awstats, this will be dated by the time it has run. It doesn't produce static output files unless you update the configuration.

To view the statistics you must invoke an online CGI script which will take the statistics it has condensed and created then produce the output you can inspect from the browser.

To do that you must visit the following URL in your browser:

http://www.example.com/cgi-bin/awstats.pl

If you wish to have static HTML pages created instead you must run the following command line:

/usr/share/doc/awstats/examples/awstats_buildstaticpages.pl -update \
 -config=/etc/awstats/awstats.conf \
 -dir=/var/www/stats/ \
 -awstatsprog=/usr/lib/cgi-bin/awstats.pl

This will use the configuration file "/etc/awstats/awstats.conf", to build some static pages which it will place in "/var/www/stats".

As you can see this is quite a mouthful! However it's a simple thing to add to a script to run once a day.

If you do this then you should disable the default updating of the statstics which happens every ten minutes by removing the file /etc/cron.d/awstats - if you are building static pages only once a day it is a waste of time updating the statics for online viewing more often.

To handle multiple sites involves making a copy of the configuration file /etc/awstats/awstats.conf to a new name /etc/awstats/awstats.name.conf.

Once this is done you can then update the statistics for a single host by specifying on the command line:

-config=name

This will update the statistics for the named configuration file.

You can also examine the simple script /usr/share/doc/awstats/awstats-update which attempts to update all configuration files, modifying this to build static pages for each host is a simple enough matter.

Webalizer

Webalizer is a flexible webstats producer which is written in C, which helps make it nice and fast.

Installing the Debian package is as simple as running:

apt-get install webalizer

This will lead you through some basic questions by using debconf to prompt for answers.

By default the package will install a daily cron job which will cause the system to process the logfiles once a day, it will always run after the default Apache logfile rotation, which means that instead of examining the logfile /var/log/apache/access.log it will use the previous one /var/log/apache/access.log.1.

To configure the software you must look at the global file /etc/webalizer.conf.

There are at least two options you will need to adjust:

# LogFile defines the web server log file to use.  If not specified
# here or on on the command line, input will default to STDIN.
LogFile         /var/log/apache/access.log.1

# OutputDir is where you want to put the output files.  This should
# should be a full path name, however relative ones might work as well.
# If no output directory is specified, the current directory will be used.
OutputDir       /var/www/webalizer

The rest of the options you can adjust as you wish.

This works well for single sites, but if you have a group of websites all on the same machine you might need to make some changes.

The way that I handle multiple websites on one host is to place all the files beneath a common directory /home/www, such as:

/home/www/
|-- www.site1.com
|   |-- htdocs
|   |   `-- stats
|   `-- logs
`-- www.site2.com
    |-- htdocs
    |   `-- stats
    `-- logs

Here we have two sites www.site1.com, and www.site2.com, each has its own logs/ subdirectory where Apache places the logfiles.

To handle this simply you merely copy the default webalizer.conf file from /etc into each of the log directories:

cp /etc/webalizer.conf /home/www.site1.com/logs
cp /etc/webalizer.conf /home/www.site2.com/logs

Now if you make the changes to the configuration file so that each one has:

Logfile   access.log
OutputDir ../stats/

You can update the stats by running:

cd /home/www/www.site1.com/logs
webalizer -q
cd /home/www/www.site2.com/logs
webalizer -q

(The -q flag merely makes the program run quietly).

These two commands can be placed inside a shell script and invoked automatically be a cron job belonging to a user who can write to the stats directory - and you can remove the default job by running:

rm /etc/cron.daily/webalizer

The default output of the webalizer script can be seen in the sample reports which are available here on the webalizer site, and contain information about the number of unique visitors per month, the most popular directories and the popular files.

Each aspect of the report can be customized by following instructions in the configuration file.

 

 


Posted by Anonymous (194.2.xx.xx) on Tue 8 Feb 2005 at 16:27
Hi, i'm using Debian stable on servers and testing on workstations.
On this servers, I run awstats successfully, except for automatic update.
I would like to use the logrotate script to automatically update my sites like said in the FAQ (http://awstats.sourceforge.net/docs/awstats_faq.html#ROTATE) to not loose data during the apache logrotate execution.
But, it doesn't work on Debian, nor stable nor testing.

Do you know this problem or a way to solve it ?
Or, maybe you can tell if your method don't loose data.

thanks.


[ Parent | Reply to this comment ]

Posted by Steve (82.41.xx.xx) on Tue 8 Feb 2005 at 16:33
[ Send Message | View Steve's Scratchpad | View Weblogs ]

"Doesn't work" is a little vague, but perhaps the error can be solved by following the instructions in /usr/share/doc/awstats/README.Debian - especially the notes on file permissions.

Steve
-- Steve.org.uk

[ Parent | Reply to this comment ]

Posted by Anonymous (68.38.xx.xx) on Tue 15 Feb 2005 at 08:28
Actually it works just fine. I'm not using the awstats deb package but download the source.. this shouldn't matter though.

In /etc/logrotate.d/ I create a file named after the domain so for example /etc/logrotate.d/domainname which handles the logrotation for that domain. Simply make use of the prerotate and endscript features. Here is a working example:
/home/user/www/logs/access.log {
  daily
  missingok
  rotate 120
  compress
  delaycompress
  notifempty
  create 644 root root
  sharedscripts
  prerotate
    /home/user/www/AWStats/cgi-bin/awstats.pl -update -config=/home/user/www/AWStats/cgi-bin/awstats.conf >/dev/null 2>&1
  endscript
  postrotate
    /etc/init.d/apache reload #>/dev/null 2>&1
  endscript
}
Comment or uncomment depending on what you wish to be notified about. Initial tests comment out >/dev/null 2>&1 so you receive any errors. After that though uncomment it so you don't get an unneeded email. However I do like to see that the apache process was successfully reloaded without errors so I leave it commented.

Hope this helps! - Rob

[ Parent | Reply to this comment ]

Posted by Anonymous (216.220.xx.xx) on Wed 9 Mar 2005 at 18:35
I just wanted to add a note for noobies like me who are looking, as I was when I orginally discovered this post, for information on choosing between the awstats and webalizer packages. Obviously it is best to try both and decide for yourself, but here are my impressions of each.

Awstats appears to be more feature rich than webalizer, as it may be used to produce reports on a larger array of logs (httpd, mail, ftp...), and there are a number of useful plugins developed for it. Also, awstats' html reports are much easier on the eyes than those generated by webalizer. On the other hand, awstats takes a bit more effort to configure and may require you to change (weaken?) the permissions of your apache logs. Of the two, I felt awstats had more inherent security risks than webalizer; and, in fact, a rather infmaous flaw was discovered recently in awstats that led to the compromise of some well known sites.

Webalizer is quite easy to set up, as the debian developers have put a lot of effort into the package and debconf walks you through the installation nicely. Webalizer's reports are perfectly useful and adequate.

[ Parent | Reply to this comment ]

Posted by Anonymous (217.113.xx.xx) on Sat 14 May 2005 at 12:45
Yep, awstats have had quite nasty bug, but on other hand you don't need to allow to watch your statistics by everyone! Just put something like snippet below in your apache2.conf:
        <Directory "/usr/lib/cgi-bin">
                AllowOverride None
                Options ExecCGI -MultiViews +SymLinksIfOwnerMatch
                Order allow,deny
                Allow from all

                AuthType basic
                AuthName "cgi-bin restricted"
                AuthUserFile /etc/awstats/awstats.htpasswd

                <Files "awstats.pl">
                       Require valid-user
                </Files>
        </Directory>
then create password file with:
        htpasswd -c /etc/awstats/awstats.htpasswd yourlogin
(just don't forget not to use -c when you will be adding next user...)
And from now on only authorized persons can watch your statistics.

Except that—if you are security paranoid, you can additionally secure password by allowing access to stats by https only. Even if you have multiple vhosts and only one ip—by default awstats shows statistics for domain in which it was run, but you can also use:
https://www.httpsdomain.org/cgi-bin/awstats.pl?config=www.httponl y.org to access other domains.
So if you make redirect in each vhost section of apache2 http config similiar to:
        Redirect /cgi-bin/awstats.pl https://www.httpsdomain.org/cgi-bin/awstats.pl?config=www.httponl y.org
you have simpler path to write and access to unencrypted awstats page blocked.

(Sorry for posting the same two times, but the forum has a small bug that removed some lines previously. This time everything should be ok, I hope.)

[ Parent | Reply to this comment ]

Posted by Anonymous (217.64.xx.xx) on Thu 29 Sep 2005 at 10:30
awk '{print $1}'
(This extracts the first part each line in the logfile, which is the hostname or IP address of the visitor, sorts these entries removing duplicates and then counts them)

Does it?! Which version of awk are you using? ;-) ITYM something along the lines of
awk '{print $1}' | sort -u | wc -l
or awk '{print $1}' | sort | uniq -c | sort -rn -k1
for some report-like to look at.

[ Parent | Reply to this comment ]

Posted by Steve (82.41.xx.xx) on Thu 29 Sep 2005 at 11:27
[ Send Message | View Steve's Scratchpad | View Weblogs ]

Of course you are correct. I've updated the text.

I must have made a bad edit at the time ...

Steve
--

[ Parent | Reply to this comment ]

Posted by Anonymous (81.139.xx.xx) on Sat 1 Oct 2005 at 15:33
Hi Steve,
Thanks, yet another easy to understand article. As a newbie I'm not that sure how to invoke the command 'webalizer -q' using cron and a shell script. Could you amplify by any chance?

I have attempted to invoke the command with a crontab line to no avail.

In theory there would be many logs files to update depending on the number of domains on the server.

So is it possible to setup one script and cron job that updates all stat files??
Rgs Pete

[ Parent | Reply to this comment ]

Posted by Steve (82.41.xx.xx) on Sat 1 Oct 2005 at 15:37
[ Send Message | View Steve's Scratchpad | View Weblogs ]

Yes using one script is the best way. Say /usr/local/bin/update-stats has the following contents:

cd /home/www/www.steve.org.uk/logs
/usr/bin/webalizer -q 

cd /home/www/www.debian-administration.org/logs
/usr/bin/webalizer -q 

Then to run this script once a day use this in your crontab file:

0  0  *  *  *  /usr/local/bin/update-stats

(You can see a simple introduction to crontab here.)

Alternatively you could use logrotate to run the script, as described in this article. To do that modify /etc/logrotate.d/apache (or apache2) to have:

/home/www/*/logs/*.log {
        daily
        missingok
        rotate 5
        compress
        delaycompress
        notifempty
        create 644 root root
        sharedscripts
        prerotate
                /usr/local/bin/update-stats
        endscript
        postrotate
                /etc/init.d/apache2 restart
        endscript
}

(Of course I'm assuming you store your websites beneath /home/www/foo.com {htdocs logs cgi-bin} - that might not be how you do things ...)

Steve
--

[ Parent | Reply to this comment ]

Posted by Anonymous (81.139.xx.xx) on Sat 1 Oct 2005 at 19:11
Steve, thanks for the guidance,

So if I setup the file ‘update-stats’ in the following dir

/usr/local/bin/update-stats

And ‘update-stats’ contains the following lines

cd /home/compass/posh-promdresses.co.uk/logs
/usr/bin/webalizer -q

and append this to the crontab file

0 0 * * * /usr/local/bin/update-stats

The stats update magic should work right??

Need I adjust the webalizer.conf file which currently reads

# LogFile /var/log/apache/access.log.0
LogFile /var/log/apache/access.log.1

# OutputDir is where you want to put the output files. This should
# should be a full path name, however relative ones might work as well.
# If no output directory is specified, the current directory will be used.

OutputDir /var/www/webalizer

If I enter /usr/local/bin/update-stats and call update-stats I get a permission denied error :-(

Rgs Pete

[ Parent | Reply to this comment ]

Posted by Steve (82.41.xx.xx) on Sat 1 Oct 2005 at 19:14
[ Send Message | View Steve's Scratchpad | View Weblogs ]

Check the permissions of the output directory, and of the logfile.

Perhaps your user doesnt have read/write permission.

(Anything beneath /var/www is going to be unwritable to non-root users. Unless you make changes...)

Steve
--

[ Parent | Reply to this comment ]

Posted by suspended user gg234 (195.14.xx.xx) on Tue 11 Oct 2005 at 13:14
[ Send Message ]
I have installed webalizer and stats also working fine.Now i need to configure this stats for all 10 websites for this i am following your procedure.I have copied the webalizer.conf file and changed only the output directory to
../stats and logfile location is LogFile /var/log/apache2/access.log.1.

i am getting the following error

./logs: line 36: LogFile: command not found
./logs: line 42: OutputDir: command not found
./logs: line 65: Incremental: command not found
./logs: line 81: ReportTitle: command not found
./logs: line 92: HostName: command not found
./logs: line 244: HideSite: command not found
./logs: line 247: HideReferrer: command not found
./logs: line 250: HideReferrer: command not found
./logs: line 253: HideURL: command not found
./logs: line 254: HideURL: command not found
./logs: line 255: HideURL: command not found
./logs: line 256: HideURL: command not found
./logs: line 257: HideURL: command not found
./logs: line 263: GroupURL: command not found
./logs: line 303: IgnoreSite: command not found
./logs: line 304: IgnoreReferrer: command not found
./logs: line 325: MangleAgents: command not found


Thanks for your help

[ Parent | Reply to this comment ]

Posted by Steve (82.41.xx.xx) on Tue 11 Oct 2005 at 13:16
[ Send Message | View Steve's Scratchpad | View Weblogs ]

Looks like you're trying to execute the configuration file - that looks like a bash error.

Show the commands you're running as well as the result and it might be more clear what is going on...

Steve
--

[ Parent | Reply to this comment ]

Posted by suspended user gg234 (195.14.xx.xx) on Tue 11 Oct 2005 at 13:45
[ Send Message ]
sorry exactly correct and now i have tried to run the webalizer -q but the results are going to default o/p directory /var/www/webalizer/.


But i have copied(as you suggested file name logs) a separate conf file under apache2-default folder under this i have webalizer conf file(output dir change to ../stats) and i have created stats folder under apache2-default .

When i run webalizer -q output for this apache2-default folder is not copying to apache2-default/stats folder and it is empty.

hope this clears the doubt

thanks for your help

[ Parent | Reply to this comment ]

Posted by suspended user gg234 (195.14.xx.xx) on Tue 11 Oct 2005 at 16:18
[ Send Message ]
hi Steve,

Can you help me why i am not getting output the the perticular folder?.#

Thanks

[ Parent | Reply to this comment ]

Posted by Anonymous (71.255.xx.xx) on Sat 31 Dec 2005 at 05:50
You do NOT need to create your own custom script for multiple webalizer sites. Debian (sarge) comes with a script pre-written for this. It's in fact a lot easier to setup.

Goto /usr/share/doc/webalizer/, there you'll find "cron-multiple-config". Edit the file to remove the header comments. Now `cp /usr/share/doc/webalizer/cron-multiple-config /etc/cron.daily/webalizer`(back up /etc/cron.daily/webalizer if you ever decide to go back to just doing a single webalizer site). Now create /etc/webalizer/. Move /etc/webalizer.conf into it. Now create multiple webalizer.conf in that dir according to each site, and edit the fields respectatively.

[ Parent | Reply to this comment ]

Posted by Antras (85.206.xx.xx) on Sun 30 Apr 2006 at 10:14
[ Send Message ]
Can you advise any analyse-tool, which save its result not at the html-files, but at the database(MySQL e.g.)? Webalyser&AWstats can't works with database :(
I've found only sawmill, but it's shareware.

[ Parent | Reply to this comment ]

Posted by Steve (82.41.xx.xx) on Sun 30 Apr 2006 at 13:05
[ Send Message | View Steve's Scratchpad | View Weblogs ]

I'm unaware of any such thing.

Perhaps searching freshmeat will turn something up?

Steve

[ Parent | Reply to this comment ]

Posted by mpapet (208.179.xx.xx) on Fri 29 Dec 2006 at 17:48
[ Send Message ]
I checked the cron.daily script in Etch and found that the script runs all *.conf files found in /etc/webalizer.

I copied webalizer.conf over under a new name and changed paths for each of my web sites. Easy as pie.

Very informative article. Thanks!

[ Parent | Reply to this comment ]

Posted by Anonymous (121.246.xx.xx) on Fri 21 Aug 2009 at 14:12
CAN ANY ONE HELP ME, BY Mistake I have removed CGI-BIN and Webalizer from the website, and my website is not working, any suggestion or solution for it

[ Parent | Reply to this comment ]

Sign In

Username:

Password:

[Register|Advanced]

 

Flattr

 

Current Poll

Which init system are you using in Debian?






( 1609 votes ~ 7 comments )

 

 

Related Links