Posted by Steve on Mon 31 Jan 2005 at 00:28
It's very useful to be able to view the statistics of websites, to see how visitors are finding your sites, which pages are the most popular, etc. Debian contains several packages for presenting this information to you, and here we'll look at two of them.
When it comes to viewing statistics of your website there are a few things that you have to bear in mind:
With those caveats out of the way the package you'll choose to display your statistics will probably depend on two things:
In a very simple way the total number of visits to your website can be achieved by merely counting all the lines inside your apache access log with the following command:
wc -l /var/log/apache/access.log
However this doesn't take very much into account, for example a single visit to your website to view the front page might result in multiple requests, for example to load a CSS file and a group of graphics.
Similar simple statistics can be achieved from the command line, such as showing the number of unique visitors to your site:
awk '{print $1}' | sort -u | wc -l
(This extracts the first part each line in the logfile, which is the hostname or IP address of the visitor, sorts these entries removing duplicates and then counts them)
However this doesn't take into accounts "visits per day", or "visits per month". In short if you wish to view interesting statstics like this you'll need to create a lot of different scripts.
Alternatively you could install a real statistics viewer which has already been created, such as awstats or analog.
Both of these tools work in exactly the same way. They will read in the logfile which Apache has produced, and then process the entries internally before producing a collection of HTML pages somewhere with statistics inside them.
The Debian packages will be installed to work with the default Apache configuration which Debian users, which has the logfile located in /var/log/apache/access.log. If you've moved this for your sites then you'll need to make changes.
AwstatsWebalizerAwstats is a versatile logfile processor which is written in Perl.
You can see a sample of the output which it produces by looking at the online Awstats sample page - this shows you the unique visitors per month, top search requests which users used to find your site, and other information.
The awstats package is configured via the files in /etc/awstats/ directory. There is a global configuration file, and a local one which may be modified to make changes.
The most obvious changes to make are the following settings:
LogFile="/var/log/apache/access.log" # Enter the log file type you want to analyze. # Possible values: # W - For a web log file # S - For a streaming log file # M - For a mail log file # F - For a ftp log file # Example: W # Default: W # LogType=W # Examples for Apache combined logs (following two examples are equivalent): # LogFormat = 1 # LogFormat = "%host %other %logname %time1 %methodurl %code %bytesd %refererquo t %uaquot" # LogFormat=4 SiteDomain=""By default when awstats runs it merely produces a datafile in /var/lib/awstats, this will be dated by the time it has run. It doesn't produce static output files unless you update the configuration.
To view the statistics you must invoke an online CGI script which will take the statistics it has condensed and created then produce the output you can inspect from the browser.
To do that you must visit the following URL in your browser:
http://www.example.com/cgi-bin/awstats.plIf you wish to have static HTML pages created instead you must run the following command line:
/usr/share/doc/awstats/examples/awstats_buildstaticpages.pl -update \ -config=/etc/awstats/awstats.conf \ -dir=/var/www/stats/ \ -awstatsprog=/usr/lib/cgi-bin/awstats.plThis will use the configuration file "/etc/awstats/awstats.conf", to build some static pages which it will place in "/var/www/stats".
As you can see this is quite a mouthful! However it's a simple thing to add to a script to run once a day.
If you do this then you should disable the default updating of the statstics which happens every ten minutes by removing the file /etc/cron.d/awstats - if you are building static pages only once a day it is a waste of time updating the statics for online viewing more often.
To handle multiple sites involves making a copy of the configuration file /etc/awstats/awstats.conf to a new name /etc/awstats/awstats.name.conf.
Once this is done you can then update the statistics for a single host by specifying on the command line:
-config=nameThis will update the statistics for the named configuration file.
You can also examine the simple script /usr/share/doc/awstats/awstats-update which attempts to update all configuration files, modifying this to build static pages for each host is a simple enough matter.
Webalizer is a flexible webstats producer which is written in C, which helps make it nice and fast.
Installing the Debian package is as simple as running:
apt-get install webalizerThis will lead you through some basic questions by using debconf to prompt for answers.
By default the package will install a daily cron job which will cause the system to process the logfiles once a day, it will always run after the default Apache logfile rotation, which means that instead of examining the logfile /var/log/apache/access.log it will use the previous one /var/log/apache/access.log.1.
To configure the software you must look at the global file /etc/webalizer.conf.
There are at least two options you will need to adjust:
# LogFile defines the web server log file to use. If not specified # here or on on the command line, input will default to STDIN. LogFile /var/log/apache/access.log.1 # OutputDir is where you want to put the output files. This should # should be a full path name, however relative ones might work as well. # If no output directory is specified, the current directory will be used. OutputDir /var/www/webalizerThe rest of the options you can adjust as you wish.
This works well for single sites, but if you have a group of websites all on the same machine you might need to make some changes.
The way that I handle multiple websites on one host is to place all the files beneath a common directory /home/www, such as:
/home/www/ |-- www.site1.com | |-- htdocs | | `-- stats | `-- logs `-- www.site2.com |-- htdocs | `-- stats `-- logsHere we have two sites www.site1.com, and www.site2.com, each has its own logs/ subdirectory where Apache places the logfiles.
To handle this simply you merely copy the default webalizer.conf file from /etc into each of the log directories:
cp /etc/webalizer.conf /home/www.site1.com/logs cp /etc/webalizer.conf /home/www.site2.com/logsNow if you make the changes to the configuration file so that each one has:
Logfile access.log OutputDir ../stats/You can update the stats by running:
cd /home/www/www.site1.com/logs webalizer -q cd /home/www/www.site2.com/logs webalizer -q(The -q flag merely makes the program run quietly).
These two commands can be placed inside a shell script and invoked automatically be a cron job belonging to a user who can write to the stats directory - and you can remove the default job by running:
rm /etc/cron.daily/webalizerThe default output of the webalizer script can be seen in the sample reports which are available here on the webalizer site, and contain information about the number of unique visitors per month, the most popular directories and the popular files.
Each aspect of the report can be customized by following instructions in the configuration file.
This article can be found online at the Debian Administration website at the following bookmarkable URL:
This article is copyright 2005 Steve - please ask for permission to republish or translate.