
Original Article found at: http://www.keithjbrown.co.uk/vworks/unix/logs.php
Workshop Requirements
In order to make the most of this workshop you should have:
- Access to an Apache Server that you have permission to administer.
- A good understanding of Apache Configuration.
- The ability to install software on your system.
Introduction
Unlike many of the other workshops that are aimed at learning within a development environment, this workshop is really
only of use within a production environment where you are interested in logs, hits and web traffics in general.
That is not to say that you should not first use a development environment on which to practice, but you will only see
the benefit when properly deployed.
This Virtual Workshop will cover how to filter information that you don’t want out of your web
server logs, how to implement
a log rotation strategy and how to automate the creation of log analysis reports using a variety of different software.
As I run my production servers in a Linux environment, there may be a slight bias towards a Linux way of doing things
(and because everything seems much easier to do with Linux). Efforts have been made throughout to ensure that the
majority of methods work with Windows as well.
What Don’t We Want to Keep?
This may seem like a strange question, but it is necessary to think about this as Apache can and will record a hit for
every file that is requested unless you tell it differently. This means that for one view of a web page every file that
is used on that page (images, CSS, external javascript etc) will be recorded as a hit. Most log analysis software will
sort out the actual page views etc when producing the stats, but unless you have specific desire to look at data to do
with images or stylesheets it is better to not even record it. There are also other things you may not wish to record,
such as search engine robots trawling your site or worms checking for files that IIS comes with in an attempt to gain
access to the server (we can adopt an air of superiority due to using Apache ;-). The effect of not recording all this
extra data means that the log files are kept significantly smaller as can be seen looking at two test logs which
record one week’s data.
Unfiltered:
-rw-r--r-- 1 keith keith 8967670 Mar 1 00:00 hits_log
Filtered:
-rw-r--r-- 1 keith keith 2562468 May 1 00:00 access_log
The unfiltered log is over three times as big as the filtered one. So having decided that we don’t want
to record all hits on the server we next need to set up the filters on the httpd.conf file. This is
done by creating a custom environmental variable then the existence of which (or not) acts as a filter when given as
an argument to the the CustomLog directive
(can be preceded by the not operator ‘!’). For example if we have created a filter called ‘mylogs‘:
CustomLog logs/access.log combined env=mylogs
This would
only log things hits that appeared in the filter. Or to ensure that everything EXCEPT the filter was logged:
CustomLog logs/access.log combined env=!mylogs
Obviously before a filter can be applied it must first be defined.
Continua”Web Server Log Rotation and Analysis”