Analyzing the extensive activity log produced by DansGuardian is often very useful. Choosing the “best” log analyzer depends on your purpose. You may want to
- produce summaries of your network's activity, or
- summmarize the activity of DansGuardian/Squid, or
- zero in on the activities of a particular individual at a particular time.
Tools that display individual log entries might be more properly called “log viewers”. But they're often lumped in with true log analyzers, and sometimes the same tool does both. Sometimes a tool that produces a good system usage report or reports from the logs serves the other purposes too by acting as an interactive analysis tool.
(Events that require taking some action right away rather than waiting until the end of the day –such as denying future web access to a user who's tried to access too many blocked webpages– may be more appropriately done when the blocked webpage message is delivered rather than by analyzing log files. )
Frequently used are Webmin (which may require a version marked “alpha” or “beta” for use with recent DansGuardian versions) and SARG (which actually analyzes the Squid stub logs and doesn't generate any DansGuardian-specific statistics). (Webmin originally encapsulated the functionality of the standalone script dglog.pl, of which an earlier version is also available in self-contained form from the “Extras and Add-Ons” section of the DansGuardian website. The Webmin DansGuardian Module Log Analysis Tool has since been enhanced quite a bit, and has now superceded the original dglog …although a dglog2 may now exist.)
Mostly using a generic web log analysis tools with DansGuardian will involve some sort of compromise. Fully analyzing DansGuardian information without compromise –for example showing filtered categories and failed circumvention attempts– requires using a DansGuarian-specific tool. If you wish to use such a tool, your only options seem to be i) the log analysis portion of the Webmin DansGuardian Module [which can be used without committing to the rest of the module], ii) a program called dglog2 [if you can obtain it], and iii) some commercial software.
More DansGuardian log analyzers and pieces –complete tools, programming frameworks, utilities, and so forth– are listed under the “Log file analysis:” heading at http://dansguardian.org/?page=extras.
The most significant enhancements to the Log Analysis Tool in the Webmin DansGuardian Module 0.7.0beta1b and later are:
- the ability to run without the Webmin interface being visible at all
- the ability to run on a different computer than DansGuardian (thus eliminating any possibility of interfering with production performance)
- the ability to run unattended to produce (regular periodic?) batch reports
- the ability to analyze logs written in any language supported by DansGuarian (not just 'ukenglish')
- optional display of detailed regular expression matching information
- optional display of and filtering on browsers' (and other tools') user agent string
To some extent it's possible to widen this list further by either a) analyzing the Squid stub logs instead or b) writing the DansGuardian log in a different format. DansGuardian has the option of writing its log in “Squid Log File Format”. (Note this is only the native Squid format, the one Squid produces by default. The DansGuardian option will not magically produce any other formats that might be produced by Squid, in particular DansGuardian will not simulate the format produced by Squid if emulate_httpd_log on is set in squid.conf.) Willingness to pay a little money to move beyond Open Source Software opens further possibilities such as Sawmill (http://www.sawmill.net).
It's usually better to have tools analyze the Squid stub logs directly rather than the DansGuardian logs in Squid format, as forcing DansGuardian to write its logs in Squid format loses quite a bit of information which might have been useful to some other analysis. (See below for a solution to the common problem with analyzing Squid logs of having the source IP always be 127.0.0.1 rather than the real originator.) Leave the DansGuardian logs in the DansGuardian native format so you have the option of consulting them for more detailed information after your analysis of the Squid stub logs identifies a potential issue.
A different alternative is the log file format produced by web servers, sometimes called either “httpd log format” or “common log file format”. Squid can produce this format by setting emulate_httpd_log; DansGuardian however cannot produce this format. As a result tools that analyze web server logs, such as “awstats” and “webalizer” (which would likely not produce very meaningful results even if they could be made to work), will not read any of the log formats that DansGuardian can produce (they might be able read the limited information from the Squid stub logs).
It's usually possible in theory to have a script “convert” logs from the format they were written in to a format that can be read by your analysis tool. But doing so may not be very helpful; these log analysis tools will usually be so far removed from what DansGuardian actually does that their output reports are not very useful in a DansGuardian/Squid environment.
In a DansGuardian/Squid system, it makes most sense to analyze the DansGuardian logs. In a combined system the Squid stub logs won't give you much insight into either the activity on your network or the behavior of the combined DansGuardian/Squid system. In particular, the Squid stub logs will not contain any information about web pages that were blocked. Analyzing this incomplete picture of network activity may produce misleading statistics.
If you nevertheless find it necessary to analyze the Squid stub logs, the first issue that will occupy your attention will probably be that everything in the Squid log appears to originate from the same address, 127.0.0.1 (“localhost” or “loopback”). This makes sense as in this environment all requests to Squid come from DansGuardian. You may desire to instead have the Squid logs point at the “real” originating IP rather than at DansGuardian.
To do this, you'll need to both 1) have DansGuardian forward the information to Squid (which would otherwise not even have the information and so of course not be able to display it), and 2) have Squid include the information in its logs.
To make 1) happen, set forwardedfor = on in dansguardian.conf. This will cause DansGuardian to add an X-Forwarded-For: header containing the IP address of the real originator to every web request it passes to Squid.
To make 2) happen is different for different releases of Squid, and will usually (but not always) happen by default. For Squid 2.5 and before, you must apply a source code patch and rebuild Squid. The source code patch is available on the DansGuardian website by clicking on “Extras and Add-Ons” and under the “3rd Party plugins and patches for squid” heading fetching “Patch for squid that makes it log the X-Forwarded-For IP”. For Squid 2.6 and 2.7, set log_uses_indirect_client on (which in turn requires something like follow_x_forwarded_for allow localhost) in squid.conf. (This is the default Squid configuration, so it may work without explicit settings.) For Squid 3.0, set forwarded_for on in squid.conf. (This is the default Squid configuration, so it may work without explicit settings.)
Note that as a side effect of these settings, in many cases Squid will send the X-Forwarded-For: header on to the actual website, thus exposing some of your internal IP addresses and possibly allowing websites to disentangle individual users. (You can prevent this in Squid 3.1 and later by specifying forwarded_for delete.) You may or may not decide that having the “real” origin IP address in the Squid logs is so important that it overrides any possible security and privacy concerns.