Note this document is for DansGuardian version 2.4.2 and is dated 2002. In 2008 the currently stable version is the 2.9/2.10 series, and current development is occurring in the 2.11 series.
DansGuardian 2.4.2 Detailed Installation Guide
Fri 29th March 2002 - Draft 3
Originally by Daniel Barron, modified by GB
This is not a HOWTO. This document is an attempt at detailed installation instructions for DansGuardian 2.4.1. If you just want to get DansGuardian up and running as quickly as possible, you might want to read the brief installation guide.
The advantages of DansGuardian 2.4.2 over 2.2.x are as follows:
- Improved gcc 3 compatability.
- Better use of stanard C++ libraries rather than C libraries.
- It uses an advanced Deterministic Finite Automata Graph algorithm. This means the searching on large phrase lists is significantly faster. For example, a 1500 phrase list will be at least 4 times faster.
- Almost all console errors are logged in SysLog for easier problem solving.
- Weighted phrases are supported. This is a method for giving phrases positive or negative values and calculating a total. If the page contains too much bad material it is blocked. The found weighted phrases can be optionally logged and reported.
- The exception matching optionally logs exception hits, making it easier to troubleshoot non-blocking of sites.
- The log format includes the size of the requested page or file and is optionally in CSV format.
- Blanket IP blocking logs the IP the user was trying to get to.
- Banned, weighted and exception phrases can all use combinations.
- There is a banned user list and banned IP list.
- Better Debian and RedHat 7.2 support has been added to the configure and Makefiles.
- Basic URL filtering for https.
- PICS filtering can be switched off globally.
- Exception URL support.
- Exception phrases are now in a different file for ease of maintenance.
- No cgi perl script or web server is required to display the Access Denied page as DansGuardian 2.4.2 has the ability to display a template HTML file itself.
The following are a list of the configuration differences between the two:
- Exception phrases are now in the exceptionphraselist file for ease of use (were in bannedphraselist).
- Additional configuration files are: bannediplist, banneduserlist, exceptionphraselist, exceptionurllist, weightedphraselist.
- Extra options in dansguardian.conf file: reportinglevel (included option for HTML template), htmltemplate (path), naughtyness limit (for weighted phrases), show weighted phrases found, weightedphraselist, log exception hits, log file format. (and possibly some more)
You must already be running a fairly recent distribution of Linux, FreeBSD, OpenBSD or Solaris. You also need to have squid running and configured on port 3128.
DansGuardian is a filtering pass-through that sits between the client browser and the Squid proxy. It listens on port 8080 and connects to squid on port 3128. So you must have no other daemon running already using port 8080.
You will need a web server. Apache (httpd) that comes with RedHat 6.2 is absolutely perfect and you would be hard pushed to find a better web server. The server is used to display a cgi perl script that gives the user notification of an 'Access Denied'.
You will need the standard development tools installed such as glibc, autoconf, gcc and make. Debian users will also require zlib1g-dev. The default Redhat 6.2 installation and most others come with these installed so you don't need to worry about them.
Most of the time during this work you will need to be logged on as root. A more experienced user will be able to determine when, however for most people - stay logged on as root for the entirety.
Make sure you have all of the above installed and working before you continue.
Obviously, you need DansGuardian. You can download it from here: DansGuardian, or from the US mirror site.
It is recommended that you download and install from the source as this is always the most up to date. However, packages are available for some distributions. For the purpose of these instructions, we shall be installing from the source.
If you just want to quickly get on with the installation and avoid all this detail, don't forget you can simply follow the brief installation guide.
1. Download DansGuardian-4.2.*.source.tar.gz into a temporary area, and untar with a tar -zxpf. This will create the subdirectory DansGuardian-4.2.*.
DansGuardian uses gnu autoconf, auto-generating the Makefile with the “configure” script.
2. cd into this new directory. Run the configure script with the help option (./configure –help) to see the user selectable settings (see table below). option description default value
bindir where the binary gets placed /usr/sbin/
sysconfdir where the config and data files gets placed /etc/dansguardian/
sysvdir where the startup script gets placed /etc/rc.d/init.d/
cgidir where the cgi-bin dir is located /home/httpd/cgi-bin/
mandir where the man docs get placed /usr/man/
logdir where the logs get place /var/log/dansguardian/
runas_usr the system user the daemon runs as nobody
runas_grp the system group the daemon runs as nobody
piddir where the pid file gets placed /var/run/
3. Run configure with the appropriate options. For examples, see section Example Configure Scripts below.
RedHat 6.2 and 7.0 users can run configure with the default settings. RedHat 7.1 and 7.2 will have to change their cgidir option. Solaris, OpenBSD and FreeBSD will have to set just about all settings.
4. Edit the Makefile and verify that all the settings are correct. If not, re-run the configure script. If you don't understand Makefiles, skip this step.
5. Run make to build DansGuardian (gmake for Solaris).
6. Run make install to create the directory structure, install all the files in the chosen paths and set permissions as appropriate.
7. Doing a make clean will remove the now un-needed binaries etc.
8. When a page is denied, DansGuardian redirects to a cgi perl script on your web server to report to the user. This makes it easy to customise the message. This server does not need to be the same machine as the DansGuardian filter server, however if it is not local you will need to amend or comment out the perl script copying line in the Makefile.
To configure the address of your web server that will display the access denied perl script, edit the accessdeniedaddress within dansguardian.conf (see sysconfdir above). For further configuration options, see the Options section.
9. The last thing that we need to do is configure the log rotation. Log rotation ensures that the filesystem does not fill up with a huge log file. Most daemons that log such as httpd and squid rotate their logs once a week. With DansGuardian, there are five log files: access.log, access.log.1, access.log.2 … and so on up to 4. Once a week, the oldest of the five log files is deleted, and the remaining logs' names are incremented by one, and a new access.log is created. The Makefile copies a log rotation script to the configuration directory. We need to schedule this script to run once a week, so (as root) do a crontab -e and enter the following:
59 23 * * sat /etc/dansguardian/logrotation
Then save. This will schedule it for 23:59 every Saturday. Of course you can schedule the log rotation differently or edit the script to your own liking. Alternatively, you can check out the log rotation scripts submitted by Freddie Cash on the extras page (near the bottom).
DansGuardian is now ready to go. You can start it by just running the binary (ie just type dansguardian and hit return). You can stop it with a dansguardian -q. Or you can use the SysV(-like) script provided. Run a dansguardian -h to see other command line options available.
Example Configure Scripts
Here are some examples of options for the configure script for different distributions:
FreeBSD A standard configure script that should work, provided you have installed FreeBSD and the associated programs in their default locations.
./configure –cgidir=/usr/local/www/cgi-bin/ \ –sysc \ –sysvdir=/usr/local/etc/rc.d/ \ –bindir=/usr/local/sbin/ \ –mandir=/usr/local/man/
OpenBSD A standard configure script that should work, provided you have installed OpenBSD and the associated programs in their default locations.
./configure –cgidir=/usr/local/www/cgi-bin/ \ –sysc \ –sysvdir=/usr/local/etc/rc.d/ \ –bindir=/usr/local/sbin/ \ –mandir=/usr/local/man/
RedHat 6.2/7.0 It is safe to run the configure (./configure) script with the defaults.
RedHat 7.1/7.2 ./configure –sysc \ –sysvdir=/etc/rc.d/init.d/ \ –cgidir=/var/www/cgi-bin/
Mandrake 7.2 ./configure –mandir=/usr/share/man/
Mandrake 8/8.1 ./configure –mandir=/usr/share/man/ \ –cgidir=/var/www/cgi-bin/
The system user 'squid' might be also appropriate rather than 'nobody' when configuring your system's “runas” option.
SuSE 7.2 ./configure –runas_grp=nogroup \ –cgidir=/usr/local/httpd/cgi-bin/
SuSE 7.3 ./configure –runas_grp=nogroup \ –cgidir=/usr/local/httpd/cgi-bin/ \ –sysvdir=/etc/init.d/
Solaris Only Solaris 8 (7/01) has been tested. DansGuardian requires the GNU version of make (gmake), and GCC version 2.95.3 (3.01 may work but is as yet untested and not recommended).
Both gmake and GCC 2.95.3 are included with the OS on the “Companion CD”, and are usually installed in /opt/sfw/bin. To ensure these are in your path, simply do (in csh; bash syntax is different):
setenv PATH /opt/sfw/bin\:$PATH
To configure and compile, do: ./configure –bindir=/opt/dansguardian/sbin/ \ –sysc \ –sysvdir=/etc/init.d/ \ –cgidir=/var/apache/cgi-bin/ \ –mandir=/opt/dansguardian/man/ \ –logdir=/opt/dansguardian/log/
DansGuardian is highly configurable. The source code is provided so you have the ultimate in configurability, although most people will be content with modifying the configuration files. If you do modify the source code please send what you've done to Daniel.
After you have modified any configuration file, to apply the changes you will need to restart DansGuardian. To do this, simply do a dansguardian -r.
There is one main configuration file, several banned lists and an exception list. These are all explained below:
This contains a list of domain endings that if found in the requested URL, DansGuardian will not filter the page. Note that you should not put the http:// or the www. at the beginning of the entries.
This contains a list of client IPs who you want to bypass the filtering. For example, the network administrator's computer's IP.
Usernames who will not be filtered (basic authentication or ident must be enabled).
If any of the phrases listed here appear in a web page then the filtering is bypassed. Care should be taken adding phrases to this file as they can easily stop many pages from being blocked. It would be better to put a negative value in the weightedphraselist.
URLs in here are for parts of sites that filtering should be switched off for.
IP addresses of client machines to disallow web access to. Only put IP addresses here, not host names.
This contains a list of banned phrases. The phrases must be enclosed between < and >. DansGuardian is supplied with an example list. You can not use phrases such as <sex> as this will block sites such as Middlesex University. The phrases can contain spaces. Use them to your advantage. This is the most useful part of DansGuardian and will catch more pages than PICS and URL filtering put together.
Combinations of phrases can also be used, which if they are all found in a page, it is blocked. Exception phrases are no longer listed in this file - see exceptionphraselist.
Users names, who, if basic proxy authentication is enabled, will automatically be denied web access.
This contains a list of banned MIME-types. If a URL request returns a MIME-type that is in this list, DansGuardian will block it. DansGuardian comes with some example MIME-types to deny. This is a good way of blocking inappropriate movies for example. It is obviously unwise to ban the MIME-types text/html or image/*.
This contains a list of banned file extensions. If a URL ends in an extension that is in this list, DansGuardian will block it. DansGuardian comes with some example file extensions to deny. This is a good way of blocking kiddies from downloading those lovely screen savers and hacking tools. You are a fool if you ban the file extension .html, or .jpg etc.
This contains a list of banned regular expression URLs. For more information on regular expressions, click here. Regular expressions are a very powerful pattern matching system. This file allows you to match URLs using this method.
This file contains a list of banned sites. Entering a domain name here bans the entire site. For banning specific parts of a site, see bannedurllist. Also, you can have a blanket ban all sites except those specifically excluded in exceptionsitelist. You can also block sites specified only as an IP address, and include a stock squidGuard blacklists collection. To enable these blacklists, download them from the extras section here. Simply put them somewhere appropriate, un-comment the squidGuard blacklists collection lines at the bottom of the bannedsitelist file, and check the paths are correct. For URL blacklists, edit the bannedurllist in a similar way.
This allows you to block specific parts of a site rather than the whole site. To block an entire site, see bannedsitelist. To enable squidGuard blacklists for URLs, you will need to download the blacklists and edit the squidGuard blacklists collection section at the bottom (as for bannedsitelist above).
Each phrase is given a value either positive or negative and the values are added up. Phrases to do with good subjects will have negative values, and bad subjects will have positive values. Once the naughtyness limit is reached (within dansguardian.conf), the page is blocked. See the Naughtyness Limit description within the dansguardian.conf section below.
This file allows you to finely tune the PICS filtering. Each PICS section comes with a description of the allowed settings and what they represent. The default settings with DansGuardian are set for youngish children, for example mild profanities and artistic nudity are allowed. PICS filtering can also be totally disabled / enabled using the enablePICS = on | off option.
For more detailed information on PICS ratings, click here. ICRA
The ICRA section is fairly self-explanatory. A value of 0 means nothing of that category is allowed, whereas a value of 1 allows it. For example,
ICRAnudityartistic = 1
allows nude art. For more in-depth information see here. RSAC
RSAC is an older version of ICRA. The values here range from 0 meaning none allowed, through 2 (the default value), to 4, which allows wanton and gratuitous amounts of the given category. For more in-depth information see here. evaluWEB
evaluWEB rating uses a system similar to the British Film classification system:
0 = U (Universal, ie. suitable for even the youngest viewer)
1 = PG (Parental Guidance recommended)
2 = 18 (Only suitable for viewers aged 18 and over) SafeSurf
Similar to RSAC, but containing a larger range of categories with the range from 0 = full filtering to 9 = wanton and gratuitous. For more in-depth information, see here.
See evaluWEB. For more in-depth information, see here.
This is yet another ratings scheme. See here for more information. dansguardian.conf
The only setting that is vital for you to configure in the dansguardian.conf file is the accessdeniedaddress setting. You should set this to the address (not the file path) of your Apache server with the perl access denied reporting script. For most people this will be the same server as squid and DansGuardian. If you really want you can change this address to a normal html static page on any server. Reporting Level You can change the reporting level for when a page gets denied. It can say just 'Access Denied', or report why, or report why and what the denied phrase is. The latter may be more useful for testing, but the middler would be more useful in a school environment. Stealth mode logs what would be denied but doesn't do any blocking.
Logging Settings This setting lets you configure the logging level. You can log nothing, just denied pages, text based and all requests. HTTPS requests only get logged when the logging is set to 3 - all requests.
Log Exception Hits Log if an exception (user, ip, URL, or phrase) is matched and so the page gets let through. This can be useful for diagnosing why a site gets through the filter.
Log File Format This setting alters the format of the DansGuardian log file. Please note option 3 (standard log format) is not yet unimplemented.
Network Settings These allow you to modify the IP address that DansGuardian is listening on, the port DansGuardian listens on, the IP address of the server running squid as well as the squid port. It is possible to configure the Access Denied reporting page here also.
Content Filtering Settings Here you can modify the location of the list files. Adjusting these locations is not recommended.
Naughtyness limit This setting refers to the weighted phrase limit over which the page will be blocked. Each weighted phrase is given a value either positive or negative and the values added up. Phrases to do with good subjects will have negative values, and bad subjects will have positive values. See the weightedphraselist file for examples. As a rough guide, a value of 50 is for young children, 100 for older children, 160 for young adults.
Show weighted phrases found If enabled then the phrases found that made up the total which exceeds the naughtyness limit will be logged and, if the reporting level is high enough, reported.
Reverse Lookups for Banned Sites and URLs If set to on, DansGuardian will look up the forward DNS for an IP URL address and search for both in the banned site and URL lists. This would prevent a user from simply entering the IP for a banned address. It will reduce searching speed somewhat so unless you have a local caching DNS server, leave it off and use the Blanket IP Block option in the bannedsitelist file instead.
Build bannedsitelist and bannedurllist Cache Files This will compare the date stamp of the list file with the date stamp of the cache file and will recreate as needed. If a bsl or bul .processed file exists, then that will be used instead. It will increase process start speed by 300%. On slow computers this will be significant. Fast computers do not need this option.
POST protection (web upload and forms) This is for blocking or limiting uploads, not for blocking forms without any file upload. The value is given in kilobytes after MIME encoding and header information.
Username identification methods (used in logging) The proxyauth option is for when basic proxy authentication is used (obviously no good for transparent proxying). The ntlm option is for when the proxy supports the MS NTLM authentication. This only works with IE5.5 sp1 and later, and has not been implemented yet. The ident option causes DansGuardian to try to connect to an identd server on the computer originating the request.
Forwarded For This option adds an X-Forwarded-For: <clientIP> to the HTTP request header. This may help solve some problem sites that need to know the source IP.
Max Children This sets the maximum number of processes to spawn to handle the incoming connections. This will prevent DoS attacks killing the server with too many spawned processes. On large sites you might want to double or triple this number.
Log Connection Handling Errors This option logs some debug info regarding fork()ing and accept()ing which can usually be ignored. These are logged by syslog. It is safe to leave this setting on or off.
There is none that is required as DansGuardian appears to squid just as a normal web browser. However…
We need to make sure that squid will not allow client browsers to bypass DansGuardian. This is a non trivial problem. What I will assume is that you have already blocked open web access (via firewall IP tables and IP chains) and the only way to access the web is through squid and hence DansGuardian. This goal is achievable in a number of ways.
You can modify the acl rules so that only localhost has access. In my squid.conf I had the following lines:
# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS # acl localnet src 192.168.42.0/255.255.255.0 http_access allow localnet http_access allow localhost http_access deny all
So I remmed out both the localnet lines with a #. I believe that the default configuration of squid is to only have localhost allowed so you probably don't even have to do this.
With proxy authentication things get a little bit more complex. With no DansGuardian in my squid.conf I have:
authenticate_program /usr/bin/smb_auth -W DOMAIN -U 192.168.0.2 acl domainusers proxy_auth REQUIRED http_access allow domainusers http_access deny all
Which allows authenticated users to access the proxy from anywhere.
However, it makes sense to let SSL through without going through DansGuardian, yet at the same time still prevent bypassing DansGuardian on other ports. So the same section became:
authenticate_program /usr/bin/smb_auth -W DOMAIN -U 192.168.0.2 acl domainusers proxy_auth REQUIRED acl linuxserver src 192.168.0.1/255.255.255.255 acl ntserver src 192.168.0.2/255.255.255.255 http_access allow linuxserver http_access allow ntserver http_access allow domainusers localhost http_access allow CONNECT SSL_ports http_access deny all
So for all web access (SSL or not) the user is required to be authenticated, but SSL is allowed to bypass DansGuardian. I also allow the NT and Linux servers to bypass the filtering and access squid directly.
ipchains method 1 You could redirect incoming requests on port 3128 from the local net to port 8080 while still allowing incoming from the localhost to access 3128. More specific detail is currently beyond the scope of these instructions.
ipchains method 2 - sneaky method A really sneaky method would be to configure squid to work as a transparent proxy and redirect all port 80 traffic to port 8080. You would want to include method 1 as well. More specific detail is currently beyond the scope of these instructions. There are HOWTOs that cover this.
Let's assume you have a Linux server at IP 192.168.0.1 which is a caching web proxy and intranet web server. On this server you have DansGuardian installed listening on port 8080.
You need to configure the client browser for http proxy 192.168.0.1 with port 8080. You can configure ftp the same as http - it is reported to work. That's it. But for efficiency you might want to set the 'no proxy on' to your local Apache server address - 192.168.0.1 in this case. If you've got DNS working, you can use the DNS address of your local server. I tend not to.
If you've used the sneaky method to configure squid then there should be no configuration required. Configure your browser for no proxy and maybe set the 'no proxy on' to your local Apache server address - 192.168.0.1 in this case.
This could be known as super sneaky I suppose. The problem with the simple config method is that you have to configure by hand every browser. This is a pain and is why the sneaky method is really quite good. The way Beebug solve this is by using a custom in-house Visual Basic application that loads upon login and modifies the needed Windows registry and various files so that IE and Netscape (including email) are configured automatically. It's so good, you can just stick a copy of Netscape on a server in a share and run this program at login and it automatically just works on all the PCs. No installation and a central copy so it's easy to update. It also has other features such as removing unwanted bits such as web folders, GMT/BST auto switching (already done by login script), app data, and other fixes. Unless I persuade the directors to release it open source you can't use this method without paying or writing your own. And guess who wrote this - yes - the DansGuardian author!
For all support issues, join the mailing list and post your question or comment there.
If you feel your message is not suitable for public viewing and is private (for example asking for pricing or other commercial issues) then email the author direct. The address is author at dansguardian dot org.
You can also get further help from the DansGuardian web site dansguardian.org.
Any comments about this document, email gb at dansguardian dot org.