A common problem is blocked communications: end user workstations can't communicate with DansGuardian, or DansGuardian can't communicate with its backend proxy (probably Squid), or the DansGuardian filter system can't communicate with the Internet.
Usually blocked communications are due to an old/forgotten/erroneous filtering rule in Shorewall or IPtables or your router/NAT box. Double-check your packet blocks, especially the old ones. And watch out for separate rules for ICMP that make `ping` packets behave differently than web surfing packets.
End user workstations should have a path to communicate with DansGuardian with destination TCP port 8080 and any source TCP port (assuming something like filterport=8080 and filterip=yourinternalIP in “dansguardian.conf”). DansGuardian should have a path to communicate with its backend proxy with destination TCP port 3128 (assuming something like proxyport=3128 and proxyip=127.0.0.1 in “dansguardian.conf”, http_port 127.0.0.1:3128 in “squid.conf”, and no interfering IPtables or Shorewall rules or policies yet). And the DansGuardian filter system (actually the backend proxy portion) should have a path to communicate with the Internet with destination TCP port 80 and any source TCP port.
Several Squid configuration options and option names are slightly different for 2.x series and 3.x series versions of Squid. For example while 2.x uses transparent, 3.x prefers intercepting. For another example log_uses_indirect_client in 2.x has no exact analogue in 3.x (although forwarded_for is closely related). Example Squid configurations may need to be slightly revised for the other series of Squid versions.
OS permissions problems might occur in a new or modified DansGuardian installation. (Sometimes problems will start after an upgrade for no apparent reason; ultimately the problem will be traced to the upgraded DansGuardian daemon running as a different user.) Most frequently this happens with either the DansGuardian log files or with anti-virus scanning. When there's a problem with log file permissions, DansGuardian will usually issue a message something like: Error opening/creating log file. (check ownership and access rights). I am running as nobody and I am trying to open /usr/local/var/log/dansguardian/access.log.
Like almost all Linux daemons (services:-), DansGuardian will always run as the same specific user, no matter who starts it or how. Typically you launch DansGuardian as the superuser, then it demotes itself to a more specific user (after a bit of startup which probably includes reading its configuration file). [Of course you could run DansGuardian in the foreground rather than as a daemon (service), either with the nodaemon parameter in its configuration file or with the -N command line parameter; this is something you might do during deep debugging (but not in normal production).]
The DansGuardian daemon user gets set one of two ways:
- hopefully the username:groupname is set in the configuration file
- if not, the built in fallback is used without further change
During building DansGuardian, the user can be specified as anything you like with the ./configure parameters --with-proxyuser=… and --with-proxygroup=…. If your DansGuardian was pre-built, you can find out what username and groupname was compiled in by executing dansguardian -v. Often the builder didn't explicitly specify anything at all (rather than risking choosing wrongly:-), so the default value of nobody:nobody is often used. Most likely you will want to specify some specific user (perhaps different than the built in, perhaps just duplicating it) by setting daemonuser=… and daemongroup=… in dansguardian.conf. Explicitly setting the daemon user and group in the configuration file protects you against upgrade problems occurring because the new DansGuardian has a different username:groupname as its built in fallback. (Of course you could also change file permissions on your system to make the compiled-in value such as nobody:nobody work as is.)
(Note specifying the user in the configuration file can sometimes be confusing, as the configuration file itself is accessed before the user is reset. In the vast majority of cases this is not a problem, as usually DansGuardian starts out executing as a superuser so it has no problem accessing the configuration file. But if something goes wrong, keep in mind while figuring out the problem that DansGuardian reads its configuration file while it's still running as the original user.)
In many cases your distribution has already selected an appropriate username and permissions scheme, and there's little or nothing more you need do (so long as you adhere to their system). If not, you may wish to do any one of:
- specify the same user as Squid (probably proxy:proxy)
- specify some user that already exists on your system (maybe daemon:daemon)
- specify a new user (such as dansguardian:dansguardian, which you may have to create with something like useradd)
On a customized system, to make things work you may need to tweak them yourself. You may need to do one or more of:
- Extend group memberships in /etc/group (for example adding username dansguardian as another member of group proxy)
- Provide the selected user with read access to everything under the DansGuardian configuration directory (ex: /etc/dansguardian) and in addition “search (-x)” permissions for all the directories (including the DansGuardian configuration directory itself)
- Provide the selected user with write access to everything under the DansGuardian log directory (ex:/var/log/dansguardian) and in addition “search (-x)” permissions for all directories (including the DansGuardian log directory itself)
- Specify that log files created during log rotation have the appropriate permissions and owner
Use your favorite administration tool or text editor or a GUI or commands like chmod -R …+… … and chown -R …:… … to do these things.
(There is disagreement in the DansGuardian community about whether nobody:nobody (or nouser:nogroup) is just a safety fallback value indicating that daemonuser= and daemongroup= have not yet been set when they should have been, or is a real value that should be made to work. [This is related to the larger question of whether many of the build (./configure) defaults are just minimal values, or are actually intended to be used by typical DansGuardians.] Be sure to follow the lead of your distribution in this regard: for example if your distribution supplies a pre-built DansGuardian and says to not use daemonuser= and daemongroup=, follow their instructions.)
If you're using anti-virus scanning, optimally DansGuardian and clamd should run as the same user:group. The minimal requirement is somewhat less, as DansGuardian gives “group read” access to its temp files, so just clamd being a member of the DansGuardian daemon group is sufficient.
Incomplete DansGuardian daemon permissions setting can be another cause of the rather mysterious message Unable to getgrnam(): Success. (The most frequent cause is the value of daemongroup= is not defined in /etc/group, usually because only `useradd` was executed and `groupadd` was forgotten.) The user:group that DansGuardian runs as (possibly nobody:nogroup) must not be forbidden from accessing /etc/group. For example you may need to make the userid that DansGuardian runs as a member of group 'daemon'.
Specifying the loopback address rather than an interface address in squid.conf (http_port 127.0.0.1:3128) is part of preventing users from skipping around DansGuardian. It allows Squid to communicate only with DansGuardian itself, not with any end user computer.
Although this may be exactly what you eventually want, it can be very limiting during debugging. In particular it prevents end user computers from communicating directly with the Squid half of a DansGuardian/Squid system, something that can be very useful when isolating a problem.
During debugging you may need to modify your squid.conf. For example during debugging it may be prudent to change this line in squid.conf to simply http_port 3128.
Normally log files are rotated on a regular schedule. Old log files are rotated out, then compressed, and eventually deleted. And new log files are started.
If you have a problem with this process sometimes working but other times not because the new log files are not always owned by the same user, most likely you have more than one mechanism trying to rotate the log files at the same time. Look in places like /etc/logrotate.d/dansguardian, /etc/syslog.conf, and DGBIN=`which dansguardian`;`dirname $DGBIN`/logrotate, and remove all but one logrotation method.
Although the default weighted phrase list activations work okay initially, they seldom meet your specific needs all that well. The weighted phrase lists are categorized; you can easily turn whole sets of weighted phrases on or off to suit your particular environment. Do this by simply commenting or uncommenting individual lines in lists/weightedphraselist. (Insert a sharp [#] at the front of lines you want to deactivate, remove any sharps from lines that you want to be active.)
Activate the lines for all the categories you care about. But don't activate too many more lines than necessary (especially not those for languages none of your users can read anyway, most especially not for Chinese, Japanese, or Malay). Because of the inevitable false positives even with sophisticated weighted phrase list scoring, every category you activate will block a few more legitimate web pages. If you uncomment all the weighted phrase categories, browsing the web is likely to become overly difficult. The solution? “Don't do that.”
The whole area of which daemons (services) are started at which time varies from system to system. If you want DansGuardian to start automatically at boot time, you may need to explicitly issue the appropriate commands yourself. For some guidance, see Starting DansGuardian Automatically At Boot.
Often the first hurdle encountered when going down a wrong path is that the Squid stub logs give the same source for all requests: IP 127.0.0.1 (“localhost” or “loopback”). When this comes up, it often makes more sense to back up a bit then go down a different path. (See Usage#1 in the Configuration/Usage portion of the Wiki FAQ.)
This issue often comes up in the context of Log File Analysis. Another common situation is when a Squid system is being replaced by a DansGuardian system, as although the local proxy is being used a different way for a different purpose, it's still named Squid. At least in the case of Log File Analysis, the real underlying problem is usually looking at the wrong logfile. In a combined DansGuardian/Squid system there are two logfiles. Complete information is in /var/log/dansguardian/access.log, while /var/log/squid/access.log is just a stub log that only contains information about web requests that were not preemptively blocked by DansGuardian.
Neither correct DansGuardian operation nor analyzing log files normally require recording the original source address in the Squid stub logs. If you nevertheless wish to do so, you can by setting some configuration options in DansGuardian (and maybe in Squid too).
Many “blacklists” actually categorize websites; they list all websites, not just bad websites.
With these blacklists you should ban only the categories you consider “bad”, rather than all the website categories. For example you probably don't want to ban the “homerepair” category (or maybe you do), and depending on your environment you may or may not not want to ban the “mail” category.
To enable or disable blacklist categories, edit lists/bannedsitelist. Activate categories by deleting any sharp [#] at the front of line, and deactivate categories by inserting a sharp at the front of the line. (You should make similar changes in lists/bannedurllist and lists/weightedphraselist too if the category is available there [often it's not].) Most likely you shouldn't simply uncomment all the lines.
(Note some blacklists contain additional categories for which there's no predefined .Include line, such as the category “searchengines”. There's no predefined .Include line for these because it's unlikely anyone will ever want to ban them. But if you really do want to ban them, you're free to add your own .Include line.)
(Although typical DansGuardian installations only use one blacklist, you can set up Multiple Black Lists if you wish. DansGuardian's default configuration emphasizes convenience and so tries to match the single most frequently used blacklist, but DansGuardian configuration can be expanded far beyond the defaults if you wish.)
When an end user computer accesses Squid directly, Squid sees the request source IP address as being that of the end user computer. But when the same end user computer accesses DansGuardian which then accesses Squid, Squid sees the request source IP address as being that of DansGuardian (probably 127.0.0.1). A Squid ACL that checks the source IP address against the local network will work fine for direct Squid access, yet fail when DansGuardian is inserted into the path.
If the relevant parts of your squid.conf look something like
acl localhost src 127.0.0.1/32 # define for later use acl localclients src 192.168.0.1/24 # define for later use ... http_access allow localclients # allow LAN to web http_access deny all # default ACL end
try changing your squid.conf to something like this
acl localhost src 22.214.171.124/32 # define for later use (no change) acl localclients src 192.168.0.1/24 # define for later use (no change) ... http_access allow localhost #<== add http_access allow localclients # allow LAN to web (no change) http_access deny all # default ACL end (no change)
Bumbling the syntax of sites –such as including "http://" or “www” or a leading dot (per Squid)– will make DansGuardian act as though the entry weren't there. Here's an explication of the Domain Name System (DNS), followed by a few simple rules of thumb. Following the rules of thumb at the end of this item will probably solve your problem.
In DansGuardian terminology a “site” can be either
- the name of one specific host
- the name of a domain or subdomain (which contains many hosts)
There's no general way to know whether a particular name is a “host” or a ”(sub)domain”; they look exactly the same. Fortunately for the purposes of DansGuardian it hardly matters.
This applies especially to the entries in the …sitelist files and also to the first part of each entry in the …urllist files.
The pieces of a domain name are separated by dots and should be read right to left. Domain names form a simple strict hierarcy. The rightmost portion –sometimes referred to as the top level domain or tld– is the most general: for example org for all organizations in the U.S.. The second portion identifies the specific organization and is the part that requires some kind of registration: for example foobar.org is organization Foobar. All the other portions further to the left are the responsibility of Foobar itself; they are not the responsibility of the network cloud, IANA, NIC, or ICANN.
Here's an example. Suppose:
- www.foobar.org is the specific host that runs a web server
- foobar.org is both the main domain name and an allowable shortcut to www.foobar.org
- yuck.foobar.org is a subdomain (controlled by the owner of foobar.org)
- bake.foobar.org is another subdomain (also controlled by the owner of foobar.org)
- ick.yuck.foobar.org and bletch.yuck.foobar.org are two specific hosts in a subdomain
Then in lists/bannedsitelist:
foobar.org # disallow all hosts (at least 3) named *.foobar.org, # regardless of whether or not they're in a subdomain www.foobar.org # DON'T DO THIS # disallow only host www.foobar.org if accessed by that name, # yet allow access by the shortcut name foobar.org bake.foobar.org # disallow all hosts named *.bake.foobar.org yuck.foobar.org # disallow all hosts (at least 2) named *.yuck.foobar.org ick.yuck.foobar.org # disallow this specific host bletch.yuck.foobar.org # disallow this specific host
The above description can be collapsed to just these rules of thumb:
- Omit "http://"
- Omit “www”
- Omit any leading period
(this may be different from some other software that won't work right without the leading period)
- Use the longest possible (i.e. most specific) entry that will work yet remain flexible
- If shorter entries already exist and they conflict with your new entry, try using both 'banned…' and 'exception… ' lists (the 'exception…' lists take precedence, but only for exactly what's specified in them, for example banning “foobar.org” then excepting “bake.foobar.org” allows any webservers named *.bake.foobar.org but disallows all the rest of the foobar.org webservers)
another alternative is to try lengthening the existing 'banned…' entries (without making them inoperative)
For years there has been a low level but naggingly persistent series of reports that DansGuardian doesn't run as reliably under OpenBSD as it does under Linux. Most users never see any problem at all …but a few unlucky ones do. Occasional failure of a DansGuardian child process may be tolerable, as recovery is automatic and the jerky operation is visible to only a single user. However frequent failure of all (or at least most) DansGuardian child processes, or failure of the DansGuardian parent process, will not be tolerable.
To improve the web search rankings of this important question, its detailed answer has been moved out to its own separate document. (Also see questions Installation#26 and Installation#26b in the Wiki FAQ.)
In some circumstances some DansGuardian executables will refuse to start up after issuing a message something like this:
dansguardian: error while loading shared libraries: libclamav.so.5: cannot open shared object file: No such file or directory
This strange dependency on ClamAV can manifest even if you don't use any anti-virus at all and have configured your dansguardian.conf accordingly.
Eliminating this weird ClamAV library dependency is always possible (in fact straightforward); but both build-time (./configure) and run-time (dansguardian.conf) options may need to be adjusted the first time. The easiest way to correct the build-time options may be to obtain a corrected DansGuardian package. (Another alternative is to re-build the dansguardian executable yourself.)
When building DansGuardian, use the –enable-clamd ./configure option, but not the –enable-clamav option too. In an ideal world, all DansGuardian packages obtained from distribution repositories should already be built this way. However in the real (not ideal) world, repository errors are possible. Once DansGuardian is bult correctly, you can then control whether or not to use ClamAV purely through the configuration options in dansguardian.conf; in other words once the build/configure options are correct, you will never need to revisit them no matter what you do with anti-virus.
In dansguardian.conf, use the 'clamdscan' option rather than the 'clamav' option. The 'clamdscan' option interfaces to ClamAV through the interprocess named pipe socket provided by the clam daemon. (The old 'clamav' option tries to interface to ClamAV through a version dependent library [a *nix “shared object” (.so) is analogous to a Windows “dynamic link library” (.dll)] which is probably no longer supported nor even available.)