DansGuardian Documentation Wiki

You are here: Main Index » common_problems


|

Wiki Information

This is an old revision of the document!


Common Problems and their Solutions

Check the entire documentation portion of this wiki, especially the Filter Usage and Tips section. Also check the Wiki FAQ

Allow Traffic Through Firewall/Router

A common problem is blocked communications: end user workstations can't communicate with DansGuardian, or DansGuardian can't communicate with its backend proxy (probably Squid), or the DansGuardian filter system can't communicate with the Internet.

Usually blocked communications are due to an old/forgotten/erroneous filtering rule in Shorewall or IPtables or your router/NAT box. Double-check your packet blocks, especially the old ones. And watch out for separate rules for ICMP that make `ping` packets behave differently than web surfing packets.

End user workstations should have a path to communicate with DansGuardian with destination TCP port 8080 and any source TCP port (assuming something like filterport=8080 and filterip=yourinternalIP in “dansguardian.conf”). DansGuardian should have a path to communicate with its backend proxy with destination TCP port 3128 (assuming something like proxyport=3128 and proxyip=127.0.0.1 in “dansguardian.conf”, http_port 127.0.0.1:3128 in “squid.conf”, and no interfering IPtables or Shorewall rules or policies yet). And the DansGuardian filter system (actually the backend proxy portion) should have a path to communicate with the Internet with destination TCP port 80 and any source TCP port.

Squid Configuration Options & Option Names

Several Squid configuration options and option names are slightly different for 2.x series and 3.x series versions of Squid. For example while 2.x uses transparent, 3.x prefers intercepting. For another example log_uses_indirect_client in 2.x has no exact analogue in 3.x (although forwarded_for is closely related). Example Squid configurations may need to be slightly revised for the other series of Squid versions.

Permissions And Daemon User

OS permissions problems might occur in a new or modified DansGuardian installation. (Sometimes problems will start after an upgrade for no apparent reason; ultimately the problem will be traced to the upgraded DansGuardian daemon running as a different user.) Most frequently this happens with either the DansGuardian log files or with anti-virus scanning. When there's a problem with log file permissions, DansGuardian will usually issue a message something like: Error opening/creating log file. (check ownership and access rights). I am running as nobody and I am trying to open /usr/local/var/log/dansguardian/access.log.

Like almost all Linux daemons (services:-), DansGuardian will always run as the same specific user, no matter who starts it or how. Typically you launch DansGuardian as the superuser, then it demotes itself to a more specific user (after a bit of startup which probably includes reading its configuration file). [Of course you could run DansGuardian in the foreground rather than as a daemon (service), either with the nodaemon parameter in its configuration file or with the -N command line parameter; this is something you might do during deep debugging (but not in normal production).]

The DansGuardian daemon user gets set one of two ways:

  • hopefully the username:groupname is set in the configuration file
  • if not, the built in fallback is used without further change

During building DansGuardian, the user can be specified as anything you like with the ./configure parameters --with-proxyuser=… and --with-proxygroup=…. If your DansGuardian was pre-built, you can find out what username and groupname was compiled in by executing dansguardian -v. Often the builder didn't explicitly specify anything at all (rather than risking choosing wrongly:-), so the default value of nobody:nobody is often used. Most likely you will want to specify some specific user (perhaps different than the built in, perhaps just duplicating it) by setting daemonuser=… and daemongroup=… in dansguardian.conf. Explicitly setting the daemon user and group in the configuration file protects you against upgrade problems occurring because the new DansGuardian has a different username:groupname as its built in fallback. (Of course you could also change file permissions on your system to make the compiled-in value such as nobody:nobody work as is.)

(Note specifying the user in the configuration file can sometimes be confusing, as the configuration file itself is accessed before the user is reset. In the vast majority of cases this is not a problem, as usually DansGuardian starts out executing as a superuser so it has no problem accessing the configuration file. But if something goes wrong, keep in mind while figuring out the problem that DansGuardian reads its configuration file while it's still running as the original user.)

In many cases your distribution has already selected an appropriate username and permissions scheme, and there's little or nothing more you need do (so long as you adhere to their system). If not, you may wish to do any one of:

  • specify the same user as Squid (probably proxy:proxy)
  • specify some user that already exists on your system (maybe daemon:daemon)
  • specify a new user (such as dansguardian:dansguardian, which you may have to create with something like useradd)

On a customized system, to make things work you may need to tweak them yourself. You may need to do one or more of:

  • Extend group memberships in /etc/group (for example adding username dansguardian as another member of group proxy)
  • Provide the selected user with read access to everything under the DansGuardian configuration directory (ex: /etc/dansguardian) and in addition “search (-x)” permissions for all the directories (including the DansGuardian configuration directory itself)
  • Provide the selected user with write access to everything under the DansGuardian log directory (ex:/var/log/dansguardian) and in addition “search (-x)” permissions for all directories (including the DansGuardian log directory itself)
  • Specify that log files created during log rotation have the appropriate permissions and owner

Use your favorite administration tool or text editor or a GUI or commands like  chmod -R …+… …  and   chown -R …:… …  to do these things.

If you're using anti-virus scanning, optimally DansGuardian and clamd should run as the same user:group. The minimal requirement is somewhat less, as DansGuardian gives “group read” access to its temp files, so just clamd being a member of the DansGuardian daemon group is sufficient.

Don't Prematurely Lock Down

Specifying the loopback address rather than an interface address in squid.conf (http_port 127.0.0.1:3128) is part of preventing users from skipping around DansGuardian. It allows Squid to communicate only with DansGuardian itself, not with any end user computer.

Although this may be exactly what you eventually want, it can be very limiting during debugging. In particular it prevents end user computers from communicating directly with the Squid half of a DansGuardian/Squid system, something that can be very useful when isolating a problem.

During debugging you may need to modify your squid.conf. For example during debugging it may be prudent to change this line in squid.conf to simply http_port 3128.

Dueling Log Rotations

Normally log files are rotated on a regular schedule. Old log files are rotated out, then compressed, and eventually deleted. And new log files are started.

If you have a problem with this process sometimes working but other times not because the new log files are not always owned by the same user, most likely you have more than one mechanism trying to rotate the log files at the same time. Look in places like /etc/logrotate.d/dansguardian, /etc/syslog.conf, and DGBIN=`which dansguardian`;`dirname $DGBIN`/logrotate, and remove all but one logrotation method.

Phraselists Should Be Selective

Although the default weighted phrase list activations work okay initially, they seldom meet your specific needs all that well. The weighted phrase lists are categorized; you can easily turn whole sets of weighted phrases on or off to suit your particular environment. Do this by simply commenting or uncommenting individual lines in lists/weightedphraselist. (Insert a sharp [#] at the front of lines you want to deactivate, remove any sharps from lines that you want to be active.)

Activate the lines for all the categories you care about. But don't activate too many more lines than necessary (especially not those for languages none of your users ever access anyway). Because of the inevitable false positives even with sophisticated weighted phrase list scoring, every category you activate will block a few more legitimate web pages. If you uncomment all the weighted phrase categories, browsing the web is likely to become overly difficult. The solution? “Don't do that.”

DansGuardian Doesn't Automatically Start At Boot

The whole area of which daemons (services) are started at which time varies from system to system. If you want DansGuardian to start automatically at boot time, you may need to explicitly issue the appropriate commands yourself. For some guidance, see Starting DansGuardian Automatically At Boot.

Squid Logs Only Point At Localhost

Often the first hurdle encountered when going down a wrong path is that the Squid stub logs give the same source for all requests: IP 127.0.0.1 (“localhost” or “loopback”). When this comes up, it often makes more sense to back up a bit then go down a different path. (See Usage#1 in the Configuration/Usage portion of the Wiki FAQ.)

This issue often comes up in the context of Log File Analysis. Another common situation is when a Squid system is being replaced by a DansGuardian system, as although the local proxy is being used a different way for a different purpose, it's still named Squid. At least in the case of Log File Analysis, the real underlying problem is usually looking at the wrong logfile. In a combined DansGuardian/Squid system there are two logfiles. Complete information is in /var/log/dansguardian/access.log, while /var/log/squid/access.log is just a stub log that only contains information about web requests that were not preemptively blocked by DansGuardian.

Neither correct DansGuardian operation nor analyzing log files normally require recording the original source address in the Squid stub logs. If you nevertheless wish to do so, you can by setting some configuration options in DansGuardian (and maybe in Squid too).

Many Blacklists Actually Categorize Rathern Than Ban

Many “blacklists” actually categorize websites; they list all websites, not just bad websites.

With these blacklists you should ban only the categories you consider “bad”, rather than all the website categories. For example you probably don't want to ban the “homerepair” category, and depending on your environment you may or may not not want to ban the “mail” category.

To enable or disable blacklist categories, edit lists/bannedsitelist. Activate categories by deleting any sharp [#] at the front of line, and deactivate categories by inserting a sharp at the front of the line. (You should make similar changes in lists/bannedurllist and lists/weightedphraselist too if the category is available there [often it's not].) Most likely you shouldn't simply uncomment all the lines.

(Note some blacklists contain additional categories for which there's no predefined .Include line, such as the category “searchengines”. There's no predefined .Include line for these because it's unlikely anyone will ever want to ban them. But if you really do want to ban them, you're free to add your own .Include line.)

(Although typical DansGuardian installations only use one blacklist, you can set up Multiple Black Lists if you wish. DansGuardian's default configuration emphasizes convenience and so tries to match the single most frequently used blacklist, but DansGuardian configuration can be expanded far beyond the defaults if you wish.)

Squid Works By Itself, But Not With DansGuardian

When an end user computer accesses Squid directly, Squid sees the request source IP address as being that of the end user computer. But when the same end user computer accesses DansGuardian which then accesses Squid, Squid sees the request source IP address as being that of DansGuardian (probably 127.0.0.1). A Squid ACL that checks the source IP address against the local network will work fine for direct Squid access, yet fail when DansGuardian is inserted into the path.

If the relevant parts of your squid.conf look something like

acl localhost src 127.0.0.1/32      # define for later use
acl localclients src 192.168.0.1/24 # define for later use
...
http_access allow localclients      # allow LAN to web
http_access deny all                # default ACL end

try changing your squid.conf to something like this

acl localhost src 172.0.0.1/32      # define for later use (no change)
acl localclients src 192.168.0.1/24 # define for later use (no change)
...
http_access allow localhost         #<== add
http_access allow localclients      # allow LAN to web (no change)
http_access deny all                # default ACL end (no change)

It's As Though My "Site" Entries Weren't There

Bumbling the syntax of sites –such as including "http://" or “www” or a leading dot (per Squid)– will make DansGuardian act as though the entry weren't there. Here's an explication of the Domain Name System (DNS), followed by a few simple rules of thumb. Following the rules of thumb at the end of this item will probably solve your problem.

In DansGuardian terminology a “site” can be either

  • the name of one specific host
  • the name of a domain or subdomain (which contains many hosts)

There's no general way to know whether a particular name is a “host” or a ”(sub)domain”; they look exactly the same. Fortunately for the purposes of DansGuardian it hardly matters.

This applies especially to the entries in the …sitelist files and also to the first part of each entry in the …urllist files.

The pieces of a domain name are separated by dots and should be read right to left. Domain names form a simple strict hierarcy. The rightmost portion –sometimes referred to as the top level domain or tld– is the most general: for example org for all organizations in the U.S.. The second portion identifies the specific organization and is the part that requires some kind of registration: for example foobar.org is organization Foobar. All the other portions further to the left are the responsibility of Foobar itself; they are not the responsibility of the network cloud, IANA, NIC, or ICANN.

Here's an example. Suppose:

  • www.foobar.org is the specific host that runs a web server
  • foobar.org is both the main domain name and an allowable shortcut to www.foobar.org
  • yuck.foobar.org is a subdomain (controlled by the owner of foobar.org)
  • bake.foobar.org is another subdomain (also controlled by the owner of foobar.org)
  • ick.yuck.foobar.org and bletch.yuck.foobar.org are two specific hosts in a subdomain

Then in lists/bannedsitelist:

foobar.org		# disallow all hosts (at least 3) named *.foobar.org,
			#  regardless of whether or not they're in a subdomain
www.foobar.org		# DON'T DO THIS
			# disallow only host www.foobar.org if accessed by that name,
			#  yet allow access by the shortcut name foobar.org
bake.foobar.org		# disallow all hosts named *.bake.foobar.org
yuck.foobar.org		# disallow all hosts (at least 2) named *.yuck.foobar.org
ick.yuck.foobar.org	# disallow this specific host
bletch.yuck.foobar.org	# disallow this specific host

The above description can be collapsed to just these rules of thumb:

  1. Omit "http://"
  2. Omit “www”
  3. Omit any leading period
    (this may be different from some other software that won't work right without the leading period)
  4. Use the longest possible (i.e. most specific) entry that will work yet remain flexible
  5. If shorter entries already exist and they conflict with your new entry, first lengthen the existing entries (without making them inoperative)

Operation Under NetBSD/FreeBSD/OpenBSD Is Somewhat Unreliable

For years there has been a low level but naggingly persistent series of reports that DansGuardian doesn't run as reliably under OpenBSD as it does under Linux. Most users never see any problem at all  …but a few unlucky ones do. Occasional failure of a DansGuardian child process may be tolerable, as recovery is automatic and the jerky operation is visible to only a single user. However frequent failure of all (or at least most) DansGuardian child processes, or failure of the DansGuardian parent process, will not be tolerable. (Also see questions Installation#26 and Installation#26b in the Wiki FAQ.)

To put it as briefly as possible (perhaps oversimplifying), BSD-derived kernels may need to be tuned in order to obtain stable DansGuardian operation. If there are kernel issues, DansGuardian is likely to start up and run for a while but then fail with a segment fault (SIGSEGV), usually in an “impossible” location. The kernel should be tuned for “peak” conditions; if it's been closely tuned for “average” conditions (or worse tuned to “minimize” kernel size), unstable DansGuardian operation is almost inevitable! Performance monitoring tools such as top may be misleading, as the spikes of activity that affect DansGuardian are far shorter than even the tools' minimum sample time.

Fortunately even though the problem has not yet been completely pinned down, it's fairly well understood. The current OpenBSD kernel doesn't handle a couple of conditions as well as a typical Linux kernel. One of those problem conditions is sustained high load; the other condition is long-lived processes whose memory address space gets very fragmented (usually because they handle lots and lots of different small requests). The three common applications that are most likely to expose these kernel limitations are Apache (the web server), Squid, and DansGuardian.

The programmers behind OpenBSD are very aware of these problems, and keep fiddling with the problematic parts of their kernel. As a result, the exact failure symptoms can change considerably from one kernel version to another. The kernel itself might panic (thus shutting down the entire system), or an application may just hang, or an application may disappear without proper warning or notice, or an application may shut itself down after receiving more failure return codes than it can handle.

The chief problems seem to be i) an apparent shortage of memory because of massive address space fragmentation and ii) a lack of socket structures. It may also be the case that iii) there are not enough “file descriptor” structures. Recoding applications to better handle the known OpenBSD limitations does not seem to be a reasonable option, as it would probably both a) require the tremendous effort of a complete rewrite and b) just trade stability under OpenBSD for instability under Linux.

Such problems are performance-related; the faster an application runs, the less time the kernel has to cover over these flaws before they grow large enough to become visible. Since performance is continually improved in most applications, later versions of most applications expose worse problems.

Problems are often noticed right after an application upgrade. Administrators focus more attention on the application right after an upgrade. The system load level has likely risen over time, but slowly so nobody noticed. Newer application versions typically provide somewhat better performance, making it more likely the kernel will exhibit problems. And other applications and services may have been changed at the same time. As a result of all these things, it's easy to mis-conclude problems have something to do with a “bug” that was recently introduced into the application.

If you're one of the unlucky ones, you could of course either switch away from OpenBSD or learn to live with the occasional problem. But it's quite likely neither of these options are desirable. So what else can you do? Here's a thorough list of suggestions; most likely the first thing you should try is tune the kernel; simply tuning the kernel may completely resolve the problem. (Some suggestions mainly address high load, and probably won't help the memory fragmentation problem very much. Some suggestions mainly address process memory, and probably won't help very much if you suffer from frequent overloads.) Find the suggestions that fit your situation, and pursue them.

  • Upgrade your kernel
    Each kernel version seems to be an improvement over the previous one. (The problems may not be completely fixed yet though; problems have been reported on kernels at least as late as version 4.3 and perhaps later.)
  • Add RAM
    This helps the problem in three different ways. First, newer OpenBSD kernels (but not older ones) reconfigure themselves every boot depending on how much RAM they see; if there's more RAM, all the kernel configuration options are increased. Second, more memory allows applications to spread out a little more so problems don't become visible quite so soon. And third, more memory makes everything run a little faster, including the kernel which has a bit more time to repair small problems before they grow too large.
  • Purposely de-tune DansGuardian
    If DansGuardian doesn't handle requests quite as quickly, the kernel will have more time to cover its errors before they get out of hand. Whatever you did to improve DansGuardian performance, undo parts of it.
  • Cause DansGuardian child tasks to stop and restart more frequently (or alternatively less frequently)
    The idea behind stopping and restarting tasks more frequently is to reduce memory fragmentation and its subsequent problems. But restarting processes may not reduce memory fragmentation after all! So try this and see what happens. If there's no improvement, return the variables to their original values. Reduce minsparechildren, maxsparechildren, and preforkchildren. And reduce maxagechildren, perhaps to 300 or even 200. (Doing this will almost certainly have the side effect of reducing performance, perhaps noticeably.)
     Sometimes the opposite change of stopping and restarting child tasks less frequently will improve stability. So also experiment with increasing minsparechildren, maxsparechildren, and preforkchildren, and greatly increasing maxagechildren, perhaps to 10000 or even 50000.
  • Reduce the average load
    Maybe you can tune other things or provide other capabilities so your users don't hit the web quite so hard. Or maybe you'll just have to change your users' behavior  …if you can; if you're not sure, it may be better not to even try. Maybe users can be persuaded to drop their load just a few percent. But then again maybe they won't change and won't change and won't change until they suddenly drop out altogether (and even worse don't come back).
  • Cap the peak system load
    Use the DansGuardian maxips parameter to set a hard limit on how many computers can access the web at the same time. Set the number slightly lower than current peaks: high enough to not overly inconvenience users, but low enough to provide the desired reliability.
  • Manually tune the kernel
    This is frequently required for BSD-derived kernels.
     Perhaps all you need to do is increase maxusers (kern.maxusers ?) beyond 256, as most other parameters are connected to it. (Don't even do just this if you have a kernel that automatically adjusts maxusers at boot depending on how much RAM it finds.) If you want to get more detailed, consider increasing OPEN_MAX, or BUFCACHEPERCENT, MAX_KMAPENT, NKMEMPAGES, NKMEMPAGES_MAX and decreasing NMBCLUSTERS.
     If just tailoring maxusers doesn't produce satisfactory results, you may need to go further and tune individual items. Pay attention to the number of network socket structures, the number of file descriptors, and the number of tasks. Especially (and perhaps surprisingly) pay attention to the number of shared memory structures. DansGuardian/Squid uses significantly more “shared memory” than most other server applications, including the IPC communication between DansGuardian and its backend proxy and the IPC communication between the DansGuardian parent and child processes.
     Remember you're tuning for peak conditions (not average conditions). Even a performance monitoring tool that displays every second won't show conditions that last less than 100 milliseconds. Yet these short load spikes on an inadequately tuned kernel may be the main cause of DansGuardian crashing.
     Only change runtime values; do not rebuild a kernel (except as a last resort if you really really know what you're doing.) Rebulding OpenBSD kernels is no longer recommended (or even acceptable in most cases). Currently manually re-tuning a BSD-derived kernel often involves either the sysctl command or modifying the file /etc/sysctl.conf.
  • Use an older version of DansGuardian
    If you have a borderline case (DansGuardian doesn't fail very often, and just a very small improvement in reliability would be enough), an easy way to slightly de-tune DansGuardian may be to run an older version which does not include recent performance improvements.
     (Note this may not be reasonably possible. All 2.10.x.x versions of DansGuardian provide such similar performance that changing would not be worthwhile. And the 2.8.x.x versions of DansGuardian are now several years old.)
  • Add an auto-restart capability (for example a 'cron' auto-restart script)
    Have a script wake up every few minutes and check if DansGuardian is still responsive. If not, stop both DansGuardian and Squid then start them fresh (Squid first then DansGuardian).
  • Add a 'cron' periodic restart script
    Have a script wake up once in a while (say every day or every few hours), and forcibly stop both DansGuardian and Squid then start them again (Squid first, then DansGuardian) no matter what. This will ensure that there's a properly functioning DansGuardian most of the time even if the server is unattended. It may however dramatically inconvenience users by aborting their web connection once in a while.