The original of this document is stored at http://contentfilter.futuragts.com/wiki/doku.php?id=faq
- Technical FAQ
(part of DansGuardian installation package
These DansGuardian Wiki FAQs were updated in December 2009. They apply to the current 2.10.x.x versions of DansGuardian, which will probably be the only available version for the foreseeable future.
These DansGuardian Wiki FAQs are so closely related to the official ones they use the same names and numbering. However they have not been sanctioned by the DansGuardian team.
Complaint#1. Many DansGuardian systems started blocking (part of) my website at the same time for no apparent reason. What can I do about this?
Many DansGuardian systems subscribe to and use the same “blacklists” that categorize large numbers (millions!) of websites. If your website gets incorrectly categorized by one of these blacklists, many DansGuardian systems may start blocking it even though their administrators haven't taken any explicit action. Here's a lookup for the most common of these blacklists where you can type in the name of your website and see if this is indeed what has happened to you.
If your site appears on a blacklist, you need to follow the remove instructions for that list. DO NOT WRITE TO DANSGUARDIAN ASKING TO BE REMOVED. DansGuardian software development has no control over either the content or the use of any blacklist, and can't do anything about your request. (Note that even after your website is properly categorized by a blacklist, it may take several more weeks for the change to propagate to all the DansGuardians that use that blacklist.)
Complaint#1b. Many DansGuardian systems are blocking (part of) my website because they object to its content. But the content looks fine to me. How can I resolve this?
The easiest resolution to this uncommon problem is to tweak the content of your webpage so DansGuardian no longer finds it objectionable, perhaps by adding “good” phrases. Try to get a copy of the relevant log entry (or the access denied page) to find out which phrases DansGuardian is objecting to. And try to identify a web filter administrator who can test your tweaks. (Note DansGuardian uses a sophisticated scoring method called “weighted phrases” and doesn't just respond to a single occurrence of one particular phrase. See General#20 below.)
If you feel DansGuardian's standard/default list of words and phrases should be changed, you can submit phrase change requests at http://dansguardian.org/?page=contact. However even if your change is accepted it will affect only new DansGuardian installations; it will not affect existing installations.
Complaint#1c. A DansGuardian system is blocking (part of) my website. Why?
DansGuardian is not a service; there is no central organization controlling lots of web filters. Rather, different network administrators use the DansGuardian software tool in different ways. The only way to find out why any one particular DansGuardian is blocking your website is to contact the administrator of that particular DansGuardian. Most likely the administrator has configured erroneous additional or too strict phrase filtering, and should rectify their error.
Complaint#2. Suddenly DansGuardian is blocking my web access. But I never installed DansGuardian. What's going on?
Usually this happens for one of the following reasons:
- If you're using wireless, without your realizing it your computer has attached itself to some neighboring network rather than your own.
- Some virus or jokester has modified the connections:proxy-server settings in your browser options.
- Your ISP has applied strict filtering to your web access, likely because some child filtering you requested was applied to the wrong connection or at the wrong time or behaves differently than you expected.
- If you're using a school or work network, the network is tightening its restrictions. Find out the new policy. And discuss the specifics of your problem with your network administrator, as this specific block may be just a technical mistake.
- Assuming it's Linux/Unix-based (but not Windows-based), your new computer (or new software distribution) by default runs DansGuardian locally …although you may not have realized it.
Complaint#3. My web browsing was interrupted by a display that said “If you have any queries contact your ICT Coordinator or Network Manager.” So I clicked on the nearest link, which brought me here. Can you help?
You will need to contact your web filter administrator, perhaps by some offline method such as telephone. Whoever installed your DansGuardian forgot to personalize the “Access Denied” page to suggest to users how to handle problems. There's a nearby (but in fact unrelated) link on the default “Access Denied” page that leads to this general web filtering software, which of course has no way to address your problem.
Complaint#4. I used to email my homework to myself from home, then download it at school. But now I can't log in to webmail at all! What should I do?
Contact your network administrator and show them your problem. DansGuardian software is just a tool, used by different administrators for different purposes. Maybe your network policy is changing. Or maybe your problem is just an “accident” that can be repaired.
Complaint#5. I can't wait for the DansGuardian administrator
to allow access to a webpage I need right away.
What can I can do on my own to
access the website?
Check if your DansGuardian administrator has already provided a way to access websites right away. Study the “access denied” page closely, and ask other users if they know of a standard procedure. Try each of the following (in order).
- If your DansGuardian administrator provided a “bypass” link on the access denied page, just click it to acknowledge the DansGuardian warning and then access the page anyway. If your DansGuardian administrator provided on the “access denied page” some way to relax a site restriction, use it.
- If different classes of users have different filters (for example students and teachers have different web access), ask one of the more privileged users to try to display the web page for you.
- If you find out about some local procedure for accessing restricted pages (using a special computer, using a special identity, asking a librarian, etc.), follow it.
- Check if your DansGuardian administrator has authorized a deputy to allow page access when the administrator is absent, and if so contact that deputy.
- If nothing else works, and you can't reach any of the Information Technology staff, go somewhere else and use a different network with different restrictions.
(Trying to “hack” DansGuardian to access a website despite its restrictions thrusts you into a local cat-and-mouse game which you're not especially likely to win. Your DansGuardian adminstrator is probably very aware of the constant mutual escalation of hacks and counter-hacks, and has already implemented some local countermeasures.)
Complaint#6. When I try to browse the web with my new home PC, the content filter often objects with weird reasons like “Portuguese Pornography”. How can I tune the content filter to behave more appropriately?
A few new PCs (such as some Asus EeePC 900a's) come with a default mis-configuration. The problem you see might be only the tip of the iceberg, with many many other problems still submerged. (It's not known how extensive the mis-configuration is; it's not even known how to change it properly.) So although in theory DansGuardian can be tuned to behave more appropriately, in practice the only reasonable course of action seems to be to completely disable the DansGuardian content filter.
Complaint#7. When I try to browse the web with my new home PC, I receive nonsensical errors about bad content. What's really happening, and what can I do about it?
Many new PCs (such as the Asus Eee PC) give the buyer a choice of OS options: Windows or Linux. Most buyers apparently choose the Windows option, as the support channels focus there. It's true that DansGuardian currently can't run under Windows, so a support response that the filtering must have something to do with your ISP is understandable. However with the Linux option it's almost certainly wrong; it's almost certainly the case that your ISP does not have anything to do with the problem, and that DansGuardian is running locally on your new PC.
Be sure support understands you have the Linux option. If you're asking Linux- questions but the support channel is giving you Windows- answers, you can go around in circles for a long time without anybody realizing it, get very frustrated, and even accuse people of lying.
And check carefully the materials that came with your new PC for a Linux-specific (or even version-specific) support channel. Your Linux OS may have been added (or even overwritten) later by a reseller who offers their own support channel, but displays it less prominently than the manufacturer's support channel. It may be that only the reseller's support channel will know anything about the software they've installed (maybe even a different version of Linux!).
Complaint#8. I want to just completely disable DansGuardian on my new PC, but nobody else can give me any idea at all how to do so. Can you help me at all?
Either of these hacks, although not “clean” and likely not the “right” way to administer your PC, have been suggested to have the desired effect of disabling DansGuardian without messing up other applications too much:
Hack # 1 Press Ctrl-Alt-T to get a terminal sudo dpkg --purge dansguardian squid squid-common # Above will get rid of the offending packages + configuration. sudo rm /etc/policycontrol/firewall.conf # Above will get rid of the system forcing you through the (now missing) proxy. reboot the eeepc
Hack # 2 press control + alt + T to get a terminal cd /etc/dansguardian sudo nano dansguardianf1.conf find the line that says: groupmode = 1 change the one to a two so the line is now: groupmode = 2 save the modified file reboot the eeepc
(Doing this should cause DansGuardian to completely stop filtering. If instead you simply remove the DansGuardian package or executable, Firefox may stop working altogether. Apparently the proxy settings in FireFox are locked to expect to use DansGuardian, and only a “clean” procedure would unlock them.)
Complaint#9. Where might I obtain more support information about this problem?
Google is your friend. You might obtain some information from the distributor of your EeePC, or from something called “woot”, or from Asus, or from your OS distribution (see for example http://ubuntuforums.org/showthread.php?t=311089). The DansGuardian community knows almost nothing about the configuration of the Asus Eee PC (except that it's not generic DansGuardian) and so isn't able to offer much help.
General#1. I want differing degrees of filtering for different users. Can DansGuardian do this?
Yes. Multiple Filter Groups is a prominent part of DansGuardian and is frequently implemented. The effort invested now to set up Multiple Filter Groups will pay off handsomely later by simplifying yet enhancing ongoing administration. There is no other way except for Multiple Filter Groups to treat even just a few users (or computers) differently.
General#1b. I want some users to have access to “all” the web and some to have “no” access. How should I configure DansGuardian?
Extend your list of filter groups with an “all” group and a “no” group, then set up 'multiple filter groups' the normal way. With 2.10 a shortcut for these two filter groups, which both performs better and is easier to configure, is to use the groupmode configuration parameter in the relevant dansguardianfN.conf; that way you don't have to set a whole bunch of configuration lists just right. Usually the “no” filter group should be the default filter group f1, so unknown users get no web access at all until you firmly identify them.
(With the old 2.8 series, to configure a group with “no” access turn on blanket block [**, and maybe also blanket secure block: **s] in file 'bannedsitelist' for the group, and ensure all the 'exception…list' files for the group are empty [probably simply by setting each of them to /dev/null in the .conf file]. With the old 2.8 series, to configure a group with “all” access add a line that says simply .* into file 'exceptionregexpurllist', and ensure all the 'banned…list' files for the group are empty [probably simply by setting each of them to /dev/null in the .conf file].)
General#1c. Does DansGuardian use an ACL-like style of configuration?
Currently configuration of DansGuardian is done mostly with simple lists (not anything like “Access Control Lists”, for example as used in Squid). No DansGuardian feature is absolutely dependent on having any particular style of configuration though. (Some features may be either significantly easier to configure or significantly easier to code –or both– with certain styles of configuration. But having some other style of configuration does not simply make those features impossible.)
General#1d. I just want a “simple” difference in filtering (almost blanket block except one site, almost blanket exception except one site, all https:, all logins, only one computer, only one user, only one site, etc. etc.). Is there some other way besides the seemingly heavy-weight Multiple Filter Groups mechanism?
No. If different users (either computers or people) should be treated even slightly differently, you need Multiple Filter Groups. Period.
(If there are only two groups of filter rules, and one of them is either “all access” or “no access” with no exceptions, you may be able to fake the configuration by clever use of 'exceptioniplist' and 'bannediplist'. Otherwise you will need Multiple Filter Groups no matter what.)
General#2. When will NTLM authentication and Digest authentication be supported natively?
They already are. DansGuardian 2.10 supports both NTLM and Digest natively via its own authentication plugins. Of course like most user authentication methods, using NTLM or Digest authentication requires assistance from the local proxy, which should be configured to request and validate user credentials (auth_param and other Squid configuration options; maybe also “helper” modules, typically bundled with either the Squid release or a Samba release).
General#2b. Is a “sandwich” configuration necessary (or even recommended)?
Some earlier versions of DansGuardian provided NTLM and Digest authentication only via a “sandwich” configuration: User↔Squid↔DansGuardian↔Squid↔Internet (DansGuardian “meat” between two Squid “breads”, get it?). Such configurations are not necessary for either NTLM or Digest authentication with DansGuardian 2.10. Because they are more difficult to configure (disabling caching in the first copy of Squid, etc.), they are seldom even suggested.
General#3. Who are the developers?
See the Developers page.
General#5. Why does DansGuardian use some other backend proxy (Squid, Oops!, Tinyproxy, etc.)?
So DansGuardian does not have to re-implement web fetching, network timeouts, retrying, caching, password checking, etc. (The DansGuardian half of a DansGuardian/backend-proxy system is not really a proxy itself, it's more of a filtering pass-though.)
General#5b. Why is DansGuardian usually combined with Squid?
Squid seems to be the premier all-around OSS backend proxy for whole networks. Also, where web access is important enough to add filtering, speed is important too, so a caching backend proxy should be used. Squid fills the bill for a fast caching proxy.
(The relatively heavyweight Squid is most appropriate for use in filtering an entire network of computers [and for providing assistance with some Auth schemes]. For single-computer or single-user applications, marrying DansGuardian with some other backend proxy [Tinyproxy? Oops!?] may be more appropriate, for example providing better perceived performance.)
You must read the copyright notice if you wish to download this program.
General#7. Can I reverse the polarity?
If you mean, make DansGuardian filter out all the non-pornographic, non-profane sites to focus on only the real hardcore, then the answer is No.
But if you mean a “whitelist” style of operation, then the answer is Yes. Enable “blanket block” by un-commenting the settings ** and **s in the list file(s) bannedsitelist. Then add the sites you want to allow to greysitelist (or exceptionsitelist). (For blocking sites specified by IP address rather by than domain name, you probably wish to un-comment **ip as well as **, and **ips as well as **s.) Note that –despite what you might expect– the relevant settings are in a list file, not any conf file. When you use the “whitelist” style of operation you never need a “blacklist” subscription.
General#8. Which PICS rating services does DansGuardian support?
DansGuardian supports rating services by watching for telltale phrases rather than by using .rat files. So adding more rating services is quite simple. Currently the rating services already set up in the default configuration are:
- Microsystems Software's Cyber Patrol CyberNOT List
- SafeLabel (The Safelabeling system is a subset of the META name rating system.)
(For upward compatibility, a few remnant capabilities for specifying various PICS filtering numeric parameters are still present in DansGuardian 2.10.x.x. But operationally they have been replaced by the “watching for telltale phrases” method, which is significantly easier to configure and use. These numeric parameters no longer do anything.)
General#9. Does DansGuardian check images (pictures) as well as words for pornography?
No, as doing this would take a lot of processing power (but it's a possible future feature). One way to do this might be to decode the image file and scan for high percentages of skin tone (I'm not sure how it would work for black or asian people who have a much darker skin tone though…). Another approach is to “bolt on” a separate tool, using the same content scanning mechanism as is used for anti-virus checking. If anyone wants to code this or point to a usable tool or point to a workable algorithm, contributions are welcomed.
General#11. What's the mailing list for?
All aspects of generic DansGuardian: Installation help, Usage help, GUI, Troubleshooting, Bug reporting, Suggestions, Offers of help, etc. Unlike with much OSS software, the generic DansGuardian mailing list is not de facto focused on just those running the latest versions of DansGuardian and/or building DansGuardian from source.
General#11b. Where should feature requests be submitted?
Feature requests are best added here: https://sourceforge.net/tracker/?group_id=131757&atid=722101
General#12. Are you against porn or something?
No. I am pro-free speech. I am anti-censorship. I am pro-classification. Nothing should be banned totally - ever. Everything should be classified so only what is appropriate can be viewed. I do not just mean web pages. I mean everything. DansGuardian applies classification where needed to web pages. An adult individual at home has every right to read, view and say what ever they want. A child in a school or library does not have this right. This is what I think. If you disagree - great; everyone is entitled to their own opinion.
General#14. Why is DansGuardian called 'true' content filtering?
Once a webpage URL passes pre-checks and is fetched, DansGuardian actually scans its content word by word against the configured phraselists. Some commercial web filters call themselves content filters when they are not - they are just glorified URL filters. They are lying through their teeth. People who don't realise this waste a lot of money on them.
General#15. Can DansGuardian do anti-virus filtering?
Yes. A standard feature in version 2.10 allows anti-virus checkers (and other content scanners) to be incorporated into DansGuardian. (Versions 2.8 and earlier required “the DGAV patch” to do anti-virus scanning. No patch nor special build or distribution is needed to use anti-virus scanning with version [2.9]2.10.)
General#15b. Are infected files quarantined or deleted?
With the newer method of incorporating anti-virus checking as an external scan (clamdscan), whether or not infected files are copied to quarantine is controlled by the configuration of the anti-virus scanning tool, not by DansGuardian. (With the DGAV patch and with the older integral [clamav with linked in ClamAV] method of including anti-virus checking, infected files were deleted.)
General#16. Is DansGuardian CIPA compliant?
Becoming CIPA compliant is a multi-step procedure that involves creating written policies, having open meetings for public input, technology planning and applying a technology protection measure. DansGuardian meets all of the requirements necessary for implementing a technology protection measure. If a school/library only installed DansGuardian and did none of the other steps, they would not be CIPA compliant, so technically, software cannot be labeled CIPA compliant. Legally, it is a significant distinction.
(Written by Tamara Georgick, Technology and Training Consultant, Washington State Library).
Also see this message on the subject.
General#17. Where can I get more information?
- There are a large number of informative comments in the default list and configuration files, which effectively act as a reference guide. (The comments in the DansGuardian default list and configuration files may be your best source of information; get used to looking in this different place, something you may not be used to doing with other software.)
- A thorough detailed technical FAQ (different from this document) is installed along with the DansGuardian binary. By default it will be found in ”/usr/local/share/doc/dansguardian”.
- Some information on implementing various types of plugins (authentication, AV scanning, and download management) is installed in the same location as the detailed technical FAQ.
- There is a community-maintained wiki at http://contentfilter.futuragts.com/wiki
- Contact other DansGuarian users via the DansGuardian users public mailing list (http://tech.groups.yahoo.com/group/dansguardian/).
General#18. Can DansGuardian assign requests to different filter groups depending on other information, including group membership, IP address or address range or subnet, or LDAP entries?
Yes. Assignment to one of the multiple filter groups can be done many different ways, including by web browsing username, OS logon username, indirectly by OS group membership, by LDAP database entries, or by unchanging IP address, IP address range, or IP subnet. (Several of these choices are provided via the “IP” authentication plugin, which assumes unchanging [for example static] IP addresses.)
General#19. What exactly does DansGuardian check? Locations? Neighborhood? Search Terms? Words on the Page? Pictures on the Page? File Categories? Downloads? Uploads? Other Actions? Sequence of Requests?
DansGuardian separates out download and upload requests and handles them however its configuration specifies, typically allowing some actions and denying others, probably depending on the filename extension or its MIME type. All other individual web requests/transactions are subjected to DansGuardian's two stages of filtering:
- In the first stage (sometimes called triage or pre-filter), the URL is vetted against configuration lists to see if it should even be accessed. If not, the request is rejected immediately; if so, the web content is downloaded into DansGuardian but not yet made available to the user.
- In the second stage, DansGuardian uses its weighted phrase list algorithms to scan the word combinations on the fetched page. If not too much objectionable content was found, the page is then delivered to the user.
DansGuardian does not do any of these directly:
- vett web search terms
- check whether the URL is “near” objectionable URLs
- scan pictures routinely
- block whole categories of files regardless of name or MIME type (for example “email attachments”)
- link requests together into “webpages”
- block particular actions other than upload/download (for example “chatting”)
- link sequences of requests together (for example “web searching” followed by “playing a game”)
- scan parameters inside POST requests other than upload/download
Although DansGuardian doesn't do these filtering operations directly, it's usually possible to get the desired net result anyway by clever configuration (sometimes utilizing unobvious bits of information from the log, sometimes using View:Source to examine HTML, occasionally crafting rather hairy regular expressions, and once in a while even examining packets with a network sniffer program). However, most such hacks need to be re-created for each particular website, as they seldom transfer to similar websites very well. And it's common for such hacks to stop working whenever the website is reorganized, something most websites do at least every couple of years if not more often.
General#20. The center of DansGuardian seems to be its weighted phrase filtering. What does “weighted phrase filtering” really mean?
DansGuardian scans all the words on each webpage, looking for matches against its list of “phrases”. (In DansGuardian a “phrase” can be any sequence of one or more words, for example “Online Casino”.) Each time a phrase is found, it's associated score is added to the page total. In other words, if the same phrase is found more than once, its associated score will be added to the page total again every time it's found. If it's a “good” phrase, the associated score will be negative, so it will effectively be subtracted from the page total. This allows one phrase (for example “breast cancer”) to offset another that would otherwise nix the page. The page total is then compared against the naughtynesslimit you configured; the page is accepted if its total score is lower and is rejected if its total score is higher.
DansGuardian comes with long lists of pre-configured scored phrases which are already broken into categories. You can choose to enable or to disable each category of phrase by simply un-commenting or commenting-out the .Include reference to it in file weightedphraselist. (You could also modify or add your own phrases and scores, although doing this is not so common.)
General#21. I want my web filter to behave differently at different times of day. Can DansGuardian do this?
Yes, the simplest way to have the filter behave differently at different times of day is to use the time limiting syntax inside most list files. This method does not require restarting DansGuardian at all …not even a “soft” restart with -g. Specifying times inside most list files will allow those files of filter restrictions to be made active only at certain times of day. Comments within most list configuration files provide a guide for adding time of day restrictions to that particular file. (If there are no comments about time limiting syntax inside a specific file, the behavior is not supported for that particular file.)
The time limiting syntax within a file is applied to all items in that file; there is no syntax for limiting individual items. So the conventional procedure is to add .Include statements to a list file which point at newly created additional list files which each contain their own time limits. For example, two newly created list files might be one for “sites that are always banned” and another for “sites that are banned only during business hours”. (The former contents of the existing base list file are usually distributed into the new list files.)
(Note this method may not be suitable for changes to items in a conf file, may not work for more than one time block per day, and may not be the best way to handle massive changes affecting many list files simultaneously. See the items below for other alternatives.)
General#21b. I want to use different list files (not just turn them on or off) at different times of day. How can I do this?
One option is to use DansGuardian's .Include to split a list into more than one file. Then use clever settings of the “time” instructions within each file to enable one or the other (or both or neither) at different times of day.
A second option is to have the conf file point at a symbolic link which OS tools (rm …, ln -s …, etc.) manipulate before restarting DansGuardian. The symlink change is generally performed as part of a 'cron' job (one launched automatically at a certain time of day) that both changes configuration and restarts DansGuardian.
(A 'cron' job can also restart DansGuardian with a different configuration altogether, one that points at different list files. Use the -c … option to point DansGuardian at a different configuration [rather than the default one]. [For this option you may need to figure out how to specify options such as -c … through whatever mechanism your distribution uses to start services (daemons).])
The major factor in choosing which mechanism to use is your own comfort level. Some administrators find 'cron' jobs drop-dead simple, while other administrators are allergic to 'cron' jobs and much prefer the convenience of time-of-day instructions inside DansGuardian list files.
General#21c. I want to turn a whole bunch of list files on and off all at once. What's another way to do this?
Set up a 'cron' job (one launched automatically at a certain time of day) to stop DansGuardian, then restart it with a different configuration that points at the alternate list files. Use the -c … option to point DansGuardian at a different configuration (rather than the default one). (You may need to figure out how to specify options such as -c … through whatever mechanism your distribution uses to start services [daemons].)
General#21d. I want to change one or more items specified in a conf file at different times of day. What's a way to do this?
Again, set up a 'cron' job (one launched automatically at a certain time of day) to stop DansGuardian, then restart it with a different configuration. Use the -c … option to point DansGuardian at a different configuration (rather than the default one). (You may need to figure out how to specify options such as -c … through whatever mechanism your distribution uses to start services [daemons].)
General#22. Can DansGuardian assign requests to different filter groups depending on MAC address?
No. DansGuardian as distributed does not use MAC addresses anywhere, including in any of its authentication plugins. (You could in theory code your own authentication plugin using MAC addresses, but it would be quite restricted.)
MAC addresses of browsing clients are only visible in the one network “hop” that terminates at the browser itself. MAC addresses are not carried end-to-end by IP, nor are they carried by the HTTP protocol. Thus no similar application could possibly reliably use MAC addresses. (The Squid capability, which at first seems to be an exception to this rule, in fact breaks down completely and becomes unusable in any network large enough to contain “level 3 devices” such as “IP routers”.
General#23. Can I combine auth methods to assign requests to different filter groups by a combination of username and IP address?
Probably not usefully. Using more than one authentication method is focused toward environments of mixed user OSs (some Windows, some Linux, etc.), and is rather unusual otherwise. Attempts to use two auth methods simultaneously raise issues of precedence and conflict which probably won't be handled satisfactorily.
General#23b. More generally, can I intelligently chain auth methods so each hands off to the next one in an order I specify? In other words can I set up some sort of “fallback” mechanism with multiple auth methods?
Not really. Auth methods are indeed used in the order you specify and do indeed fall through to the next method on failure or mismatch or incompletion. But these rather limited capabilities aren't enough to compose intelligent chaining, partly because some auth methods (BASIC, Ident, and IP) are “one-pass” while others (Digest and NTLM) are “two-pass”.
(Even the seemingly straightforward case of identifying people if possible, then simply falling back to identifying the computer if the person is unknown, may require modifying source code just to make it work at all. And even then, it will probably mis-handle mistyped passwords so badly that it's effectively unusable.)
General#23c. What combinations of auth methods do work reliably?
There is one case of chaining two auth methods together that does work reliably. The first auth method must be “one-pass” and must use only computer-supplied information (no chance of a user ever mistyping or delaying a password) and must always provide the same answer immediately every time it's repeated (the second auth method is not so restricted, for example it may be “two-pass”) and must be configurable in Squid too.
If the first auth method identifies the user or computer, that identification will be used to assign the session to a filter group. But if the first auth method cannot identify the user or computer, the session will be handed off to the second auth method, which will be expected to identify all remaining users or computers, so that every session can be assigned to a filter group. You can use this capability for example to assign all computers on one IP subnet to a filter group immediately, while requiring all users on all other computers to provide credentials acceptable to the local proxy (for example a “Squid helper program”).
The Squid configuration must exactly match the DansGuardian auth configuration, something that's straightforward for one auth method but may appear more complicated for two auth methods. Also where IP is one of the DansGuardian auth methods, you will need to pass the original client IP address through to Squid (otherwise all requests will appear to come from 127.0.0.1). This specific example authorizes all the computers on subnet 192.168.3.x immediately, then requires all other computer users to enter their credentials.
... forwardedfor = on ... # Auth plugins authplugin = '/etc/dansguardian/authplugins/ip.conf' authplugin = '/etc/dansguardian/authplugins/proxy-basic.conf' ...
# list of allowed IPs 192.168.3.1-192.168.3.254=specialips
... follow_x_forwarded_for allow all # probably already on by default ... auth_param basic program ... auth_param basic children ... auth_param basic realm ... auth_param credentialsttl ... auth_param basic casesensitive ... ... acl password proxy_auth REQUIRED acl specialips src 192.168.3.0/24 ... # note the order of the following two lines, # specialips first here to match IP auth being first in dansguardian.conf http_access allow specialips http_access allow password
General#24. Can DansGuardian log what would have been blocked, yet not actually block anything?
Yes. In fact you can do this separately for some filter groups while other filter groups operate normally. Going through all the motions but not actually blocking anything is sometimes called “stealth” mode or “learn” mode (or “surveillance” mode). To enable it for a filter group, simply configure reportinglevel = -1 in that filter group's dansguardianfN.conf. Information about what would have been blocked is written to the DansGuardian log. It will be summarized by log analysis programs just as though pages had actually been blocked.
General#25. Can DansGuardian allow a user to proceed despite a block?
Yes, DansGuardian provides a “bypass” feature. (Note though filtering different users differently with 'multiple filter groups' often pretty much removes the need to use this “bypass” feature.) DansGuardian's style in handling a bypass is “click to acknowledge”; an additional click is required, essentially asking “are you sure?” Simply requiring an additional conscious decision this way might meet your public access requirements.
DansGuardian can be set up to provide its bypass option just with its regular bypass and bypasskey configuration options, and it can deliver both the blocked message and the bypass option to the user at the same time simply via the -BYPASS- variable in the regular HTML “access denied” template. Like “stealth” mode, the bypass functionality can be enabled (or disabled and made invisible) separately for each filter group.
General#25b. Can the DansGuardian bypass capability be protected by a password?
Yes, DansGuardian provides two different ways to require a password to use its bypass capability.
- The newer way uses 'multiple filter groups'. Typically multiple filter groups requires a password when a web session is first started in order to select the right filter group; then it's not necessary to re-enter the passsword every time a bypass is desired. (You can present the bypass option on the “access denied” page to only some of your filter groups and not others.)
- The older way uses a CGI script on a separate website to generate the required “hash”. The older way may still be appropriate if you're very concerned about user anonymity and if your users don't already have their own userid and password.
General#26. How does DansGuardian handle media streams such as Internet telephone calls or real time video?
Once they are established, media data streams mostly use different ports than the web, and so don't go through DansGuardian at all. Their establishment though often uses the web. So by controlling whether the establishment pages are allowed or blocked, DansGuardian can often control whether media data streams get established in the first place.
General#27. Is there a mailing list for generic DansGuardian? What are appropriate subjects?
Yes, there's a DansGuardian mailing list. Any issue that a generic DansGuardian administrator might have is appropriate. Any issue a user of the Webmin DansGuardian module might have is also appropriate. Unlike many mailing lists, queries are welcome even about pre-built or older versions; the DansGuardian mailing list is not “geeks only”. (However distribution-specific queries are usually better targeted to a forum about that distribution; queries about DansGuardian running on any distribution that adds its own configuration tools for many applications [not only the OS] should be directed to that distribution.)
Whatever you send to the mailing list address is repeated to every one of the thousands of members of the mailing list. If you look closely at received messages, you can tell they came via an email list. But if you don't look carefully, you can mis-conclude you're getting email from people you never heard of.
The idea behind a mailing list is you as a reader will sometimes have the same question as a poster. You'll benefit from reading the answers even though you didn't post the question yourself. Furthermore, even eavesdropping on questions and answers that aren't immediately applicable to your situation will (hopefully) be educational.
General#28. What are the mechanics of the DansGuardian mailing list? What “gotchas” are there, and how can they be avoided?
- Only list members can post to the list (most mailing lists work this way these days). The most common reason why all your posts disappear is your email address is not subscribed to the list.
- Choose the once a day “digest” summary distribution option if you don't expect to respond hardly at all.
- In your email interface program, specify a rule that diverts everything with [dansguardian] in its subect to some alternate inbox you've created. And instruct your email interface program to sort messages in the new folder “by thread”.
- Always use the same email address when posting. And either post from only one computer, or use some kind of Webmail.
- An easy way to avoid potential problems (appearing to be a “junk” source, getting stuck in SPAM purgatory, etc.) is to create a Yahoo webmail account and use it just to subscribe to the mailing list.
- When you post a question, start the new email conversation with a brand new 'Subject:'; don't just “reply” to some previous posting.
- When you “reply” to an email, snip out the parts of the reply quote that aren't important or aren't very useful.
- Reply only very seldom, and only if you have something to add to the conversation. If you don't know the answer to the poster's query, just stay hidden (don't reply “I dunno”; if everybody did this, all the unhelpful responses would drown out the next question).
- If you became a member of the mailing list, but then had second thoughts, to unsubscribe either click on the “unsubscribe” hyperlink at the end of each message or follow the directions at the bottom of the mailing list web page. Do not post your unsubscribe request to the mailing list itself. It won't work. Worse, it will annoy all the other mailing list members (and may get you categorized as a hopeless noob too.)
- The mailing list is about generic DansGuardian. Queries about any distribution-specific configuration tool should be directed to that distribution rather than to this mailing list.
General#29. Does DansGuardian support HTTP 1.1/Persistent Connections?
Let's break down the question:
- Does DansGuardian support persistent connections? YES
- Does DansGuardian support all aspects of HTTP 1.1? NO
- Does DansGuardian convert browser HTTP 1.1 requests into webserver HTTP 1.0 requests? YES
- Does DansGuardian appear to webservers to be an HTTP 1.1 client? PROBABLY NOT
(The popular interchangeability of the terms “HTTP 1.1” and “Persistent Connections” is a somewhat ambiguous over-generalization.)
HTTP 1.0 connections are assumed to be non-persistent unless they specifically request persistence with the Connection: Keep-Alive (or Proxy-Connection: Keep-Alive) header. HTTP 1.1 connections are assumed to be persistent unless they specifically decline persistence with the Connection: Close (or Proxy-Connection: Close) header. DansGuardian downgrades all connections to HTTP 1.0, because there are various features in HTTP 1.1 (chunked encoding, byte range retrieval) which it either doesn't support or which could potentially present a way around the filter, but does its best to preserve persistent connection support, marking any HTTP 1.0 request as requesting persistence if it was translated from a normal HTTP 1.1 request.
General#30. Can DansGuardian limit the amount of time each user can websurf each day? For example can I configure DansGuardian to restrict users to only one hour of browsing each day?
No, DansGuardian does not track nor use time-per-user. (You may though be able to combine DansGuardian with some other software.)
Although DansGuardian can also be used on a single system to filter web content for just that one system, it's commonly used for filtering web content for an entire network with many systems and many (visiting) users. In the whole-network context, the concept of time-per-user is less relevant.
General#31. Which languages and character encodings does DansGuardian support?
DansGuardian is capable of supporting all alphabets, languages, and character encodings equally. Its processing is of simple sequences of raw bytes; it remains ignorant of the actual alphabet and/or language and/or encoding used by either any particular web page or any particular phrase configuration file.
(Note although many users choose to not go beyond the generic “default” phraselists, DansGuardian functionality is not limited to them. Currently the generic “default” phraselists are skewed toward a Roman alphabet, the English language, and a Latin1-style character encoding.)
Configuration may involve uncommon text editor functionality, incorrect or abbreviated displays, additional .Include statements, character encoding translation programs, environment settings, and the system default locale setting. You should thoroughly understand DansGnardian operation –in particular its interactions with language and encoding– before attempting to extend the configuration to new web environment.
Installation#1. What are some common installation problems that can cause DansGuardian to appear to not filter and/or not log?
- Flush browser caches. Redisplay of previously accessed webpages from a browser's cache can be extremely confusing. Always click the browser's “refresh” button (while holding down CTRL?) to force a reload before concluding anything.
- Every time you modify any of DansGuardian's …conf or …list or template… (or other language) files, stop and restart DansGuardian so it reads all your changes.
- Test from an end user computer, not from the DansGuardian or Squid or gateway or firewall computer.
- There should be a communications path through various ports all the way from the user's browser to the Internet. DansGuardian should be listening (on port 8080) and accept connections from browsers (redirected from port 80 if transparent-intercepting). The other side of DansGuardian should then connect to Squid (via port 3128). Squid should listen to DansGuardian (on port 3128), and the other side of Squid should reach the Internet via port 80.
- Very good DNS (Domain Name System) service should be provided to both all the end user computers and the DansGuardian/Squid computer.
- Browsers should connect to DansGuardian. In explicit-proxy configurations, the user's browser should have its proxy setting pointing to port 8080 on the DansGuardian computer. In transparent-intercepting configurations, the user's browser should not be set to use a proxy at all, rather a Shorewall/IPtables rule on the gateway/firewall should redirect the network traffic from port 80 into port 8080 on the DansGuardian computer (often the same computer).
- Comment-out or remove any example IP addresses from the files bannediplist and exceptioniplist. If an example is present and it happens to match your own IP address, DansGuardian operation may be puzzling.
Installation#2. Why doesn't the Ident method of authentication for multiple filter groups seem to work in transparent-intercepting configurations?
Ident authentication does work in both explicit-proxy and transparent-intercepting configurations. Ident authentication can sometimes be a bit tricky to install and configure, and so contribute to the impression that it simply doesn't work. For example, even one internal network infrastructure component that doesn't pass network traffic via port 113 can cripple Ident authentication. If you're having trouble with Ident authentication, it may help to look at the Ident instructions in the doc wiki or posting #885 in the mailing list archives.
Installation#2b. Why do my web users experience very long delays and/or get mis-assigned to the default filter group when I try to use the Ident method of authentication?
In order to successfully use the Ident method of authentication, your entire internal network must pass port 113 traffic. Commonly the default configuration of firewall software on each individual computer causes a problem; in particular the firewall built into Windows by default blocks port 113 traffic to the local computer. Port 113 must be opened in every local firewall in order for you to use the Ident method of authentication successfully. (You may also need to open port 113 in your network infrastructure devices, especially “routers”.)
Installation#4. What should be a key consideration when sketching out a DansGuardian installation?
Key consideration: Do I want a transparent-intercepting or an explicit-proxy style configuration?
It can make a big difference in network operation, and it's not so easy to change later. Some of the configuration options for one style don't do anything in the other style environment.
Transparent-intercepting environments protect the whole network all at once. Computers (even unknown ones and mobile ones) can casually move on and off the network with no reconfiguration, and they can use uncommon (or even multiple) web browsers with no special configuration. However secure/https/443 web traffic (and web traffic using ports other than 80 too) cannot be filtered even minimally. Explicit-proxy environments protect each computer individually. Even secure/https/443 connections can be controlled to some extent. However every web browser will require “proxy” settings. Fortunately, this can sometimes be accomplished with PAC or WPAD, or a more general Managed Browser Settings scheme such as Group Policy.
(Conventional wisdom may be that the most significant problem with explicit-proxy environments is that unknown and mobile computers casually joining the network can appear to have reached a “dead end”; web browsing won't work and the user will have no idea why or what to do about it. In fact this is wrong; a ”network billboard” can greatly reduce or even eliminate “dead end” syndrome.)
Installation#5. Which machine should DansGuardian/Squid be installed on?
For a single user system, DansGuardian/Squid should be installed directly onto the single computer. For a network of users, DansGuardian/Squid is often installed on the gateway/firewall machine itself. DansGuardian/Squid can also be installed on a separate server, rather than directly on the gateway/firewall server. The separate DansGuardian/Squid server can either be on the LAN itself (probably near the gateway/firewall), or in a DMZ (De-Militarized Zone, i.e. half in/half out).
The slight additional complication of using a separate server may be necessary for any one of several reasons:
- limited gateway/firewall hardware cannot support the additional processing
- the gateway/firewall is a closed system
- policy or very high security concerns disallow running anything directly on the gateway/firewall
- a very large network requires more than one DansGuardian/Squid machine
Installation#6. Can DansGuardian filter the content of https: (encrypted/443/ssl/tls) traffic? Can it even control https: access to websites?
→ Access control depends on which configuration family you've chosen. ←
In explicit-proxy environments, DansGuardian uses its configured lists of sites (bannedsitelist, exceptionsitelist, blacklists) to vett connections for both http: and https: traffic (provided the https: traffic goes through DansGuardian). However the URL path and the content are encrypted so they cannot be analyzed (or even logged). In other words …urllist, …regexpurllist, and weighted… do not apply to https: traffic, not even in these environments.
In transparent-intercepting environments, https: traffic doesn't go through DansGuardian at all; DansGuardian doesn't even have access to the website name. (This is a fundamental restriction, not something that can be “fixed” - redirecting port 443 traffic into DansGuardian won't work.) DansGuardian options that control https: filtering (for example “Blanket SSL Block”) have no effect at all in these environments.
(Inability to look inside encrypted traffic is a generic restriction, not something specific to DansGuardian. After all, if some man-in-the-middle could intercept and analyze the traffic [and see your credit card number], it wouldn't really be “secure”, would it? Currently although there are a few commercial products that begin to address this issue, no open source software can scan encrypted content.)
Installation#6b. Why in techspeak can't https: traffic be transparently redirected through DansGuardian?
Sometimes it doesn't seem to make sense that https: traffic can't be transparently redirected through DansGuardian/Squid. Why should it make any difference whether traffic was sent to some port by the browser or was redirected to that port by an IPtables rule? In fact it does make a difference; here's a detailed inside explanation of why:
The browser behaves in a fundamentally different way when contacting an https: site through a local proxy such as DansGuardian/Squid, because it makes a small, unencrypted request to the local proxy itself, asking the proxy to open up a “tunnel” to the site for it to send encrypted traffic through. But in order to do this, the browser has to know that a local proxy is in use, which means the proxy settings have to be configured directly in the browser. Redirecting port 443 at the firewall level won't work, because DansGuardian/Squid won't understand the already-encrypted traffic which gets redirected to them.
Installation#7. Building DansGuardian seems to require setting up a full development environment (development versions of the standard libraries, Zlib, PCRE, etc. etc.), something I don't have to do for other software. Is this really necessary for DansGuardian?
Probably not. Official (or unofficial) pre-built DansGuardian packages are now available for most distributions. Check your distribution's archives (and other archives) before building DansGuardian yourself. As with most OSS software, building the software from the source obtained from the software's website is most relevant to “alpha” and “beta” testers, distribution maintainers, adventurous administrators, and others who want the very latest version.
(Note that in some cases distributions offer different versions they label things like “stable”, “experimental”, etc. If their “stable” version is 22.214.171.124 and their “experimental” version is 2.9.8.x or later or 2.10.x.x, ignore the distribution's outdated labels [the words don't really mean what they seem to mean] and fetch the newer version.)
Installation#8. I want to administer my DansGuardian via a GUI. Is a GUI available? Where can I get help with it?
A Webmin module (separate from the DansGuardian package) is available for GUI administration of DansGuardian. First install the Webmin framework for your distribution (if your distribution does not provide a package, get it from either www.webmin.org or www.sourceforge.net). Then fetch the “DansGuardian Webmin” module (if you distribution does not provide a package, get it from www.sourceforge.net). (Get the version marked “devel” for use with 2.9/2.10, as the version marked “stable” is almost certainly outdated, and the usual version mismatch issues will not arise.) To install it, use the “Webmin Modules” interface inside Webmin itself to add the DansGuardian module you just fetched.
(You might need to use the Webmin DansGuardian module itself to adjust some of its paths before it will come up all the way.)
The current version of the DansGuardian Webmin Module simplifies using Multiple Filter Groups. It will do most of the setup for you automatically. And it makes modifying one filter group or a set of filter groups straightforward. The DansGuardian Webmin Module also includes one of the best tools for producing reports on the activity of DansGuardian. Unlike most other uses of Webmin, this “Log Analysis” does not have to be run on the same machine that runs production DansGuardian, and it does not have to be run interactively.
Generally questions about the Webmin DansGuardian GUI are handled on the regular DansGuardian mailing list. (Note the Webmin GUI is for all administration including installation setup chores, but not for the initial DansGuardian software installation itself.)
Installation#9. Can DansGuardian be integrated with real-time external/remote verification services (similar to the RBL lists for email)?
Usually performance considerations make external URL verification impractical. If you think your badly outdated blacklists sometimes cause problems, upgrade your subscription to revise your blacklists more frequently. An exception to the rule of avoiding external services is to incorporate OpenDNS into your DNS lookups. Some of its protections -for example those against phishing attacks- can be usefully combined with DansGuardian without introducing significant performance or maintenance problems.
Installation#10. What userid/groupid does DansGuardian run under? What file permissions does DansGuardian need?
DansGuardian runs under whatever userid/groupid is specified by daemonuser and daemongroup in dansguardian.conf. Often these specify the same userid/groupid as Squid uses. The specified userid and groupid must exist; you may need to create a new userid or groupid with your usual administration tools (for example useradd and groupadd). If nothing at all is specified, DansGuardian's fallback is to run under the userid and groupid that were compiled into the executable, which is often nobody/nobody (or nouser/nogroup). Whatever userid and groupid DansGuardian uses should be able to access
- The existing DansGuardian log files (perhaps everything under /usr/local/var/log/dansguardian/)
- All the new DansGuardian log files that will be created by log rotation
- The DansGuardian configuration files (perhaps everything under /usr/local/etc/dansguardian/)
(There is some disagreement about whether nobody/nobody is just a safety fallback value indicating that daemonuser= and daemongroup= have not yet been set when they should have been, or is a real value that should be made to work. In any case, if your distribution supplies a pre-built DansGuardian and says to not use daemonuser= and daemongroup=, follow their instructions.)
Installation#11. How do I give my users completely unfiltered access to local websites?
The best way to allow unfiltered access to a local website is to not have the traffic enter DansGuardian in the first place. For transparent-intercepting environments this will happen automatically (so long as the web server is directly accessible over the LAN without going through the DansGuardian gateway). For explicit-proxy environments a setting in each browser named something like “Bypass proxy server for local addresses” should be checked (it probably already is).
Installation#11b. How do I give my users completely unfiltered access to a specific remote website?
The best way to allow unfiltered access to a specific non-local website is again to not have the traffic enter DansGuardian in the first place. For transparent-intercepting environments this will require a moderately clever IPtables entry that routes only that traffic directly to the other network interface rather than through DansGuardian. For explicit-proxy environments, either provide an automatic configuration script that bypasses DansGuardian when accessing that website, or use a setting in each browser named something like “No-proxy for:” (this setting may not be available in some browsers). If you have restricted “skipping around”, you will also need to add an exception rule to your network server IPtables.
Installation#12. What do I need a blacklist subscription for?
DansGuardian typically pre-checks each URL site and path, using very large lists of site and path categories, to decide whether or not to even fetch the webpage and proceed with content checks. DansGuardian will continue to work even without such lists, perhaps by relying mainly on OpenDNS for anti-proxy and anti-phishing, and perhaps by relying more heavily on its stage two weighted phrase filtering technology. Without blacklists, there may be some degradation in either performance or accuracy, perhaps even enough to be noticeable. And of course blacklists are not needed at all for the “whitelist” style of operation.
The term “blacklist” in the DansGuardian context can be a little misleading, as what many lists do is more like categorizing sites than listing bad sites. The DansGuardian administrator gets to choose which categories are considered bad in their environment and so should be immediately blocked, and which categories should be passed on to the second stage phrase filtering, by simply un-commenting or commenting-out the relevant .Include lines in the .list files bannedsitelist and bannedurllist.
Such lists change too often to be packaged with DansGuardian itself, change so frequently that updating them only when DansGuardian is updated wouldn't work, and are much too large for isolated individuals to maintain. So they are supplied separately, usually via some sort of subscription arrangement.
Installation#12b. What's the urlblacklist-update script called and where can I get it? And what does it do?
A very thorough script with a lot of error handling is available for automatically updating blacklists. It's called UpdateBL. There are several similar scripts, all with the same name! One version is available from urlblacklist.com; another version is available from dansguardian.org; at least a couple versions are available from the mailing list archive or its members.
All known versions of this script do thorough error checking and handling. The result is that no matter what goes wrong or when, your system won't be left with no blacklists at all. Even if you're very unluckly, the worst that can happen is you'll be left with the old version of the blacklists rather than the new version. Be warned that most of the other blacklist update scripts associated with other blacklists do not include extensive error checking and handling.
Most versions of UpdateBL are suitable for both one-time (manual/interactive) use and recurring (automatic) use.
Various versions of UpdateBL also:
- add an appropriate #listcategory statement to each file so you can easily tell right away which blacklist category caused a block
- fetch only “diffs” –which are much smaller than the entire blacklists– and process them locally to produce updated blacklists
- fetch some other blacklists too (especially those for phishing, malware, and proxies) and merge them into those from urlblacklist.com (and remove duplicates) before deploying them
- unsort blacklists (no longer necessary, see Installation#12d below)
Unfortunately it appears that no one version of UpdateBL does all these things.
UpdateBL is often very site-specific. So often you'll need to modify a script slightly –or even merge two different versions– to produce a script that's appropriate to your environment.
Installation#12c. Can I set up my system to update my blacklists automatically?
Yes, set up your update script (probably UpdateBL) as a 'cron' job. 'cron' jobs are a normal function of Linux; nothing special is required.
Installation#12d. Do I need to “unsort” my blacklists?
Probably not any more. Versions of DansGuardian older than 126.96.36.199 started up extremely slowly (many minutes!) when presented with pre-sorted blacklists. At one time urlblacklist.com catered to this misbehavior by unsorting (randomizing) their blacklists before distributing them.
Because this behavior made distribution of “diffs” and user processing of blacklists difficult, urlblacklist.com stopped doing this. Shortly thereafter, the problem of slow DansGuardian strartup with pre-sorted blacklists was addressed. As a result it's no longer necessary to unsort blacklists.
Keeping blacklists sorted at all times makes it easier to
- find something
- merge blacklists from several sources
- locate duplicates
- pre-process blacklists to remove known local exceptions.
Installation#12e. Where can I get blacklists suitable for use with DansGuardian?
As a convenience, DansGuardian is preconfigured to use the blacklists from http://urlblacklist.com/. With only minor tweaking, you can make DansGuardian use a different blacklist, multiple blacklists simultaneously, or no blacklist at all. Usually categories of blacklists are pulled in by .Include statements in .list files. (Note urlblacklist.com will almost certainly not respond at all to any emails, which doesn't necessarily reflect on the quality of the blacklists themselves.)
Another commonly used set of blacklists is the one available from http://www.shallalist.de/.
Many blacklist subscriptions either charge a small fee or require you to register in order to satisfy their license. Many blacklists are free to some kinds of sites while charging other kinds of sites. Examine their license carefully and understand it thoroughly. (Also, find out and follow what each blacklist wants. They're not all the same, for example some routinely request an email contact to start, while others do not routinely respond to emails at all.)
The service from http://www.opendns.com/ may be an alternative to using blacklists (or it may be used in combination with blacklists). It simply makes “bad” websites invisible and hence unreachable. All updates are handled immediately by the OpenDNS staff at their own website, so DansGuardian systems are always up-to-date even though their adminisrators never perform any update chores.
Installation#13. Do I need to set up my own weighted phrase lists, or can I just use the supplied ones?
Most likely you can just use the supplied weighted phrase lists. They are already divided into categories. Choose which categories should be locally enabled by simply un-commenting or commenting-out the relevant .Include lines in lists/weightedphraselist. (You can if you wish extend -or even modify- the distributed phrase lists, but doing so certainly isn't necessary.)
(Also see Usage#12 in the Configuration/Usage FAQ Portion below.)
Installation#14. Do I need PCRE (the Perl-Compatible Regular Expression libraries, and sometimes also its development tools)?
Most pre-built packages require PCRE runtime shared libraries (maybe something like /lib/libpcre.so.0). And most distributions already contain the PCRE runtime, so there's seldom an issue.
If you build DansGuardian yourself, you may need to add a PCRE “development” package. You can also build DansGuardian to not require PCRE by adding the configuration directive –enable-PCRE=no. The resulting DansGuardian will still work, but you won't be able to use any Perl-style constructs in any regular expressions you add or modify, and you won't be able to use most of the pre-supplied content modifying recipes in contentregexplist (you probably don't care about this).
Installation#15. How can I install and run multiple instances of DansGuardian simultaneously?
Use a single instance of DansGuardian with multiple filter groups instead. This does everything you could do by running more than one copy of DansGuardian in parallel, and is much easier to configure and maintain. (Note different pieces of software are different. Just because DansGuardian prefers to avoid multiple instances doesn't mean all other software applications do too.)
Installation#16. Can I prevent my users from skipping around DansGuardian/Squid (or DansGuardian alone)?
Yes, every configuration of DansGuardian/Squid (under an OS like Linux) can always be made impossible to skip around. Generally this is done via addition of two or three Shorewall/IPtables rules (and tweaks to configurations to use 127.0.0.1 for communication between DansGuardian and Squid). Note that although preventing of skipping around is always possible, it's not automatic; an initial installation of DansGuardian/Squid almost never includes what's needed to prevent skipping around. (This is just as well, as premature “lockdown” makes troubleshooting a new installation needlessly difficult.)
Installation#17. DansGuardian and Squid services don't start automatically when the server is rebooted. How can I fix this?
The use of runlevels and the automatic launching of daemons (“services”) in Linux is not standardized. As a result, frequently DansGuardian and Squid installations cannot figure out what to do or how to do it. So you should manually set the daemons to start whenever your system restarts. Depending on your distribution, most likely the command you'll need is either update-rc.d or chkconfig ; the runlevel and ntsysv commands may also be useful.
Installation#18. How do I configure Shorewall/IPtables for my environment (transparent-intercepting or explicit-proxy; LAN or DMZ; gateway/firewall or separate server)?
For initial operation of a LAN installation, explicit-proxy configurations do not need Shorewall/IPtables at all; only transparent-intercepting configurations need Shorewall/IPtables. (You will need Shorewall/IPtables later to “lockdown” both explicit-proxy and transparent-intercepting configurations.) For operation (but not for “lockdown”), http://www.shorewall.net/Shorewall_Squid_Usage.html may be a helpful reference, although there are two known issues with using this reference in a DansGuardian context:
- Everything is presented only in the syntax of the Shorewall front-end, not that of IPtables commands, which may or may not meet your needs directly.
- It assumes browsers connect directly to Squid with no intervening DansGuardian (so you must translate everything to Browser←8080→DansGuardian, DansGuardian←3128→Squid, and Squid←80→Internet).
Installation#19. To have Squid request NTLM credentials,
there are TWO helper programs:
one comes with Squid and the other comes in the Samba package.
Which one should I use?
If possible use the NTLM authentication helper program that comes in the Samba package. The two are pretty much interchangeable. The Samba one works better, especially in “unusual” circumstances. And perhaps most importantly, to minimize duplication of effort the one that comes with Squid is being only minimally maintained.
Installation#20. Sometimes I see references to paths like /etc/dansguardian, and other times I see references like /usr/local/etc/dansguardian. Are these really the same thing, and if so why do they sometimes appear differently?
In the 2.9/2.10 DansGuardian series, the build (./configure) options have been set so a “default” build is appropriate for experimental use (but not necessarily for production or distribution use). Builds such as those done by distributions usually want to explicitly set several build options so the result better matches recommended administration procedures. Unfortunately often builders either don't realize the defaults have changed in DansGuardian 2.9/2.10, or conclude that the defaults must be “correct” (even though they're a bit unusual). As a result, the built in paths frequently become /usr/local/etc/dansguardian (for configuration, lists, and blacklists), /usr/local/var/log/dansguardian (for logs), /usr/local/sbin (for executables), etc.
Rebuilding DansGuardian with appropriate ./configure settings (for example --prefix= ) may produce paths that look more like what you desire and expect.
Where the existing DansGuardian references /etc/dansguardian/dansguardian.conf but the new DansGuardian references /usr/local/etc/dansguardian/dansguardian.conf, one possible problem is the upgrade process will not detect the previous installation of DansGuardian at all (and so will provide all brand new configuration values). You can minimize this problem (no matter what paths any newer DansGuardian might use) by creating a couple symlinks before you upgrade DansGuardian: ln -s /etc/dansguardian /usr/local/etc/dansguardian ; ln -s /var/log/dansguardian /usr/local/var/log/dansguardian .
Installation#20b. I edit DansGuardian's configuration files and restart it. But my change still doesn't take effect; it's as though the old configuration were “stuck”. Why does this happen?
Probably your system has two copies of the configuration files; you're modifying one while DansGuardian is reading the other.
See if for example both /etc/dansguardian/dansguardian.conf and /usr/local/etc/dansguardian/dansguardian.conf exist. Determine which copy of the configuration files DansGuardian is really using. Move the other copy to some location where it's immediately clear that version has been “archived”, so you won't edit it by accident any more.
Installation#21. Which Kaspersky anti-virus package do I need for the “kavd” contentscanner? And what should I do after I've attempted to install Kasperky A-V but just get the message ”'cannot perform virus scan”?
The aveserver program the “kavd” content scanner uses may have been moved to the 'kav4mailserver' package. Furthermore, the license terms provided by Kaspersky may no longer sanction its use for this purpose.
To use Kaspersky anti-virus with DansGuardian, use the ICAP server and the “icap” contentscanner configuration instead.
Installation#22. Which “contentscanner” option should I use with Clam Anti-Virus?
Use the second option, the one that references 'clamdscan.conf', which says 'plugname=clamdscan'. The 'clamdscan' option largely eliminates any sort of version dependency [build-time or run-time] between DansGuardian and ClamAV. It interfaces with the interprocess named pipe socket provided by the current version of ClamAV, and has no special requirements or restrictions.
The old 'clamav' runtime option remains present mainly for historical reasons (it may not even work at all any more with recent versions of ClamAV); the old 'clamav' runtime option is effectively deprecated. The --enable-clamav build option should not be specified (it's not necessary, and probably won't even work any more). Most builds should use only the --enable-clamd option. (In fact the unnecessary presence of build/configure option --enable-clamav will probably cause DansGuardian to emit a weird error message about a ClamAV library version mismatch, for example
dansguardian: error while loading shared libraries: libclamav.so.5: cannot open shared object file: No such file or directory
then refuse to start up, even if 'clamav' is not being used and so is not configured in dansguardian.conf. Executables without the old 'clamav' build option will not experience this problem.)
Installation#22b. On my system the 'clamdscan' option in dansguardian.conf says !!Not Compiled!!, but the 'clamav' option is present. Can I use the 'clamav' option instead? If not, what should I do?
The 'clamav' option is not exactly the same, probably will not work at all with more recent releases of ClamAV, unneccessarily introduces an overly tight version dependency, can be difficult to install and maintain, and for all these reasons is not recommended.
Instead, do one or more of the following:
- complain to your distribution about their DansGuardian package having been built inappropriately (builds should use only ./configure option –enable-clamd, not –enable-clamav too)
- obtain a more appropriate (and later?) DansGuardian package for your distribution from an “unofficial” repository (most distributions have one or more)
- rebuild DansGuardian from source, adding –enable-clamd to (and removing –enable-clamav from) its configuration (see Installation#24b for rebuilding “almost” the same)
- forego entirely the use of an Anti-Virus with DansGuardian
Installation#23. My system already runs the clam daemon. Can I just use the existing clam installation?
Yes, that's what clamdscan does, communicate with a clam daemon through the named pipe socket it provides.
Installation#23b. I tried to enable clamdscan, but it just says “Could not perform virus scan!” What should I do?
Start by backing out of your hole. Debugging clamdscan through DansGuardian is usually needlessly difficult and is seldom necessary. It will work much better to debug clamdscan directly.
At a shell prompt you should be able to execute clamdscan [filename] and get a few lines of output –including an OK and a SCAN SUMMARY. Until this direct use of clamdscan works correctly for you, don't even bother trying to use it through DansGuardian. If you have problems, you may find the ClamAV log (follow LogFile from /etc/clamd.conf), the ClamAV options related to debugging (probably LogClean and Debug), and the ClamAV documentation helpful.
Installation#23c. What should I do to preempt common problems with using clamdscan from DansGuardian?
Most often with distribution-neutral installations of DansGuardian a virus scanning problem is due to file permissions. DansGuardian writes each file to be scanned with the owner and group DansGuardian itself is running as and permissions -r--r-----. In order to scan these files, ClamAV needs to have the OS's permission to read them.
(Note this may not be a problem with some distributions, as the distribution has already customized userid's so this issue doesn't arise.)
One way to resolve this problem is do both the following (then of course restart both applications):
- In clamd.conf, set AllowSupplementaryGroups yes.
- Add the ClamAV user (clamav ?) as a member of the group that DansGuardian runs as (proxy ? squid ? dansguardian ?).
Another common problem is that “LocalSocket …” in 'clamd.conf' and “clamdudsfile = '…'” in DansGuardian's 'clamdscan.conf' should refer to exactly the same filename (/tmp/clamd and /tmp/clamd.socket are not exactly the same; one includes an explicit .socket and the other just implies it).
Installation#24. Can I build DansGuardian from source myself without much trouble?
Yes, building DansGuardian from source is easy and only takes a few minutes.
Your system probably already has almost all of the prerequisites needed: gcc, make, header files, logrotate.
You may get error messages about a handful of missing files and find that you need one or two additional packages: zlib-devel and pcre-devel. Although most systems already have the runtime packages for Zlib (compression) and PCRE (Perl-compatible regular expressions), those may only be sufficient to run but not to build DansGuardian.
Installation#24b. Can I configure the DansGuardian I'm building to be almost the same as the package I'm replacing? If so, how?
Yes, execute dansguardian -v on the executable that's your model (probably the one from the distributed package). Take all the --xxxxxx options it emits and feed them back into the ./configure for your new build (except of course change the parameter that makes your new build only “almost” the same).
The listing of configuration parameters from dansguardian -v is accurate and complete and is already in the right format for ./configure.
Installation#24c. Can I create a “debug” DansGuardian that exactly matches the package I'm using? If so, how?
Yes, adding just the --with-dgdebug to a new DansGuardian that's otherwise exactly the same as an existing pattern is just another variation on the technique in the previous question.
In fact, building a “debug” DansGuardian this way is so straightforward that a script in the Wiki document Using A Debug Version automates the process.
Installation#25. What are the significant differences between the 2.9.x.x series and the 2.10.x.x series?
DansGuardian odd-numbered series are “development” versions, while even-numbered series are “stable” versions. An odd-numbered series will go through a multitude of versions as features are tweaked. An even-numbered series, on the other hand, will be re-released only to fix significant flaws and will typically extend to only a handful of versions. The tip of the 2.10.x.x series might not change for years.
As a historical example, the 2.9.x.x series encompassed many versions as features were tweaked. When it became “stable” it was re-christened as the 2.10.x.x series. There is absolutely no difference in the code base.
Installation#25b. Are there good reasons to download and build one of the old 2.9.x.x series now that the 2.10.x.x series is available?
No. 2.9.x.x versions are of historical interest only.
Installation#26. My experience is my DansGuardian system runs amok once in a while! Why is operation not always 100% reliable, and what can I do about it?
DansGuardian keeps running for a very very long time. It greatly stresses the kernel by using lots of memory at the very same time as it's using sockets heavily. It continually creates and destroys processes. It uses gobs of interprocess communication via pipes. And typically all the software together (usually DansGuardian plus Squid) creates a high load. Because of all this, DansGuardian is very good at exposing bizarre and rare flaws in kernels, even though other software operates correctly all the time under that kernel. (Maybe DansGuardian should be part of a kernel's test suite:-) Such problems usually occur with BSD-derived kernels (see next question).
There have been anecdotal reports of DansGuardian's count of child processes becoming inaccurate on systems with more than one filterip statement in 'dansguardian.conf' (such configurations, while not common, are legitimate, however for unknown reasons they don't always work quite like they should), and possibly also only under BSD-derived kernels. These reports have never been completely understood, and may not highlight kernel problems. Still…
On versions up through 188.8.131.52, this problem sometimes caused the parent DansGuardian process (and hence the entire application) to SEGV (SIGSEGV, segment fault) crash. Later versions do not crash. A preliminary patch tagged 184.108.40.206 but never released frequently lost track of some of its child processes, usually eventually leading to noticeably poor performance. A more thorough change in 220.127.116.11 tries to recover more gracefully; it's not yet known how often (if ever) problems in 18.104.22.168 and later can result in poor performance. To minimize the risk of possible problems:
- Restart DansGuardian every day. Use your system's command for controlling services/daemons (for example /etc/rc.d/init.d/dansguardian restart), not the built in shortcut dansguardian -r). Usually you can do this with a 'cron' job in the middle of the night.
- Configure your system so the kernel “swap” space and the Squid “cache” are on different drives. (Having the swap space on the same drive as the web cache and the application logs and the kernel logs and the kernel modules and base [and sometimes the virus scanning temp space too] can cause horrid disk head contention, which makes the system act like it's badly overloaded even though the CPU isn't very busy.)
- Set maxagechildren in 'dansguardian.conf' to a very high value (++10000) (or if it works better a very low value (only a few hundred).
Installation#26b. My large-scale DansGuardian/Squid installation fails frequently with a message about “segment fault” (SIGSEGV). What can I do about it?
Many BSD-derived kernels (NetBSD, FreeBSD, OpenBSD, etc.) default to being tuned for workstation or small server use, and need to be manually re-tuned for large-scale DansGuardian operation. If the kernel is not re-tuned, it will often appear that DansGuardian starts up but later fails with a SEGV/SIGSEGV –often in an “impossible” location. Changing DansGuardian options won't help very much; it's the kernel that needs attention. Consider other things too, but only after you've tuned your kernel. (For more and more detailed information see the docunent Operation Under NetBSD/FreeBSD/OpenBSD on the Wiki.)
Almost all recent reports of large-scale system instability have involved BSD-derived kernels (NetBSD, FreeBSD, OpenBSD, etc.). Unlike Linux-derived kernels, many versions of BSD-derived kernels do not automatically adjust their configuration to the amount of physical RAM. Reports suggest that kernels that are “closely tuned” are the most likely to exhibit problems. (Also, some versions of BSD-derived kernels contain some older networking code that may not be perfectly reliable when network turnover is very high.)
To obtain stable DansGuardian operation with BSD-derived kernels that autmatically scale all kernel parameters to the amount of RAM, likely all that's needed is to add more memory to the system. With other BSD-derived kernels that scale all kernel parameters to a single size setting, all you will need to do is explicitly increase kern.maxusers. With some BSD-derived kernels though you will need to tweak individual kernel parameters. (Also, there's been one report that stable DansGuardian operation can be obtained by falling back to Squid 2.x rather than using Squid 3.x.)
Unfortunately it's not known exactly which parameters might need tweaking nor exactly what their values should be. Experiment. If something doesn't make a difference, put it back exactly like it was. But if it affects DansGuardian, search for the right value. Pay particular attention to the number of sockets, the number of file descriptors, and the number of kernel processes. Perhaps most importantly, experiment with increasing the kernel's amount of shared memory. Such kernel tuning of course interacts with tuning the number of DansGuardian child processes; if the kernel is tweaked until operation is stable, then maxchildren is increased, unstable operation can return.
One user was able to increase maxchildren as necessary, yet retain stable DansGuardian operation by adding the following lines to /etc/sysctl.conf:
Another user suggests a different tweak to shared memory:
kern.ipc.shmseg=512 kern.ipc.shmmni=512 kern.ipc.semmni=512
Yet another user found that in order to increase maxchildren, the kernel had to be tuned with
There have been reports that it's helpful to set:
Also possibly helpful is increasing the number of memory structures available for IPC communication (which is heavily used between the various DansGuardian processes):
kern.ipc.msgmnb=8192 kern.ipc.msgmni=40 kern.ipc.msgseg=512 kern.ipc.msgssz=64 kern.ipc.msgtql=2048
Installation#27. Why not just set the all of the …children configuration options to very high values?
If the routine number of DansGuardian child processes is quite a bit greater than the number of simultaneous users, your system may do a very large amount of useless swapping. To put it another way, way too many unused child processes can lead to significantly poorer performance.
Installation#27b. Why not just set maxchildren as high as possible and rely on DansGuardian to adjust the actual number of child processes dynamically?
Once in a great while an error could cause DansGuardian to start as many child processes as it can. If there were really only a handful of simultaneous users when that happened, performance could be noticeably degraded for a time. The parameter maxchildren is a safety limit as well as for performance tuning.
Installation#28. My understanding is I can't use DansGuardian IP Auth if my network uses DHCP. Is this really true?
Not exactly… In order for IP Auth to function correctly and be maintainable, the IP addresses of your computers should not change. This is the real restriction on the use of IP Auth, which is sometimes (not quite accurately) expressed as “No DHCP”.
Even a naive DHCP server will try to assign the same IP address to the same computer every time …but it won't try hard enough. In order to use IP Auth, you could either forego DHCP altogether and use “static” IP addresses, or you could further increase the “stickiness” of DHCP-assigned IP addresses either of the following ways:
- force lease times to be very long (many months or even years!) - most DHCP servers can be configured to replace the lease time requested by the client computer with a value you configure on the server (so configuration of individual computers isn't necessary)
- assign a DHCP “reservation” to every computer that tells the DHCP server which IP address must always be assigned to that computer
Installation#29. When I try to view website X, my browser just displays an all white blank page. What's wrong?
Whenever a web browser encounters an unforeseen error, it just displays a blank page. Furthermore, browsers may just display the same blank page even in situations that are not clearly errors, such as an inaccessible Cascading Style Sheet. In other words you might think of a blank web page as the browser's universal error indication. Or you might think of a blank page as the web equivalent of the BSOD (Blue Screen Of Death).
A gazillion different things can lead to the display of a blank page. (You might even fix one problem only to uncover a second, and never realize you were on the right track, because both problems resulted in the same blank page display.) All a blank page tells you is that something is wrong, and that you need to do further troubleshooting on the computer where DansGuardian runs.
Installation#30. I expected the “trickle” download manager to provide the quickest download experience, but instead it seems to be the slowest! Is something getting in the way of the “trickle” download manager really “trickling”?
Yes, more or less. (Remember download managers matter more when anti-virus scanning is enabled.) Some previous versions of DansGuardian provided a “trickle” download manager that began to pass the downloaded file through to the client browser almost as soon as it started to arrive. However it was realized this was risky. In some circumstances the significant part of a virus payload could get clear through to the client browser before any virus was detected. So although the current version of DansGuardian provides a “trickle” download manager for upward compatibility, it doesn't really “trickle”. What it currently does is holds up the downloaded file until the entire file has arrived and been completely virus scanned before it sends even the first byte on to the client browser. Then –as before– it sends only one byte at a time. The net result can be that use of the standard “trickle” download manager in the current version of DansGuardian is very slow while offering no real benefit to the user.
(Some DansGuardian administrators disagree. They think either that the level of risk is so small it needn't be a concern, or that administrators should be given the opportunity to shoot themselves in the foot if they really wish to. As a result, some source code modifications to produce a “trickle” download manager that really “trickles” may be available via the mailing list.)
Installation#31. How can I find all the log entries related to a browser's attempt to load one page?
What the browser sees as just one page will show up as many lines in the log, one for each internal part of the page. Some of the entries may appear to be to unrelated domains, but in fact they're integral parts of displaying the overall page. As the “base page” is known only to the browser but not to DansGuardian (it often doesn't appear anywhere in the HTTP headers), many of the relevant log entries will not contain the domain name that appears in the browser's address bar. Nevertheless, you need to find all the relevant lines in order to troubleshoot a problem.
Perhaps the best alternative is to do your testing off hours, so there's nothing else in the log except the test browser's attempt to display the webpage. When you can do this, you know all the lines in the log are relevant.
But troubleshooting offhours isn't always possible. As an alternative, first extract only the portion of the log around the time the attempt was made. Then further extract only the entries that came from the client computer that ran the browser attempt. Depending on your configuration, the relevant entries might be identified by client IP address, by username, or by filter group.
(Be aware that if some internal parts of the page are in the browser's “cache”, requests for those parts won't go over the network and so won't appear in the DansGuardian logs. Thus you should either do a “maximal refresh” [for example CTRL-Refresh in both IE and Firefox], or flush the browser's entire cache immediately before making your test request.)
Installation#32. When I try to extract the “relevant” log entries, many of them don't contain the domain name that appears in the browser's address bar. Can I just assume these log entries are extra, and ignore them?
No! What appears in a browser as just one webpage is almost always composed internally of many parts, and each part generates a separate log entry. As only the browser knows what the “base page” is, the domain name in the browser's address bar won't appear anywhere in many of the relevant DansGuardian log entries (even though they're all really part of the same webpage). Often it's a problem with one of these parts from some other site that causes a base webpage to display incorrectly. To be effective, your troubleshooting must attend to these seemingly extraneous parts too.
Installation#33. Phrases containing accented/special characters appear to never match against web pages using the UTF-8 encoding (even though I've taken care to correctly allow for both language and encoding). What else can I try?
If your phrases include accented/special characters and web pages you typically access use the UTF-8 encoding, you might need to either specify a Latin1-style character encoding for the environment of the DansGuardian process itself, or specify preservecase=2 in dansguardian.conf. (Although it's difficult to make much sense of this, experience indicates these changes do solve the possible problem.)
Usage#1. When using DansGuardian, the squid logs all point to localhost (127.0.0.1) as source IP address.
Good observation! The source IP of the request to Squid is localhost (127.0.0.1), as it is DansGuardian making the request and DansGuardian is running locally on the server. This question often suggests a deeper problem, either analyzing the wrong log files or attempting to reuse old ACLs from a Squid-only system in a DansGuardian/Squid system. In order to monitor which IP is going where, you should look at the DansGuardian logs rather than the Squid stub logs. (The DansGuardian logs are located in a folder something like /usr/local/var/log/dansguardian/.) The DansGuardian log contains more and different information, so of course its format is not the same as the Squid log.
If desired, a “quick fix” is to have DansGuardian add the X-Forwarded-For: HTTP extension header that most versions of Squid understand. For each request, Squid then has the IP address of the end user computer rather than that of DansGuardian. To do this, set forwardedfor = on in dansguardian.conf. (Also turn on follow_x_forwarded_for in squid.conf if it isn't already on by default, probably with a statement like follow_x_forwarded_for allow localhost.) Finally in Squid 3.1 and later add forwarded_for delete to squid.conf so it won't pass the information along to the external website. Otherwise websites may be able to see some of your internal IP addresses and be able to disentangle some of your users.
Usage#2. I get “Error connecting to test proxy”, what's wrong?
The backend proxy server half must be started before the DansGuardian half. If you use startup scripts, re-arrange them so that they start in that order! (Often sequence numbers are provided for ordering service startups at each runlevel, in which case the Squid Snn number should be lower than the DansGuardian Snn number.)
Usage#4. DansGuardian is running, but it's not filtering.
Flush the browser's cache (or at least click “refresh” in the browser). Double-check everything in the installation and configuration of DansGuardian and Squid. Double-check the proxy settings of the browser you're using. Check that the computer you're testing from is a user computer (not the DansGuardian or gateway or firewall computer), and is not listed in exceptioniplist. Reread the installation instructions carefully. Check the DansGuardian log (probably /usr/local/var/log/dansguardian/access.log) to see if the request is even going through the filter. Review the General Troubleshooting Strategies suggested by the doc wiki]]. Post to the mailing list.
Usage#6. How do I use (BSD)newsyslog instead of (Linux)logrotate to rotate DansGuardian log files?
- User Jyri contributed:
If rotating DG logs with newsyslog, you have to configure newsyslog like this (in /etc/newsyslog.conf): /var/log/dansguardian/access.log nobody.nobody 644 7 * 24 Z \ "/usr/local/etc/rc.d/dansguardian.sh stop && \ /usr/local/etc/rc.d/dansguardian.sh start" Important things are: - New access.log will have both UID and GID set to "nobody". Default is "root", which prevents DG from writing to the log, meaning that DG won't start after log rotation. - DG will be restarted after rotating the log. If this is not done, DG runs ok but won't log anything after log rotation occurs. Other options mean: - New access.log will have permissions set to 644. - Keep 7 latest logfiles in archive. - Don't care about size of the log file. - Rotate log every 24 hours. - Compress archived logfiles.
Usage#7. Some online updates and installs (software, malware signatures, media decoders, new versions, etc.) don't work reliably through DansGuardian.
This is usually caused by the servers labeling their files as text with a gzip stream (DansGuardian checks all documents whose MIME type starts with “text/…”, including such poorly described download files). Regardless of the exact cause, the solution is to add the problem servers to the file exceptionsitelist. Either add the entire problem domain to exceptionsitelist, or check the logs and see which particular server is being blocked and add only that specific server to exceptionsitelist.
Usage#8. DansGuardian fails on startup, but it does not give any error messages explaining what the problem is.
DansGuardian is actually quite detailed at reporting errors; however, you are probably not seeing its messages. To see the detailed messages, try starting DansGuardian from the command line with just the single command dansguardian . It will tell you what the problem is and/or pinpoint which configuration file the problem is in.
The regular daemon control commands (service dansguardian start, /etc/init.d/dansguardian start, etc.) are preferred for normal use. But in this case they unhelpfully gobble up all the detailed error messages from DansGuardian to summarize them as simply OK or Failed. So in this case, use the dansguardian command directly rather than the usual daemon control commands.
Usage#8b. How should I interpret the DansGuardian startup messages?
If startup is successful DansGuardian won't emit eny messages at all. The presence of messages indicates there was some sort of startup error.
Often the messages will be in the form of a “backtrace”. If you can easily interpret two different messages and aren't sure which one to follow first, select the earlier/topmost one. The first message will be the most specific and detailed. The last message will often say something about “parsing”. It just means the error occurred while processing the configuration (well duh), not that the error necessarily has anything to do with the syntax in any file.
Usage#9. Help, it appears that strangers are using/abusing my DansGuardian or Squid!
Be sure in dansguardian.conf to specify filterip = … as the IP address of the computer interface to your internal LAN. Don't just rely on the filterip default, as it makes the filter available on all interfaces, including any that lead to the wide world. Also be sure in squid.conf http_port …:… is 127.0.0.1:3128 (or the IP address of the computer interface to your internal LAN plus the DansGuardian↔Squid port). Again don't just rely on the http_port default, as it makes Squid available on all interfaces, including any that lead to the wide world. For both DansGuardian and Squid if you need to listen on more than one network interface, you can repeat the entire configuration directive line (filterip or http_port) as many times as necessary.
Usage#10. Since using DansGuardian, my Squid ACLs no longer work.
That is correct. The source IP of the request to Squid is probably localhost (127.0.0.1), as it is DansGuardian making the request and DansGuardian is running locally on the same server. When upgrading from a Squid system to a DansGuardian system, reusing the old Squid ACL configuration (just because the proxy half happens to still be named Squid) may not be the best idea. A “quick fix” is available though; see Usage#1 above.
Usage#11. What about performance? Do you have any recommendations for system size? What is considered a large scale environment, and what does it take to support it?
System and network capabilities and speeds change so quickly and vary so much in different parts of the world it's hard to meaningfully recommend specifics. Anything with more than 350 simultaneous users should probably be considered a large scale environment. Until recently, very large DansGuardian/Squid systems were rare. But with the advent of very large servers with huge amounts of memory, SATA II or SAS drives, and multi-gigahertz multi-core CPUs, very large DansGuardian/Squid systems have become more common. Besides advice from the mailing list, see the performance tuning information in the doc wiki and the whole thread including posting #3231 in the mailing list archives.
A single copy of DansGuardian can sometimes scale up to about 1000 (or even more with a special build) simultaneous users when running under an appropriate OS on appropriate hardware. Another option for very large environments is to run DansGuardian on several systems, perhaps with all the client computers in each IP address range always using the same DansGuardian. (Although dynamic load balancing may be more interesting technically, simple static load balancing is usually adequate.) In these environments with more than one filter server, the DansGuardian machines will of course be different than your firewall/gateway.
Keep in mind that the full combined DansGuardian/Squid may have performance problems due to insufficient memory or inappropriate swapping even though the Squid half by itself did not exhibit these problems. Also keep in mind you should attend to not only DansGuardian itself but also Squid when tuning performance; a simple change in disk layout or mounting can sometimes considerably improve performance.
Usage#11b. What special tuning of large servers is needed?
Very large servers are probably already tuned for server operation. But if the hardware or kernel was intended for workstation use, it will probably not handle such a large number of users very well. Starting from a workstation, quite a few things may need re-tuning, including (but not limited to) the following.
- Increase the system overall number of file descriptors. First find out the current value: cat /proc/sys/fs/file-max. Then increase it if necessary: echo 65535 > /proc/sys/fs/file-max. (In “sysctl” the name is “fs.file-max”.)
- Increase the per-process limit on the number of file descriptors (make it the same as the [possibly revised] DansGuardian internal FD_SETSIZE). First find out the default value: in a terminal shell as “root” ulimit -n (the value will probably be 1024). Then increase it if necessary: again in a terminal shell as “root” ulimit -n 4096.
- Put the Squid data cache on a separate (fast) hard drive, preferably connected through a separate I/O channel.
(The procedures for tuning these things vary widely from one distribution to the next. For example some kernels prefer modifying the /proc file system, some kernels prefer modifying the /etc/sysctl file, and some kernels prefer using the sysctl command. [Often one of the “sysctl” methods is preferred for permanent changes, while the ”/proc” method is only for temporary changes.] The example procedure shown for each item may not be the one your distribution prefers.)
Usage#11c. What special tuning of DansGuardian for use on large servers may be needed?
For systems with less than about 1000 simultaneous users, no special version of DansGuardian is needed; just follow the usual performance tuning guidelines, especially those for “child” tasks. For systems with more than about 1000 concurrent users (specifically with more than 1018 child processes), you will need to obtain (or build) a special version of DansGuardian.
While most of DansGuardian uses the poll() system call which can handle an unlimited number of file descriptors, there are still a couple uses of the select() system call in DansGuardian 2.10. The net effect is the highest acceptable value of maxchildren is limited by the file descriptor set size that's acceptable to select(). (Vanilla DansGuardian can and does use all the file descriptors that are available, some of which at any instant in time will be waiting or closed. But DansGuardian cannot simultaneously access more than FD_SETSIZE file descriptors.)
At most maxchildren can be FD_SETSIZE-6. Don't simply circumvent this check; even if it appears to work at first, DansGuardian will soon become unresponsive (especially under high load). Usually the default value of FD_SETSIZE is 1024, so the highest acceptable value of maxchildren is 1018 (pay no attention to old texts that suggest the highest acceptable value of maxchildren is exactly 999 [or 1000], as this hasn't been true for a long time). Unlike most configuration problems, if the value of maxchildren is too high, a “system message” will be emitted not only onto the terminal that is starting DansGuardian but also through syslog (and the terminal message may include a colloquial reference to “rabbits”).
FD_SETSIZE is normally defined by the system and constrained by the Posix standard. There is no universal way to increase it beyond 1024 without risking binary compatibility. As 1024 is often “too small” these days, and as most modern kernels internally support much higher values, most distributions provide some way to change it just for a particular application build. Do not change the system include files directly (unless explicitly instructed to do so by your distribution), as doing so will affect –and possibly break– other applications. Also ignore the code in DansGuardian that seems to set the value to 256. It's nothing more than an “emergency fallback” in case the system doesn't set any value at all, and is virtually never actually used.
To increase maxchildren beyond the default FD_SETSIZE (less 6), you will need to obtain or rebuild DansGuardian with a different internal FD_SETSIZE used by all DansGuardian modules (not just dansguardian.cpp). To rebuild DansGuardian, first obtain the source tarball and set things up for building. The procedure for rebuilding an application with a different FD_SETSIZE differs from one distribution to another; hopefully your distribution offers directions that you can follow. In any case don't increase the value inside DansGuardian beyond what your kernel actually supports.
A procedure that seems reasonable and documents itself and works on some systems is to just extend the build instructions slightly as follows (note the syntax requires there be no spaces at all). (This example assumes the new value is 4096, and assumes this is supported by the kernel – if you need a different value with your kernel or in your environment, change the second line.):
cd ./usr/local/src/... ./configure ... ... CXXFLAGS=-DFD_SETSIZE=4096 make make install
(Note well that's CXXFLAGS [not CPPFLAGS], which is correct for DansGuardian [but probably not for some other applications].)
Unfortunately this procedure doesn't always work because some systems just overwrite your new value of FD_SETSIZE with the old system default value. Make sure your procedure really did use your new value of FD_SETSIZE everywhere in preference to the old system default value. If it didn't, look for a different procedure to use on your version of your distribution.
Usage#11d. Does DansGuardian make full use of all available cores and/or processors?
Yes. DansGuardian is structured as a set of loosely connected separate processes, a structure that's ideal for taking full advantage of multiple cores and/or multiple processors.
Usage#12. Can I just use the “default” configuration of DansGuardian exactly as distributed?
First let's be clear we're talking about the default configuration you'd see if you obtained DansGuardian directly. If instead DansGuardian was integrated into your distribution, it may have different defaults to which the following comments only partially apply.
The default configuration of DansGuardian is functional, and is quite useful for testing the operation of a new installation. The default configuration even provides “usable” filtering that's usually better than nothing at all (and may even be considered adequate for production use in some circumstances). Once DansGuardian is installed and functioning though, it's configuration should be tuned to match your local usage patterns and policies; the default configuration is unlikely to suit any particular purpose very well.
Think of the DansGuardian “default” configuration as being a starting point that already handles lots of routine situations and saves you a lot of effort. Or think of the DansGuardian “default” configuration as being similar to the “default” configurations of most modern gateways, where initially everything is turned off and you have to open the ports you want. To put it another way, the DansGuardian default configuration emphasizes safety (not appropriateness). Especially avoid the following:
- Un-comment only the lines you're interested in in weightedphraselist. Do not simply un-comment every line, as doing so can easily result in significant over-filtering.
- In almost all cases, un-comment the lines for the “goodphrases” category. Without these offsets Dansguardian's filtering may not be very intelligent.
- Do not just use the pre-supplied value of naughtynesslimit. The default value of 50 is extremely conservative. Some of the comments in dansguardianf1.conf provide guidelines for choosing values for naughtynesslimit in different environments. Experience is that for use by adults with filtering of only the most blatant webpages, even values larger than 200 are not unreasonable.
Usage#12b. Should I treat adjustments I have to make to the DansGuardian configuration as “bugs”?
Not usually. The DansGuardian “default” configuration is not a fixed canned configuration (it's more of a “starting point”). Some tweaking of the DansGuardian “default” configuration to better match your local usage patterns and policies is expected.
Usage#13. DansGuardian doesn't work the way I want it to on my IPCop system.
The Cop+ Web interface that controls DansGuardian gives you limited ability to edit Dansguardian Configuration files. The configuration files are all in /etc/dansguardian/ and subdirectories. You can edit the configuration files directly with vi from the command line or using WinSCP from a windows workstation. Care Must be taken to keep the files owned by “nobody” or the Web interface will not be able to edit them anymore. For help with Cop+, try http://home.earthlink.net/~copplus/dghelp.html.
Usage#14. DansGuardian doesn't work the way I want it to on my SmoothWall Express or SmoothWall system.
For help with the homebrew DansGuardian package for SmoothWall Express, try http://community.smoothwall.org/forum/. If you are running one of SmoothWall's commercial products, either get in touch with your reseller, or if you bought directly from SmoothWall, try https://support.smoothwall.net/.
Usage#15. I have a system configuration problem on distribution X, and the messages mention DansGuardian. Can you help me?
Unfortunately maybe not. Many DansGuardian configuration tools are actually created by a distribution, even though they may appear to be “generic”. (Just because it says “DansGuardian” in the title doesn't mean the DansGuardian community had anything to do with it or knows anything about it.) If your problem involves a configuration program specific to distribution X, it's unlikely the DansGuardian community will be able to help. For these kinds of issues, use the recommended support channels for distribution X.
Usage#16. How can I tell the user (and me too) why their webpage was blocked?
Attend to both parts of this answer, first inserting information into the configuration lists, and second displaying that information to the user.
Every list file can have one #listcategory ”…” statement in it. (Active statements have exactly one sharp, not two and not zero.) Several list files already do include a category label. The biggest problem is every time your blacklist subscription produces revised files, those new files probably do not contain this label. Typically for blacklist files the label should be simply the name of the parent directory, for example #listcategory “pornography”. Maybe the script you use to download fresh blacklist files already contains functionality to insert these labels, and all you need to do is switch it on. In other cases you may need to enhance the script you use to download fresh blacklist files to insert these labels.
(Note there can be one #listcategory ”…” statement in every file, not just in every list. Every file can [and probably should] have its own #listcategory ”…” statement, even if the file is .Included in some list rather than being directly specified in a conf file. [This might be different from what you would expect if you drew an analogy from other software.])
Whatever the content of the label in the configuration file that blocked a web page will be copied to the blocked template as the replacement text for the -CATEGORY- variable. You can simply use the default blocked page HTML template unchanged, in which case the category information will be displayed in red just below the reason the page was blocked. Or you can modify the blocked page HTML template (and restart DansGuardian) to include this variable in any context and as often as you like.
Usage#17. How do I filter FTP (or remote logins, or Instant Messaging, or Peer To Peer File Sharing, or Folder and Printer sharing, or email, or etc.) with DansGuardian?
DansGuardian is purely a web filter. Use other tools to filter other kinds of network traffic. If non-web network traffic is forced through DansGuardian anyway, DansGuardian will have no idea what to do and will just throw it away, resulting in no connectivity at all rather than filtered connectivity.
Usage#17b. How do I filter outgoing as well as incoming web traffic with DansGuardian?
In general the current DansGuardian filters only incoming web traffic (although it does give you some control over “upload” operations). If you need to filter outgoing web traffic, you will need to use some other software tool.
Usage#17c. After I installed DansGuardian, Folder and Printer sharing (or remote logins, or Instant Messaging, or Peer To Peer File Sharing, or FTP, or email, or etc.) stopped working. How should I change my DansGuardian configuration to restore lost network usability?
Apparently installation of DansGuardian included manipulation of network firewalls such as IPtables or local firewalls such as the one built into Windows. It's these manipulations that have interfered with your network usability; DansGuardian itself has no effect on anything other than web traffic. You need to figure out which firewall manipulations caused the problem, and further modify them to restore your network usability while still allowing DansGuardian to operate.
Usage#18. How can file downloads be controlled? By filename extension? By MIME type? By website?
File downloads can be controlled by either filename extension or MIME type (via files bannedextensionlist, exceptionextensionlist, bannedmimetypelist, and exceptionmimetypelist). You might wish for example to deny download of … .EXE files while allowing download of … .ZIP files.
ORing both file extensions and MIME types together adds needed robustness, since either bit of information will sometimes be incorrectly specified in the HTTP headers, but only once in a great while will both bits of information be incorrect simultaneously.
However, you are responsible for ensuring MIME type and filename extension instructions don't conflict. If you allow conflicts to creep in, download management can become impossibly tangled. A good management technique is to always make all modifications to the same list (probably MIME type), then always regenerate the other list (file extensions in this example) from scratch to be its equivalent.
(Another common way to avoid any possibility of conflicts [but which defeats the original purpose of having both] is to always use the same list alone [MIME types or filename extensions, whichever is more convenient for you], and never use the other list at all [in other words keep one of the two lists empty].)
Usage#18b. Can I specify different download restrictions (filename extensions and MIME types) for different websites?
File download control is pretty much the same for all websites; you cannot directly control file downloads differently for different file extensions or MIME types depending on which website is supplying the download (except to allow all downloads from that website).
One thing you can do is completely except some websites from all download restrictions. If you've configured for the whitelist style of controlling downloads (blockdownloads=on in dansguardianfN.conf), list all sites and URLs from which downloads are okay in 'exceptionfilesitelist' and 'exceptionfileurllist'. For the normal style of controlling downloads (blockdownloads=off in dansguardianfN.conf), regular download restrictions ('bannedmimetypelist', 'bannedextensionlist', 'exceptionmimetypelist', 'exceptionextentionlist') will apply in most cases, but not to the sites and URLs listed in 'exceptionfilesitelist' and 'exceptionurllist' from which all downloads will be allowed.
Another thing you can do is use “multiple filter groups” to produce a similar effect. Define some additional filter groups to be used just for downloading. Since each filter group can have its own lists of allowable file extensions and MIME types, you can give different restrictions to different filter groups and hence effectively to different websites.
Usage#19. Can I block certain terms from use in web searches, for example disallowing searches for “porn” everywhere?
Usually if an offensive search term is specified, the results page will be so objectionable that DansGuardian will filter it out before the user ever sees it. So the net result looks the same to the user as it would have if the search term had been preemptively blocked before it was ever sent.
(As almost all searches use the same GET-“query” syntax, and as the full URL is visible to DansGuardian for regular HTTP, theoretically you could block search terms by matching regular expressions against URLs. However experience suggests such regular expressions quickly become extremely complex and difficult to maintain.)
Usage#20. A teacher is asking that the web filter prepare to allow material for their next unit. Their next unit includes material that's easily mixed up with a banned category, and so is likely to run afoul of the web filter (for example “breast cancer”). What can I do?
Step back a little and try to better understand what's really being asked. One possibility is you're being asked to address a problem that doesn't exist. The teacher could be reacting to an experience at another school, or a rumor, or a bad dream, or… Sit down with the teacher at a filtered computer and try a dozen searches and displays. If the teacher says “gee, it works much better than I expected”, the best course of action may be to simply leave it alone.
Another possibility is this is a disguised request which really means: please make the web filter better at distinguishing good from bad. Phrased this way, it's clear the answer is: If I knew how to do that, I would have already done it. Finally, sometimes such a request really means: please reduce the false positives for a particular subject area, at least for a few days. If so, some of the suggestions below (which are very roughly ordered from top to bottom) may help. Note many of these suggestions are likely to also affect other subject areas and some of them may not be acceptable.
- Pre-vett a big bunch of websites. Help the teacher sit at an unfiltered computer and check out websites the students should be allowed to visit later (either because the teacher suggested them or because a websearch led to them). Write down each okayed website, and add all those websites to file exceptionsitelist (or better yet to a new file .Included by exceptionsitelist). And deliver the same list of websites as bookmarks for the students.
(Just this one suggestion may be sufficient to resolve the issue. And it doesn't affect other categories, and it has few or no negative side effects, so it can be left in place permanently, perhaps for another class or another teacher or another year.)
- Relax the entire web filter temporarily by tweaking the overall naughtynesslimit to a higher number for a few days.
- Ensure “good” phrases are sufficiently active. Be sure the various “Good Phrases” .Includes are un-commented in weightedphraselist. If they already are, try temporarily enlarging (maybe even doubling) the negative score in each line related to the unit's material in goodphrases/weighted_general (for this example the three lines containing the word “breast”). Of course save the original file before tweaking its contents.
- If your blacklists include a “good” category that lists a lot of the sites that the teacher wants, temporarily give it a free ride. For this example, add an .Include line for the hygiene category to greysitelist (or exceptionsitelist).
- If you already use multiple filter groups and identify individual users, get a list of the students in the affected class and temporarily promote them to a less restrictive filter group.
- If a lot of false positives involve one particular category, temporarily stop banning that category. For example, if there are a lot of “blog” postings that the teacher desires to be accessible, temporarily comment out the line referencing forums in bannedsitelist and bannedurllist.
- Temporarily remove the block on the other area that's likely to be confused with the desired area. Using the example of “breast cancer”, temporarily comment-out the pornography category in bannedsitelist and bannedurllist.
In any case, monitor the filter's behavior closely. Comb through the DENIED entries in the DansGuardian log every day during the problematic unit. If there appears to still be interference, take additional steps immediately.
Usage#21. What's the difference between 'exception…list' and 'grey…list'?
Inclusion in any exception…list disables all filtering so webpages are allowed no matter what. On the other hand inclusion in any grey…list (greysitelist, greyurllist) disables only the first stage filter checks (triage or pre-filter or URL); the second stage weighted phrase list checks still occur and can still deny access to a webpage. So put it in grey…list if you want to see it only after it passes phrase filtering; put it in exception…list if you want to see it no matter what.
If you have also configured some kind of anti-virus content scanning, items included in any grey…list will still be anti-virus scanned (just as they are still phrase scanned). Whether or not items included in any exception…list will still be anti-virus scanned is controlled by the contentscanexceptions = yes or contentscanexceptions = no option in dansguardian.conf.
(Note the grey…list file names use the British spelling “grey”, not the American spelling “gray”.)
Usage#22. The **s and **ips options in bannedsitelist don't seem to do anything at all. What could interfere with https: filtering?
DansGuardian can only filter the traffic flowing through it. When the **s and **ips options don't do anything, it usually indicates ssl/tls/443/https: traffic isn't going through DansGuardian at all.
If you have a transparent-intercepting type configuration, there's no way to filter https: traffic at all with DansGuardian. (If you try to direct https: traffic through DansGuardian anyway, it simply won't work.) See Installation#6 and Installation#4 above.
If you have an explicit-proxy type configuration, it's possible you need to enter an additional setting into each of your browsers. Some browsers have a setting for SSL Proxy separate from their main proxy setting. If so, repeat in each browser your proxy setting (probably something like proxy.your.domain.tld 8080) a second time for SSL proxy.
If filtering of https: in general does work, but the blanket block for use of https: with IPaddresses doesn't work, upgrade to release 22.214.171.124 or later, (or [despite the comments] specify **ips rather than *ips in bannedsitelist; in other words two asterisks rather than just one).
Usage#23. When asked for NTLM credentials,
Firefox pops up and asks the user to retype them
even if they were already used to logon.
How can I make Firefox handle NTLM authorization requests
the same as IE (Internet Explorer),
silently and automatically returning to DansGuardian/Squid
the login username and password from the OS?
It is possible to make Firefox handle NTLM credential requests the same way as IE when it's running on any Windows system (but not on a Linux system). In each Firefox, type about:config in the address bar. A full screen of options (most of which aren't available through the Tools:Options mechanism) will be displayed. Then type ntlm into the blank line titled Filter:. The display will shrink to just a few relevant options. The option network.automatic-ntml-auth.allow-proxies should have the value true; double-click on it and change it if not.
Most likely you can just ignore the option network.automatic-ntlm-auth.trusted-uris, which lists external websites NTML should automatically respond to (Firefox doesn't just respond to everybody by default, as there's a small risk illicit sites could collect information to help them to break your domain passwords). Only add the name of the DansGuardian/Squid computer (the name by which the browser computer can ping it) if things don't work.
Leave the option named network.ntlm.send-lm-response set false. All it really does (although the name might suggest otherwise) is include the Lan Manager Password Hash in the appropriate places in NTLM responses. This is only necessary with some very old versions of the NTLM protocol and some very old (or naive) Squid NTLM authorization helper programs. It's not necessary for proper operation of NTLM authentication in most cases; in fact, it's most likely not even a good idea. The LM Hash is a password equivalent; in many cases it can be used in place of the password without needing to know the password itself. Including it in some NTLM responses may allow crackers to easily illicitly harvest password equivalents. So don't turn it on (unless for some reason it's really necessary and you thoroughly understand both its behavior and its ramifications).
Usage #24. Can I use NTLM authentication with a Safari-like browser, either on a Macintosh or on an iPhone?
Yes, but perhaps only with some additional effort. Both Macintosh Safari and iPhone Safari can understand NTLM requests and know they need to return some credentials. (Although NTLM probably won't automatically authenticate users in these cases, at least the manually entered username/password will be exchanged in NTLM format.)
However Apple has had a lot of bugs in this area (both in the browser itself and in parts of OS X) which often resulted in the browser hanging or looping or even crashing. You will probably need versions released at least as late as Summer 2008, or perhaps even later. Either try it with what you have, or check the current status of bugs in this area.
A different solution which works in all situations (and even works for more than just web surfing - things like reading email from an Exchange server) relies on an additional proxy. Safari should connect to the additional special proxy which handles NTLM but otherwise just chains requests on through to the regular DansGuardian/Squid proxy. An additional special proxy that does exactly this is already available; it's called NTLMAPS. If you have a lot of Macintoshes or iPhones on your network and can't sufficiently influence the software versions they use, you can serve them by installing NTLMAPS. It works both for providing NTLM authentication to the DansGuardian/Squid proxy itself and for internal websites that require NTLM authentication and even for the rather uncommon case of external/public websites that require NTLM authentication.
Usage #25. I want to exclude blocking of advertisements from the log. Everything indicates DansGuardian's default configuration will do what I want. But it's not working. Do I need to do something else?
The DansGuardian function to not log blocked advertisements is controlled by the logadblocks option in dansguardian.conf, which is probably already set to off. But that's only part of the mechanism. The other part is a #listcategory “ADs” statement in each list file that defines advertisements. Because those statements mostly go in files that are distributed outside of DansGuardian, they may not be there by default and you may need to add them. Unlike most things in DansGuardian, the tag “ADs” is case-sensitive, so the statement #listcategory “ads” won't do anything and you probably won't be able to figure out why not. Make 'A' and 'D' upper case, and 's' lower case.
Usage #26. Some of my config lines act like they're not there,
even though other config lines behave as expected. It's as though DansGuardian were simply ignoring those lines.
And sometimes a line will start working when I retype it, even though it looks exactly the same.
Why is this happening and what can I do about it?
There was a bug in DansGuardian versions 126.96.36.199 through 188.8.131.52 (but not 184.108.40.206 and before, or 220.127.116.11 and after) that caused list lines containing an invisible TAB character to be mis-parsed. That caused all kinds of weird and wonderful behaviors, including what appeared to be the exact same line behaving differently when retyped, and lines being sensitive to the presence of comments! (Use something like dansguardian -v to find out the DansGuardian version, which isn't always the same as the package version.)
To solve the problem do any one of:
- Edit the list files to use only SPACEs never TABs.
- Convert all existing TABs to the right number of SPACEs (maybe use something like sed or Perl).
- Upgrade to at least DansGuardian 18.104.22.168
Usage#27. DansGuardian doesn't score a webpage the way I think it should be. So should I “adjust” one of my phrase list files?
Most likely what you should adjust instead is naughtynesslimit. This single parameter has more effect on what is filtered and what is not than anything else.
In some cases you may need to further tune your configuration. If so, simply comment or un-comment the various .Include's in `weightedphraselist'. For best results uncomment only the categories you're concerned about, as well as most or all of the “goodphrases” lines. (Most often you should think of the phraselists that were distributed with your DansGuardian as fixed.)
(Because of issues like making the scores for new phrases scale to naughtynesslimit in a way that fits in with existing scores and minimizing false matches, appropriate manipulation of individual phrase list entries can be “challenging”. Changing phrase list entries is usually done by users that either have unusual requirements in unusual environments or are constructing phraselists to be included in the next release.)
Sometimes you can even more simply just give the entire site a free ride. If it's clear to you that nothing on the entire site is objectionable or ever will be, rather than having DansGuardian try to figure out which webpages are “more equal than others”, just add the site name to 'exceptionsitelist'. (If you aren't satisfied with your existing phraselists, you can obtain –likely newer– replacement phraselists from http://contentfilter.futuragts.com/phraselists/.)
Usage#27b. How can I determine how a webpage is being weight scored and which of my phraselist files might be causing a problem?
To get some general insight into the operation of your phraselists, look at the calcuated “weight” (or “score”) for requests in DansGuardian's access.log (the calculated weight is two fields to the left of the HTTP return code field, which stands out because it's always a three-digit number). For any webpage that exceeded naughtynesslimit, the log contains a complete list of words and phrases that contributed to the score, both positive and negative (assuming you've retained the default showweightedfound=on). The Log Analysis tool in some versions of the DansGuardian Webmin Module may make it even easier to display this information.
To obtain as much information as you easily can about suspected problems related to phraselists, temporarily reduce naughtynesslimit all the way down to 1, then access the sites of interest so those accesses appear in the log, and carefully examine the extended information that now appears in the log. If this still isn't sufficient (or isn't possible), to obtain full information about suspected problems related to phraselists whether or not naughtynesslimit was exceeded, replicate the suspect access while running a “debug” version of DansGuardian (see question number Installation#24c above). Capture the output; it will tell you exactly which phrases matched and exactly how they contributed to the weight score calculation.
(Note the weightedphrasemode option in 'dansguardian.conf' can select either of two different methods of calculating weight scores. Be sure you've selected the method you desire before you undertake detailed troubleshooting of problems. And don't change this option lightly, as its effect on filtering of borderline webpages [those with a score “close” to your naughtynesslimit] can be significant.)
Usage#28. One of my phrases matches some webpages, but not other webpages that look exactly the same. Why? What should I do about it?
DansGuardian simply compares all available phrases against all webpages without any reference to either the language or the character encoding of either the phrases or the webpage. In other words comparisons are done simply as raw strings of binary bytes (octets). This means for example the phrase word foobar stored in a Latin1 encoding will probably match most webpages but will not match webpages that use the UTF-16 encoding.
Mostly this issue doesn't matter, as each phrase word is in a particular language and almost all webpages in that language use similar encodings. But there are some exceptions. For example webpages in the Japanese language may be in any of several rather dissimilar encodings. For another example, occasionally a webpage in the English language uses the mostly-16-bit Unicode encoding UTF-16.
If you want a phrase word to match webpages in different encodings, you will need to specify the phrase word more than once, probably in more than one file. Fortunately there is quite a bit of similarity between encodings. A phrase word in any mostly-8-bit encoding is likely to match several webpages even though they use other mostly-8-bit encodings, and a phrase word in any mostly-16-bit encoding is likely to match webpages even though they use slightly different mostly-16-bit encodings. So specifying a phrase word just twice, once encoded as UTF-8 and a second time encoded as UTF-16, might be sufficient to match all webpages.
Usage#29. When access to a website isn't working right, why doesn't just adding the site to 'exceptionsitelist' resolve the problem?
If your browser shows you the DansGuardian “access denied” message, then adding the site to 'exceptionsitelist' should make a big difference. But for problems other than the DansGuardian “access denied” message (i.e. blank pages, etc.), adding the site to 'exceptionsitelist' probably won't work.
To say the same thing another way: The common meaning of “bypassing DansGuardian” is simply neutering filtering (i.e. operate pretty much normally, except don't deny access - rather like “stealth” mode). If instead you mean letting the browser and the server communicate as though DansGuardian wasn't even there (i.e. some sort of hands-off transparent pass-through), there's no way to do this from inside DansGuardian. If your problem does not result in a DansGuardian “access denied” message, listing the site in 'exceptionsitelist' won't help.
Problems other than “access denied” when accessing a site may occur for any of these reasons:
- the website itself refuses to work through proxies. If this is what's happening, consider contacting the website, as there's little you can do with the configuration to resolve the problem. Better yet simply arrange for the traffic to completely skip around DansGuardian/Squid (while still guaranteeing filtering for other sites).
- the website itself refuses to communicate with any client that doesn't appear to fully support HTTP 1.1. If this is what's happening, consider contacting the website, as there's little you can do with the DansGuardian configuration to resolve the problem. Better yet simply arrange for the traffic to completely skip around DansGuardian (while still guaranteeing filtering for other sites).
- the custom web application is “too fragile” and doesn't work right unless everything is exactly as expected. If this is what's happening, consider contacting the source of the custom web application, as they may not realize there's a problem. There may be communication on both web (80,443) and non-web ports, and they may (unreasonably?) be expected to stay in sync, which can be quite difficult when communication passes through a proxy or firewall such as DansGuardian/Squid. To get it to work for your users right away, arrange for the traffic to completely skip around DansGuardian (while still guaranteeing filtering for other sites), and be sure any non-web ports required by the application are also opened (at least to that site).
- there's a subtle flaw in the behavior of DansGuardian/Squid, often due to some lacuna of your DansGuardian/Squid configuration. It's usually best to simply arrange for access to the problem website to circumvent DansGuardian/Squid entirely (probably with either IPtables or some browser setting or both). The extremely detailed troubleshooting necessary to identify the subtle flaw may be quite awkward and slow even for an expert, and may not produce a solution within your required timeframe.
- although some of the communication uses the “web” ports (80,443), the rest of the communication uses a different port and that port is blocked by your firewall. Web applications should show you a meaningful error message in this case …but many don't. Perhaps you should find out what the port is and open it, probably only for communication originating in the outbound direction.
Usage#29b. How can I give my users guaranteed completely hands-off access to a special website?
If a special website will not tolerate even very minor tweaks to its traffic (see previous question), arrange it so that traffic never enters DansGuardian in the first place. (See question Installation#11b.) Specially intercept just that particular network traffic with IPtables, then route it either directly to the other network interface or directly to Squid without going through DansGuardian.
Usage#30. I want to except… some documents (ex: webmail attachments), yet I still want virus scanning to occur for these files. How can I do this?
If you can identify the documents simply by website or URL, just enter them in greysitelist or greyurullist (rather than exceptionsitelist or exceptionurllist).
If you have to use something like Mime type or file extension or a regular expression pattern and so there's no available grey…list, alternatively change from the default setting to contentscanexceptions = on in dansguardian.conf.
Usage#31. When using NTLM authentication to the DansGuardian/Squid proxy itself, users sometimes have trouble accessing remote websites that also use NTLM authentication, as though the two NTLM authentication processes were somehow interfering with each other. How can I fix this?
This problem comes and goes depending on many things, including which exact version of Squid you're using and exactly how Squid is configured. In some environments it occurs so infrequently it's not a concern. In other environments any one of these fixes may be needed:
- use IPtables (not anything inside the DansGuardian/Squid configuration) to allow users to access the remote site directly rather than through the filter
- set up a VPN so your users can establish a presence on the host network, then access the webserver privately rather than publicly
- change the webserver configuration to offer not only NTLM but also one of the W3C standard authentication methods (Basic or Digest) as an alternative
If you cannot use any of the above solutions, some other things to try are:
- check that your Squid configuration always requires the user to supply NTLM credentials before accessing the web
- upgrade to a recent version of DansGuardian (at least 22.214.171.124)
(Ideally this should never occur. NTLM is an inside protocol and should never be the only choice for authentication on publicly accessible websites. But the real world is not ideal - such things do happen.)
Usage#32. I've obtained a client web program that does what I want, but it knows nothing about DansGuardian and tries to reach the Internet directly through port 80. Is there some way I can get it to go through DansGuardian without changing it?
Almost certainly Yes. The vast majority of web programs respect the same environment variable that specifies the path to the proxy. Often this functionality is only minimally documented, so it's common to find that a program doesn't mention proxy support, yet understands this environment variable and does the right thing if it's set.
Set the environment variable like this, then try your program. (The syntax for setting an environment variable varies slightly with different operating systems and shells; you may need to modify the syntax for your particular system.)
Windows> set http_proxy=http://numeric.ipaddr.of.dansguardian:8080 Linux-bash$ export http_proxy="http://numeric.ipaddr.of.dansguardian:8080"
Usage#33. Even though I'm using an authplugin to identify users, neither usernames nor real client IP addresses (not counting the '0.0.0.0' placeholder) ever appear in my logs. How can I make usernames or client IP addresses appear in my logs?
Because the possibility of identifying an individual from information in the logs is often a legal or policy issue at different sites, it's a configuration option in DansGuardian. If you want to see usernames or client IP addresses in your logs, set anonymizelogs=no in dansguardian.conf; if on the other hand you want to ensure usernames and client IP addresses are not included in your logs, set anonymizelogs=yes in dansguardian.conf.
(Of course, if you're not using any local-proxy-assisted auth method at all, DansGuardian has no idea what usernames are, and so can't include that information in its logs [some client IP address information may still be present]. The issue noted in the previous paragraph though is much more likely to trouble DansGuardian administrators.)
Usage#34. Will the anonymizelogs=yes option have any effect on my troubleshooting?
Yes, having anonymized logs can make troubleshooting more difficult. If you can't temporarily set anonymizelogs=no when troubleshooting, you may need to use either the “Offhours” or the “Group/Time” method to identify and extract the relevant log entries.
Usage#35. How can I identify the log entry that was generated by a user's attempt to display a particular webpage?
You'll never find “the one” log entry no matter what, because it doesn't exist - rather there's a whole bundle of relevant log entries.
What appears in the browser to be a single webpage is actually composed internally of many different parts, and the fetching of each of these parts generates a separate log entry. So troubleshooting an attempt to display a single webpage will amost always generate a block of relevant log entries. There are several methods for extracting all the relevant log entries, including “Offhours”, “User/Time”, “Computer/Time”, and “Group/Time”.
Usage#36. A webpage from domain X won't display correctly, but I can't find any log entry for domain X that says DENIED. Where else should I look for troubleshooting information?
The information you need really is in the logs, not anywhere else. The reason you haven't been able to find it is that it doesn't include any mention of domain X. So long as you simply search for domain X, you'll miss it.
Where is it?
The Technical FAQ is one of the better references for resolving problems. It's inside the package and is installed along with the DansGuardian binary. Typically it's installed into /usr/local/share/dansguardian. Or, to read it directly from the source package, see …/doc/FAQ.