DansGuardian Documentation Wiki

You are here: Main Index » downloadable_blacklists


Wiki Information

Blacklists for DansGuardian


One filtering method used by DansGuardian is to check whether the web site being visited is listed in categorized blacklists. These are lists that are generated either automatically by computer programs (robots or spiders) or by humans. Below you'll see a number of lists that are compatible with DansGuardian.

Why Blacklists?

Blacklists help a lot in triaging which things DansGuardian needs to examine more closely. DansGuardian will work perfectly well without any blacklists, but it may require faster hardware to serve fewer users.

However no blacklist is perfect. You will inevitably disagree about the categorization of a few sites. Trying to get a blacklist 100% “correct” (according to your opinion) is a hopeless thankless task.

Instead, have mechanisms for overriding individual blacklist listings in your own environment. In DansGuardian blacklists are typically .Include'd in banned…list's, and whenever something is listed in both a banned…list and an exception…list, the exception…list listing takes precedence. So just straightforward use of the exception…list's is often the only overriding mechanism you'll need. (A complete solution to providing overrides would be to use scripts to automatically apply local exception edits to every revised version of every blacklist as it is fetched and installed.)

Use comments to note the date and reason for each exception, supporting periodic cleanups of the cruft. Although you may have the best of intentions of just “remembering” why each override is in place, experience shows it very seldom turns out that way. So if in doubt, add more comments.

Comprehensive Blacklists

Comprehensive blacklists that are actively maintained and categorize millions of sites. Note that these require registration and/or payment for regular use.

Malware Blacklists

Blacklists that specifically block proxies.

Proxy Blacklists

Blacklists that specifically block malware such as viruses, trojans, phishing attempts etc.

Limited Blacklists

The following blacklists are either limited in scope or are not actively maintained.

Whitelist Mode

If DansGuardian is being used in a “whitelist” mode (i.e. with “blanket block” enabled), blacklists are not needed. In whitelist mode, blacklists are neither necessary nor useful. Downloading, configuring, or subscribing to blacklists would simply be a waste of time in whitelist mode.

However it may appear that without some blacklist files present, DansGuardian will not start up. This is not true; no placeholder blacklist files are necessary. Simply find all the .Include statements that refer to any nonexistent file in the blacklist area, and comment them all out.

## To find all uncommented references to files in the blacklist area

prompt# cd /etc/dansguardian/lists  # or /usr/local/etc/dansguardian/lists
                                    # or whatever your system uses
prompt# grep -c '[^#][ 	]*.Include.*blacklist' * | grep -v :0

## Now edit each of these files with your favorite text editor 
##  and comment out every line that includes a blacklist file
## Example of a commented out (disabled) reference to a blacklist file 
##  (this example in 'bannedsitelist')
## If necessary insert a sharp in the very first column of each relevant line 


Include Categories

Include statements in 'bannedsitelist' and 'bannedurllist' (and perhaps others) divide up blacklists by category so you can easily enable (uncomment) or disable (comment) different categories simply by manipulating the .Include line for that category. The distributed category .Include already match the blacklists from 'urlblacklist', but will need to be modified or replaced to work with other blacklists.

Suppose you've already fetched and installed the blacklist files you're going to use, and they contain only sites and URLs. And now you need to construct the relevant .Include lines. Here's a way to do it (as usual with Linux-style text tools including the Bash shell, be careful not to mix up double [”], single ['], and backquotes [`], as they all behave differently):

prompt# export L=/etc/dansguardian/lists  # or whatever path is correct on your system
prompt# export BL=/etc/dansguardian/lists/blacklists  #or whatever path is correct on your system
prompt# echo "### new category includes begin below ###" >> $L/bannedsitelist
prompt# echo "### new category includes begin below ###" >> $L/bannedurllist
prompt# for FULLCAT in `find $BL -type d -print`; do ls $FULLCAT/domains 2>/dev/null && echo "#.Include<$FULLCAT/domains>" >> $L/bannedsitelist; ls $FULLCAT/urls 2>/dev/null && echo "#.Include<$FULLCAT/urls>" >> $L/bannedurllist; done

Downloading Blacklists

Once downloading of a blacklist subscription works reliably for you, it should most likely be scheduled to run automatically in the future. Make it a 'cron' job, with the frequency parameters matching your blacklist subscription. If you need to man crontab. (Remember automatically launched 'cron' jobs don't have the same environment as during your interactive testing; only a skeleton environment is provided for 'cron' jobs. This is the most frequent cause of 'cron' job problems.)

It seems prudent for each blacklist subscription download script to leave the old version of its blacklists in place until the new version is fully ready (downloaded, verified, tagged, etc.) Don't just assume all the sample blacklist download scripts already act this way, FIXME as unfortunately quite a few of them delete the old version of their blacklists before making sure the new versions are all ready. Suggest changes to some of the sample blacklist download scripts to make them more cautious.

If your DansGuardian system uses blacklists simply to improve performance, you can get away with updating your blacklists only rather infrequently: perhaps every month or so. If on the other hand your DansGuardian system uses blacklists as its primary method of forbidding proxy use and/or malware attacks, you need to update your blacklists frequently: at least every day.

Multiple Blacklists

Simultaneous use of multple blacklists is straightforward and is often a good idea. The DansGuardian default configuration is set up to use only a single blacklist subscription, but this is simply a convenience default rather than a requirement.

Using more than one blacklist simultaneously can provide several benefits:

  • Either blacklist can be updated without affecting the other
  • There's no need to find a script to combine blacklists or categories, or figure out when to execute it
  • The “reason” on the blocked page can (if you wish) include which blacklist as well as which category
  • Future problems obtaining one blacklist won't affect any other

The biggest drawback with setting up simultaneous use of multiple blacklists may be having to wrestle with alternate/renamed categories now rather than later. For example it may be prudent to figure out which categories are similar (even though they may have different names).

Setting Up Storage for Multiple Blacklists

Using multiple blacklists requires a little attention the first time to set it up, but then it runs all by itself. First create create additional directories as necessary:

prompt# cd /etc/dansguardian/lists
prompt# mkdir blacklists.shalla

Then make sure each blacklist update goes into its own directory- In many cases this just means twiddling the setting of a variable in a script. Here's a list of what's needed for some (but by no means all) blacklists:

  • BL (urlblacklist) → /etc/dansguardian/lists/blacklists
  • shalla_update.sh → dbhome=”/etc/dansguardian/lists/blacklists.shalla”

Using Multiple Blacklists via Includes

Add additional .Include lines to the various ...list files as necessary. For example here's what some of the lines added to bannedsitelist might look like:

## Use only urlblacklist
## Use only shalla blacklist
## Use _both_ blacklists
## Corresponding categories but spelled differently
## No corresponding category in shalla
## No corresponding category in urlblacklist
## Shalla also has subcategories

Labeling Blacklists

The -CATEGORIES- symbol in the blocked page template is replaced by the content of the #listcategory: "..." tags in the various DansGuardian configuration files.

You can make it very easy for any administrator to find the appropriate configuration file by making these tags for blacklist files simply list the parent directory name (pets, phishing, porn, proxy, etc.). If you're using multiple blacklists also add the name of the blacklist subscription to the tag, for example: #listcategory: "porn (shalla)"

Make the script that downloads each set of fresh blacklists also insert tags into all the files so they'll always be there. In some blacklist download scripts (for example some versions of the urlblacklist update script) FIXME the tagging functionality is already there, so all you need to do is turn it on with a simple switch. What exactly is the name and location of the switch to turn on tagging in the urlblacklist.com script?

OpenDNS and Blacklists

The service provided by OpenDNS can assist or even replace your use of blacklists. One way to think of OpenDNS is as a centralized blacklist service providing near-real-time updates.

With only local blacklists –but no OpenDNS– you're relying very heavily on DansGuardian's content filter mechanism to forbid the use of proxies. This may be sufficient if you update your blacklists frequently and https: traffic goes through DansGuardian too (only possible with “explicit-proxy” style configurations).

OpenDNS applies only to systems referenced by name; it is not invoked at all when systems are referred to directly by IP address. So using OpenDNS doesn't make much sense unless you've applied the blanket block to IP addresses (i.e. you've uncommented *ip and *ips in 'bannedsitelist'), as otherwise it would be too easy to circumvent the OpenDNS restrictions. If your users legitimately access a few specific systems by IP address, you can list those IP addresses in 'exceptionsitelist' (if this isn't convenient, you can even make clever entries in 'exceptionregepurllist' to allow access to systems by IP address for some particular purpose such as webmail).

You can combine using OpenDNS and using local blacklists to improve the depth of your filtering. However to avoid horrid management confusion, customize OpenDNS to forbid only categories that will never be excepted (proxies, phishing, malware, ?). In this configuration whenever you add a new site exception you know it will always go in DansGuardian.

You can also use OpenDNS exclusively. Configure DansGuardian to not use any local blacklist at all (just as you would do for whitelist mode). In this case there is no need to customize OpenDNS to forbid only some categories. All you need to do is set the protection level as desired. In this configuration all new site exceptions will always be done through OpenDNS; the DansGuardian mechanism will not be used. (Note though some have questioned the thoroughness and timeliness of OpenDNS updates, suggesting using OpenDNS exclusively while also neglecting constant improvement of content scannng recipes may leave a system vulnerable to circumvention by public proxies.)

In either case (OpenDNS combined with blacklists, OpenDNS exclusively), you must arrange that all your computers ultimately get their DNS service from the OpenDNS servers, and most importantly that your computers cannot get DNS service from anywhere else. Otherwise users who wish to reference proxy systems can simply reconfigure end user computers to use different DNS servers, then proceed to access systems that you thought were forbidden. One way to do this is with DHCP (to point all computers at the desired DNS servers), a local caching DNS service (to centralize your references to OpenDNS) and some IPtables rules (to keep end user computers from accessing any DNS servers other than your local caching DNS system). By sending each DNS query over your ISP connection only the first time, a local caching DNS service will considerably improve performance, both by freeing up a great deal of ISP bandwidth for more web content and by responding much more quickly to most requests since responses can come out of the local cache without ever entering the cloud.