DansGuardian Documentation Wiki

You are here: Main Index » allowing_a_website


|

Wiki Information

Allowing a Web Site

You can't get there from here.
Old Yankee Saying

Getting There

I want to get to website foo.com. In some cases I want to go even further, skipping “content filtering” for that website so I always get there without ever seeing “access denied”. How do I do that?

The procedures are the same for both of these:

  • overriding bans that DansGuardian might otherwise make
  • adding a “whitelist” entry if you have DansGuardian's “Blanket Block” enabled

You can think of this task as involving two parts: “how” and “what”, which are described in the two main sections below.

(Note though these procedures probably will not solve other problems that do not result in “access denied”. If users should always be able to access to a particular website, but there's a problem other than “access denied” [such as a blank page], it may be best to simply arrange for traffic to that website to never enter DansGuardian in the first place.)

How To Tell DansGuardian

Add entries to DansGuardian configuration file “exceptionsitelist” if you always want to be able to go there. If on the other hand you want “content filtering” to still be enforced even though other checks are skipped, add entries to DansGuardian configuration file “greysitelist”.

Depending on your local administrative procedures, you may either edit the file directly with your favorite *nix text editor or edit it via a GUI administration tool such as Webmin.

Entries should be just a name, no slashes, no protocol (such as “http”), no “www”, no colon, no username or password or at sign, no port number. Entries for DansGuardian should not ever start with a period (other software -perhaps even Squid- may be different). Use a specific host name rather than the generic domain name if you only want to allow that one host (and if it's name is not the mythical “www”).

Use a sharp character ('#') to add a comment. Record the date, your name, and the reason for the exception. Here are some example entries:

foxyproxy.net           # 5/6/2008 Jim filter bypass
sc3.securelogin.com     # 8/22/2008 Sally webmail for all
liveconnect.com         # 3/18/2008 Joe Jane-approved chat

When you're done editing the file, force DansGuardian to reread its configuration by issuing the command

prompt# dansguardian -g

(The Webmin GUI will do this for you automatically.)

(Here's a more thorough description of the syntax of entries in the ...sitelist files.)

What To Tell DansGuardian

These days few websites are still monolithic; most of them refer to other websites. Some such referrals are clearly within the same domain name, so just one instruction works for all the hosts in the whole domain including those referrals. Many other referrals though are not clearly within the same domain name.

Some referrals are obscured so it's too hard to figure out on the fly which domain they're in. Usually these referrals call out hosts by IP address. They will be a problem if you have DansGuardian's “Blanket IP Block” enabled. You can add IP addresses to “exceptionsitelist” or “greysitelist” just the same way you add domain names and host names, so the DansGuardian configuration file might look like this:

domainname.com          # 8/8/2008 Monica fresh artichokes
64.28.95.76             # 9/9/2008 Jack grapefruit
hostname.foobar.net     # 10/10/2008 Shereen eggplant

Other referrals include bits and pieces of functionality stored on other websites. They may bring in a JavaScript library, a CSS/Style sheet, advertisements, stock images, bits of functionality such as a secure login, outsourced auxiliaries such as hit counters, tracking to calculate payments due a third party, and so forth. In most cases you will need to except these as well as the base domain name in order for the website to function correctly. The main problem is to find out what these subsidiary sites are.

Sometimes un-banning the first round of subsidiary websites will expose another layer of subsidiary websites. So you may need to perform more than one round of the process.

Often a subsidiary website is referenced by quite a few different base websites. So excepting it just once will improve the interaction of several websites. Do be careful not to create duplicates in “exceptionsitelist” or “greysitelist”; they will work, but they may cause you enormous maintenance headaches later.

Figuring Out The Subsidiaries: Isolating Activity

In order to un-ban a website, you quite often need to also un-ban some other host names or domain names or IP addresses that aren't clearly related to the base website. But how do you figure out what these other host names or domain names or IP addresses are?

  • Searching the DansGuardian documentation won't help.
  • Searching the web probably won't help.
  • Searching the DansGuardian email list archives usually won't help.
  • Asking for help on the DansGuardian email list often won't help.

The only reasonable way to find out what the subsidiaries currently are is to look in your DansGuardian log! Everything else is just shooting in the dark.

Find all the DENIED entries related to an attempt to visit the website of interest. These are the subsidiaries that must be un-banned along with the base website name. The logfile is named something like /usr/local/var/log/dansguardian/access.log, and is accessible directly from the system where DansGuardian runs.

Pretend to be a regular user and attempt to access the website of interest, then look at the log. Your log file will probably be quite large and rather confusing; you will probably need to isolate just the part related to one attempt to access the website of interest. Use any one of the methods below to isolate just the relevant lines.

If some internal parts of the page are already in the browser's “cache”, requests for those parts won't go over the network and so won't appear in the DansGuardian logs, and the logs will be unremediably confusing. To forestall such problems, you should either do a “maximal refresh” (for example CTRL-Refresh in both IE and Firefox), or flush the browser's cache entirely immediately before attempting to access the website of interest.

(Note just searching through your logs for the domain name that's in your browser's address bar will not work! What appears in your browser as just one page will appear in your logs as many separate lines [one for each internal part], and many of those lines won't contain the base domain name anywhere.)

Offhours Activity Isolation

Often the best way way to clearly identify and separate the activity you (pretending to be a user) generate is to make your attempt offhours when there's no other activity.

First find out how many lines long the log is:

cd /var/log/dansguardian
wc -l access.log

Say the output shows the log was 213656 lines long before your activity.

Now pretend to be a user and attempt to access the website. Proceed as far as you can before DansGuardian bans your activity.

Go back to the server where DansGuardian runs and extract the new portion of the log:

sed -n '213656,$ p' access.log > extract2

Finally if you're looking for subsidiary sites (but not if you're extracting relevant log entries so you can post them) you can extract just the records where DansGuardian banned something:

grep 'DENIED' extract2 > extract3
Computer/Time Activity Isolation

Another way to extract only the relevant portion of the log is to find all the activity your computer generated around the time of day when you were pretending to be a user. (You can only use this method if your DansGuardian is configured with anonymizelogs=off in dansguardian.conf, so you may wish to temporarily change your configuration.)

First use the timestamps on the records in the log to extract a portion that starts a little before your attempt and ends a little after your attempt. One way to do this is with the *nix text processing tools `grep` and `sed`. Suppose you attempted to access the website of interest about 3:10 pm (within the range 15:05:..-15:15:..):

cd /var/log/dansguardian
grep -n ' 15:05:' access.log

suppose for example the resulting record number is 12431

grep -n ' 15:15:' access.log<br>

suppose for example the resulting record number is 12583

sed -n '12431,12583 p' access.log > extract1

Now use the IP address of the computer you were using to further extract just the records your attempt generated. Suppose for example when pretending to be a user you were using the workstation with the IP address 172.16.1.183:

grep '172.16.1.183' extract1 > extract2

(Remember, IP addresses [other than the placeholder 0.0.0.0] will only be present in the log if your DansGuardian is configured with anonymizelogs=off in dansguardian.conf.)

Finally if you're looking for subsidiary sites (but not if you're extracting relevant log entries so you can post them) you can extract just the records where DansGuardian banned something:

grep 'DENIED' extract2 > extract3
User/Time Activity Isolation

A slight variation is to find all the activity your username (rather than your computer's IP address) generated around a certain time of day. (Again, you can only use this method if your DansGuardian is configured with anonymizelogs=off in dansguardian.conf, so you may wish to temporarily change your configuration.)

First use the timestamps on the records in the log to extract a portion that starts a little before your attempt and ends a little after your attempt. One way to do this is with the *nix text processing tools `grep` and `sed`. Suppose you attempted to access the website of interest about 3:10 pm (within the range 15:05:..-15:15:..):

cd /var/log/dansguardian
grep -n ' 15:05:' access.log

suppose for example the resulting record number is 12431

grep -n ' 15:15:' access.log

suppose for example the resulting record number is 12583

sed -n '12431,12583 p' access.log > extract1

Now use your username to further extract just the records your attempt generated. Suppose you were pretending to be user 'jack':

grep 'jack' extract1 > extract2

(Remember, usernames will only be present in the log if your DansGuardian is configured with anonymizelogs=off in dansguardian.conf.)

Finally if you're looking for subsidiary sites (but not if you're extracting relevant log entries so you can post them) you can extract just the records where DansGuardian banned something:

grep 'DENIED' extract2 > extract3
Group/Time Activity Isolation

Another slight variation is to find all the activity for your test filtergroup (rather than your username or your computer's IP address) generated around a certain time of day. (This method has the advantage that it will work even if your DansGuardian is configured with anonymizelogs=on in dansguardian.conf.)

First use the timestamps on the records in the log to extract a portion that starts a little before your attempt and ends a little after your attempt. One way to do this is with the *nix text processing tools `grep` and `sed`. Suppose you attempted to access the website of interest about 3:10 pm (within the range 15:05:..-15:15:..):

cd /var/log/dansguardian
grep -n ' 15:05:' access.log

suppose for example the resulting record number is 12431

grep -n ' 15:15:' access.log

suppose for example the resulting record number is 12583

sed -n '12431,12583 p' access.log > extract1

Now use the name of your test filtergroup (groupname=”…” in dansguardianfN.conf) to further extract just the records your attempt generated. Suppose your test filter group is named 'usertest'.

grep 'usertest' extract1 > extract2

(If you've not yet specified a non-blank groupname in dansguardianfN.conf, the filter group will only be identified by its number in the log, making this extraction difficult, perhaps too difficult.)

Finally if you're looking for subsidiary sites (but not if you're extracting relevant log entries so you can post them) you can extract just the records where DansGuardian banned something:

grep 'DENIED' extract2 > extract3

Figuring Out The Subsidiaries: Using Anonymized Logs

If you've configured anonymizelogs=on in dansguardian.conf, extracting relevant records for troubleshooting can be more challenging. Use any one of these techniques:

  1. Do your tests when there's no other web use activity originating from your network. In other words, use the “Offhours Activity Isolation” method.
  2. In a “multiple filter groups” installation, add an additional filtergroup with nobody in it except your test user. Then you can sift through the log looking for that filtergroup and find all the relevant entries.
  3. Temporarily set anonymizelogs=off, restart DansGuardian, do your troubleshooting, then restore anonymizelogs=on. (Note this will sometimes be unacceptable, in which case you will be forced to use one of the other techniques.)