DSDF (Different Strokes for Different Folks)-Common Idiom
DansGuardian is an award winning web content filtering proxy1) for Linux, FreeBSD/OpenBSD/NetBSD, Mac OS X, HP-UX, and Solaris that uses a web proxy such as Squid2) to do all the fetching. It uses multiple methods to filter incoming content (not outgoing content).
DansGuardian is quite different depending on what level you approach it at. (That's why you may get different answers to whether or not something is “supported”.)
It's common to interface to DansGuardian at more than one level: mostly at the “moderate” level for convenience, and at the same time a bit at the “low” level for special needs.
At its heart DansGuardian is a “filter engine”. Almost everything (including language and encoding) is configurable. (For further information see Filtering Methods below.)
Fairly often some function is best provided not by “direct” detection in DansGuardian but rather by “indirect” means. (For example one might block access to “over 18 only” YouTube videos by mangling the partuclar URL that's known to be accessed.) With quite a bit of software such approaches would be classed as “kludges” and would be avoided. But with DansGuardian such approaches are relatively common, and the negative connotation doesn't seem warranted.
The advantage of the low level interface is you can do pretty much anything. The disadvantage is frequently non-trivial configuration and testing effort will be required.
(Even though such a low level approach might seem pretty geeky, there's a GUI for it. Among other things, the Webmin DansGuardian Module provides full access to all low level configuration options.)
Commonly used blocks of functionality –implemented as preset configurations– are provided with DansGuardian. They can simply be “enabled” or “disabled” without any need to understand how they work or tweak their low level configuration by just commenting or uncommenting various ”.Include” lines in the configuration files. These blocks of functionality are distributed as:
- “blacklists” of proscribed locations (as maintenance of such lists is so time-consuming the only reasonable approach is to share the effort across multiple user sites)
- entire websites
- portions of websites identifiable by URL
- “phraselists” (to eliminate duplicating the time and effort to construct weighted/scored lists of words) (currently [June 2009] most common Romance languages as well as Chinese and Japanese languages are covered)
- a few words that should be completely banned
- many words that should contribute to the “weight/score” of each webpage
- a few more always-illegitimate URL patterns
- Always-illegitimate URL patterns (such as any URL containing “xxx”)
- URL modifications (such as forcing Google “safe search”)
- Cookie modifications (such as forcing Bing “safe search”)
One advantage of the moderale level approach is that new and revised blocks of pre-configured functionality (especially current “phraselists” and also current “blacklists”) can be obtained and installed without affecting at all either the DansGuardian “filter engine” itself or its configuration.
DansGuardian is a True Content Filter in that it actively monitors the content of web content and doesn't only filter based on the location (url) that the content is coming from. DansGuardian filters using the following methods:
- Black/White domain/url list based filtering
- Regex (regular expressions) urls blocking
- Regex substitution (can change words/phrases on sites)
- PICs web-site rating filtering
- Anti-virus filtering
- Meta tags filtering
- File extension and file type (MIME) filtering
- Words and phrases on sites
- POST limiting
The phrase filtering will check for pages that contain profanities and phrases often associated with pornography and other undesirable content. DansGuardian's sophisticated approach of weighted phrase filtering with intelligent scoring is a considerable improvement over plain word and phrase filtering. (Even so, the remaining false positives may suggest DansGuardian is most appropriate for public terminals and children and younger teens.) The POST filtering allows you to block or limit web upload. The URL and domain filtering is able to handle huge lists and is significantly faster than squidGuard.
The filtering has configurable domain, user and source ip exception lists. SSL Tunneling is supported (but of course not content-filtered as DansGuardian isn't privy to the encryption). The configurable logging produces a log in an easy to read format which has the option to only log the text-based pages, thus significantly reducing redundant information such as every image on a page. Almost all parts of DansGuardian are configurable thus giving the end administrator user total control over what is filtered and not some third-party company.
Versions 2.8.x.x of the software and later support multiple filter configurations to easily provide different filter settings for groups of users without attempting to run “multiple instances”. (“Multiple instances” are neither necessary nor easily possible, at least not with 2.10.x.x and later.) Versions 2.10.x.x include integrated Google search filtering and integrated anti-virus scanning.
Even-numbered version series (2.8.x.x, 2.10.x.x, etc.) are “stable”. They change only very occasionally, as evidenced by the last two parts of their version number (…x.x) being quite low. Development and testing of new features occurs in the odd-numbered version series (2.9.x.x, 2.11.x.x, etc.) When a development/beta version is “completed”, it's rechristened; for example after development in the 2.9.x.x series culminating in 18.104.22.168, the very next version of the same code was called 2.10.