|
D a n s G u a r d i a n P h r a s e l i s t s
|
DansGuardian Phraselist Updates
Here you can find the latest development phraselists. Beware that while our testing shows that these existing lists generate fewer false positives and higher catch rates, the new lists may generate more false positives. Please send feedback if you find that phrases are causing false positives for you.
October 23rd 2007 Update
You can download here in two formats: Zip File or TarBall
.
Key Changes:
* Added new weighted phraselists categories: translation, music
* Small updates to various languages in the pornography phraselists
* Several small improvements to other lists
* Added and reorganized additional bannedregexpurllist entries
Note that some of the changes/lists are ALPHA
quality. Some lists are empty, some incomplete - especially the
new categories. I'd like feedback and additions if possible
especially for the following:
* peer2peer, news, sports, games and webmail lists
* foreign language lists - do they overblock other languages - ie portuguese blocking spanish or english pages?
Please direct any comments, criticism and
suggestions to phrasemaster@dansguardian.org.
October 8th 2007 Update
You can download the new lists
from: http://contentfilter.futuragts.com/phraselists/phraselistsoct8.zip http://contentfilter.futuragts.com/phraselists/phraselistsoct8.tar.gz
Key Changes:
* Added new weighted phraselists categories: conspiracy, domainsforsale, idtheft, secretsocieties, selflabelling, travel, upstreamfilter
* Lists now include #listcategory: "listname"
* Small updates to various languages in the pornography phraselists
* Updates and changes to the proxy list
* Many updates to the sports lists
* Many, many small improvements to other lists
* Added additional bannedregexpurllist entries
Note that some of the changes/lists are ALPHA
quality. Some lists are empty, some incomplete - especially the
new categories. I'd like feedback and additions if possible
especially for the following:
* peer2peer, news, sports, games and webmail lists
* foreign language lists - do they overblock other languages - ie portuguese blocking spanish or english pages?
* SpanishPornography list
Please direct any comments, criticism and
suggestions to phrasemaster@dansguardian.org.
May 31st 2006 Update
You can download the new lists
from: http://contentfilter.futuragts.com/phraselists/phraselistsmay31.zip http://contentfilter.futuragts.com/phraselists/phraselistsmay31.tar.gz
Key Changes:
***Almost too many to List***
* Added new weighted phraselists categories: forums
* Added new languages for pornography: Malay, Japanese, Chinese, Norwegian, Russian
* Added content to the previously empty SpanishPornography list
* Added phrases to the goodphrases list
* Many changes to the proxy and malware lists
* Many, many improvements to other lists - removed overblocking phrases
* Added additional bannedregexpurllist entries
Note that some of the changes/lists are ALPHA
quality. Some lists are empty, some incomplete - especially the
new categories. I'd like feedback and additions if possible
especially for the following:
* peer2peer, news, sports, games and webmail lists
* foreign language lists - do they overblock other languages - ie portuguese blocking spanish or english pages?
* SpanishPornography list
Please direct any comments, criticism and
suggestions to phrasemaster@dansguardian.org.
August 10th 2005 Update
You can download the new lists
from: http://contentfilter.futuragts.com/phraselists/phraselistsaug10.zip http://contentfilter.futuragts.com/phraselists/phraselistsaug10.tar.gz
Key Changes:
* Added Peer2Peer weighted phrases
* Added phrases to the goodphrases list
* Many, many small improvements to lists - removed overblocking phrases
Note that some of the changes/lists are ALPHA
quality. Some lists are empty, some incomplete - especially the
new categories. I'd like feedback and additions if possible
especially for the following:
* peer2peer, news, sports, games and webmail lists
* foreign language lists - do they overblock other languages - ie portuguese blocking spanish or english pages?
* Spanish Pornography list
Please direct any comments, criticism and
suggestions to phrasemaster@dansguardian.org.
May 31st 2005 Update
You can download the new lists
from: http://contentfilter.futuragts.com/phraselists/phraselistsmay31.zip http://contentfilter.futuragts.com/phraselists/phraselistsmay31.tar.gz
Key Changes:
* Added goodphrases_danish file - thanks to Frederik Dannemare
* Added porn_danish file - thanks to Frederik Dannemare
* Fixed more overblocking problems especially in the French porn list
Note that some of the changes/lists are ALPHA
quality. Some lists are empty, some incomplete - especially the
new categories. I'd like feedback and additions if possible
especially for the following:
* news, sports, games and webmaillists
* foreign language lists - do they overblock other languages - ie portuguese blocking spanish or english pages?
* SpanishPornography list
Please direct any comments, criticism and
suggestions to phrasemaster@dansguardian.org.
May 12th 2005 Update
You can download the new lists
from: http://contentfilter.futuragts.com/phraselists/phraselistsmay12.zip http://contentfilter.futuragts.com/phraselists/phraselistsmay12.tar.gz
Key Changes:
* Removed several duplicate phrases
* Fixed several phrases with no score
* Added goodphrases_portuguese file - thanks to Allan Gomes
* Added badwords_portuguese file - thanks to Allan Gomes
* Moved some phrases from /porn/weighted_portuguese to badwords_portuguese - thanks to Allan Gomes
* Fixed accents on several portuguese files - thanks to Allan Gomes
* Fixed more overblocking problems
Note that some of the changes/lists are ALPHA
quality. Some lists are empty, some incomplete - especially the
new categories. I'd like feedback and additions if possible
especially for the following:
* news, sports, games and webmaillists
* foreign language lists - do they overblock other languages - ie portuguese blocking spanish or english pages?
* SpanishPornography list
Please direct any comments, criticism and
suggestions to phrasemaster@dansguardian.org.
May 2nd 2005 Update
Here is an update on the status of the weighted phrase lists.
Substantial changes have been made and they are ready for some
testing. Thank you to those who contributed lists and
suggestions.
You can download the new lists
from: http://contentfilter.futuragts.com/phraselists/phraselistsmay2.zip http://contentfilter.futuragts.com/phraselists/phraselistsmay2.tar.gz
Key
Changes: * Added MANY new lists - see below * Changed the list
file naming structure to weighted_[language] and banned_[language] *
Changed the organization of the weightedphraselist file * Changed
the organization of the bannedphraselist file * Split content in
the exceptionphrase and goodphrases lists to prevent conflict with
sports, webmail and news lists. * Added terms to the
bannedregexpurllist file for proxies and nudism * Added notes to
the tops of many files for easier reference/tracking when
editing
Note that some of the changes/lists are ALPHA
quality. Some lists are empty, some incomplete - especially the
new categories. I'd like feedback and additions if possible
especially for the following: * news, sports, games and webmail
lists * foreign language lists - do they overblock other languages
- ie portuguese blocking spanish or english pages? * Spanish
Pornography list
Please direct any comments, criticism and
suggestions to phrasemaster@dansguardian.org.
New
Lists:
badwords_dutch badwords_german badwords_spanish chat_italian gambling_portuguese games gore_portuguese illegaldrugs_portuguese intolerance_portuguese malware news nudism personals_portuguese porn_dutch porn_spanish porn_portuguese porn_italian proxies sport violence_portuguese webmail weapons_portuguese **
plus several portuguese banned lists
Examples of words removed
from existing lists:
< foto ><foto> in several
lists generates too many false positives < adult> in italian
generates too many false positives. <sperm> in italian
generates false positives - should be < sperm> < sperm
>? <chat><10> in italian generates too many false
positives. < video ><10> in italian generates too many
false positives.
------------------------------------
New May 2nd Weighted Phrase List File ------------------------------------
#
NOTE: New lists are commented out as ALPHA or BETA depending on how
much the # lists have been tested. # ALPHA - Brand new - no
testing has been done # BETA - Relatively new - tested in at least
one location #
#Good Phrases (to allow medical, education,
news and other good
sites) .Include</etc/dansguardian/phraselists/goodphrases/weighted_general> .Include</etc/dansguardian/phraselists/goodphrases/weighted_news>
#Pornography .Include</etc/dansguardian/phraselists/pornography/weighted> .Include</etc/dansguardian/phraselists/pornography/weighted_dutch>
#ALPHA# .Include</etc/dansguardian/phraselists/pornography/weighted_french> .Include</etc/dansguardian/phraselists/pornography/weighted_german> .Include</etc/dansguardian/phraselists/pornography/weighted_italian> .Include</etc/dansguardian/phraselists/pornography/weighted_portuguese> .Include</etc/dansguardian/phraselists/pornography/weighted_spanish>
#ALPHA# .Include</etc/dansguardian/phraselists/nudism/weighted>
#Bad
Words -
swearing .Include</etc/dansguardian/phraselists/badwords/weighted_dutch>
#BETA# .Include</etc/dansguardian/phraselists/badwords/weighted_french> .Include</etc/dansguardian/phraselists/badwords/weighted_german>
#ALPHA# .Include</etc/dansguardian/phraselists/badwords/weighted_spanish>
#ALPHA#
#Drugs #.Include</etc/dansguardian/phraselists/drugadvocacy/weighted> #.Include</etc/dansguardian/phraselists/illegaldrugs/weighted> #.Include</etc/dansguardian/phraselists/illegaldrugs/weighted_portuguese> #.Include</etc/dansguardian/phraselists/legaldrugs/weighted>
#Violence
and
intolerance #.Include</etc/dansguardian/phraselists/intolerance/weighted> #.Include</etc/dansguardian/phraselists/intolerance/weighted_portuguese> #.Include</etc/dansguardian/phraselists/gore/weighted> #.Include</etc/dansguardian/phraselists/gore/weighted_portuguese> #.Include</etc/dansguardian/phraselists/violence/weighted> #.Include</etc/dansguardian/phraselists/violence/weighted_portuguese> #.Include</etc/dansguardian/phraselists/weapons/weighted> #.Include</etc/dansguardian/phraselists/weapons/weighted_portuguese>
#Chat #.Include</etc/dansguardian/phraselists/chat/weighted> #.Include</etc/dansguardian/phraselists/chat/weighted_italian>
#Webmail #.Include</etc/dansguardian/phraselists/webmail/weighted>
#ALPHA# #Note that if you enable the webmail weighted list you
should also disable #the "exception_email" list in the
exceptionphraselist
file.
#Productivity #.Include</etc/dansguardian/phraselists/gambling/weighted> #.Include</etc/dansguardian/phraselists/gambling/weighted_portuguese> .Include</etc/dansguardian/phraselists/warezhacking/weighted> #.Include</etc/dansguardian/phraselists/personals/weighted> #.Include</etc/dansguardian/phraselists/personals/weighted_portuguese> #.Include</etc/dansguardian/phraselists/games/weighted>
#ALPHA# #.Include</etc/dansguardian/phraselists/sport/weighted>
#ALPHA# #.Include</etc/dansguardian/phraselists/news/weighted>
#ALPHA#
#System Management and
Security #.Include</etc/dansguardian/phraselists/malware/weighted>
#ALPHA# #.Include</etc/dansguardian/phraselists/proxies/weighted>
#BETA#
|
|