[spambayes-dev] subjective assessment of bigrams
Toby Dickenson
tdickenson at devmail.geminidataloggers.co.uk
Wed Jan 7 06:15:16 EST 2004
Ive been using bigrams since 2003-12-18, and thought you may be interested in
some subjective feedback. I am using my overnight-train-on-everything regime,
with 14000 hams and 2000 spams.
* My database size grew from 10M to 80M. Overnight training runs extended from
5 minutes to 20 minutes
* A much larger proportion of spams now score 0.99 or over (I filters these
into a folder that I never normally look at). Spams that score 0.98 or lower
I filter into a 'probable spam' folder and check manually every week; I am
seeing a much smaller proportion of messages in this category.
* I have seen a qualitative change in the type of spam that gets classified as
unsure. Most of my unsures used to be very small messages, spams selling
something I might otherwise be interested in, or other ones where 'unsure'
made sense. It had never missed a nigerian or porn spam for many months....
until I enabled bigrams. With bigrams, a few have scored between 0.50 and
0.55. I tried untraining some of them, then reclassifying with bigrams turned
off; they all scored above 0.90.
I am happy to experiment if anyone has any suggestions.
--
Toby Dickenson
More information about the spambayes-dev
mailing list