[Spambayes] Web interface statistics
skip at pobox.com
skip at pobox.com
Thu May 10 14:17:39 CEST 2007
Dave> There really is something very fishy going on. I actually added
Dave> instrumentation code to watch my training script train particular
Dave> words multiple times as ham or spam, but when I query those words
Dave> using the sb_imapfilter web interface, they always are shown as
Dave> having been trained 0 or 1 times, with one of two corresponding
Dave> probabilities.
Dave> I do a wildcard query with a single letter and returning 1000
Dave> results, and there's not a single number over 1 in the #spam or
Dave> #ham columns.
Dave> What could be going on?
I've no idea. It seems to be working for me. I have lots of singletons(*),
which is to be expected, but also lots of multiples:
% spamcounts -r spam
token,nspam,nham,spam prob
"spam,",2,1,0.5
spam.,2,0,0.908163265306
to:addr:spambayes,3,3,0.390338438268
"spamcop,",1,0,0.844827586207
email name:spambayes-dev,2,0,0.908163265306
to:addr:spambayes-dev,2,0,0.908163265306
spamabyes,0,1,0.155172413793
spamming?,1,0,0.844827586207
email name:spambayes,4,3,0.5
subject:spambayes,2,0,0.908163265306
spam,0,3,0.0652173913043
"spamassasin,",0,1,0.155172413793
message-id:@no.spam.plz,0,1,0.155172413793
sender:addr:spambayes-bounces+skip=pobox.com,0,1,0.155172413793
cc:addr:spambayes,1,0,0.844827586207
from:addr:nospam.org,0,1,0.155172413793
from:addr:no.spam.plz,0,1,0.155172413793
spammer,1,0,0.844827586207
spambayes,0,3,0.0652173913043
subject:spam,1,1,0.5
spammed,2,0,0.908163265306
from:addr:spamgourmet.com,0,1,0.155172413793
spammers,1,1,0.5
to:name:spambayes,1,0,0.844827586207
sender:addr:spambayes-dev-bounces,2,0,0.908163265306
subject:spam.,1,0,0.844827586207
url:spambayes-dev,2,0,0.908163265306
spamming,1,0,0.844827586207
spambayes.,0,1,0.155172413793
sender:addr:spambayes-bounces,4,2,0.5
url:spammer_id,1,0,0.844827586207
url:spambayes,4,3,0.5
anti-spam,0,1,0.155172413793
Skip
(*) Linguists call such singletons "hapax legemona". I guess they were
trying to be snooty when they came up with that term.
More information about the SpamBayes
mailing list