[Spambayes] filter misclassification

Tony Meyer tameyer at ihug.co.nz
Mon May 9 08:57:45 CEST 2005


> The clues were obtained using
> 
> /usr/local/bin/showclues.py -d .hammiedb specimen_file > out_filename
> 
> But before doing this I retrained my database .hammiedb using
> 
> sb_mboxtrain.py -d .hammiedb -g mail/ham
>                       -s mail/spam > /dev/null 2>&1
> 
> 
> to get the revised contents of out_filename as:
> 
> **************************************************************
> Combined Score: 6% (0.0565183)
> **************************************************************
> Internal ham score (*H*): 0.997835
> Internal spam score (*S*): 0.110872

I'm not sure what the problem is here.  Your original mail said that you
needed the message to be scored as ham, and that's what's happening here
(0.06 is pretty close to 0).  Is this message actually spam?

> # ham trained on: 771
> # spam trained on: 56

Note that this is a reasonable imbalance, and we generally recommend that
the database is kept approximately balanced.  See
<http://entrian.com/sbwiki/TrainingIdeas> for more information.

=Tony.Meyer

-- 
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this. 



More information about the Spambayes mailing list