[Spambayes] RE: Spam Clues: Download It Free. Download Free Console
Video Games. Unlimited MP3's
tameyer at ihug.co.nz
Mon Jan 10 01:31:13 CET 2005
> I wrote to you last week about Spam with Ham tacked on the end.
> I've retrained my database using the "exception" method and
> turned on bigrams.
> This message came in at 2%.
> Thanks in advance for any further ideas you have. SpamBayes
> gets most of my Spam, just not Spam using this new technique.
> 'send' 0.0636887 12 2
> 'long' 0.0768659 6 1
> 'sorry' 0.0918367 2 0
> 'subject:Video' 0.0918367 2 0
> 'take' 0.109281 9 3
> 'let' 0.120729 8 3
These were in the body of the spam, not the tacked on bit. They're quite
strong ham clues for you. Training on a few more spams like this should
change that (assuming that they are in the same sort of format, and assuming
that your ham doesn't look like this).
> 'to:addr:above-the-garage.com' 0.3861 25 50
You've train on twice as many spam messages as ham messages with this token,
but it's strongly ham. That's not good. This is because of the inbalance
in training (this is one of the main issues that needs to be solved to make
SpamBayes easier to use). For example, if you had trained on 118 ham (the
same number as spam), and none of them had this token in it, then the score
for this token would be 0.67. Similar changes apply to the other tokens in
the clues list.
Try grabbing a random selection of 81 (this will bring the numbers into
balance) ham messages and training on them and see if that helps.
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this.
More information about the Spambayes