[Spambayes] Tough to classify

David Shaw david at theresistance.net
Sun Apr 13 23:49:21 EDT 2003


> I have no doubt that it was obviously ham to you, but don't accept it 
> would
> have been obvious ham to humans other than you.  For example,
>

You're right of course.  The mail included this spammie bit:

Need to give a gift? Not sure what to buy? Amazon.com gift
certificates are available in any dollar amount from $5 to $5,000.
We'll deliver it via e-mail or physical mail-- so it's the perfect
last minute gift.
Learn more at http://www.amazon.com/gift-certificates/


> You must have many ham clues, else your *H* score wouldn't have been 
> 0.98.

It had lots of strong clues for both.


> There are many ways to combine the individual word spamprobs so that 
> the msg
> will come out as ham.  The trick is to do so in a way that doesn't also
> classify more spam as ham.  The combination method in spambayes is the 
> end
> result of some intense work on the topic by several people, and beat 
> about a
> dozen other combination methods in large tests.  That doesn't mean 
> it's the
> best possible combination method, but does suggests it won't be 
> trivial to
> do better.

Oh I know.  I read the math in the chi squared code on Gary's page and 
quickly got in over my head.  I took some probability math classes in 
college, but it's been a few years.

Maybe I just need to adjust my thresholds.  This message scored:

X-Spambayes-Spam-Probability: 0.288224866953

I have my ham threshold at .2 and my spam at .8.  Almost always when a 
message is unsure it is really spam.  This time it was ham.  I think 
maybe I just need to set the thresholds to .3 and .7 and see how that 
goes for a while.


> The combination code (in classifier.py) is about the easiest part of 
> the
> system to change, so feel encouraged to test alternatives.  "I feel 
> like"
> isn't really testable on its own <wink>.

I love python for this very reason :)  If only I could figure out that 
dibbler stuff -- it seems very complicated (and slow, at least on OS X) 
for what it's doing.  I'd love to replace it with something simpler and 
faster.




More information about the Spambayes mailing list