[Spambayes] Tough to classify

David Shaw david at theresistance.net
Sun Apr 13 23:49:21 EDT 2003

> I have no doubt that it was obviously ham to you, but don't accept it 
> would
> have been obvious ham to humans other than you.  For example,

You're right of course.  The mail included this spammie bit:

Need to give a gift? Not sure what to buy? Amazon.com gift
certificates are available in any dollar amount from $5 to $5,000.
We'll deliver it via e-mail or physical mail-- so it's the perfect
last minute gift.
Learn more at http://www.amazon.com/gift-certificates/

> You must have many ham clues, else your *H* score wouldn't have been 
> 0.98.

It had lots of strong clues for both.

> There are many ways to combine the individual word spamprobs so that 
> the msg
> will come out as ham.  The trick is to do so in a way that doesn't also
> classify more spam as ham.  The combination method in spambayes is the 
> end
> result of some intense work on the topic by several people, and beat 
> about a
> dozen other combination methods in large tests.  That doesn't mean 
> it's the
> best possible combination method, but does suggests it won't be 
> trivial to
> do better.

Oh I know.  I read the math in the chi squared code on Gary's page and 
quickly got in over my head.  I took some probability math classes in 
college, but it's been a few years.

Maybe I just need to adjust my thresholds.  This message scored:

X-Spambayes-Spam-Probability: 0.288224866953

I have my ham threshold at .2 and my spam at .8.  Almost always when a 
message is unsure it is really spam.  This time it was ham.  I think 
maybe I just need to set the thresholds to .3 and .7 and see how that 
goes for a while.

> The combination code (in classifier.py) is about the easiest part of 
> the
> system to change, so feel encouraged to test alternatives.  "I feel 
> like"
> isn't really testable on its own <wink>.

I love python for this very reason :)  If only I could figure out that 
dibbler stuff -- it seems very complicated (and slow, at least on OS X) 
for what it's doing.  I'd love to replace it with something simpler and 

More information about the Spambayes mailing list