[Spambayes] Tough to classify
david at theresistance.net
Sun Apr 13 23:49:21 EDT 2003
> I have no doubt that it was obviously ham to you, but don't accept it
> have been obvious ham to humans other than you. For example,
You're right of course. The mail included this spammie bit:
Need to give a gift? Not sure what to buy? Amazon.com gift
certificates are available in any dollar amount from $5 to $5,000.
We'll deliver it via e-mail or physical mail-- so it's the perfect
last minute gift.
Learn more at http://www.amazon.com/gift-certificates/
> You must have many ham clues, else your *H* score wouldn't have been
It had lots of strong clues for both.
> There are many ways to combine the individual word spamprobs so that
> the msg
> will come out as ham. The trick is to do so in a way that doesn't also
> classify more spam as ham. The combination method in spambayes is the
> result of some intense work on the topic by several people, and beat
> about a
> dozen other combination methods in large tests. That doesn't mean
> it's the
> best possible combination method, but does suggests it won't be
> trivial to
> do better.
Oh I know. I read the math in the chi squared code on Gary's page and
quickly got in over my head. I took some probability math classes in
college, but it's been a few years.
Maybe I just need to adjust my thresholds. This message scored:
I have my ham threshold at .2 and my spam at .8. Almost always when a
message is unsure it is really spam. This time it was ham. I think
maybe I just need to set the thresholds to .3 and .7 and see how that
goes for a while.
> The combination code (in classifier.py) is about the easiest part of
> system to change, so feel encouraged to test alternatives. "I feel
> isn't really testable on its own <wink>.
I love python for this very reason :) If only I could figure out that
dibbler stuff -- it seems very complicated (and slow, at least on OS X)
for what it's doing. I'd love to replace it with something simpler and
More information about the Spambayes