[Spambayes] x-hammie-disposition in pop3proxy

Tim Peters tim.one@comcast.net
Sat Nov 2 17:57:39 2002


[Tim@mail.powweb.com]
> Ok, I've got the pop3proxy up and running on my machine.  Very
> simple to get running.

Good!  I haven't had time to try it yet, so I won't be much help, but I'm
glad it ran easily for you.

> I don't have a trained database (the real challenge)

The difficulty of bootstrapping a database is generally overstated, and
especially by those who haven't yet done it <wink>.  Train on everything you
get for a few days.  I predict you'll find it gets most things right after
just a dozen msgs of each kind.  But it will also make howling mistakes
until you've trained on much more than that.  Even so, don't take the
classifications too seriously at the start, and it should be very helpful
quickly.

> at this point, and it's adding the x-hammie-disposition header with
> value of 'no'.  I presume that this means that the classifier thinks
> this is NOT ham?

More accurately, that the score fell below the value of spam_cutoff you've
set, and if you didn't set one yet, the default value of

spam_cutoff: 0.90

The relevant code appears to be in pop3proxy BayesProxy.onRetr():

            prob = self.bayes.spamprob(tokenizer.tokenize(message))
            if prob > options.spam_cutoff:
                disposition = "Yes"
            else:
                disposition = "No "

> So if there's no database, then it assumes everything is spam?

There's always a database, but at the start it's empty.  If there are no
words in the database, that's not a special case to the code, the math
simply works out to give a score of 0.5 to every msg then (which makes
sense:  in the absence of any evidence at all, it has no reason to favor any
specific conclusion).  Whatever you set ham_cutoff and spam_cutoff to be,
0.5 should definitely be in your Unsure category.  However, it doesn't look
like pop3proxy is paying attention to ham_cutoff yet, nor is it currently
capable of generating an "I'm lost -- help me!" Unsure disposition.  Someone
needs to teach it about the middle ground.

> Or am I reading the meaning of the header backwards?

No, you're reading it right.