[Spambayes] x-hammie-disposition in pop3proxy

Tim@mail.powweb.com Tim@mail.powweb.com
Sat Nov 2 19:51:02 2002

Ok, so Tim says I'm not reading it backwards, Richie says I am...  I think the x-hammie-disposition header should be ham|spam|unsure versus 'yes|no|unsure'.  
This is much clearer, not much chance for interpretive errors...  and furthermore, the header itself should be x-spambayes-disposition, because this says 
clearly where the header came from...  I can make that change, too, if the collective wills it... but if I'm gonna make many changes, it might be reasonable to 
bring me up to speed on the cvs checkin thing...

- Tim

11/2/2002 11:57:39 AM, Tim Peters <tim.one@comcast.net> wrote:

>> Ok, I've got the pop3proxy up and running on my machine.  Very
>> simple to get running.
>Good!  I haven't had time to try it yet, so I won't be much help, but I'm
>glad it ran easily for you.
>> I don't have a trained database (the real challenge)
>The difficulty of bootstrapping a database is generally overstated, and
>especially by those who haven't yet done it <wink>.  Train on everything you
>get for a few days.  I predict you'll find it gets most things right after
>just a dozen msgs of each kind.  But it will also make howling mistakes
>until you've trained on much more than that.  Even so, don't take the
>classifications too seriously at the start, and it should be very helpful
>> at this point, and it's adding the x-hammie-disposition header with
>> value of 'no'.  I presume that this means that the classifier thinks
>> this is NOT ham?
>More accurately, that the score fell below the value of spam_cutoff you've
>set, and if you didn't set one yet, the default value of
>spam_cutoff: 0.90
>The relevant code appears to be in pop3proxy BayesProxy.onRetr():
>            prob = self.bayes.spamprob(tokenizer.tokenize(message))
>            if prob > options.spam_cutoff:
>                disposition = "Yes"
>            else:
>                disposition = "No "
>> So if there's no database, then it assumes everything is spam?
>There's always a database, but at the start it's empty.  If there are no
>words in the database, that's not a special case to the code, the math
>simply works out to give a score of 0.5 to every msg then (which makes
>sense:  in the absence of any evidence at all, it has no reason to favor any
>specific conclusion).  Whatever you set ham_cutoff and spam_cutoff to be,
>0.5 should definitely be in your Unsure category.  However, it doesn't look
>like pop3proxy is paying attention to ham_cutoff yet, nor is it currently
>capable of generating an "I'm lost -- help me!" Unsure disposition.  Someone
>needs to teach it about the middle ground.
>> Or am I reading the meaning of the header backwards?
>No, you're reading it right.
>Spambayes mailing list
- Tim