[Spambayes] Need some clarification on the training database

Tim Peters tim.one@comcast.net
Mon Nov 4 06:21:45 2002


]Tim@mail.powweb.com\
> Never mind... I figured it out.  Ok, so I have the pop3proxy
> looking at the training database that my smtpproxy creates when I
> send it spam and ham via redirect, which preserves headers in my
> mailer.  Unfortunately, it also adds my headers, so I'm training
> spambayes to recognize mail from myself to myself as spam or ham,
> depending on how many of each i send... ;)

At least that part shouldn't matter much.  The spamprob guesses are based on
the percentages of spam and ham in which a word appears.  If a particular
header clue about you appears in 100% of the forwarded spam, and 100% of the
forwarded ham, it will get spamprob 0.5 regardless of the absolute numbers
involved.  This is, it will be 100% neutral.  This can still distort scores,
but I expect in a very minor way, and via pulling them toward Unsure rather
toward either side.