[Spambayes] Need some clarification on the training database
Tim Peters
tim.one@comcast.net
Mon Nov 4 06:21:45 2002
]Tim@mail.powweb.com\
> Never mind... I figured it out. Ok, so I have the pop3proxy
> looking at the training database that my smtpproxy creates when I
> send it spam and ham via redirect, which preserves headers in my
> mailer. Unfortunately, it also adds my headers, so I'm training
> spambayes to recognize mail from myself to myself as spam or ham,
> depending on how many of each i send... ;)
At least that part shouldn't matter much. The spamprob guesses are based on
the percentages of spam and ham in which a word appears. If a particular
header clue about you appears in 100% of the forwarded spam, and 100% of the
forwarded ham, it will get spamprob 0.5 regardless of the absolute numbers
involved. This is, it will be 100% neutral. This can still distort scores,
but I expect in a very minor way, and via pulling them toward Unsure rather
toward either side.