[Spambayes] Question on "Save and shutdown" button
tim.peters at gmail.com
Wed Jul 21 06:55:25 CEST 2004
[samnicholls at appintec.com]
> COMMENT #2 -- The discipline of having to train has made me realize that, in
> my case at least, there is a third category of email, that which is not ham, yet it
> does not fall into the pure definition of spam either.
SpamBayes has no definitions of ham or spam, of course -- they're
whatever you tell it is ham and spam. You can use any definitions you
like. For example, I used to get a particular kind of "joke of the
day" spam, hawking everything from "male enhancement" products to
human growth hormone, but I liked the jokes. I trained on those as
ham. SB didn't report me to the authorities <wink> -- and it didn't
start believing that all ads for HGH were ham either.
> Examples of this are newsletters from organizations with which I have a loose
> association, and advertisements from companies who have apparently
> purchased a legitimate (?) email list of those who are in my rather narrow vertical
> industry. Yet for accurate training it is obviously so important to be consistent
> when categorizing the "unsures", and what might be undesirable in the "ham".
> When an email like this does make it through to my Inbox, I'm in a quandary as
> to how to categorize it.
You don't *have* to categorize it. SB should be a helpful servant,
not a dictator. I get a significant amount of email in this grey area
too, and I'm happy to delete it leaving it as Unsure. IOW, if I have
to think twice about whether a given msg "is ham" or "is spam", then
it is in fact Unsure even to me -- so be it.
I've tried consistently training on such things as "ham" or "spam",
but it doesn't do any good: because I change *my* mind about it from
day to day (highly correlated with how busy I am on different days!),
I end up training the same kinds of things into both categories, and
then they end up scoring Unsure anyway.
Ambiguity is a fact of life in many areas. I think email is one of
them. Leaving Unsures alone, SB's idea of "unsure" has gotten quite
close to my own.
More information about the Spambayes