[Spambayes] More "spam of the future" lately?

Michael N. Nitabach mnitabach at acedsl.com
Wed Dec 17 16:00:39 EST 2003

> -----Original Message-----
> From: Tim Peters [mailto:tim.one at comcast.net]
> Sent: Wednesday, December 17, 2003 3:42 PM
> To: Michael N. Nitabach; spambayes at python.org
> Subject: RE: [Spambayes] More "spam of the future" lately?
> >> 0.7 maybe, but you'd eventually regret dropping 
> [spam_cutoff] to 0.5.
> [Michael N. Nitabach]
> > What makes you say that? I have my certain-spam cutoff at .30, and
> > my uncertain at .01. My training database has about 8000 hams and
> > 3000 spams. I have only ever received ten hams that scored over
> > .01, and only one over .20.
> Unless you've eyeballed every message scored as spam, then it's almost
> certain you've suffered false positives due to those 
> settings.

I just looked in my certain-spam folder at all e-mails that scored below 0.70. Only a single one was a false positive: a SpamBayes mailing list digest that contained a complete actual spam e-mail that someone had posted, which scored 0.49.

> There's more
> info on the project's background page:
>     http://spambayes.sourceforge.net/background.html
> Note especially the third graph.  The way spamprobs are combined in
> SpamBayes guarantees that a highly ambiguous message will 
> score very near
> 0.5 (explained in more detail before the third graph, and much more at
>     http://www.linuxjournal.com/article.php?sid=6467
> ).

I receive a substantial amount of e-mail that scores between 0.30 and 0.70, but so far it has *all* been spam.

> The kinds of email people get vary widely, though, and it's 
> possible your
> mix is extremely well-suited to this classifier, devoid of 
> any significant
> ambiguity.

Well, the interesting thing is that a lot of my spam is relatively technical sales-pitch e-mail that is talking about the same sorts of things that I talk about in my ham professional e-mails.

> (I'll note that if you use your SpamBayes'd email only for
> professional purposes, and no personal ones (like chatting 
> with friends and
> relatives), it doesn't strain my imagination that your ham 
> could be *so*
> uniform that ambiguity doesn't arise -- but then your email 
> mix would be
> atypical too.)

No, I use it for equal parts professional and personal correspondence.

Michael N. Nitabach, Ph.D., J.D.
Assistant Professor
Department of Cellular and Molecular Physiology
Yale University School of Medicine
(203) 737-2939
mnitabach at acedsl.com

More information about the Spambayes mailing list