[Spambayes] Feedback

Tim Peters tim.one at comcast.net
Mon Jan 5 21:04:31 EST 2004


[Nick Kell]
> I am just writing to give a little feedback after using your software
> for the last few weeks.  I had what can only be described as a
> ‘minor’ spam problem, but annoying all the same.  Spambayes is pretty
> good, it does what it says on the packet!  I am also a Baysian
> statistician who is not that horrified at the nomenclature used.  I
> think I can see how you are working this clever program.

An odd thing is that this filter is Bayesian in a way most "Bayesian
filters" aren't, but isn't Bayesian in the way most are.  For an
explanation, see Gary Robinson's article:

    http://www.linuxjournal.com/article.php?sid=6467

> I was so impressed though with the results that I advised a friends
> from work who was receiving over 300 spams a day to give it a go.  It
> made very easy work of this huge problem and he is very pleased
> indeed.  Do you want to use any of his log files for research, I can
> send them to you?

Nope, the purpose of the log files is to record clues about what might have
gone wrong if the code fails.  Mostly this ends up revealing undocumented
(mis)features of Microsoft's Outlook and MAPI APIs (gratuitous differences
among Outlook and OS versions).  The logs deliberately don't record any
useful <wink> information, because if they did, people would hesitate to
show them to us.

> Finally, I think I can safely say that the amount of spam being
> received by both of us has been reduced also since using the
> software.  No idea why that should be, but happy all the same.

If it helps you delete more spam *without* looking at the body of the spam
(either by opening the msg, or displaying it in a preview pane), then you're
not triggering as many "web beacons".  That's a gimmick encoding your email
address in a URL pointing to a spammer website -- such a thing can
communicate your email address to a spammer as a side effect of "merely"
fetching, e.g., an image to display in the body of HTML email.  When a web
beacon fires, the spammer knows both that your email address is live, and
that you looked at the body of the message, and that combination makes you
very attractive for more spamming.

That's one theory, anyway <wink>.  I seem to be getting a lot less spam
lately too, but am not sure why.

> Once again, thanks again.  If there is anything I can do to
> contribute to the project please let me know.

If you'd like a more theoretical challenge, we have an ongoing problem with
"unbalanced" training data (training on much more ham than spam, or vice
versa).  The current method works best when a user trains on an
approximately equal number of each, and regardless of the ham::spam ratio
they get in their real-life email stream.  We haven't found a good way to
deal with imbalance.  To the contrary, the one adjustment we tried turned
out to make matters worse.




More information about the Spambayes mailing list