[Spambayes] Feature Request

Wed Apr 20 02:39:30 CEST 2005

[Michal Vitecek]
> i think you trust spambayes way too much. 

Some people, like Tim, Alex, Skip (and I guess me by now) are pretty
justified in trusting SpamBayes because we have run very extensive tests,
with our own mail, that demonstrate that SpamBayes will work as expected
(for our mail, in the way we use it).

> i also think pre-filtering mail before spambayes is quite strange
> - what's the use of a filtering application if it needs such
> workarounds? 

A lot of people are quite in favour of the multiple filter approach.
Different filtering techniques have different strengths, and so if you
combine them in appropriate ways, you'll hopefully get an end system that's
better than any of the individuals (much as Skip described his results).

> 1) it's tiresome to train spambayes several weeks that certain mails
> with a pattern (== automatically generated) are indeed ham when one has
> a largish spam/ham database.

I think just about all the SpamBayes developers recommend a small database
these days.  Testing has shown (although this isn't as extensive as some of
the other testing) that results are better.  Have you read the wiki page
about training?

<http://entrian.com/sbwiki/TrainingIdeas>

>  2) some mails from certain addresses must not be missed (or put in
> spam folder where they receive much lesser attention), now if you
> combine it with sb_server.py on a firm's pop3 server

SpamBayes is really designed as an individual (probably client-side) filter.
I know various people use it for multiple users (server-side etc), but there
are many issues with that.  There are other things (e.g. the database and
concurrency) that would be higher up in priority than whitelisting, though.

> you see
> whitelisting would help immensely because there are no easier ways
> how to do it (filter a mail to a mailbox before spambayes) - it's
> certainly more difficult than adding whitelisting to spambayes.

So do the whitelisting in the client - since SpamBayes just tags mail (i.e.
everything is still delivered), you can do whitelisting with the client's
rule system before you activate the rules based on the SpamBayes tags.

> 3) it's a feature some might not use but which would be listed when
> comparing with different filtering application. i've read some reviews
> on spambayes on the internet and this missing feature was adding
> unnecessary cons to spambayes.

If people really want whitelisting, then their mailer can do it, or they can
buy (e.g.) InBoxer.  All the developers have their own reasons for working
on SpamBayes, but I doubt getting good magazine reviews comes into play for
many.

> 1) it can't be that difficult to implement

Read Mark's comments about implementing whitelisting.  That's just for the
Outlook plug-in (probably the biggest group of SpamBayes users), which is
probably the easiest in some ways (because there's access to an address book
of some type).  It quite clearly outlines the problems, which make difficult
to implement.

> 2) it's up to the users to decide what tokens (even compound ones) they
> consider so important that they take over the whole classification.

The thing is that it's up to the developers (in a case of volunteer work,
like SpamBayes) to decide what they want to spend their time implementing.
There have been a few things that have been done as a result of user
requests, but the bulk is done because one (or more) of the developers wants
it done themselves.  Again, if someone submitted a good patch that
implemented whitelisting, then I'm sure that one of the developers would
review it and integrate it.

=Tony.Meyer

-- 
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes.
http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this.