[Spambayes] outlook express

Wed Nov 19 00:52:05 EST 2003

> Somewhere in the documentation, probably for the POP3 proxy, 
> it's stated that the forwarding (to the SMTP proxy) method of 
> training the server doesn't work with Outlook Express because 
> OE may not forward all headers.  I was wondering if anyone 
> has reviewed this lately for the most recent version(s) of 
> Outlook Express to see if it's still a valid statement.

Yes, it is definitely still valid.  Realistically, it probably always will
be - Microsoft doesn't really have any reason to improve OE; it's only there
to give people a free option - if you want something good (sic?), you need
Outlook.  The only way around this that I can see is once again offer to
include the spambayes id in the message body (which is included in forwarded
messages, obviously).  This is particularly inelegant, though, and at the
time it was removed (because it didn't really fit when code was refactored),
not all that many people used the SMTP proxy.

> And what are the problems caused in the training when these 
> headers are lost?

The spambayes id is not included (it's one of the headers added by the
proxy), so spambayes can't find the message in the cache to do the
appropriate training.  You could manually add write the id into the message
body, but that seems like it would be even less convenient.

The SMTP proxy can simply train on the raw message forwarded to it, rather
than looking up the message in the cache, but this will more than likely
result in poor training - for example, all messages will be 'from' you, and
all the useful data in the headers will be unavailable for training.

> I'd like to use this method to do the 
> occasional training of Spambayes, unless it's unwise.

It is definitely unwise.

> I get over 1000 messages a day (mostly from technical mailing 
> lists) with about 30% spam.  Using the 'Review messages' web 
> page with cached messages is getting more and more unworkable 
> with this volume of mail.

Do you still need to train?  I would have thought that you'd get pretty
acceptable results quite quickly with that volume of mail.  On positive
notes:

 * Versions of spambayes post-1.0 have additional options for reviewing
messages via the web interface.  In particular, you can limit the number of
messages that appear on one screen (maybe this made it into 1.0a7?), and you
can change the default training for messages classified as ham/spam/unsure.

 * I really will finish the sb_pop3dnd.py script soon.  This lets you do
drag'n'drop training in any mailer that lets you drag'n'drop mail between
accounts, and works with both POP3 and IMAP (this includes OE).  The main
reason it's taken so long to finish is that Twisted's (a communications
toolkit for Python) imap support has been pretty poor, and extremely
volatile - hopefully that has changed by now.

There are probably other ways you could train, too.  For example, if you are
only training on a small percentage of mail (which is presumably the case),
then you could store copies of mail to be trained in a couple of folders
specifically for the task.  Once in a while (every week, say), you could use
the web interface to train on the folder's dbx file, clear out the folder,
and train that way.

=Tony Meyer