[spambayes-dev] RE: [Spambayes] How low can you go?

Tony Meyer tameyer at ihug.co.nz
Tue Dec 23 21:16:19 EST 2003


[Alex]
> The reason why Outlook is a particular problem is that 
> Outlook mutilates mail, irretrievably destroying the RFC 822 
> structure that it may have once been delivered in.  A similar 
> structure can theoretically be recreated, but like many 
> recreations, some information (like the separators used in 
> MIME encapsulation, etc) is not the same.
[...]
> I'm strongly in favor of ditching Outlook entirely.

The export.py script does a reasonable job of putting everything back
together again.  Actually, I believe it does the exact same job as when
getting a message to pass to tokenizer for general use.  So although popping
a proxy in between Outlook and the POP3 server to catch raw messages would
certainly be more pure and correct (sb_server can do this, BTW, just set the
cache expiry limit *really* high and don't bother classifiying any
messages), for practical purposes using the data that Outlook gives is just
as useful. (Since if anything got accepted into the core those using the
Outlook plug-in would be dealing with those effects).  This is a (another)
good reason for us to try each other's patches (and I will get to the
incremental ones soon, honest! <wink>) since some of us have Outlook-altered
messages to test, and others have nice pure message streams.

> I'm not even sure if the test harnesses use a database 
> backend at all; I think they may be keeping everything in 
> memory.  Dunno.  I haven't looked at that in ages.

They keep everything in memory unless you've enabled the 'save the
classifier' option (can't remember what it's called; too lazy to check), and
then it pickles them.

=Tony Meyer




More information about the spambayes-dev mailing list