[Spambayes] Spam overload

Meyer, Tony T.A.Meyer at massey.ac.nz
Tue May 20 16:44:30 EDT 2003


> - spambayes still doesn't gracefully handle outlook crashing
> or being unable to save the DB for any reason.

Mark has fixed this (at least to a certain extent) with the changes he
checked in over the last week.

> - you're going to increase your network load somewhat due to
> all the database load/saves over the network.

It really all depends on how your system is set up, but as an
alternative to storing the database on a network you could sync a local
copy from the network (this sort of thing is done in the labs here).
You might lose some training, but that shouldn't really matter,
especially if the aim is to get rid of 95% of spam rather than 99.9%.

> I was thinking that it might be useful to do a SQL version of
> spambayes where the words and probabilities are stored in 
> separate tables:
[...]
> Once you have this then you can either use it as a backend
> for a client-based tool, or write an exchange server plugin 
> that works similarly to the outlook plugin, but runs on the server.
> 
> Anyone up for it? <wink>

Coincidentally, I was working with Python and mySQL today, so I threw
this together as well (based on the given outline).  My SQL is pretty
rusty, so I'm not claiming that it's the fastest/most efficient
implementation, but I have a SQLClassifier that can be used just like
the PickleClassifier and DBClassisifers.  (Works with my testing).

If there's more than one person interested in this then I can commit it
as an update to storage.py, otherwise if anyone does want to use SQL as
their database then I can leave it as a patch on SF (let me know).

This can be used with any of the existing apps (pop3proxy, imapfilter,
Outlook plugin), or with any new thing [1].  As for writing an exchange
server plugin/smtp classifying proxy, I don't have any facility to do
testing, so someone else will have to volunteer there.

=Tony Meyer

[1] Well, actually, it needs a little bit more work.  We actually have
_two_ databases - the word count one, and a message info one (well,
pop3proxy, imapfilter and the Outlook plugin do).  The SQL bit is only
done for the word count one, not the message info, which could lead to
rather odd results.  It would be easy enough to fix this, but not until
the 'master database' idea is resolved.



More information about the Spambayes mailing list