[Spambayes] SpamBayes as a gatway solution

Joerg Beyer job at webde-ag.de
Thu Aug 28 09:06:23 EDT 2003


Bobby Wilkins wrote:
> My bayes DB is 17MB; we have over 1,000 users on an Exchange server; we might have upwards of 17GB of disk for bayes databases alone.  Considering we currently limit users to 100MB of storage, this would represent an increase of ~20% in our Exchange server disk.  I'd also warrant that it would be a more-than-20% increase in disk I/O for the database activity...


ok, that's what berkley db makes of it. They optimize for speed - as far
as I can see. Could you make a dump of you byes DB (to ascii, just
word and ham/spam count)? I assume that the dump is _much_ smaller.

I tested with a rather small byes DB and had a factor >15.

> To be commercially-viable, a server-based bayes implementation that kept individual databases would:
>  - have to be able to store its data efficiently (grouping 
>    common message information while keeping individual
>    scores sounds difficult off-the-cuff)
>  - potentially use a compressed database engine (don't know
>    if there is one of those around, but I've seen a few
>    proprietary ones)
>  - NOT hammer the server in either CPU or disk I/O
>  - NOT grow infinitely (automatically trim old data;
>    deciding what "old data" is, however, sounds hard)

you could even use a (or more) dedicted box(es).


	Joerg




More information about the Spambayes mailing list