[Spambayes] SpamBayes as a gatway solution
Joerg Beyer
job at webde-ag.de
Thu Aug 28 09:06:23 EDT 2003
Bobby Wilkins wrote:
> My bayes DB is 17MB; we have over 1,000 users on an Exchange server; we might have upwards of 17GB of disk for bayes databases alone. Considering we currently limit users to 100MB of storage, this would represent an increase of ~20% in our Exchange server disk. I'd also warrant that it would be a more-than-20% increase in disk I/O for the database activity...
ok, that's what berkley db makes of it. They optimize for speed - as far
as I can see. Could you make a dump of you byes DB (to ascii, just
word and ham/spam count)? I assume that the dump is _much_ smaller.
I tested with a rather small byes DB and had a factor >15.
> To be commercially-viable, a server-based bayes implementation that kept individual databases would:
> - have to be able to store its data efficiently (grouping
> common message information while keeping individual
> scores sounds difficult off-the-cuff)
> - potentially use a compressed database engine (don't know
> if there is one of those around, but I've seen a few
> proprietary ones)
> - NOT hammer the server in either CPU or disk I/O
> - NOT grow infinitely (automatically trim old data;
> deciding what "old data" is, however, sounds hard)
you could even use a (or more) dedicted box(es).
Joerg
More information about the Spambayes
mailing list