[spambayes-dev] improving dumbdbm's survival chances...
Tim Peters
tim.one at comcast.net
Tue Jul 15 11:54:51 EDT 2003
[G. Armour Van Horn]
> Hey, I'm nowhere near "Tim's sister" capability, but I still want to
> just download the zip, extract it, and run the proxy. If dumbdbm is a
> dumb way to go, it shouldn't be the default.
It's a last resort, but it shouldn't be even that.
> I wouldn't be too upset to be retraining, I've only been running this
> install for a week and could just start from scratch again. I was
> planning on keeping up the training for a week or so anyway, although
> my database is already up to 27 megs.
The large size of your database is (just) one of the bad consequences of
using dumbdbm. A dumbdbm database consists of a .dir file and a .dat file,
and I assume your 27MB refers to the .dat file. The .dir file holds keys
and the .dat file values. A dumbdbm .dat file consumes at least 512 bytes
for each value, so a 27MB .dat file can't represent more than about 50,000
tokens -- which is actually on the small end for a spambayes database.
That's outrageous overhead, since there's only about 4 bytes of information
in a spambayes database value, 128x smaller than dumbdbm requires. Now only
a custom-designed database could actually achieve that, but a good
general-purpose database should be able to get away with much less than 512
bytes per spambayes database value.
More information about the spambayes-dev
mailing list