[spambayes-dev] improving dumbdbm's survival chances...

G. Armour Van Horn vanhorn at whidbey.com
Tue Jul 15 09:12:50 EDT 2003

Tim Peters wrote:

> The large size of your database is (just) one of the bad consequences of
> using dumbdbm.  A dumbdbm database consists of a .dir file and a .dat file,
> and I assume your 27MB refers to the .dat file.  The .dir file holds keys
> and the .dat file values.  A dumbdbm .dat file consumes at least 512 bytes
> for each value, so a 27MB .dat file can't represent more than about 50,000
> tokens -- which is actually on the small end for a spambayes database.

Yesterday, my proxy reported that it has trained on something like 800 ham and
700 spam, that was the 27MB hammie.dat. I just did the export/import routine per
Tony's message. Both reported that there were 140 spam and 128 ham, a 262KB
hammie.export file and a new hammie.pkl file of 498KB.

When I go back to the web interface, sure enough, that think it only has trained
on those 268 messages, and despite the reduction in content, the hammie.dat file
is over 32MB!


