[spambayes-dev] improving dumbdbm's survival chances...

G. Armour Van Horn vanhorn at whidbey.com
Tue Jul 15 09:12:50 EDT 2003

Tim Peters wrote:

> The large size of your database is (just) one of the bad consequences of
> using dumbdbm.  A dumbdbm database consists of a .dir file and a .dat file,
> and I assume your 27MB refers to the .dat file.  The .dir file holds keys
> and the .dat file values.  A dumbdbm .dat file consumes at least 512 bytes
> for each value, so a 27MB .dat file can't represent more than about 50,000
> tokens -- which is actually on the small end for a spambayes database.

Yesterday, my proxy reported that it has trained on something like 800 ham and
700 spam, that was the 27MB hammie.dat. I just did the export/import routine per
Tony's message. Both reported that there were 140 spam and 128 ham, a 262KB
hammie.export file and a new hammie.pkl file of 498KB.

When I go back to the web interface, sure enough, that think it only has trained
on those 268 messages, and despite the reduction in content, the hammie.dat file
is over 32MB!


Sign up now for Quotes of the Day, a handful of quotations
on a theme delivered every morning.
Enlightenment! Daily, for free!
mailto:twisted at whidbey.com?subject=Subscribe_QOTD

For web hosting and maintenance,
visit Van's home page: http://www.domainvanhorn.com/van/

More information about the spambayes-dev mailing list