[Spambayes] Database reduction
Skip Montanaro
skip@pobox.com
Fri Nov 1 01:34:31 2002
Neale> When pickling a Bayes object, the pickler is smart enough not to
Neale> repeatedly say "this is a wordinfo object" but rather, I assume,
Neale> "this is of type 2", only having to name type 2 once. However,
Neale> hammie pickles each wordinfo individually, keyed by a string.
Neale> This makes for fast lookups, but giant databases.
You can always define your own __getstate__ and __setstate__ methods for the
Wordinfo class which processes a more compact form of the object's state.
Or am I misunderstanding what you said?
Neale> Tim just mentioned a performance tweak; is this an indicator that
Neale> now would be a good time to resume trying to reduce hammie's
Neale> database size?
I reduced the size of my database significantly after my training run by
deleting wordinfo where the hamcount was 1 and the spamcount was 0 or vice
versa.
Skip