[Spambayes] Database reduction

Skip Montanaro skip@pobox.com
Fri Nov 1 01:34:31 2002


    Neale> When pickling a Bayes object, the pickler is smart enough not to
    Neale> repeatedly say "this is a wordinfo object" but rather, I assume,
    Neale> "this is of type 2", only having to name type 2 once.  However,
    Neale> hammie pickles each wordinfo individually, keyed by a string.
    Neale> This makes for fast lookups, but giant databases.

You can always define your own __getstate__ and __setstate__ methods for the
Wordinfo class which processes a more compact form of the object's state.
Or am I misunderstanding what you said?

    Neale> Tim just mentioned a performance tweak; is this an indicator that
    Neale> now would be a good time to resume trying to reduce hammie's
    Neale> database size?

I reduced the size of my database significantly after my training run by
deleting wordinfo where the hamcount was 1 and the spamcount was 0 or vice
versa.

Skip