[Spambayes] Database reduction
Fri Nov 1 00:14:41 2002
So then, Tim Peters <email@example.com> is all like:
> [cool database trick]
The bigger problem, at least for hammie, is that pickling wordinfo
instances makes huge strings, the majority of which is redundant
information. When pickling a Bayes object, the pickler is smart enough
not to repeatedly say "this is a wordinfo object" but rather, I assume,
"this is of type 2", only having to name type 2 once. However, hammie
pickles each wordinfo individually, keyed by a string. This makes for
fast lookups, but giant databases.
Tim just mentioned a performance tweak; is this an indicator that now
would be a good time to resume trying to reduce hammie's database size?