[spambayes-dev] A new and altogether different bsddb breakage
Tim Peters
tim.one at comcast.net
Wed Dec 17 20:36:48 EST 2003
[Barry]
> ...
> (I haven't looked closely at exactly what data spambayes wants to store).
The token statistics database now is a single (but large) mapping from short
8-bit strings to 2-tuples of little integers. The strings are usually less
than 16 characters, and never a lot longer than that (the tokenizer
truncates very long strings, synthesizing short "skip" tokens as proxies).
It would be nice to have other mappings too, like forward and inverse msgid
<-> bag_of_tokens maps. A little-integer timestamp may get added to the
2-tuples.
More information about the spambayes-dev
mailing list