[spambayes-dev] A new and altogether different bsddb breakage

Tim Peters tim.one at comcast.net
Wed Dec 17 20:36:48 EST 2003


[Barry]
> ...
> (I haven't looked closely at exactly what data spambayes wants to store).

The token statistics database now is a single (but large) mapping from short
8-bit strings to 2-tuples of little integers.  The strings are usually less
than 16 characters, and never a lot longer than that (the tokenizer
truncates very long strings, synthesizing short "skip" tokens as proxies).

It would be nice to have other mappings too, like forward and inverse msgid
<-> bag_of_tokens maps.  A little-integer timestamp may get added to the
2-tuples.




More information about the spambayes-dev mailing list