[spambayes-dev] Deprecated options

Tim Peters tim.peters at gmail.com
Mon Aug 9 03:29:31 CEST 2004


[Ryan Malayter]
...
> I remember looking then, and I am still unable to find those patches (in
> CVS) or the statistical results. Only anecdotal references to "hashing
> performing poorly" seem to appear throghout a bunch of threads. My
> google search was "CRM-114 site:python.org", there were 93 results that
> I looked at, nothing pointing to the original tests of these ideas.

There were many threads that tried hashing for one reason or another. 
Sorry, I can't make time to search for them.  One experiment clearly
related to CRM-114, with patch, is here:

    http://mail.python.org/pipermail/spambayes/2002-November/001504.html

For whatever reason, pipermail gave the attachment an .exe extension. 
Rename it to .txt (or whatever works for you for a patch file).

> I guess the failure of the whole hashing issue was never really settled
> in my mind, since it seems to work so well for CRM-114. But SB has been
> working "good enough" for me for over a year now, so I never pursued
> thigns further.

The CRM experiment had much more to do with generating huge piles of
highly correlated features than with hashing.  CRM-114 does everything
differently, from tokenization through combining rule.  The experiment
only changed one thing in SB, and that experiment was such a disaster
there was no incentive to try to figure out if changing N other things
too may have helped.

> ...
> Did the test just store the hash value as hex/base64/whatever in the
> regular SpamBayes DB format?

Yes.

> What hash was used? The same "fast hash" used in CRM114?

Answered in the msg linked to above.


More information about the spambayes-dev mailing list