[spambayes-dev] Using crm114-style hash files
T. Alexander Popiel
popiel at wolfskeep.com
Tue Jul 15 13:04:09 EDT 2003
In message: <3F1446CB.AC7A4070 at dobesland.com>
Dobes Vandermeer <dobes at dobesland.com> writes:
>
>Hearing all this talk of 27MB databases makes we want to suggest (sorry,
>I'm not a python hacker - yet - or I'd submit a file instead) trying
>crm114-style hash files.
Some time ago, we investigated using crm114-like hashes.
The drop in accuracy was very distinct; several percentage
points, as I recall, and the errors that it made were
bizarre (since the collisions don't conform to anything like
common sense). Because of this, and because it made the
debug header output fairly unenlightening (since you could
never be sure if the 'whodunnit' token had a .9 rating
because you got a lot of detective novel spam or because it
collided with 'NAKED!!!'), we decided to abandon that line
of exploration.
Details of the discussion should be in the list archives,
if you're curious.
- Alex
More information about the spambayes-dev
mailing list