
Tim Peters <tim.one@comcast.net>:
[Eric S. Raymond]
It's a freaking *ideal* use for Judy arrays. Platonically perfect. They couldn't fit better if they'd been designed for this application. Bogofilter was actually born in the moment that I realized this.
I believe that so long as it stays in memory.
VM, dude, VM is your friend. I thought this through carefully. The size of bogofilter's working set isn't limited by core. And because it's a B-tree variant, the access frequency will be proportional to log2 of the wordlist size and the patterns will be spatially bursty. This is a memory access pattern that plays nice with an LRU pager.
But, as you mention in your manpage
startup is too slow for sites handling thousands of mails an hour
That likely makes a Zope OOBTree stored under ZODB a better choice still, as that's designed for efficient update and access in a persistent database
I'm working on a simpler solution, one which might have a Pythonic spinoff. Stay tuned. -- <a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>