[Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8
Eric S. Raymond
esr@thyrsus.com
Tue, 20 Aug 2002 21:14:29 -0400
Tim Peters <tim.one@comcast.net>:
> [Eric S. Raymond]
> > It's a freaking *ideal* use for Judy arrays. Platonically perfect. They
> > couldn't fit better if they'd been designed for this application.
> > Bogofilter was actually born in the moment that I realized this.
>
> I believe that so long as it stays in memory.
VM, dude, VM is your friend. I thought this through carefully. The
size of bogofilter's working set isn't limited by core. And because
it's a B-tree variant, the access frequency will be proportional to
log2 of the wordlist size and the patterns will be spatially bursty.
This is a memory access pattern that plays nice with an LRU pager.
> But, as you mention in your manpage
>
> startup is too slow for sites handling thousands of mails an hour
>
> That likely makes a Zope OOBTree stored under ZODB a better choice still, as
> that's designed for efficient update and access in a persistent database
I'm working on a simpler solution, one which might have a Pythonic spinoff.
Stay tuned.
--
<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>