[Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8

Eric S. Raymond esr@thyrsus.com
Tue, 20 Aug 2002 21:14:29 -0400

Tim Peters <tim.one@comcast.net>:
> [Eric S. Raymond]
> > It's a freaking *ideal* use for Judy arrays.  Platonically perfect.  They
> > couldn't fit better if they'd been designed for this application.
> > Bogofilter was actually born in the moment that I realized this.
> I believe that so long as it stays in memory. 

VM, dude, VM is your friend.  I thought this through carefully.  The
size of bogofilter's working set isn't limited by core.  And because
it's a B-tree variant, the access frequency will be proportional to
log2 of the wordlist size and the patterns will be spatially bursty.
This is a memory access pattern that plays nice with an LRU pager.

> But, as you mention in your manpage
>     startup is too slow for sites handling thousands of mails an hour
> That likely makes a Zope OOBTree stored under ZODB a better choice still, as
> that's designed for efficient update and access in a persistent database

I'm working on a simpler solution, one which might have a Pythonic spinoff.
Stay tuned.
		<a href="http://www.tuxedo.org/~esr/">Eric S. Raymond</a>