[Python-Dev] Re: [Python-checkins] python/nondist/sandbox/spambayes GBayes.py,1.7,1.8

Tim Peters tim.one@comcast.net
Tue, 20 Aug 2002 20:09:57 -0400


[Eric S. Raymond]
> It's a freaking *ideal* use for Judy arrays.  Platonically perfect.  They
> couldn't fit better if they'd been designed for this application.
> Bogofilter was actually born in the moment that I realized this.

I believe that so long as it stays in memory.  But, as you mention in your
manpage

    startup is too slow for sites handling thousands of mails an hour

That likely makes a Zope OOBTree stored under ZODB a better choice still, as
that's designed for efficient update and access in a persistent database
(the version of this we've got now does update during scoring, to keep track
of when tokens were last used, and how often they've proved useful in
discriminating -- there needs to be a way to expire tokens over time, else
the database will grow without bound).

I've corresponded with Douglas Baskins about "this kind of thing", and he's
keen to address it (along with every other problem in the world <0.9 wink>);
it would help if HP weren't laying off the people who have worked on this
code.