[Spambayes] don't update if you don't want to retrain

Richie Hindle richie@entrian.com
Sun Dec 1 20:20:10 2002


[Neale]
> I've just checked in a new anydbm that has a more appropriate list of
> database back-ends to try on the Windows platform.  [...]
> This should eliminate any dbm concerns for Windows folk.

You left dbhash in the list - that's just another interface to the broken
bsddb.  And if that gets removed, Windows users will be left with dumbdbm -
the name doesn't inspire confidence, and the docstring says "XXX TO DO: -
seems to contain a bug when updating..."

As far as I can see there's a complete solution available to these DBM
problems.  Perhaps I've missed something, but I've been back over all the
discussions and I can't see anything wrong with it:

 o We demand bsddb 3 or better on platforms where bsddb is the dbm
   implementation that gets picked up.  So until Python 2.3 is released,
   Windows users need to install pybsddb.  I've just done this and it's
   trivial.  (We already demand a new "email" library and no-one's
   complained.)  Would this cause problems on any other platforms?

 o If training goes slowly, we implement Tim Peters' idea: "Bulk training
   could be taught to use a new classifier based on an in-memory dict.
   When that's done, the in-memory dict's ham and spam counts would be
   added into the persistent DB (rewriting only those WordInfo records
   corresponding to words that appeared in the bulk training data), and
   then the in-memory dict could be thrown away."

 o Or (Neale) you were talking about writing a caching front-end for the
   DBM (regardless of which actual DBM was behind it) - that would work
   as well.

Wouldn't that solve *everything*?  Startup times would be quick, training
would be quick, no buggy DBM implementations would be used, and different
components wouldn't default to different storage formats (hammie vs.
pop3proxy).  Installing pybsddb on Windows is trivial, and once Python 2.3
comes out you won't even need to do that.

I've probably missed something - it's hard to keep up!

-- 
Richie Hindle
richie@entrian.com




More information about the Spambayes mailing list