[Spambayes] Current version

Paul Moore lists@morpheus.demon.co.uk
Mon Nov 25 22:32:48 2002


Neale Pickett <neale@woozle.org> writes:

> So then, "Moore, Paul" <Paul.Moore@atosorigin.com> is all like:
>
>> From: Neale Pickett [mailto:neale@woozle.org]
>> > I'll check anydbm back in.
>> 
>> Better not.
>> 
>> Dumbdbm doesn't support first() and next() for key iteration. (And
>> dbhash doesn't support iterkeys()).
>
> Ya know, now that I think about it, we don't need key iteration
> anymore.  Since we're now storing only the counters associated with
> a word, there's no reason I can think of that anything would need to
> iterate over the keys.
>
> This is why Franois could use anydbm without problems--we're not
> using the first() and next() constructs anymore.

Actually, iteritems() is used in update_probabilities(), which is
still called in pop3proxy. I'm not sure why Fran~ois didn't see the
problem - maybe he hasn't trained any data with the change in place...

[BTW, __iter__ should be implemented as iterkeys, not as iteritems, if
it's to be compatible with "real" dictionaries...]

Annoyingly, as far as I can see, anydbm doesn't actually offer any
decent guarantees. It says that it will use one of dbhash, gdbm, dbm,
or dumbdbm. And it says "The object returned by open() supports most
of the same functionality as dictionaries; keys and their
corresponding values can be stored, retrieved, and deleted, and the
has_key() and keys() methods are available. Keys and values must
always be strings." Nothing about iteration.

And the individual dbm modules are no help:

dbhash: documents first(), last(), next(), previous() and sync()
dbm: "the items() and values() methods are not supported". keys() is
     slow, and iterkeys() isn't supported.
gdbm: firstkey(), nextkey(), reorganize() and sync()
dumbdbm: says nothing (but supports iterkeys, not itervalues or
         iteritems - __iter__ is iterkeys())

Also, whichdb on a pybsddb3 hash database reports it as a hashdb, so
that reopening an existing db file will (on Windows) use the broken
built-in DB implementation, rather than the pybsddb3 one.

[later]

Doh. I've been spending a lot of time looking at this now, trying out
implementations, and I just read the help for hammiebulk.py - which
points out that for pop3proxy, the pickle store is recommended over
DBM. I was using DBM at the stage when I was using a custom fetcher
which ran the classifier as a filter. I didn't think to switch when I
changed to pop3proxy :-(

So all of this, while of theoretical interest, is not in fact of
practical value to me...

I still feel that anydbm is not well suited to user customisation, and
that DBDict could do with the ability for the user to specify a
particular DBM implementation. But I don't have a real need any more,
so while I'm happy to help, I'm no longer driven by the need :-)

Paul.
-- 
This signature intentionally left blank



More information about the Spambayes mailing list