[pypy-dev] gdbm

Dan Stromberg drsalists at gmail.com
Thu Nov 18 01:17:09 CET 2010


On Tue, Nov 16, 2010 at 4:17 AM, Antonio Cuni <anto.cuni at gmail.com> wrote:

> Hi Dan,
> first: thanks for your help :-)
>
>
> On 16/11/10 03:17, Dan Stromberg wrote:
>
>>
>> Yes, the dbm module in pypy is basically like the bsddb module in cpython.
>>
>> cpython includes modules for bsddb, gdbm, and more.
>>
>> I tend to prefer gdbm over bsddb, because I've seen bsddb databases get
>> corrupt too many times - EG, when a filesystem overflows.  bsddb might be
>> a
>> little faster though; I've never compared their performance.
>>
>
> So, if I understand correctly you are saying that we should rename our
> dbm.py to bsdb.py, and write a new dbm.py which can use either bsdb or gdbm?
> Sounds fine, do you feel like implementing it? :-)
>
> Moreover, I also agree with amaury that your code is very similar to the
> one in the current dbm.py, so we should maybe try to refactor things to
> share common parts between the twos.
>
> ciao,
> Anto
>

Wow, CPython's Berkeley DB interface is actually quite a bit more
comprehensive and complex than I'd realized.  This isn't just a matter of
renaming dbm.py to bsddb.py and refactoring a bit.  It's more of a time
commitment to something I don't use than I'd thought.

Althought pypy's current dbm.py implements something similar to cpython's
Berkeley DB interface, it isn't all that similar.  It uses a subset of the
same on-disk representations, but the API appears to be pretty different.
This is based on playing around in the unit tests and bsddb module a bit.

I actually suggest:
1) svn rename'ing dbm.py into some unused directory for history's sake; it
implements the ndbm _interface_ (a little oddly called "dbm" in cpython -
but I believe true "dbm" is "one database per program") well, but it's not
really all that similar to bsddb.
2) Adding the gdbm.py module I wrote, more or less verbatim.

I got into the project of merging these two things thinking that bsddb was
mostly just like the gdbm module, but bsddb is actually quite a bit more
involved, and is something I've pretty much stopped using due bad
experiences with bsddb database corruption from both cpython and C.

Given #1 and #2 above, anydbm should continue working, due to the presence
of gdbm and dumbdbm.

I guess I think that if someone has a need for bsddb (and it's assorted
interfaces), they probably should work on that.

Sound reasonable?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20101117/0122a5b7/attachment.html>


More information about the Pypy-dev mailing list