[Python-3000] Immutable bytes type and dbm modules

Thu Aug 23 01:59:29 CEST 2007

On Tue, Aug 07, 2007 at 06:53:32AM +0200, "Martin v. L?wis" wrote:
> > I guess we have to rethink our use of these databases somewhat.
> 
> Ok. In the interest of progress, I'll be looking at coming up with
> some fixes for the code base right now; as we agree that the
> underlying semantics is bytes:bytes, any encoding wrappers on
> top of it can be added later.

The underlying Modules/_bsddb.c today uses PyArg_Parse(..., "s#", ...)
which if i read Python/getargs.c correctly is very lenient on the
input types it accepts.  It appears to accept anything with a buffer
API, auto-converting unicode to the default encoding as needed.

IMHO all of that is desirable in many situations but it is not strict.
bytes:bytes or int:bytes (depending on the database type) are
fundamentally all the C berkeleydb library knows.  Attaching meaning
to the keys and values is up to the user.  I'm about to try a _bsddb.c
that strictly enforces bytes as values for the underlying bsddb.db API
provided by _bsddb in my sandbox under the assumption that being
strict about bytes is desired at that level there.  I predict lots of
Lib/bsddb/test/ edits.

> My concern is that people need to access existing databases. It's
> all fine that the code accessing them breaks, and that they have
> to actively port to Py3k. However, telling them that they have to
> represent the keys in their dbm disk files in a different manner
> might cause a revolt...

agreed.  thus the importance of allowing bytes:bytes.