[Python-3000] Immutable bytes type and dbm modules

Guido van Rossum guido at python.org
Tue Aug 7 05:56:35 CEST 2007


On 8/6/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> > Or perhaps a special value of the encoding argument passed to
> > *dbm.open() (maybe None, maybe the default, maybe "raw" or "bytes"?)
> > to specify that the key values are to be bytes?
>
> This is essentially the state of the bsddb module in the struni branch
> right now. The default is bytes keys and values; if you want string
> keys, you write
>
>    db = bsddb.open(...)
>    db = bsddb.StringKeys(db)
>
> which arranges for transparent UTF-8 encoding;

Ah. I hadn't realized that this was the API. It sounds like as good a
solution as mine.

> it would be possible to extend this to
>
>    db = bsddb.open(...)
>    db = bsddb.StringKeys(db, encoding="latin-1")

This would be even better.

> However, this has the view that there is a single "proper" key
> representation, which is bytes, and then reinterpretations.
>
> Now if you say that the dbm files are dicts conceptually, and
> bytes are not allowed as dict keys, then any API that allows
> for bytes as dbm keys (whether by default or as an option) is
> conceptually inconsistent - as you now do have dict-like objects
> which use bytes keys. This causes confusion if you pass one of
> them to, say, .update of a "real" dict, which then fails. IOW,
> I couldn't do
>
>    d = {}
>    d.update(db)
>
> if db is in the "keys are bytes" mode.

I guess we have to rethink our use of these databases somewhat. I
think I'm fine with the model that the basic dbm implementations map
bytes to bytes, and aren't particularly compatible with dicts. (They
aren't, really, anyway -- the key and value types are typically
restricted, and the reference semantics are different.)

But, just like for regular file we have TextIOWrapper which wraps a
binary file with a layer for encoded text I/O, I think it would be
very useful to have a layer around the *dbm modules for making them
handle text.

Perhaps the StringKeys and/or StringValues wrappers can be
generalized? Or perhaps we could borrow from io.open(), and use a
combination of the mode and the encoding to determine how to stack
wrappers.

Another approach might be to generalize shelve. It already supports
pickling values. There could be a few variants for dealing with keys
that are either strings or arbitrary immutables; the keys used for the
underlying *dbm file would then be either an encoding (if the keys are
limited to strings) or a pickle (if they aren't). (The latter would
require some kind of canonical pickling version, so may not be
practical; there also may not be enough of a use case to bother.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list