[Python-3000] Immutable bytes type and dbm modules
Mike Klaas
mike.klaas at gmail.com
Tue Aug 7 03:57:07 CEST 2007
On 6-Aug-07, at 5:39 PM, Guido van Rossum wrote:
>
> I thought about this issue some more.
>
> Given that the *dbm types strive for emulating dicts, I think it makes
> sense to use strings for the keys, and bytes for the values; this
> makes them more plug-compatible with real dicts. (We should ideally
> also change the keys() method etc. to return views.) This of course
> requires that we know the encoding used for the keys. Perhaps it would
> be acceptable to pick a conservative default encoding (e.g. ASCII) and
> add an encoding argument to the open() method.
>
> Perhaps this will work? It seems better than using str8 or bytes
> for the keys.
There are some scenarios that might be difficult under such a regime.
The berkeley api provides means for efficiently mapping a bytestring
to another bytestring. Often, the data is not text, and the
performance of the database is sensitive to the means of serialization.
For instance, it is quite common to use integers as keys. If you are
inserting keys in order, it is about a hundred times faster to encode
the ints in big-endian byte order than than little-endian:
class MyIntDB(object):
def __setitem__(self, key, item):
self.db.put(struct.pack('>Q', key), serializer(item))
def __getitem__(self, key):
return unserializer(self.db.get(struct.pack('>Q', key)))
How do you envision these types of tasks being accomplished with
unicode keys? It is conceivable that one could write a custom
unicode encoding that accomplishes this, convert the key to unicode,
and pass the custom encoding name to the constructor.
regards,
-Mike
More information about the Python-3000
mailing list