[Python-3000] Immutable bytes type and dbm modules

Tue Aug 7 03:57:07 CEST 2007

On 6-Aug-07, at 5:39 PM, Guido van Rossum wrote:

>
> I thought about this issue some more.
>
> Given that the *dbm types strive for emulating dicts, I think it makes
> sense to use strings for the keys, and bytes for the values; this
> makes them more plug-compatible with real dicts. (We should ideally
> also change the keys() method etc. to return views.) This of course
> requires that we know the encoding used for the keys. Perhaps it would
> be acceptable to pick a conservative default encoding (e.g. ASCII) and
> add an encoding argument to the open() method.
>
> Perhaps this will work? It seems better than using str8 or bytes  
> for the keys.

There are some scenarios that might be difficult under such a regime.

The berkeley api provides means for efficiently mapping a bytestring  
to another bytestring.  Often, the data is not text, and the  
performance of the database is sensitive to the means of serialization.

For instance, it is quite common to use integers as keys.  If you are  
inserting keys in order, it is about a hundred times faster to encode  
the ints in big-endian byte order than than little-endian:

class MyIntDB(object):
	def __setitem__(self, key, item):
               self.db.put(struct.pack('>Q', key), serializer(item))
         def __getitem__(self, key):
               return unserializer(self.db.get(struct.pack('>Q', key)))

How do you envision these types of tasks being accomplished with  
unicode keys?  It is conceivable that one could write a custom  
unicode encoding that accomplishes this, convert the key to unicode,  
and pass the custom encoding name to the constructor.

regards,
-Mike