[Python-Dev] unicode hell/mixing str and unicode as dictionary keys

M.-A. Lemburg mal at egenix.com
Thu Aug 3 18:51:21 CEST 2006


Ralf Schmitt wrote:
> Ralf Schmitt wrote:
>> Still trying to port our software. here's another thing I noticed:
>>
>> d = {}
>> d[u'm\xe1s'] = 1
>> d['m\xe1s'] = 1
>> print d
>>
>> With python 2.4 I can add those two keys to the dictionary and get:
>> $ python2.4 t2.py
>> {u'm\xe1s': 1, 'm\xe1s': 1}
>>
>> With python 2.5 I get:
>>
>> $ python2.5 t2.py
>> Traceback (most recent call last):
>>    File "t2.py", line 3, in <module>
>>      d['m\xe1s'] = 1
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 1: 
>> ordinal not in range(128)
>>
>> Is this intended behaviour? I guess this might break lots of programs 
>> and the way python 2.4 works looks right to me.
>> I think it should be possible to mix str/unicode keys in dicts and let 
>> non-ascii strings compare not-equal to any unicode string.
> 
> Also this behaviour makes your programs break randomly, that is, it will 
> break when the string you add hashes to the same value that the unicode 
> string has (at least that's what I guess..)

This is because Unicode and 8-bit string keys only work
in the same way if and only if they are plain ASCII.

The reason lies in the hash function used by Unicode: it is
crafted to make hash(u) == hash(s) for all ASCII s, such
that s == u.

For non-ASCII strings, there are no guarantees as to the
hash value of the strings or whether they match or not.

This has been like that since Unicode was introduced, so it's
not new in Python 2.5.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 03 2006)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::


More information about the Python-Dev mailing list