[Python-Dev] unicode hell/mixing str and unicode as dictionary keys

M.-A. Lemburg mal at egenix.com
Fri Aug 4 12:16:04 CEST 2006


Greg Ewing wrote:
> M.-A. Lemburg wrote:
> 
>> If a string
>> is not ASCII and thus causes the exception, there's not a lot you
>> can say, since you don't know the encoding of the string.
> 
> That's one way of looking at it.
> 
> Another is that any string containing chars > 127 is not
> text at all, but binary data, in which case it's not equal
> to *any* unicode string -- just like bytes objects will
> not be equal to strings in Py3k.
> 
>> All you
>> know is that it's not ASCII. Instead of guessing, Python then raises
>> an exception to let the programmer decide.
> 
> There's no disputing that an exception should be raised
> if the string *must* be interpretable as characters in
> order to continue. But that's not true here if you allow
> for the interpretation that they're simply objects of
> different (duck) type and therefore unequal.

Hmm, given that interpretation, 1 == 1.0 would have to be
False.

Note that you do have to interpret the string as characters
if you compare it to Unicode and there's nothing wrong with
that.

What's making this particular case interesting is that
the comparison is hidden in the dictionary implementation
and only triggers if you get a hash collision, which makes
the whole issue appear to be happening randomly.

This whole thread aside: it's never recommended to mix strings
and Unicode, unless you really have to.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 04 2006)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::


More information about the Python-Dev mailing list