[Python-Dev] unicode hell/mixing str and unicode as dictionarykeys

Fri Aug 4 13:41:21 CEST 2006

The "string" isn´t necessarily text, so selecting latin-1 doesn´t help  (in fact, what happens is that the current default encoding is used, in his case this was ascii).  What if it is image data?  What if you are using a dict to implement a singleton set for arbitrary objects?

The point is that if the comparison operator raises an exception, the two objects are likely to be dissimilar.  We could even define that behaviour.  Propagating the exception means that you can't have objects as keys in a dictionary that raise an exception when compared.  This goes over and beyond any unicode vs. string question.

If the propagation of the exception was a concious change for debugging purposes, why not make that somehow optional?  A flag on the dict object?  Or special lookup mehtods for that?

Cheers,
Kristján

> -----Original Message-----
> From: python-dev-bounces+kristjan=ccpgames.com at python.org 
> [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] 
> On Behalf Of Josiah Carlson
> Sent: 4. ágúst 2006 04:34
> To: Bob Ippolito; python-dev at python.org
> Subject: Re: [Python-Dev] unicode hell/mixing str and unicode 
> as dictionarykeys
> 
> 
> Bob Ippolito <bob at redivi.com> wrote:
> > On Aug 3, 2006, at 6:51 PM, Greg Ewing wrote:
> > 
> > > M.-A. Lemburg wrote:
> > >
> > >> Perhaps we ought to add an exception to the dict lookup 
> mechanism 
> > >> and continue to silence UnicodeErrors ?!
> > >
> > > Seems to be that comparison of unicode and non-unicode 
> strings for 
> > > equality shouldn't raise exceptions in the first place.
> > 
> > Seems like a slightly better idea than having dictionaries suppress 
> > exceptions. Still not ideal though because sticking 
> non-ASCII strings 
> > that are supposed to be text and unicode in the same data 
> structures 
> > is *probably* still an error.
> 
> If/when 'python -U -c "import test.testall"' runs without 
> unexpected error (I doubt it will happen prior to the "all 
> strings are unicode"
> conversion), then I think that we can say that there aren't 
> any use-cases for strings and unicode being in the same dictionary.
> 
> As an alternate idea, rather than attempting to 
> .decode('ascii') when strings and unicode compare, why not 
> .decode('latin-1')?  We lose the unicode decoding error, but 
> "the right thing" happens (in my opinion) when u'\xa1' and 
> '\xa1' compare.
> 
>  - Josiah
> 
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/kristjan%40c
cpgames.com
>