Why KeyError ???

Geoff Gerrietts geoff at gerrietts.net
Wed Mar 6 13:51:30 EST 2002


Quoting Ozren Lasic (ozren_lasic at yahoo.com):
> Sorry, but I didn't get it.
> Maybe you can explain me on another example??
> 

This is all very interesting.

Consider this:

>>> a = '\xe7\xd0\x9f\x86\xa7'
>>> b = unicode(a,'cp1250')
>>> c = "abcde"
>>> d = unicode(c,'cp1250')
>>> a.__hash__()
-1420316064
>>> b.__hash__()
2044161023
>>> c.__hash__()
-1332677140
>>> d.__hash__()
-1332677140
>>> a == b
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeError: ASCII decoding error: ordinal not in range(128)


The only hypothesis I can put forward here is that the Unicode code is
trying to maintain a sense of the actual identity of the referenced
glyphs. I'm not familiar with the encodings that are under discussion
here, but I'm assuming that in codepage 1250, as in the default C
locale, the characters "a" - "e" refer to the same glyphs.

On the other hand, I'm guessing that in codepage 1250, the glyphs
referenced by "\xe7\xd0\x9f\x86\xa7" are different from the glyphs
referenced by those same bytes in the default encoding. In fact, the
traceback at the end there suggests that those glyphs have NO meaning
in the default encoding.

Consequently, the underlying unicode representation -- 2 bytes per
glyph? I'm not sure -- tracks the two strings differently. When you go
to hash them, the difference is exposed, but when you're looking at
them, they don't look all that different.

I'm not sure why the key on the dictionary would report "abcde"
instead of u"abdcde"; that's pretty baroque.

--G.

-- 
Geoff Gerrietts             "I am always doing that which I can not do, 
<geoff at gerrietts net>     in order that I may learn how to do it." 
http://www.gerrietts.net                    --Pablo Picasso




More information about the Python-list mailing list