Unicode and dictionaries

Sat Jan 16 22:06:01 EST 2010

Carl Banks <pavlovevidence at gmail.com> writes:

> On Jan 16, 3:56 pm, Ben Finney <ben+pyt... at benfinney.id.au> wrote:
> > gizli <mehm... at gmail.com> writes:
> > > >>> test_dict = {u'öğe':1}
> > > >>> u'öğe' in test_dict.keys()
> > > True
> > > >>> 'öğe' in test_dict.keys()
> > > True
> >
> > I would call this a bug. The two objects are different, so the latter
> > expression should return ‘False’.
>
> Except the two objects are not different if default encoding is utf-8.

They are different, because a Unicode object is *not* encoded in any
character encoding, whereas the byte string object is.

The source code shows a Unicode *literal* represented in some encoding;
but, just like the source code sequence ‘1.0’ results in an
floating-point object, the source code sequence ‘u'öğe'’ results in a
Unicode object. Neither the floating-point object nor the Unicode object
have a character encoding, even though their representations in source
code did have one.

The Effbot explains it <URL:http://effbot.org/zone/unicode-objects.htm>
in more detail.

-- 
 \           “[W]hoever is able to make you absurd is able to make you |
  `\                                                unjust.” —Voltaire |
_o__)                                                                  |
Ben Finney