[Python-Dev] Re: Multibyte repr()

Wed, 09 Oct 2002 17:07:33 -0400

> Guido van Rossum <guido@python.org> writes:
> 
> >   >>> u = u'\u1f40'
> >   >>> s = u.encode('utf8')
> >   >>> s
> >   'a=\x80'
> >   >>>
> > 
> > The latter output is not helpful, because the encoding of s is not the
> > locale's encoding.

[Martin]
> [Somehow, the accents got lost in your message]
> 
> It isn't helpful, but it isn't strictly wrong, either. In this
> specific case, people are used to see utf-8 being interpreted as
> Latin-1 - that form of "mojibake" is very common, so they will know
> what happened.
> 
> I question whether the hex representation is more helpful: it depends
> on how you need to interpret the result you get.

Well, if you *want* to see the hex codes for all non-ASCII characters,
repr() used to be your friend.  No more.  If you *want* to see the
printable characters, you could always use print.

I'd be okay with this change if the default locale wasn't changed by
readline.  Did you see my patch for that?  Then people who want to see
their encoding from repr() can learn to put

    import locale
    locale.setlocale(locale.LC_CTYPE, "")

in their $PYTHONSTARTUP file (or in their app's main()).

--Guido van Rossum (home page: http://www.python.org/~guido/)