Unicode Objects in Tuples
Ned Batchelder
ned at nedbatchelder.com
Fri Oct 11 05:22:36 EDT 2013
On 10/11/13 4:16 AM, Stephen Tucker wrote:
> I am using IDLE, Python 2.7.2 on Windows 7, 64-bit.
>
> I have four questions:
>
> 1. Why is it that
> print unicode_object
> displays non-ASCII characters in the unicode object correctly, whereas
> print (unicode_object, another_unicode_object)
> displays non-ASCII characters in the unicode objects as escape
> sequences (as repr() does)?
>
> 2. Given that this is actually /deliberately /the case (which I, at
> the moment, am finding difficult to accept), what is the neatest (that
> is, the most Pythonic) way to get non-ASCII characters in unicode
> objects in tuples displayed correctly?
>
> 3. A similar thing happens when I write such objects and tuples to a
> file opened by
> codecs.open ( ..., "utf-8")
> I have also found that, even though I use write to send the text to
> the file, unicode objects not in tuples get their non-ASCII characters
> sent to the file correctly, whereas, unicode objects in tuples get
> their characters sent to the file as escape sequences. Why is this the
> case?
>
> 4. As for question 1 above, I ask here also: What is the neatest way
> to get round this?
>
> Stephen Tucker.
>
Although Python 3 is better than Python 2 at Unicode, as the others have
said, the most important point is one that you hit upon yourself.
When you print an object x, you are actually printing str(x). The str()
of a tuple is a paren, followed by the repr()'s of its elements,
separated by commas, then a closing paren. Tuples and lists use the
repr() of their elements when producing either their own str() or their
own repr().
Python 3 does better at this because repr() in Python 3 will gladly
include non-ASCII characters in its output, while Python 2 will only
include ASCII characters, and so must resort to escape sequences. (BTW:
if you like the ASCII-only idea from Python 2, Python 3 has the ascii()
function and the %a string formatting directive for that very purpose.)
The two string representation alternatives str() and repr() can be
confusing. Think of it as: str() is for customers, repr() is for
developers, or: str() is for humans, repr() is for geeks. The reason
tuples use the repr() of their elements is that the parens+commas
representation of a tuple is geeky to begin with, so it uses repr() of
its elements, even for str(tuple).
The way to avoid repr() for the elements is to format the tuple yourself.
--Ned.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20131011/9a80fa0f/attachment.html>
More information about the Python-list
mailing list