On Wed, May 17, 2017 at 02:41:29PM -0700, Craig Rodrigues wrote:
e = "{}".format(u"hi") [...] type(e) == str
The confusion for me is why is type(e) of type str, and not unicode?
I think that's one of the reasons why the Python 2.7 string model is (1) convenient to those using purely ASCII, but (2) ultimately broken. You can see why it's broken if you do this: py> "{}".format(u"hiĀµ") Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in position 2: ordinal not in range(128) So it tries to encode the Unicode string to ASCII, and if that succeeds, format returns a byte str. I'm not sure if that was a deliberate design choice for format, or just a side-effect of it calling str() on its arguments by default. I'm not sure if I've answered your question or not. Are you looking for justification of this misfeature, or an explanation of the historical reasons why it exists, or something else? (If you're looking for the same behaviour in Python 3 and 2.7, probably the best thing you can do is just religiously use unicode strings u'' in both. You might try: from __future__ import unicode_literals in 2.7, but I'm not sure that's enough.) -- Steve