[Python-Dev] readd u'' literal support in 3.3?

Fri Dec 9 05:24:33 CET 2011

On Thu, 2011-12-08 at 22:34 -0500, Barry Warsaw wrote:
> On Dec 09, 2011, at 03:50 AM, Lennart Regebro wrote:
> 
> >One reason is that you need to be able to say "This should be str in
> >Python 2, and binary in Python 3, that should be Unicode in Python 2
> >and str in Python 3, and that over there should be str in both
> >versions", and the future import doesn't support that.
> 
> Sorry, I don't understand this.  What does it mean to be "str in both
> versions"?  And why would you want that?
> 
> As for "str in Python 2 and binary in Python 3", b'' prefixes do that in
> Python >= 2.6 without the future import (if I take "binary" to mean bytes
> type).
> 
> As for "Unicode in Python 2 and str in Python 3", unadorned strings with the
> future import in Python >= 2.6 does that just fine.
> 
> One of the nice things too is that with #include <bytesobject.h> in Python >=
> 2.6, changing all your PyStrings to PyBytes, you can get the same behavior in
> your extension modules.
> 
> You still need to be clear about what are bytes and what are strings.  The
> problem comes when you aren't or can't be sure, i.e. you have objects that are
> sometimes one and sometimes the other.  Such as email headers.  In that case,
> you're kind of screwed.  Python 2's str type let you cheat, but not without
> consequences.  Those consequences are spelled "UnicodeErrors" and I'll be glad
> to be rid of them.

The PEP 3333 WSGI protocol *requires* that you present its APIs with
"native strings" (str on Python 3, str on Python 2).  So while the
oversimplification "don't do that" sounds great here, in real life, not
so much.

- C