[Python-Dev] readd u'' literal support in 3.3?
Nick Coghlan
ncoghlan at gmail.com
Fri Dec 9 06:30:36 CET 2011
On Fri, Dec 9, 2011 at 2:33 PM, Chris McDonough <chrism at plope.com> wrote:
> Continuing to not support u'' in Python 3 will be like having an
> immigration station where folks who have a b'ritish' passport can get
> through right away, but folks with a u'kranian' passport need to get
> back on a plane that appears to come from the Ukraine before they
> receive another tag that says they are indeed from the Ukraine. It's
> just pointless makework.
OK, I think I finally understand your point. You want the ability to
be able to, in your Python 2.x code, write modules that use *all
three* kinds of string literal:
----------
foo = u"this is a Unicode string in both Python 2.x and 3.x"
bar = "this is an 8-bit string in Python 2.x and a Unicode string in 3.x"
baz = b"this is an 8-bit string in Python 2.x and a bytes object in 3.x"
----------
This is driven by the desire to use APIs (like the PEP 3333 version of
WSGI) that are defined in terms of "native strings" in the context of
applications that already include a strong binary/text separation.
Currently, in modules shared between the two series, you can't use the
"u" marker at all, since Python 3.x leaves it out as being redundant -
instead, you have a binary switch (in the form of the future import)
that lets you toggle the behaviour of basic string literals between
the first two forms:
----------
bar = "this is an 8-bit string in Python 2.x and a Unicode string in 3.x"
baz = b"this is an 8-bit string in Python 2.x and a bytes object in 3.x"
----------
from __future__ import unicode_literals
foo = "this is a Unicode string in both Python 2.x and 3.x"
baz = b"this is an 8-bit string in Python 2.x and a bytes object in 3.x"
----------
Currently, to get all 3 kinds of behaviour in a shared codebase
without additional function calls at runtime, you need to pick one set
of strings (either "always Unicode" or "native string type") and move
them out to a separate module. So, for example, depending on which set
you decided to move:
----------
from unicode_strings import foo
bar = "this is an 8-bit string in Python 2.x and a Unicode string in 3.x"
baz = b"this is an 8-bit string in Python 2.x and a bytes object in 3.x"
----------
from __future__ import unicode_literals
foo = "this is a Unicode string in both Python 2.x and 3.x"
from native_strings import bar
baz = b"this is an 8-bit string in Python 2.x and a bytes object in 3.x"
----------
Or, alternatively, you use 'six' (or a similar compatibility module)
and ensure unicode at runtime, using native or binary strings
otherwise:
----------
from six import u
foo = u("this is a Unicode string in both Python 2.x and 3.x")
bar = "this is an 8-bit string in Python 2.x and a Unicode string in 3.x"
baz = b"this is an 8-bit string in Python 2.x and a bytes object in 3.x"
----------
If you want to target 3.2, you *have* to use one of those mechanisms -
any potential restoration of u'' syntax support won't help you (and
even after 3.3 gets released in the latter half of next year, it's
still going to be a fair while before it makes it's way into the
various distros, especially the ones that include long term support
from major vendors).
So, instead of attempting to paper over the problem by reintroducing
u'', perhaps the discussion we should be having is whether or not PEP
3333's superficially appealing concept of defining an API in terms of
"native strings" is a loser in practice, and we should instead be
looking more closely at PEP 444 (since that goes the route of using
'str' in 2.x and 'bytes' in 3.x, thus rendering "from __future__
import unicode_literals" an adequate solution for 2.6+ compatibility).
The amount of pain that PEP 3333 seems to be causing in the web
development world suggests to me we may simply have been *wrong* to
think that PEP 3333 would be a workable long term approach.
Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
More information about the Python-Dev
mailing list