[Python-Dev] PEP 414 - Unicode Literals for Python 3

Tue Feb 28 02:45:48 CET 2012

On Tue, Feb 28, 2012 at 9:19 AM, Terry Reedy <tjreedy at udel.edu> wrote:
> Since writing the above, I realized that the following is a realistic
> scenario. 2.6 or 2.7 code a) uses has/set/getattr, so unicode literals would
> require a change; b) uses non-ascii chars in unicode literals; c) uses (or
> could be converted to use) print as a function; and d) otherwise uses a
> common 2-3 subset. Such would only need the u prefix addition to run under
> both Pythons. This works the other way, of course, for backporting code. So
> I am replacing 'most' with 'some unknown-to-me fraction' ;-).

Yep, that's exactly the situation I'm in with PulpDist (a web app that
primarily targets deployment on RHEL 6, which means Python 2.6). Since
I preformat all my print output with either str.format or str.join (or
use the logging module) and always use "except exc as var" to catch
exceptions, the natural way to write Python 2 code for me is *almost*
source compatible with Python 3. The only big discrepancy I'm
currently aware of? Unicode literals.

Now, I could retrofit the entire code base with the unicode_literals
import and str("") for native strings, but that has problems of its
own:
- it doesn't match the Pulp upstream, so it would make it harder for
them to review my plugins and client API usage code (or integrate them
into the default plugin set or client support API if they decide they
like them). Given that I'm one of the guinea pigs for experimental
Pulp APIs and have to dive into *their* code on occasion, it would
also be a challenge for *me* to switch modes when debugging .
- it doesn't match Django (at least, not in 1.3, which is the version
I'm using) (another potential annoyance when debugging)
- it doesn't match any of the other Django applications I use (once
again, debugging may lead to me looking at this code)
- it doesn't match the standard library (yep, you guessed it, I'd have
to mode switch when looking at standard library code, too)
- it doesn't match the intuitions of current Python 2 developers that
aren't up to speed with the niceties of Python 3 porting

Basically, using the unicode_literals import would significantly raise
the barrier to entry for PulpDist *as a Python 2 project*, as well as
forcing me to switch mental models for text processing whenever I have
to look at the code in a dependency during a debugging session.
Therefore, given that Python 2 will be my primary target for the
immediate future (and any collaborators are likely to be RHEL 6 and
hence Python 2 focused), I don't want to use that particular future
import. The downside of that choice (currently) is that it kills any
possibility of running any of it on Python 3, even the command line
client or the web front end after Django gets ported. With explicit
unicode literals being restored in Python 3.3, though, I'm a lot more
optimistic about the feasibility of porting it without too much effort
(as well as the prospect of other Django app dependencies gaining
Python 3 support).

In terms of third party upstreams, python 3 compatibility patches that
affect *every single string literal in the entire project* (either
directly or converting the entire project to the "unicode_literals"
import) aren't likely to even get reviewed, let alone accepted. By
contrast (for a project that already only supports 2.6+), cleaning up
print statements and exception handling should be a much smaller patch
that is easy to both review and accept. Making it as easy as possible
for maintainers that don't really care about Python 3 to accept
patches from people that *do* care is a very good thing.

There are still other problems that are going to affect the folks
playing at the wire protocol level, but the lack of unicode literals
is a big one that affects the entire application stack.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia