
Guido van Rossum <guido <at> python.org> writes:
I noticed there were some complaints about unnecessarily offensive language in PEP 414. Have those passages been edited to everyone's satisfaction?
I'm not sure if Nick has finished his updates, but I for one would like to see some improvements in a few places: "Many thought that the unicode_literals future import might make a common source possible, but it turns out that it's doing more harm than good." Rather than talking about it doing more harm than good, it would be better to say that unicode_literals is not the best solution in some scenarios (specifically, WSGI, but any other scenarios can also be mentioned). The "more harm than good" is not true in all scenarios, but as it's worded now, it seems like it is always a bad approach. "(either by having a u function that marks things as unicode without future imports or the inverse by having a n function that marks strings as native). Unfortunately, this has the side effect of slowing down the runtime performance of Python and makes for less beautiful code." The use of u() and n() are not equivalent in the sense that n() only has to be used when unicode_literals are in effect, and the incidence of n() calls in an application would be much lower than using u() in the absence of unicode_literals. In at least some cases, it is possible that some of the APIs which fail unless native strings are provided may be broken (e.g. some database adapters expect datetimes in ISO format as native strings, where there is no apparent reason why they couldn't accept them as text). As far as "less beautiful" code is concerned, it's subjective: I see nothing especially ugly about 'xxx' for text, and certainly don't find u'xxx' "more" beautiful - and I doubt if I'm the only person with that view. The point about the added cognitive burden of semantic-changing __future__ imports is, however, quite valid. "As it stands, when chosing between 2.7 and Python 3.2, Python 3 is currently not the best choice for certain long-term investments, since the ecosystem is not yet properly developed, and libraries are still fighting with their API decisions for Python 3." This looks to become a self-fulfilling prophecy, if you take it seriously. You would expect that, if Python 3 is the future of Python, then Python 3 is *precisely* the choice for *long*-term investments. The ecosystem is not yet fully developed, true: but that is because some people aren't ready to grasp the nettle and undergo the short-term pain required to get things in order. By "things", I mean places in existing 2.x code where no distinction was made between bytes and text, which you could get away with because of 2.x's forgiving nature. Whether you're using unicode_literals and 'xxx' or u'xxx', these things will need to be sorted out, and the syntax element is only one possible focus. If that entire sentence is removed, it does the PEP no harm, and the PEP will antagonise fewer people. "A valid point is that this would encourage people to become dependent on Python 3.3 for their ports. Fortunately that is not a big problem since that could be fixed at installation time similar to how many projects are currently invoking 2to3 as part of their installation process." Yes, but avoiding the very pain of running 2to3 is what (at least in part) motivates the PEP in the first place. This appears to be moving the pain that 2.x developers feel when trying to move to 3.x, to people who want to support 3.2 and 3.3 and 2.6+ in the same codebase. "For Python 3.1 and Python 3.2 (even 3.0 if necessary) a simple on-installation hook could be provided that tokenizes all source files and strips away the otherwise unnecessary u prefix at installation time." There's some confusion about this hook - The PEP calls it an on-installation hook (like 2to3) but Nick said it was an import-time hook. I'm more comfortable with the latter - it has a chance of providing an acceptable performance for a large codebase, as it will only kick in when .py files are newer than their .pyc. A 2to3 like hook, when working with a large codebase like Django, is likely to be about as painful as people are finding 2to3 now (when used in an edit-test-edit-test workflow). "Possible Downsides" does not mention any possible adverse impact on single codebase for 3.2/3.3, which I mention only because it's still not clear how the hook which is to make 3.2 development easier will work (in terms of its impact on development workflow). In the section on "Modernizing code", "but to make strings cheap for both 2.x and 3.x it is nearly impossible. The way it currently works is by abusing the unicode-escape codec on Python 2.x native strings." IIUC, the unicode-escape codec is only needed if you don't use unicode_literals - am I wrong about that? How are strings not equally cheap (near enough) on 2.x and 3.x if you use unicode_literals? In the "Runtime overhead of wrappers", the times may be valid, but a rider should be added to the effect that in a realistic workload, the wrapper overhead will be somewhat diluted where wrapper calls are fairly infrequent (i.e. the unicode_literals and n() case). Of course, if the PEP is targeting Python 2.5 and earlier where unicode_literals is not available, then it should say so. I would say that the overall impression given by the PEP is that "the unicode_literals approach is not worth bothering with", and that I do not find to be true based on my own experience. Regards, Vinay Sajip