PEP 414 - Unicode Literals for Python 3
Hi, I just uploaded PEP 414 which proposes am optional 'u' prefix for string literals for Python 3. You can read the PEP online: http://www.python.org/dev/peps/pep-0414/ This is a followup to the discussion about this topic here on the mailinglist and on twitter/IRC over the last few weeks. Regards, Armin
If this can encourage more projects to support Python 3 (even if it's only 3.3 and later) and hence improve adoption of Python 3, I'm all for it. A small quibble: I'd like to see a benchmark of a 'u' function implemented in C. --Guido On Sat, Feb 25, 2012 at 12:23 PM, Armin Ronacher <armin.ronacher@active-4.com> wrote:
Hi,
I just uploaded PEP 414 which proposes am optional 'u' prefix for string literals for Python 3.
You can read the PEP online: http://www.python.org/dev/peps/pep-0414/
This is a followup to the discussion about this topic here on the mailinglist and on twitter/IRC over the last few weeks.
Regards, Armin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
On Sun, Feb 26, 2012 at 1:13 PM, Guido van Rossum <guido@python.org> wrote:
A small quibble: I'd like to see a benchmark of a 'u' function implemented in C.
Even if it was quite fast, I don't think such a function would bring the same benefits as restoring support for u'' literals. Using myself as an example, my work projects (such as PulpDist [1]) are currently written to target Python 2.6, since that's the system Python on RHEL 6. As a web application, PulpDist has unicode literals *everywhere*, but (as Armin pointed out to me), turning on "from __future__ import unicode_literals" in every file would be incorrect, since many of them also include native strings (mostly related to attribute names and subprocess invocation, but probably a few WSGI related ones as well). The action-at-a-distance of that future import can also make the code hard to read and review (in particular, a diff doesn't tell you whether or not the future import is present in the original file). It's going to be quite some time before I look at porting that code to Python 3, but, given the style of forward compatible code that I write (e.g. "print (X)", never "print X" or " print (X, Y)"; "except A as B:", never "except A, B:"), the lack of unicode literals in 3.x is the only significant sticking point I expect to encounter. If 3.3+ has Unicode literals, I expect that PulpDist *right now* would be awfully close to being source compatible (and any other discrepancies would just be simple fixes like adding conditional imports from new locations). IIRC, I've previously opposed the restoration of unicode literals as a retrograde step. Looking at the implications for the future migration of PulpDist has changed my mind. Regards, Nick. [1] https://fedorahosted.org/pulpdist/ -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Am 26.02.2012 07:06, schrieb Nick Coghlan:
On Sun, Feb 26, 2012 at 1:13 PM, Guido van Rossum <guido@python.org> wrote:
A small quibble: I'd like to see a benchmark of a 'u' function implemented in C.
Even if it was quite fast, I don't think such a function would bring the same benefits as restoring support for u'' literals.
You claim that, but your argument doesn't actually support that claim (or I fail to see the argument).
Using myself as an example, my work projects (such as PulpDist [1]) are currently written to target Python 2.6, since that's the system Python on RHEL 6. As a web application, PulpDist has unicode literals *everywhere*, but (as Armin pointed out to me), turning on "from __future__ import unicode_literals" in every file would be incorrect,
Right. So you shouldn't use the __future__ import, but the u() function.
IIRC, I've previously opposed the restoration of unicode literals as a retrograde step. Looking at the implications for the future migration of PulpDist has changed my mind.
Did you try to follow the path of the u() function? Regards, Martin
Martin v. Löwis wrote:
Am 26.02.2012 07:06, schrieb Nick Coghlan:
A small quibble: I'd like to see a benchmark of a 'u' function implemented in C. Even if it was quite fast, I don't think such a function would bring
On Sun, Feb 26, 2012 at 1:13 PM, Guido van Rossum <guido@python.org> wrote: the same benefits as restoring support for u'' literals.
You claim that, but your argument doesn't actually support that claim (or I fail to see the argument).
Python 2.6 code: this = u'that' Python 3.3 code: this = u('that') Not source compatible, not elegant. (Even though 2to3 could make this fix, it's still kinda ugly.) ~Ethan~
On Mon, 27 Feb 2012 09:05:54 -0800, Ethan Furman <ethan@stoneleaf.us> wrote:
Martin v. Löwis wrote:
Am 26.02.2012 07:06, schrieb Nick Coghlan:
A small quibble: I'd like to see a benchmark of a 'u' function implemented in C. Even if it was quite fast, I don't think such a function would bring
On Sun, Feb 26, 2012 at 1:13 PM, Guido van Rossum <guido@python.org> wrote: the same benefits as restoring support for u'' literals.
You claim that, but your argument doesn't actually support that claim (or I fail to see the argument).
Python 2.6 code: this = u'that'
Python 3.3 code: this = u('that')
Not source compatible, not elegant. (Even though 2to3 could make this fix, it's still kinda ugly.)
Eh? The 2.6 version would also be u('that'). That's the whole point of the idiom. You'll need a better counter argument than that. --David
On Mon, 2012-02-27 at 12:41 -0500, R. David Murray wrote:
On Mon, 27 Feb 2012 09:05:54 -0800, Ethan Furman <ethan@stoneleaf.us> wrote:
Martin v. Löwis wrote:
Am 26.02.2012 07:06, schrieb Nick Coghlan:
A small quibble: I'd like to see a benchmark of a 'u' function implemented in C. Even if it was quite fast, I don't think such a function would bring
On Sun, Feb 26, 2012 at 1:13 PM, Guido van Rossum <guido@python.org> wrote: the same benefits as restoring support for u'' literals.
You claim that, but your argument doesn't actually support that claim (or I fail to see the argument).
Python 2.6 code: this = u'that'
Python 3.3 code: this = u('that')
Not source compatible, not elegant. (Even though 2to3 could make this fix, it's still kinda ugly.)
Eh? The 2.6 version would also be u('that'). That's the whole point of the idiom. You'll need a better counter argument than that.
The best argument is that there already exists tons and tons of Python 2 code that already does: u'that' Needing to change it to: u('that') 1) Requires effort on the part of a from-Python-2-porter to service the aesthetic and populist goal of not having an explicit but redundant-under-Py3 literal syntax that says "this is text". 2) Won't atually meet the aesthetic goal, as it's uglier and slower under *both* Python 2 and Python 3. So the populist argument remains.. "it's too confusing for people who learn Python 3 as a new language to have a redundant syntax". But we've had such a syntax in Python 2 for years with b'', and, as mentioned by Armin's PEP single-quoted vs. triple-quoted strings forever. I just don't understand the pushback here at all. This is such a nobrainer. - C
On Mon, Feb 27, 2012 at 10:01 AM, Chris McDonough <chrism@plope.com> wrote:
The best argument is that there already exists tons and tons of Python 2 code that already does:
u'that'
+1
Needing to change it to:
u('that')
1) Requires effort on the part of a from-Python-2-porter to service the aesthetic and populist goal of not having an explicit but redundant-under-Py3 literal syntax that says "this is text".
2) Won't actually meet the aesthetic goal, as it's uglier and slower under *both* Python 2 and Python 3.
So the populist argument remains.. "it's too confusing for people who learn Python 3 as a new language to have a redundant syntax". But we've had such a syntax in Python 2 for years with b'', and, as mentioned by Armin's PEP single-quoted vs. triple-quoted strings forever.
I just don't understand the pushback here at all. This is such a nobrainer.
I agree. Just let's start deprecating it too, so that once Python 2.x compatibility is no longer relevant we can eventually stop supporting it (though that may have to wait until Python 4...). We need to send *some* sort of signal that this is a compatibility hack and that no new code should use it. Maybe a SilentDeprecationWarning? -- --Guido van Rossum (python.org/~guido)
On Mon, 27 Feb 2012 10:17:57 -0800, Guido van Rossum <guido@python.org> wrote:
On Mon, Feb 27, 2012 at 10:01 AM, Chris McDonough <chrism@plope.com> wrote:
The best argument is that there already exists tons and tons of Python 2 code that already does:
u'that'
+1
Needing to change it to:
u('that')
1) Requires effort on the part of a from-Python-2-porter to service the aesthetic and populist goal of not having an explicit but redundant-under-Py3 literal syntax that says "this is text".
2) Won't actually meet the aesthetic goal, as it's uglier and slower under *both* Python 2 and Python 3.
So the populist argument remains.. "it's too confusing for people who learn Python 3 as a new language to have a redundant syntax". But we've had such a syntax in Python 2 for years with b'', and, as mentioned by Armin's PEP single-quoted vs. triple-quoted strings forever.
I just don't understand the pushback here at all. This is such a nobrainer.
It's obviously not a *no*-brainer or you wouldn't be getting pushback :) I view most of the pushback as people wanting to make sure all the options have been carefully considered. This should all be documented in the PEP.
I agree. Just let's start deprecating it too, so that once Python 2.x compatibility is no longer relevant we can eventually stop supporting it (though that may have to wait until Python 4...). We need to send *some* sort of signal that this is a compatibility hack and that no new code should use it. Maybe a SilentDeprecationWarning?
Isn't that what PendingDeprecationWarning is? This seems like the kind of use case that was introduced for (though it is less used now that DeprecationWarnings are silent by default). --David
On 2/27/2012 1:17 PM, Guido van Rossum wrote:
On Mon, Feb 27, 2012 at 10:01 AM, Chris McDonough<chrism@plope.com> wrote:
The best argument is that there already exists tons and tons of Python 2 code that already does:
u'that'
+1
I just don't understand the pushback here at all. This is such a nobrainer.
I agree. Just let's start deprecating it too, so that once Python 2.x compatibility is no longer relevant we can eventually stop supporting it (though that may have to wait until Python 4...). We need to send *some* sort of signal that this is a compatibility hack and that no new code should use it. Maybe a SilentDeprecationWarning?
One possibility: leave Ref Man 2.4.1. *String and Bytes literals* as is. Add ''' 2.4.1.1 Deprecated u prefix. To aid people who want to update Python 2 code to also run under Python 3, string literals may optionally be prefixed with "u" or "U". For this purpose, but only for this purpose, the grammar actually reads stringprefix ::= "r" | "R" | "ur" | "Ur" | "uR" | "UR" Since "u" and "U" will go away again some year, they should only be used for such multi-version code and not in code only intended for Python 3. See PEP 414. Version added: 3.3 ''' I think the PEP should have exaggerated statements removed, perhaps be shortened, explain how to patch code on installation for 3.1/2, and have something at the top pointing to that explanation. -- Terry Jan Reedy
On 2/27/2012 1:17 PM, Guido van Rossum wrote:
I just don't understand the pushback here at all. This is such a nobrainer.
I agree. Just let's start deprecating it too, so that once Python 2.x compatibility is no longer relevant we can eventually stop supporting it (though that may have to wait until Python 4...). We need to send *some* sort of signal that this is a compatibility hack and that no new code should use it. Maybe a SilentDeprecationWarning?
Before we make this change, I would like to know if this is Armin's last proposal to revert Python 3 toward Python 2 or merely the first in a series. I question this because last December Armin wrote "And in my absolutely personal opinion Python 3.3/3.4 should be more like Python 2* and Python 2.8 should happen and be a bit more like Python 3." * he wrote '3' but obviously means '2'. http://lucumr.pocoo.org/2011/12/7/thoughts-on-python3/ Chris has also made it clear that he (also?) would like more reversions. -- Terry Jan Reedy
On Mon, 27 Feb 2012 16:54:51 -0500 Terry Reedy <tjreedy@udel.edu> wrote:
On 2/27/2012 1:17 PM, Guido van Rossum wrote:
I just don't understand the pushback here at all. This is such a nobrainer.
I agree. Just let's start deprecating it too, so that once Python 2.x compatibility is no longer relevant we can eventually stop supporting it (though that may have to wait until Python 4...). We need to send *some* sort of signal that this is a compatibility hack and that no new code should use it. Maybe a SilentDeprecationWarning?
Before we make this change, I would like to know if this is Armin's last proposal to revert Python 3 toward Python 2 or merely the first in a series. I question this because last December Armin wrote
"And in my absolutely personal opinion Python 3.3/3.4 should be more like Python 2* and Python 2.8 should happen and be a bit more like Python 3." * he wrote '3' but obviously means '2'. http://lucumr.pocoo.org/2011/12/7/thoughts-on-python3/
Chris has also made it clear that he (also?) would like more reversions.
Please. While I'm not strongly in favour of the PEP, this kind of argument is dishonest. Whatever Armin's secret wishes may be, his PEP should be judged on its own grounds. Thank you Antoine.
Well said Antoine. --Guido van Rossum (sent from Android phone) On Feb 27, 2012 2:03 PM, "Antoine Pitrou" <solipsis@pitrou.net> wrote:
On Mon, 27 Feb 2012 16:54:51 -0500 Terry Reedy <tjreedy@udel.edu> wrote:
On 2/27/2012 1:17 PM, Guido van Rossum wrote:
I just don't understand the pushback here at all. This is such a nobrainer.
I agree. Just let's start deprecating it too, so that once Python 2.x compatibility is no longer relevant we can eventually stop supporting it (though that may have to wait until Python 4...). We need to send *some* sort of signal that this is a compatibility hack and that no new code should use it. Maybe a SilentDeprecationWarning?
Before we make this change, I would like to know if this is Armin's last proposal to revert Python 3 toward Python 2 or merely the first in a series. I question this because last December Armin wrote
"And in my absolutely personal opinion Python 3.3/3.4 should be more like Python 2* and Python 2.8 should happen and be a bit more like Python 3." * he wrote '3' but obviously means '2'. http://lucumr.pocoo.org/2011/12/7/thoughts-on-python3/
Chris has also made it clear that he (also?) would like more reversions.
Please. While I'm not strongly in favour of the PEP, this kind of argument is dishonest. Whatever Armin's secret wishes may be, his PEP should be judged on its own grounds.
Thank you
Antoine.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
Hi, On 2/27/12 9:54 PM, Terry Reedy wrote:
Before we make this change, I would like to know if this is Armin's last proposal to revert Python 3 toward Python 2 or merely the first in a series. I question this because last December Armin wrote You're saying as if providing a sane upgrade path was a bad thing. That said, if I had other proposals I would have submitted them *now* since waiting for another Python version to go by would not be helpful.
I only have myself to blame for providing that PEP now instead of earlier which would have been a lot more useful. Regards, Armin
On 2/27/2012 1:01 PM, Chris McDonough wrote:
On Mon, 2012-02-27 at 12:41 -0500, R. David Murray wrote:
Eh? The 2.6 version would also be u('that'). That's the whole point of the idiom. You'll need a better counter argument than that.
The best argument is that there already exists tons and tons of Python 2 code that already does:
u'that'
Needing to change it to:
u('that')
1) Requires effort on the part of a from-Python-2-porter to service the aesthetic and populist goal of not having an explicit but redundant-under-Py3 literal syntax that says "this is text".
This is a point, though this would be a one-time conversion by a 2to23 converter that would be part of other needed conversions, some by hand. I presume that most 2.6 code has problems other than u'' when attempting to run under 3.x.
2) Won't atually meet the aesthetic goal, as it's uglier and slower under *both* Python 2 and Python 3.
Less relevant. The minor ugliness would be in dual-version code, but not Python 3 itself.
So the populist argument remains.. "it's too confusing for people who learn Python 3 as a new language to have a redundant syntax". But we've had such a syntax in Python 2 for years with b'', and, as mentioned by Armin's PEP single-quoted vs. triple-quoted strings forever.
I just don't understand the pushback here at all.
For one thing, u'' does not solve the problem for 3.1 and 3.2, while u() does. 3.2 will be around for years. For one example, it will be in the April long-term-support release of Ubuntu. For another, PyPy is working on a 3.2 compatible version to come out and be put into use this year.
This is such a nobrainer.
I could claim that a solution that also works for 3.1 and 3.2 is a nobrainer. It depends on how one weighs different factors. -- Terry Jan Reedy
Terry Reedy <tjreedy <at> udel.edu> writes:
This is a point, though this would be a one-time conversion by a 2to23 converter that would be part of other needed conversions, some by hand. I presume that most 2.6 code has problems other than u'' when attempting to run under 3.x.
Right. In doing the Django port, the u() stuff took very little time - I wrote a lib2to3 fixer to do it. A lot more time was spent in areas where the bytes/text interfaces had not been thought through carefully, e.g. in the crypto/hashing stuff - this is stuff that an automatic tools couldn't do. After it was decided in the Django team to drop 2.5 support after Django 1.4 was released, the u('xxx') calls weren't needed any more. Another lib2to3 fixer converted them back to 'xxx' for use with "from __future__ import unicode_literals".
2) Won't atually meet the aesthetic goal, as it's uglier and slower under *both* Python 2 and Python 3.
Less relevant. The minor ugliness would be in dual-version code, but not Python 3 itself.
And it would be reasonably easy to transition from u('xxx') -> 'xxx' when support for 2.5 is dropped by a particular project, again using automation via a lib2to3 fixer.
I could claim that a solution that also works for 3.1 and 3.2 is a nobrainer. It depends on how one weighs different factors.
Yes. I feel the same way as Martin and Barry have expressed - it's a shame that people are talking up the potential difficulties of porting to a single code-base without the PEP change. Having been in the trenches with the Django port, I don't feel that the Unicode literal part was really a major problem. And I've now done *two* Django ports - one to a 2.5-compatible codebase with u('xxx'), and one to a 2.6+ compatible codebase with unicode_literals and plain 'xxx'. I'm only keeping the latter one up to date with changes in Django trunk, but both ports, though far from complete from a whole-project point of view, got to the point where they passed the very large test suite. On balance, though, I don't oppose the PEP. We can wish all we want for people to do the right thing (as we see it), but wishing don't make it so. Do I sense a certain amount of worry about the pace of the 2.x -> 3.x transition? It feels like we're blinking first ;-) Regards, Vinay Sajip
On Mon, 2012-02-27 at 13:44 -0500, Terry Reedy wrote:
On 2/27/2012 1:01 PM, Chris McDonough wrote:
On Mon, 2012-02-27 at 12:41 -0500, R. David Murray wrote:
Eh? The 2.6 version would also be u('that'). That's the whole point of the idiom. You'll need a better counter argument than that.
The best argument is that there already exists tons and tons of Python 2 code that already does:
u'that'
Needing to change it to:
u('that')
1) Requires effort on the part of a from-Python-2-porter to service the aesthetic and populist goal of not having an explicit but redundant-under-Py3 literal syntax that says "this is text".
This is a point, though this would be a one-time conversion by a 2to23 converter that would be part of other needed conversions, some by hand. I presume that most 2.6 code has problems other than u'' when attempting to run under 3.x.
2) Won't atually meet the aesthetic goal, as it's uglier and slower under *both* Python 2 and Python 3.
Less relevant. The minor ugliness would be in dual-version code, but not Python 3 itself.
So the populist argument remains.. "it's too confusing for people who learn Python 3 as a new language to have a redundant syntax". But we've had such a syntax in Python 2 for years with b'', and, as mentioned by Armin's PEP single-quoted vs. triple-quoted strings forever.
I just don't understand the pushback here at all.
For one thing, u'' does not solve the problem for 3.1 and 3.2, while u() does. 3.2 will be around for years. For one example, it will be in the April long-term-support release of Ubuntu. For another, PyPy is working on a 3.2 compatible version to come out and be put into use this year.
I suspect not everyone lives and dies by OS distribution release support policies. Many folks are both willing and capable to install a newer Python on an older OS. It's unfortunate that Python 3 < 3.3 does not have the syntax, and people like me who have a long-term need to "straddle" are to blame; we didn't provide useful feedback early enough to avoid the mistake. That said, it seems like preventing a reintroduction of u'' literal syntax would presume that two wrongs make a right. By our own schedule estimate of Python 3 takeup, many people won't be even thinking about porting any Python 2 code to 3 until years from now.
This is such a nobrainer.
I could claim that a solution that also works for 3.1 and 3.2 is a nobrainer. It depends on how one weighs different factors.
An argument for the reintroduction of u'' literal syntax in Python >= 3.3 is not necessarily an argument against the utility of some automated tool conversion support for porting a Python 2 app to a function-based u() syntax so it can run in Python 3 < 3.2. Tools like "2to23" or whatever can obviously be parameterized to emit slightly different 3.2-compatible and 3.3-compatible code. It's almost certain that it will need forward-version-aware modes like this anyway as newer idioms are added to 3.X that make code prettier or more efficient completely independent of u'' support. Currently we handle 3.2 compatibility in packages that "straddle" via six-like functions. We can continue doing this as necessary. If the stdlib tooling helps, great. In an emit-function-based-syntax mode, the conversion code would almost certainly need to rely on the import of an externally downloadable module like six, for compatibility under both Python 2 and 3 because there's no opportunity to go back in time and make "u()" available for older releases unless it was like inlined in every module during the conversion. But if somebody only wants to target 3.3+, and it means they don't have to rely on a six-like module to provide u(), great. - C
Chris McDonough <chrism <at> plope.com> writes:
I suspect not everyone lives and dies by OS distribution release support policies. Many folks are both willing and capable to install a newer Python on an older OS.
But many folks aren't, and lament the slow pace of Python version adoption on e.g. Red Hat and CentOS.
It's unfortunate that Python 3 < 3.3 does not have the syntax, and people like me who have a long-term need to "straddle" are to blame; we didn't provide useful feedback early enough to avoid the mistake. That said, it seems like preventing a reintroduction of u'' literal syntax would presume that two wrongs make a right. By our own schedule estimate of Python 3 takeup, many people won't be even thinking about porting any Python 2 code to 3 until years from now.
If the lack of u'' literal is what's holding them back, that's germane to the discussion of the PEP. If it's not, then why propose the PEP?
An argument for the reintroduction of u'' literal syntax in Python >= 3.3 is not necessarily an argument against the utility of some automated tool conversion support for porting a Python 2 app to a function-based u() syntax so it can run in Python 3 < 3.2.
I thought the argument was more about backtracking (or not) from Python 3's design decision to use 'xxx' for text and b'yyy' for bytes. That's the only "wrong" we're talking about for this PEP, right?
Currently we handle 3.2 compatibility in packages that "straddle" via six-like functions. We can continue doing this as necessary. If the stdlib tooling helps, great. In an emit-function-based-syntax mode, the conversion code would almost certainly need to rely on the import of an externally downloadable module like six, for compatibility under both Python 2 and 3 because there's no opportunity to go back in time and make "u()" available for older releases unless it was like inlined in every module during the conversion.
But if somebody only wants to target 3.3+, and it means they don't have to rely on a six-like module to provide u(), great.
If you only need to straddle from 2.6 onwards, then u('') isn't an issue at all, right now, is it? If you need to straddle from 2.5 downwards, there are other issues to be addressed, like exception syntax, 'with' and so forth - so making u'' available doesn't make the port a no-brainer. And if you bite the bullet and decide to do the port anyway, converting u'' to u('') won't be a problem unless you (a) can't use a fixer to automate the conversion or (b) the function call overhead cannot be borne. I'm not sure either of those objections (can't use fixer, call overhead excessive) have been made with sufficient force (i.e., data) in the discussion so far. Regards, Vinay Sajip
On Mon, 2012-02-27 at 20:18 +0000, Vinay Sajip wrote:
Chris McDonough <chrism <at> plope.com> writes:
I suspect not everyone lives and dies by OS distribution release support policies. Many folks are both willing and capable to install a newer Python on an older OS.
But many folks aren't, and lament the slow pace of Python version adoption on e.g. Red Hat and CentOS.
It's great to have software that installs easily. That said, the versions of Python that my software supports is (and has to be) be my choice. As far as I can tell, there are maybe three or four people (besides me) using my software on Python 3 right now. They have it pretty rough: lackluster library support and they have to constantly mentally transliterate third-party example code to code that works under Python 3. They are troopers! None of them would so much as bat an eyelash if I told them today they had to use Python 3.3 (if it existed in a final released form anyway) to use my software. It's just a minor drop in the bucket of inconvenience they have to currently withstand.
It's unfortunate that Python 3 < 3.3 does not have the syntax, and people like me who have a long-term need to "straddle" are to blame; we didn't provide useful feedback early enough to avoid the mistake. That said, it seems like preventing a reintroduction of u'' literal syntax would presume that two wrongs make a right. By our own schedule estimate of Python 3 takeup, many people won't be even thinking about porting any Python 2 code to 3 until years from now.
If the lack of u'' literal is what's holding them back, that's germane to the discussion of the PEP. If it's not, then why propose the PEP?
Like I said in an earlier email, u'' literal support is by no means the only issue for people who want to straddle. But it *is* an issue, and it's incredibly low-hanging fruit with near-zero real-world impact if it is reintroduced.
An argument for the reintroduction of u'' literal syntax in Python >= 3.3 is not necessarily an argument against the utility of some automated tool conversion support for porting a Python 2 app to a function-based u() syntax so it can run in Python 3 < 3.2.
I thought the argument was more about backtracking (or not) from Python 3's design decision to use 'xxx' for text and b'yyy' for bytes. That's the only "wrong" we're talking about for this PEP, right?
You cast it as "backtracking" to reintroduce the syntax, but things have changed from when the decision to omit it was first made. Its omission introduces pain in a world where it's expected that we don't use 2to3 to automatically translate code at installation time.
Currently we handle 3.2 compatibility in packages that "straddle" via six-like functions. We can continue doing this as necessary. If the stdlib tooling helps, great. In an emit-function-based-syntax mode, the conversion code would almost certainly need to rely on the import of an externally downloadable module like six, for compatibility under both Python 2 and 3 because there's no opportunity to go back in time and make "u()" available for older releases unless it was like inlined in every module during the conversion.
But if somebody only wants to target 3.3+, and it means they don't have to rely on a six-like module to provide u(), great.
If you only need to straddle from 2.6 onwards, then u('') isn't an issue at all, right now, is it?
If you look at a piece of code as something that exists in one of the two states "ported" or "not-ported", sure. But code often needs to be changed, and people of varying buy-in levels need to understand and change such code. It's just much easier for them to assume that the same syntax works on some versions of Python 2 and Python 3 and be done with it rather than need to explain the introduction of a function that only exists to paper over a syntax omission.
If you need to straddle from 2.5 downwards, there are other issues to be addressed, like exception syntax, 'with' and so forth - so making u'' available doesn't make the port a no-brainer. And if you bite the bullet and decide to do the port anyway, converting u'' to u('') won't be a problem unless you (a) can't use a fixer to automate the conversion or (b) the function call overhead cannot be borne. I'm not sure either of those objections (can't use fixer, call overhead excessive) have been made with sufficient force (i.e., data) in the discussion so far.
Regards,
Vinay Sajip
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lists%40plope.com
Chris McDonough <chrism <at> plope.com> writes:
It's great to have software that installs easily. That said, the versions of Python that my software supports is (and has to be) be my choice.
Of course. And if I understand correctly, that's 2.6, 2.7, 3.2 and later versions. I'll ignore 2.5 and earlier in this specific reply.
None of them would so much as bat an eyelash if I told them today they had to use Python 3.3 (if it existed in a final released form anyway) to use my software. It's just a minor drop in the bucket of inconvenience they have to currently withstand.
Their pain (lacklustre library support and transliterating examples from 2.x to 3.x) would be the same under 3.2 and 3.3 (unless for some perverse reason people only made libraries work under one of 3.2 and 3.3, but not both). Is it really that hard to transliterate 2.x examples to 3.x in the literal-string dimension? I can't believe it is, as the target audience is programmers.
If the lack of u'' literal is what's holding them back, that's germane to the discussion of the PEP. If it's not, then why propose the PEP?
Like I said in an earlier email, u'' literal support is by no means the only issue for people who want to straddle. But it *is* an issue, and it's incredibly low-hanging fruit with near-zero real-world impact if it is reintroduced.
But the implication of the PEP is that lack of u'' support is a major hindrance to porting, justifying the production of the PEP and this discussion. And it's not low-hanging fruit with near-zero real-world impact if we're going to deprecate it at some point (which Guido was talking about) - you're just moving the pain to a later date, unless we don't ever deprecate. I feel, like some others, that 'xxx' is natural for text, u'xxx' is inelegant by comparison, and u('xxx') a little more inelegant still. However, allowing u'' syntax in 3.3 as per this PEP, but allowing it to be optional, allows any combination of u'xxx' and 'xxx' in code in a 3.x context, which doesn't see to me to be an ideal situation especially if you have hit-and-run contributors who are not necessarily attuned to project conventions.
You cast it as "backtracking" to reintroduce the syntax, but things have changed from when the decision to omit it was first made. Its omission introduces pain in a world where it's expected that we don't use 2to3 to automatically translate code at installation time.
I'm calling it like it is. "reintroduce" in this case means undoing something already done, so it's appropriate to say "backtracking". I don't agree that things have changed. If I want to write code that works on 2.x and 3.x without the pain of running 2to3 after every change, and I'm only interested in supporting >= 2.6 (your situation, IIUC), then I use "from __future__ import unicode_literals" - that's what it was created for, wasn't it? - and use 'xxx' where I need text, b'xxx' where I need bytes, and a function to deliver native strings where they're needed. If I have a 2.x project full of u'' code which I need to bring into this approach, then I run 2to3, review what it tells me, make the changes necessary (as far as literals go, that's adding the unicode_literals import to all files, and converting u'xxx' -> 'xxx'. When I test the result, I will find numerous failures, some of which point to places where I should have used native strings (e.g. kwargs keys), which I then fix. Other areas will be where I needed to use bytes (e.g. encoding/decoding/hashing), which I will also fix. I use six or a similar approach to sort out any other issues which crop up, e.g. metaclass syntax, execfile, and so on. After a relatively modest amount of work, I have a codebase that works on 2.x and 3.x, and all I have to remember is that 'xxx' is Unicode, and if I create a new module, I need to add the future import (on the assumption that I might add literal strings later, if not now). After that, it seems to be plain sailing, and I don't have to switch mental gears re. string literals.
If you look at a piece of code as something that exists in one of the two states "ported" or "not-ported", sure. But code often needs to be changed, and people of varying buy-in levels need to understand and change such code. It's just much easier for them to assume that the same syntax works on some versions of Python 2 and Python 3 and be done with it rather than need to explain the introduction of a function that only exists to paper over a syntax omission.
Well, according to the approach I described above, that one thing needs to be the present 3.x syntax - 'xxx' is text, b'xxx' is bytes, and f('xxx') is native string (or whatever name you want instead of f). With the unicode_literals import, that syntax works on 2.6+ and 3.2+, so ISTM it should work within the constraints you mentioned for your software. Regards, Vinay Sajip
On Mon, 2012-02-27 at 21:43 +0000, Vinay Sajip wrote:
Chris McDonough <chrism <at> plope.com> writes:
It's great to have software that installs easily. That said, the versions of Python that my software supports is (and has to be) be my choice.
Of course. And if I understand correctly, that's 2.6, 2.7, 3.2 and later versions. I'll ignore 2.5 and earlier in this specific reply.
None of them would so much as bat an eyelash if I told them today they had to use Python 3.3 (if it existed in a final released form anyway) to use my software. It's just a minor drop in the bucket of inconvenience they have to currently withstand.
Their pain (lacklustre library support and transliterating examples from 2.x to 3.x) would be the same under 3.2 and 3.3 (unless for some perverse reason people only made libraries work under one of 3.2 and 3.3, but not both).
If I had it to do all over again and a Python 3.X with unicode literals had been available, I might not have targeted Python 3.2 at all. I don't consider that perverse, I just consider it "Python 3 water under the bridge". Python 3.0 and 3.1 were this for me; I paid almost no attention to them at all. Python 3.2 will be that thing for many other people.
Like I said in an earlier email, u'' literal support is by no means the only issue for people who want to straddle. But it *is* an issue, and it's incredibly low-hanging fruit with near-zero real-world impact if it is reintroduced.
But the implication of the PEP is that lack of u'' support is a major hindrance to porting, justifying the production of the PEP and this discussion. And it's not low-hanging fruit with near-zero real-world impact if we're going to deprecate it at some point (which Guido was talking about) - you're just moving the pain to a later date, unless we don't ever deprecate.
I personally see no need to deprecate. I can't conceive of an actual downside to eternal backwards compatibility here. All the arguments for its omission presume that there's some enormous untapped market full of people yearning for its omission who would be either horrified to see u'' or whom would not understand it on some fundamental level. I don't think such a market actually exists. However, there *is* a huge market for people who already understand it instinctively.
I feel, like some others, that 'xxx' is natural for text, u'xxx' is inelegant by comparison, and u('xxx') a little more inelegant still.
Yes, the aesthetics argument seems to be the remaining argument. I have no problem with the aesthetics of u'' myself. But I have no problem with the aesthetics of u('') for that matter either; if it had been used as the prevailing style to declare something being text in Python 2 and it had been omitted I'd be arguing for that instead. But it wasn't, of course. Anyway. I think I'm done doing the respond-point-for-point thing; it's becoming diminishing returns. - C
On Feb 27, 2012, at 09:43 PM, Vinay Sajip wrote:
Well, according to the approach I described above, that one thing needs to be the present 3.x syntax - 'xxx' is text, b'xxx' is bytes, and f('xxx') is native string (or whatever name you want instead of f). With the unicode_literals import, that syntax works on 2.6+ and 3.2+, so ISTM it should work within the constraints you mentioned for your software.
I agree, this works for me and it's what I do in all my code now. Strings adorned with u-prefixes just look unnatural, and there's no confusion that unadorned strings mean "unicode". And yes, I have had to use str('') occasionally to mean "native strings", but it's so rare and constant cost that I didn't even think twice about it after I discovered this trick. But it seems like this is just not an acceptable solution for proponents of the PEP. Given that the above is the most generally accepted way to spell these things in the Python versions we care about today (>= 2.6, 3.2), at the very least, the PEP needs to be rewritten to make it clear why the above is unacceptable. That's the only way IMO that the PEP can be judged on its own merits. (I'll concede for the sake of argument that 2to3 is unacceptable. I also think it's unnecessary though.) Cheers, -Barry
On Mon, 27 Feb 2012 14:50:21 -0500, Chris McDonough <chrism@plope.com> wrote:
Currently we handle 3.2 compatibility in packages that "straddle" via six-like functions. We can continue doing this as necessary. If the
It seems to me that this undermines your argument in favor of u''. Why can't you just continue to do the above for 3.3 and beyond? Frankly, *I'm* not worried about the uptake pace of Python3. It feels to me like it is pretty much on schedule, if not ahead of it. But to repeat, I'm not voting -1 here, I'm playing devil's advocate. --David
On Mon, 2012-02-27 at 15:23 -0500, R. David Murray wrote:
On Mon, 27 Feb 2012 14:50:21 -0500, Chris McDonough <chrism@plope.com> wrote:
Currently we handle 3.2 compatibility in packages that "straddle" via six-like functions. We can continue doing this as necessary. If the
It seems to me that this undermines your argument in favor of u''. Why can't you just continue to do the above for 3.3 and beyond?
I really don't know how long I'll need to do future development in the subset language of Python 2 and Python 3 because I can't predict the future. It could be two years, it might be five. Who knows. But I do know that I'm going to be developing in the subset of Python that currently runs on Python 2 >= 2.6 and Python 3 >= 3.2 for at least a year. And that will suck, because that language is a much less fun language in which to develop than either Python 2 or Python 3. Frankly, it's a pretty bad language. If we make this change now, it means a year from now I'll be able to develop in a slightly less sucky subset language if I choose to drop support for 3.2. And people who don't try to support Python 3 at all til then will never have to program in the suckiest subset like I will have had to. Note that u'' literals are sort of the tip of the iceberg here; supporting them will obviously not make development under the subset an order of magnitude less sucky, just a tiny little bit less sucky. There are other extremely annoying things, like str(bytes) returning the repr of a bytestring on Python 3. That's almost as irritating as the absence of u'' literals, but we have to evaluate one thing at a time. - C
Chris McDonough <chrism <at> plope.com> writes:
I really don't know how long I'll need to do future development in the subset language of Python 2 and Python 3 because I can't predict the future. It could be two years, it might be five. Who knows.
But I do know that I'm going to be developing in the subset of Python that currently runs on Python 2 >= 2.6 and Python 3 >= 3.2 for at least a year. And that will suck, because that language is a much less fun language in which to develop than either Python 2 or Python 3. Frankly, it's a pretty bad language.
What exactly is it that makes it so bad? Since you're developing for >= 2.6, what stops you from using "from __future__ import unicode_literals" and 'xxx' for text and b'yyy' for bytes? Then you would be working in essentially Python 3.x, at least as far as string literals go. The conversion time will be very small compared to the year time-frame you're talking about.
If we make this change now, it means a year from now I'll be able to develop in a slightly less sucky subset language if I choose to drop support for 3.2. And people who don't try to support Python 3 at all til then will never have to program in the suckiest subset like I will have had to.
And if we don't make the change now and you change your code to use unicode_literals, convert u'xxx' -> 'xxx' and then change the places where you really meant to use bytes, that'll be a one-off change after which you will be working on a common codebase which works on 2.6+ and 3.0+, and as far as string literals are concerned you'll be working in the hopefully non-sucky 3.x syntax.
Note that u'' literals are sort of the tip of the iceberg here; supporting them will obviously not make development under the subset an order of magnitude less sucky, just a tiny little bit less sucky. There are other extremely annoying things, like str(bytes) returning the repr of a bytestring on Python 3. That's almost as irritating as the absence of u'' literals, but we have to evaluate one thing at a time.
Yes, but making a backward step like reintroducing u'' just to make things a tiny little bit sucky doesn't seem to me to be worth it, because then >= 3.3 is different to 3.2 and earlier. Armin's suggestion of an install-time fixer is analogous to running 2to3 after every change, if you're trying to support 3.2 and 3.3+ at the same time, isn't it? You can't just edit-and-test, which to me is the main benefit of a single codebase. Regards, Vinay Sajip
On Mon, 2012-02-27 at 21:03 +0000, Vinay Sajip wrote:
Chris McDonough <chrism <at> plope.com> writes:
I really don't know how long I'll need to do future development in the subset language of Python 2 and Python 3 because I can't predict the future. It could be two years, it might be five. Who knows.
But I do know that I'm going to be developing in the subset of Python that currently runs on Python 2 >= 2.6 and Python 3 >= 3.2 for at least a year. And that will suck, because that language is a much less fun language in which to develop than either Python 2 or Python 3. Frankly, it's a pretty bad language.
What exactly is it that makes it so bad? Since you're developing for >= 2.6, what stops you from using "from __future__ import unicode_literals" and 'xxx' for text and b'yyy' for bytes? Then you would be working in essentially Python 3.x, at least as far as string literals go. The conversion time will be very small compared to the year time-frame you're talking about.
If we make this change now, it means a year from now I'll be able to develop in a slightly less sucky subset language if I choose to drop support for 3.2. And people who don't try to support Python 3 at all til then will never have to program in the suckiest subset like I will have had to.
And if we don't make the change now and you change your code to use unicode_literals, convert u'xxx' -> 'xxx' and then change the places where you really meant to use bytes, that'll be a one-off change after which you will be working on a common codebase which works on 2.6+ and 3.0+, and as far as string literals are concerned you'll be working in the hopefully non-sucky 3.x syntax.
Note that u'' literals are sort of the tip of the iceberg here; supporting them will obviously not make development under the subset an order of magnitude less sucky, just a tiny little bit less sucky. There are other extremely annoying things, like str(bytes) returning the repr of a bytestring on Python 3. That's almost as irritating as the absence of u'' literals, but we have to evaluate one thing at a time.
Yes, but making a backward step like reintroducing u'' just to make things a tiny little bit sucky doesn't seem to me to be worth it, because then >= 3.3 is different to 3.2 and earlier. Armin's suggestion of an install-time fixer is analogous to running 2to3 after every change, if you're trying to support 3.2 and 3.3+ at the same time, isn't it? You can't just edit-and-test, which to me is the main benefit of a single codebase.
The downsides of a unicode_literals future import are spelled out in the PEP: http://www.python.org/dev/peps/pep-0414/#rationale-and-goals - C
On Mon, 27 Feb 2012 16:16:39 -0500, Chris McDonough <chrism@plope.com> wrote:
On Mon, 2012-02-27 at 21:03 +0000, Vinay Sajip wrote:
Yes, but making a backward step like reintroducing u'' just to make things a tiny little bit sucky doesn't seem to me to be worth it, because then >= 3.3 is different to 3.2 and earlier. Armin's suggestion of an install-time fixer is analogous to running 2to3 after every change, if you're trying to support 3.2 and 3.3+ at the same time, isn't it? You can't just edit-and-test, which to me is the main benefit of a single codebase.
The downsides of a unicode_literals future import are spelled out in the PEP:
http://www.python.org/dev/peps/pep-0414/#rationale-and-goals
But the PEP doesn't address the unicode_literals plus str() approach. That is, the rationale currently makes a false claim. --David
28.02.12 00:11, Armin Ronacher написав(ла):
On 2/27/12 9:58 PM, R. David Murray wrote:
But the PEP doesn't address the unicode_literals plus str() approach. That is, the rationale currently makes a false claim. Which would be exactly what that u() does not do?
No. 1. u() is trivial for Python 3 and relatively expensive (and doubtful for non-ascii literals) for Python 2, unicode_literals plus str() is trivial for Python 3 and cheap for Python 2. 2. Text strings are natural and prevalent, but "natural" strings are domain-specific and archaic.
On 2/27/12 9:58 PM, R. David Murray wrote:
But the PEP doesn't address the unicode_literals plus str() approach. That is, the rationale currently makes a false claim. Which would be exactly what that u() does not do?
Armin, I propose that you correct the *factual* deficits of the PEP (i.e. remove all claims that cannot be supported by facts, or are otherwise incorrect or misleading). Many readers here would be more open to accepting the PEP if it was factual rather than polemic. The PEP author is supposed to collect all arguments, even the ones he doesn't agree with, and refute them. In this specific issue, the PEP states "the unicode_literals import the native string type is no longer available and has to be incorrectly labeled as bytestring" This is incorrect: even though the native string type indeed is no longer available, it is *not* consequential that it has to be labeled as byte string. Instead, you can use the str() function. It may be that you don't like that solution for some reason. If so, please mention the approach in the PEP, along with your reason for not liking it. Regards, Martin
On 28.2.2012 01:16, martin@v.loewis.de wrote:
Armin, I propose that you correct the *factual* deficits of the PEP
He cannot, because he would have to throw away whole PEP ... it is all based on non-sensical concept of "native string". There is no such animal (there are only strings and bytes, although they are incorrectly named Unicode strings and strings in Python 2), and whole PEP is just "I don't like Python 3 and I want it to be reverted back to Python 2". It doesn't matter anymore now, but I just needed to put it off my chest. Matěj
On Tue, Feb 28, 2012 at 5:56 PM, Matej Cepl <mcepl@redhat.com> wrote:
He cannot, because he would have to throw away whole PEP ... it is all based on non-sensical concept of "native string". There is no such animal (there are only strings and bytes, although they are incorrectly named Unicode strings and strings in Python 2), and whole PEP is just "I don't like Python 3 and I want it to be reverted back to Python 2".
It doesn't matter anymore now, but I just needed to put it off my chest.
If you don't know what a native string is, then you need to study more to understand why Armin's PEP exists and why it is useful. I suggest starting with PEP 3333 (the WSGI update to v1.0.1 that first clearly defined the concept of a native string: http://www.python.org/dev/peps/pep-3333/#a-note-on-string-types). There are concrete, practical reasons why the lack of Unicode literals in Python 3 makes porting harder than it needs to be. Are they insurmountable? No, of course not - there are plenty of successful ports already that demonstate porting it quite feasible with existing tools. But the existing approaches require that, in order to be forward compatible with Python 3, a program must be made *worse* in Python 2 (i.e. harder to read and harder to write correctly for someone that hasn't learned Python 3 yet). Restoring unicode literal support in 3.3 is a pragmatic step that allows a lot of code to *just work* on Python 3. Most 2.6+ code that still doesn't work on Python 3 even after this change will be made *better* (or at least not made substantially worse) by the additional changes necessary for forward compatibility. Unicode literals are somewhat unique in their impact on porting efforts, as they show up *everywhere* in Unicode correct code in Python 2. The diffs that will be needed to correctly tag bytestrings in such code under Python 2 are tiny compared to those that would be needed to strip the u"" prefixes. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Tue, 28 Feb 2012 21:42:54 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
But the existing approaches require that, in order to be forward compatible with Python 3, a program must be made *worse* in Python 2 (i.e. harder to read and harder to write correctly for someone that hasn't learned Python 3 yet).
Wrong. The separate branches approach allows you to have a clean Python 3 codebase without crippling the Python 2 codebase. Of course that approach was downplayed from the start in favour of using 2to3 on a single codebase, and now we discover that this approach is cumbersome. Note that 2to3 is actually helpful when you choose the dual branches approach, and it isn't a serial dependency in that case. (see https://bitbucket.org/pitrou/t3k/) Regards Antoine.
On Tue, Feb 28, 2012 at 9:52 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Tue, 28 Feb 2012 21:42:54 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
But the existing approaches require that, in order to be forward compatible with Python 3, a program must be made *worse* in Python 2 (i.e. harder to read and harder to write correctly for someone that hasn't learned Python 3 yet).
Wrong. The separate branches approach allows you to have a clean Python 3 codebase without crippling the Python 2 codebase. Of course that approach was downplayed from the start in favour of using 2to3 on a single codebase, and now we discover that this approach is cumbersome.
If you're using separate branches, then your Python 2 code isn't being made forward compatible with Python 3. Yes, it avoids making your Python 2 code uglier, but it means maintaining two branches in parallel until you drop Python 2 support. You've once again raised the barrier to entry: either people contribute two patches, or they accept that their patch may languish until someone else writes the patch for the other version. Again, as with 2to3, that approach obviously *works* (we've done it ourselves for years with the standard library), but it's hardly a low friction approach to porting. That's all PEP 414 is about - lowering the friction of porting to Python 3. Is it *necessary*? No, there are already enough successful ports to prove that, if sufficiently motivated, porting to Python 3 is feasible with the current toolset. However, that's the wrong question. The right question is "Does PEP 414 make porting substantially *easier*, by significantly reducing the volume of code that needs to change in order to attain Python 3 compatibility?". And the answer to *that* question is "Absolutely." Porting the web frameworks themselves to Python 3 is only the first step in migrating those ecosystems to Python 3, and because the web APIs exposed by those frameworks are so heavily Unicode based this is an issue that will hit pretty much every Python web app and library on the planet. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Le mardi 28 février 2012 à 22:14 +1000, Nick Coghlan a écrit :
If you're using separate branches, then your Python 2 code isn't being made forward compatible with Python 3. Yes, it avoids making your Python 2 code uglier, but it means maintaining two branches in parallel until you drop Python 2 support.
IMO, maintaining two branches shouldn't be much more work than maintaining hacks so that a single codebase works with two different programming languages.
You've once again raised the barrier to entry: either people contribute two patches, or they accept that their patch may languish until someone else writes the patch for the other version.
Again that's wrong. If you cleverly use 2to3 to port between branches, patches only have to be written against the 2.x version. Regards Antoine.
Il 28 febbraio 2012 13:19, Antoine Pitrou <solipsis@pitrou.net> ha scritto:
Le mardi 28 février 2012 à 22:14 +1000, Nick Coghlan a écrit :
If you're using separate branches, then your Python 2 code isn't being made forward compatible with Python 3. Yes, it avoids making your Python 2 code uglier, but it means maintaining two branches in parallel until you drop Python 2 support.
IMO, maintaining two branches shouldn't be much more work than maintaining hacks so that a single codebase works with two different programming languages.
Would that mean distributing 2 separate tarballs? How would tools such as easy_install and pip work in respect of that? Is there a naming convention they can rely on? --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/
Le 28/02/2012 13:48, Giampaolo Rodolà a écrit :
Il 28 febbraio 2012 13:19, Antoine Pitrou <solipsis@pitrou.net> ha scritto:
IMO, maintaining two branches shouldn't be much more work than maintaining hacks so that a single codebase works with two different programming languages.
Would that mean distributing 2 separate tarballs? How would tools such as easy_install and pip work in respect of that? Is there a naming convention they can rely on?
Sadly, PyPI and the packaging tools don’t play nice with non-single-codebase projects, so you have to use a different name for your 3.x-compatible release, like “unittestpy3k”. Some bdists include the Python version in the file name, but sdists don’t. Regards
On Tue, Feb 28, 2012 at 10:19 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Le mardi 28 février 2012 à 22:14 +1000, Nick Coghlan a écrit :
If you're using separate branches, then your Python 2 code isn't being made forward compatible with Python 3. Yes, it avoids making your Python 2 code uglier, but it means maintaining two branches in parallel until you drop Python 2 support.
IMO, maintaining two branches shouldn't be much more work than maintaining hacks so that a single codebase works with two different programming languages.
Aside from the unicode literal problem, I find that the Python 2.6+/3.2+ subset is still a fairly nice language for an application level web program. Most of the rest of the bytes/text ugliness is hidden away below the framework layer where folks like Chris, Armin and Jacob have to deal with it, but it doesn't affect me as a framework user.
You've once again raised the barrier to entry: either people contribute two patches, or they accept that their patch may languish until someone else writes the patch for the other version.
Again that's wrong. If you cleverly use 2to3 to port between branches, patches only have to be written against the 2.x version.
Apparently *you* know how to do that, but I don't. If I, as a CPython core developer, don't know how to do that, how is it reasonable to expect J. Random Hacker to become a Python 3 porting export? PEP 414 is all about lowering the barrier to entry for successful Python 3 ports. OK, fine some very clever people have invested a lot of time in finding ways to deal with the status quo that make it less painful. That doesn't mean it isn't painful - it just means the early adopters have steeled themselves against the pain and learned to suck it up and cope. Now that we've discovered some of the key sources of pain, we can live with a few pragmatic concessions in the purity of Python 3's language definition to ease the transition for the vast number of Python 3 ports which have yet to begin. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Feb 28, 2012, at 10:49 PM, Nick Coghlan wrote:
On Tue, Feb 28, 2012 at 10:19 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Again that's wrong. If you cleverly use 2to3 to port between branches, patches only have to be written against the 2.x version.
Apparently *you* know how to do that, but I don't. If I, as a CPython core developer, don't know how to do that, how is it reasonable to expect J. Random Hacker to become a Python 3 porting export?
They don't need to, but *we* do, and it's incumbent on us to educate our users. I strongly believe that *now* is the time to be porting to Python 3. It's critical to the long-term health of Python. It's up to us to learn the strategies for accomplishing this, spread the message that it is not only possible, but usually easy (and yes even, from my own experience, fun!). Oh and here's how in three easy steps, 1, 2, 3. I've blogged about my own porting experiences extensively. My strategies may not work for everyone, but they will work for a great many projects. If they work for yours, spread the word. If they don't, figure out something better, write about it, and spread the word. We really need to stop saying that porting to Python 3 is hard, or should be delayed. It's not in the vast majority of cases. Yes, there are warts, and we should continue to improve Python 3 so it gets easier, but by no means is it impossible for most code to be working very nicely on Python 3 today. -Barry
On 28/02/2012 14.19, Antoine Pitrou wrote:
If you're using separate branches, then your Python 2 code isn't being made forward compatible with Python 3. Yes, it avoids making your Python 2 code uglier, but it means maintaining two branches in parallel until you drop Python 2 support. IMO, maintaining two branches shouldn't be much more work than
Le mardi 28 février 2012 à 22:14 +1000, Nick Coghlan a écrit : maintaining hacks so that a single codebase works with two different programming languages.
+10 For every CPython bug that I fix I first apply the patch on 2.7, then on 3.2 and then on 3.3. Most of the time I don't even need to change anything while applying the patch to 3.2, sometimes I have to do some trivial fixes. This is also true for another personal 12kloc project* where I'm using the two-branches approach. For me, the costs of having two branches are: 1) a one-time conversion when the Python3-compatible branch is created (can be done easily with 2to3); 2) merging the fix I apply to the Python2 branch (and with modern DVCS this is not really an issue). With the shared code base approach, the costs are: 1) a one-time conversion to "fix" the code base and make it run on both 2.x and 3.x; 2) keep using and having to deal with hacks in order to keep it running. With the first approach, you also have two clean and separate code bases, with no hacks; when you stop using Python 2, you end up with a clean Python 3 branch. The one-time conversion also seems easier in the first case. (Note: there are also other costs -- e.g. releasing -- that I haven't considered because they don't affect me personally, but I'm not sure they are big enough to make the two-branches approach worse.)
You've once again raised the barrier to entry: either people contribute two patches, or they accept that their patch may languish until someone else writes the patch for the other version. Again that's wrong. If you cleverly use 2to3 to port between branches, patches only have to be written against the 2.x version.
After the initial conversion of the code base, the fixes are mostly trivial, so people don't need to write two patches (most of the patches we get for CPython are either against 2.7 or 3.2, and sometimes they even apply clearly to both). Using 2to3 to generate the 3.x code automatically for every change applied to the 2.x branch (or to convert everything when a new package is installed) sounds wrong to me. I wouldn't trust generated code even if 2to3 was a better tool. That said, I successfully used the shared code base approach with print_function, unicode_literals, and a couple of try/except for the imports for a few one-file scripts (for 2.7/3.2) that I wrote recently. TL;DR the two-branches approach usually works better (at least IME) than the shared code base approach, doesn't necessarily require more work, and doesn't need ugly hacks to work. * in this case all the string literals I had were already text (rather than bytes) and even without using unicode_literals they worked out of the box when I moved the code to 3.x. There was however a place where it didn't work, and that turned out to be a bug even in Python 2 because I was mixing bytes and text. Best Regards, Ezio Melotti
Regards
Antoine.
Il 28 febbraio 2012 15:20, Ezio Melotti <ezio.melotti@gmail.com> ha scritto:
On 28/02/2012 14.19, Antoine Pitrou wrote:
Le mardi 28 février 2012 à 22:14 +1000, Nick Coghlan a écrit :
If you're using separate branches, then your Python 2 code isn't being made forward compatible with Python 3. Yes, it avoids making your Python 2 code uglier, but it means maintaining two branches in parallel until you drop Python 2 support.
IMO, maintaining two branches shouldn't be much more work than maintaining hacks so that a single codebase works with two different programming languages.
+10
For every CPython bug that I fix I first apply the patch on 2.7, then on 3.2 and then on 3.3. Most of the time I don't even need to change anything while applying the patch to 3.2, sometimes I have to do some trivial fixes. This is also true for another personal 12kloc project* where I'm using the two-branches approach.
For me, the costs of having two branches are: 1) a one-time conversion when the Python3-compatible branch is created (can be done easily with 2to3); 2) merging the fix I apply to the Python2 branch (and with modern DVCS this is not really an issue).
With the shared code base approach, the costs are: 1) a one-time conversion to "fix" the code base and make it run on both 2.x and 3.x; 2) keep using and having to deal with hacks in order to keep it running.
With the first approach, you also have two clean and separate code bases, with no hacks; when you stop using Python 2, you end up with a clean Python 3 branch. The one-time conversion also seems easier in the first case.
(Note: there are also other costs -- e.g. releasing -- that I haven't considered because they don't affect me personally, but I'm not sure they are big enough to make the two-branches approach worse.)
They are. With that kind of approach you're basically forced to include the python version number as part of the tarball name (e.g. foo-0.3.1-py2.tar.gz and foo-0.3.1-py3.tar.gz). Just to name one, that means "foo" can't be installed via pip/easy_install. Regards, --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/
On Tue, Feb 28, 2012 at 16:30, Giampaolo Rodolà <g.rodola@gmail.com> wrote:
Il 28 febbraio 2012 15:20, Ezio Melotti <ezio.melotti@gmail.com> ha scritto:
(Note: there are also other costs -- e.g. releasing -- that I haven't considered because they don't affect me personally, but I'm not sure they are big enough to make the two-branches approach worse.)
They are. With that kind of approach you're basically forced to include the python version number as part of the tarball name (e.g. foo-0.3.1-py2.tar.gz and foo-0.3.1-py3.tar.gz).
Not at all. You can include both code bases in one package. http://python3porting.com/2to3.html#distributing-packages //Lennart
Ezio Melotti <ezio.melotti <at> gmail.com> writes:
For every CPython bug that I fix I first apply the patch on 2.7, then on 3.2 and then on 3.3. Most of the time I don't even need to change anything while applying the patch to 3.2, sometimes I have to do some trivial fixes. This is also true for another personal 12kloc project* where I'm using the two-branches approach.
I hear what you say about the personal project, but IMO CPython is atypical (as far as this discussion is concerned), not least because it's not a pure-Python project.
For me, the costs of having two branches are: 1) a one-time conversion when the Python3-compatible branch is created (can be done easily with 2to3);
Yes, but the amount of ease is project-dependent. For example, 2to3 wraps values() method calls with list(), which is a reasonable thing to do for dicts; when presented Django's querysets, which have a values() method which should not be wrapped, then you have to go through and sort things out. I'm not knocking 2to3, which I think is great. Just that things go well sometimes, and less well at other times,
With the shared code base approach, the costs are: 1) a one-time conversion to "fix" the code base and make it run on both 2.x and 3.x; 2) keep using and having to deal with hacks in order to keep it running.
Which hacks do you mean, if you're only interested in 2.6+?
With the first approach, you also have two clean and separate code bases, with no hacks; when you stop using Python 2, you end up with a clean Python 3 branch. The one-time conversion also seems easier in the first case.
(Note: there are also other costs -- e.g. releasing -- that I haven't considered because they don't affect me personally, but I'm not sure they are big enough to make the two-branches approach worse.)
I don't believe there's a one-size-fits-all. The two branches approach is appealing, and I have no quarrel with it: but I contend that big projects like Django would be reluctant to switch, or take much longer to switch to 3.x, if they had to maintain separate branches. Given the size of their user community, they have to follow strict release procedures, which (even with just running on 2.x) smaller projects can be more relaxed about. You forgot to mention the part which is most time-consuming day-to-day: making changes and testing. For the two-branch approach, its 1. Change on 2.x 2. Test on 2.x 3. Commit on 2.x 4. Merge to 3.x 5. Possibly change on 3.x 6. Test on 3.x 7. Commit on 3.x where each "test" step, if failures occur, might take you back to a previous "change" step. For the single codebase, that's 1. Change 2. Test on 2.x 3. Test on 3.x 4. Commit This, to me, is the single big advantage of the single codebase approach, and the productivity improvements outweigh code purity issues which are, in the grand scheme of things, not all that large. Another advantage is DRY: you don't have to worry about forgetting to merge some changes from 2.x to 3.x. Haven't we all been there one time or another? I know I have, though I try not to make a habit of it ;-)
After the initial conversion of the code base, the fixes are mostly trivial, so people don't need to write two patches (most of the patches we get for CPython are either against 2.7 or 3.2, and sometimes they even apply clearly to both).
Fixes may be trivial, but new features might not always be so. Regards, Vinay Sajip
On 28/02/2012 18.08, Vinay Sajip wrote:
For every CPython bug that I fix I first apply the patch on 2.7, then on 3.2 and then on 3.3. Most of the time I don't even need to change anything while applying the patch to 3.2, sometimes I have to do some trivial fixes. This is also true for another personal 12kloc project* where I'm using the two-branches approach. I hear what you say about the personal project, but IMO CPython is atypical (as far as this discussion is concerned), not least because it's not a pure-Python
Ezio Melotti<ezio.melotti<at> gmail.com> writes: project.
Most of the things I fix are pure Python, I wasn't considering the C patches and doc fixes here.
For me, the costs of having two branches are: 1) a one-time conversion when the Python3-compatible branch is created (can be done easily with 2to3); Yes, but the amount of ease is project-dependent. For example, 2to3 wraps values() method calls with list(), which is a reasonable thing to do for dicts; when presented Django's querysets, which have a values() method which should not be wrapped, then you have to go through and sort things out. I'm not knocking 2to3, which I think is great. Just that things go well sometimes, and less well at other times,
With the personal project this is what I did: 1) make a separate branch; 2) run 2to3 and let it overwrite the file; 3) review the changes as I would do with any other patch before committing; 4) fix things that 2to3 missed and other minor glitches; 5) fix a few bugs that surfaced after the port (and were in the original code too); The fixes made by 2to3 were mostly: * removing u'' from strings; * renaming imports, methods (like the .iteritems); * adding 'as' in the "except"s; * adding () for a few "print"s; These changes affected about 500 lines of code (out of 12kloc). The changes I did manually after running 2to3 were (some where not strictly necessary): * removing 'object' from classes; * removing ord() in a few places; * removing the content of super(...); * removing codecs.open() and use open() instead; * removing a few .decode('utf-8'); * adding a couple of b''; After a couple of days almost everything was working fine.
With the shared code base approach, the costs are: 1) a one-time conversion to "fix" the code base and make it run on both 2.x and 3.x; 2) keep using and having to deal with hacks in order to keep it running. Which hacks do you mean, if you're only interested in 2.6+?
Things like try/except for names that changed and wrappers for bytes/strings. Of course the situation is worse for projects that have to support earlier versions.
With the first approach, you also have two clean and separate code bases, with no hacks; when you stop using Python 2, you end up with a clean Python 3 branch. The one-time conversion also seems easier in the first case.
(Note: there are also other costs -- e.g. releasing -- that I haven't considered because they don't affect me personally, but I'm not sure they are big enough to make the two-branches approach worse.) I don't believe there's a one-size-fits-all. The two branches approach is appealing, and I have no quarrel with it: but I contend that big projects like Django would be reluctant to switch, or take much longer to switch to 3.x, if they had to maintain separate branches.
I would actually feel safer doing the port in a separate branch and keep it there. Changing all the code in the main branch just to make it work for 3.x too doesn't strike like a really good idea to me.
Given the size of their user community, they have to follow strict release procedures, which (even with just running on 2.x) smaller projects can be more relaxed about.
I don't have much experience regarding releases, but developing on a separate branch shouldn't affect the release of the 2.x version. The developers will have to merge the changes to the py3 branch too, and eventually they will be able to ship an additional release for py3 too. Sure, there's more work for the developers, but that's no news.
You forgot to mention the part which is most time-consuming day-to-day: making changes and testing. For the two-branch approach, its
1. Change on 2.x 2. Test on 2.x 3. Commit on 2.x 4. Merge to 3.x 5. Possibly change on 3.x 6. Test on 3.x 7. Commit on 3.x
where each "test" step, if failures occur, might take you back to a previous "change" step.
For the single codebase, that's
1. Change 2. Test on 2.x 3. Test on 3.x 4. Commit
And if something fails here, you will have to repeat both step 2 and 3, until you get it right for both at the same time. The step 1 of the single codebase is in the end more or less equivalent to steps 1+4+5, just in a different way. The remaining extra commit takes no time, and since the branches are independent, if you find a problem with py3 you don't have to run the test suite for 2.x again. In my experience with CPython, the most time-consuming part is making the patch work on one of the branch in the first place. Once it works, porting it to the other branches is just a mechanical step that doesn't really take much. The problems during the porting arise when the two codebases diverged. (Also keep in mind that we are not actually merging from 2.x to 3.x in CPython, otherwise it would be even easier.)
This, to me, is the single big advantage of the single codebase approach, and the productivity improvements outweigh code purity issues which are, in the grand scheme of things, not all that large.
ISTM that the amount of time is pretty much the same, so I don't see this as a point of favor of the single codebase approach. I might be wrong (I don't have much experience with the single codebase approach), but having to deal with 2+ branches never bothered me (I might be biased though, since I was already used to maintaining 3-4 branches with Python).
Another advantage is DRY: you don't have to worry about forgetting to merge some changes from 2.x to 3.x. Haven't we all been there one time or another? I know I have, though I try not to make a habit of it ;-)
I don't think it never happened to me, but I see how this could happen, especially in the first period after the second branch is introduced. Your DVCS should warn you about this though, so, at worst, you'll end up having to merge someone else's commit.
After the initial conversion of the code base, the fixes are mostly trivial, so people don't need to write two patches (most of the patches we get for CPython are either against 2.7 or 3.2, and sometimes they even apply clearly to both). Fixes may be trivial, but new features might not always be so.
True, but especially if the feature is complicated, I would rather spend a bit more time and have to clean, separate versions than a single version that tries to work on both. Best Regards, Ezio Melotti
Regards,
Vinay Sajip
28.02.12 14:14, Nick Coghlan написав(ла):
However, that's the wrong question. The right question is "Does PEP 414 make porting substantially *easier*, by significantly reducing the volume of code that needs to change in order to attain Python 3 compatibility?".
Another pertinent question: "What are disadvantages of PEP 414 is adopted?"
Serhiy Storchaka <storchaka <at> gmail.com> writes:
Another pertinent question: "What are disadvantages of PEP 414 is adopted?"
It's moot, but as I see it: the purpose of PEP 414 is to facilitate a single codebase across 2.x and 3.x. However, it only does this if your 3.x interest is 3.3+. If you also want to or need to support 3.0 - 3.2, it makes your workflow more painful, because you can't run tests on 2.x or 3.3 and then run them on 3.2 without an intermediate source conversion step - just like the 2to3 step that people find painful when it's part of maintenance workflow, and which in part prompted the PEP in the first place. Regards, Vinay Sajip
Vinay Sajip wrote:
Serhiy Storchaka <storchaka <at> gmail.com> writes:
Another pertinent question: "What are disadvantages of PEP 414 is adopted?"
It's moot, but as I see it: the purpose of PEP 414 is to facilitate a single codebase across 2.x and 3.x. However, it only does this if your 3.x interest is 3.3+. If you also want to or need to support 3.0 - 3.2, it makes your workflow more painful, because you can't run tests on 2.x or 3.3 and then run them on 3.2 without an intermediate source conversion step - just like the 2to3 step that people find painful when it's part of maintenance workflow, and which in part prompted the PEP in the first place.
I don't think it's fair to say it makes it *more* painful. Fair to say it doesn't make it less painful, but adding u'' to 3.3+ doesn't make it harder to port from 2.x to 3.1+. You're merely no better off with it than without it. Aside: in my opinion, people shouldn't actively support 3.0, or at least not advertise support for it, as it was end-of-lifed on the release of 3.1. As I see it, it is best to pretend that 3.0 never existed :) -- Steven
Steven D'Aprano <steve <at> pearwood.info> writes:
I don't think it's fair to say it makes it *more* painful. Fair to say it doesn't make it less painful, but adding u'' to 3.3+ doesn't make it harder to port from 2.x to 3.1+. You're merely no better off with it than without it.
No, it actually does make it *more* painful in some scenarios. Let's say Django decides to move to 3.x using a single codebase starting with 3.3, using PEP 414 to avoid changing u'xxx' in their source code. This is dandy for 3.3, and say I have to work with Django on 2.6, 2.7 and 3.3. Great - I make some changes, I run tests on 2.x, 3.3 - make changes as needed to fix failures, then commit. And on to the next set of changes. Now, suppose I also need to support 3.2, in addition to the other versions. I don't get the same easy workflow I had before: for 3.2, I have to run Armin's hook to remove the u'' prefixes between making changes and running tests, *every time*, but the output will be written to a separate directory, and I may have to maintain a separate test environment there in terms of test data files etc. It's exactly the complaint the PEP makes about having to have 2to3 in the workflow, and how that hurts your productivity! Though the experience may differ in degree because Armin's tool is faster, it's not going to make for a seamless workflow. Especially not if it has to run over all the files in the Django codebase. And if it's going to know only which files have changed and run only on those, how does it propose to do that, independently of my editing tools? Regards, Vinay Sajip
In http://mail.python.org/pipermail/python-dev/2012-February/117070.html Vinay Sajip wrote:
It's moot, but as I see it: the purpose of PEP 414 is to facilitate a single codebase across 2.x and 3.x. However, it only does this if your 3.x interest is 3.3+
For many people -- particularly those who haven't ported yet -- 3.x will mean 3.3+. There are some who will support 3.2 because it is a LTS release on some distribution, just as there were some who supported Python 1.5 (but not 1.6) long into the 2.x cycle, but I expect them to be the minority. I certainly don't expect 3.2 to remain a primary development target, the way that 2.7 is. IIRC, the only ways to use 3.2 even today are: (a) Make an explicit choice to use something other than the default (b) Download directly and choose 3.x without OS support (c) Use Arch Linux These are the sort of people who can be expected to upgrade. Now also remember that we're talking specifically about projects that have *not* been ported to 3.x (==> no existing users to support), and that won't be ported until 3.2 is already in maintenance mode.
If you also want to or need to support 3.0 - 3.2, it makes your workflow more painful,
Compared to dropping 3.2, yes. Compared to supporting 3.2 today? I don't see how.
because you can't run tests on 2.x or 3.3 and then run them on 3.2 without an intermediate source conversion step - just like the 2to3 step that people find painful when it's part of maintenance workflow, and which in part prompted the PEP in the first place.
So the only differences compared to today are that: (a) Fewer branches are after the auto-conversion. (b) No "current" branches are after the auto-conversion. (c) The auto-conversion is much more limited in scope. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ
On Feb 28, 2012, at 10:59 AM, Jim J. Jewett wrote:
For many people -- particularly those who haven't ported yet -- 3.x will mean 3.3+. There are some who will support 3.2 because it is a LTS release on some distribution, just as there were some who supported Python 1.5 (but not 1.6) long into the 2.x cycle, but I expect them to be the minority.
I certainly don't expect 3.2 to remain a primary development target, the way that 2.7 is. IIRC, the only ways to use 3.2 even today are:
(a) Make an explicit choice to use something other than the default (b) Download directly and choose 3.x without OS support (c) Use Arch Linux
On Debian and Ubuntu, installing Python 3.2 is easy, even if it isn't the default. However, once installed, 'python3' is Python 3.2. I personally think Python 3.2 makes for a fine platform for new code, and just as good for porting most existing libraries and applications to. You can get many Python 3.2 compatible packages from the Debian and Ubuntu archives by using the normal installation procedures, and generally, if there is a 'python-foo' package, the Python 3.2 compatible version will be called 'python3-foo'. I would expect other Linux distros to be in generally the same boat. There's a lot already available, and this will definitely increase over time. Although on Ubuntu we'll be planning future developments at UDS in May, I would expect Ubuntu 12.10 to have Python 3.3 (probably in addition to Python 3.2 since we can do that easily), and looking ahead at the expected Python release schedule, I'm expecting our next LTS in 2014 (Ubuntu 14.04) will probably ship with Python 3.4, either with or without the earlier Python 3 versions. So I think if you're starting a new project, write it in Python 3 and target Python 3.2. The only reason not to do that is if some critical part of your dependency stack hasn't yet been ported, and in that case, help them get there! IME, most are grateful for a patch or branch that added Python 3 support.
These are the sort of people who can be expected to upgrade.
Now also remember that we're talking specifically about projects that have *not* been ported to 3.x (==> no existing users to support), and that won't be ported until 3.2 is already in maintenance mode.
I really hope most people won't wait. Sure, the big frameworks by their nature are going to have more inertia, but if you are the author of a Python library, you can and should port *now* and target Python 3.2. Only this way will we as a community be able to build up the dependency stack so that when the large frameworks are ready, your library which they may depend on, will have a long and stable history on Python 3. -Barry
On Tue, Feb 28, 2012 at 16:39, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
Serhiy Storchaka <storchaka <at> gmail.com> writes:
Another pertinent question: "What are disadvantages of PEP 414 is adopted?"
It's moot, but as I see it: the purpose of PEP 414 is to facilitate a single codebase across 2.x and 3.x.
The bytes/native/unicode issue is an issue even if you use 2to3. But of course that *is* a form of "single codebase" so maybe that's what you meant. :-) //Lennart
All the various strategies for supporting Python 2 and Python 3 as well as their various drawbacks and ways around this is covered in my book, chapter 2. :-) http://python3porting.com/strategies.html I may be too late to point this out, but it feels like this discussion could have been shorter if everyone read this first. :-) //Lennart
Antoine Pitrou <solipsis <at> pitrou.net> writes:
Wrong. The separate branches approach allows you to have a clean Python 3 codebase without crippling the Python 2 codebase.
There may be warts in a single codebase (you usually can't have something for nothing), but it's not necessarily *crippled* when running in 2.x. Of course two branches allow you to have a no-compromise approach for the code style, but you might pay for that in time spent doing merges etc.
Note that 2to3 is actually helpful when you choose the dual branches approach, and it isn't a serial dependency in that case. (see https://bitbucket.org/pitrou/t3k/)
Yes, 2to3 is very useful when doing an initial porting exercise. I've used it just once in each port I've done. It also works well for a single codebase approach, only I just follow its advice rather than letting it do the conversion automatically. Regards, Vinay Sajip
Nick Coghlan <ncoghlan <at> gmail.com> writes:
tools. But the existing approaches require that, in order to be forward compatible with Python 3, a program must be made *worse* in Python 2 (i.e. harder to read and harder to write correctly for someone that hasn't learned Python 3 yet). Restoring unicode literal
How so? In the case of string literals, are you saying that it's worse in that you use 'xxx' instead of u'xxx' for text, and have to add a unicode_literals import? I don't feel that either of those make a 2.x program worse.
support in 3.3 is a pragmatic step that allows a lot of code to *just work* on Python 3. Most 2.6+ code that still doesn't work on Python 3 even after this change will be made *better* (or at least not made substantially worse) by the additional changes necessary for forward compatibility.
Remember, the PEP advocates what it does in the name of a single codebase. If you want to (or have to) support 3.2 in addition to 3.3, 2.6, 2.7, the PEP does not work for you. It only works for you if you're interested in 2.6+ and 3.3+.
Unicode literals are somewhat unique in their impact on porting efforts, as they show up *everywhere* in Unicode correct code in Python 2. The diffs that will be needed to correctly tag bytestrings in such code under Python 2 are tiny compared to those that would be needed to strip the u"" prefixes.
But that's a one-time operation using a lib2to3 fixer, and even for a big project like Django, we're not talking about a lot of time spent on this (at least, in my experience). Having a good test suite helps catch those byte-string cases more easily, of course. Regards, Vinay Sajip
Hi, On 2/28/12 12:16 AM, martin@v.loewis.de wrote:
Armin, I propose that you correct the *factual* deficits of the PEP (i.e. remove all claims that cannot be supported by facts, or are otherwise incorrect or misleading). Many readers here would be more open to accepting the PEP if it was factual rather than polemic. Please don't call this PEP polemic.
The PEP author is supposed to collect all arguments, even the ones he doesn't agree with, and refute them. I brought up all the arguments that were I knew about before I submitted this mailinglist thread and I had since not updated it.
In this specific issue, the PEP states "the unicode_literals import the native string type is no longer available and has to be incorrectly labeled as bytestring"
This is incorrect: even though the native string type indeed is no longer available, it is *not* consequential that it has to be labeled as byte string. Instead, you can use the str() function. Obviously it means not available by syntax.
It may be that you don't like that solution for some reason. If so, please mention the approach in the PEP, along with your reason for not liking it. If by str() you mean using "str('x')" as replacement for 'x' in both 2.x and 3.x with __future__ imports as a replacement for native string literals, please mention why this is better than u(), s(), n() etc. It would be equally slow than a custom wrapper function and it would not support non-ascii characters.
Regards, Armin
The PEP author is supposed to collect all arguments, even the ones he doesn't agree with, and refute them. I brought up all the arguments that were I knew about before I submitted this mailinglist thread and I had since not updated it.
This is fine, of course. I still hope you will update it now, even though it has been accepted.
This is incorrect: even though the native string type indeed is no longer available, it is *not* consequential that it has to be labeled as byte string. Instead, you can use the str() function. Obviously it means not available by syntax.
I agree that the native string type is no longer supported by syntax in that approach.
It may be that you don't like that solution for some reason. If so, please mention the approach in the PEP, along with your reason for not liking it. If by str() you mean using "str('x')" as replacement for 'x' in both 2.x and 3.x with __future__ imports as a replacement for native string literals, please mention why this is better than u(), s(), n() etc. It would be equally slow than a custom wrapper function and it would not support non-ascii characters.
That's not the point. The point is that the PEP ought to mention it as an alternative, instead of making the false claim that "it has to be labeled as byte string" (which I take as using a b"" prefix). Feel free to write something like "... it either has to be labelled as a byte string, or wrapped into a function call, e.g. using the str() function. This would be slow and would not support non-ascii characters" My whole point here is that I want the PEP to mention it, not this email thread. In addition, if you are using this very phrasing that I propose, I would then claim that a) it is not slow (certainly not as slow as a custom wrapper (*)), and b) it's not a problem that it is ASCII-only, since native strings are *practically* restricted to ASCII, anyway (even though not theoretically) In turn, I would ask that this counter-argument of mine is also reflected in the PEP. The whole point of the PEP process is that it settles disputes. Part of that settling is to avoid arguments which go in circles. To that effect, the PEP author ideally should *quickly* update the PEP, along with writing responses, so that anybody repeating an argument could be pointed to the PEP in order to shut up. HTH, Martin (*) This is also something that Guido requested at some point from the PEP: that it fairly analyses efficient implementations of potential wrapper functions, taking C implementations into account as well.
Armin Ronacher <armin.ronacher <at> active-4.com> writes:
If by str() you mean using "str('x')" as replacement for 'x' in both 2.x and 3.x with __future__ imports as a replacement for native string literals, please mention why this is better than u(), s(), n() etc. It would be equally slow than a custom wrapper function and it would not support non-ascii characters.
Well, you can give it any name you like, but if PY3: def n(literal): return literal else: # used along with "from __future__ import unicode_literals" in client code def n(literal): return literal.encode('utf-8') will support non-ASCII characters. You have not provided anything other than a microbenchmark regarding performance - as you are well aware, this does not illustrate what the performance might be on a representative workload. While there might be the odd percent in it, I didn't see any major degradation when running the Django test suite - which I would think is a more balanced workload than just benchmarking the wrapper. Of course, I don't claim to have studied the performance characteristics closely - I haven't. AFAICT, the incidence of native strings in an application is not that great (of course there can be pathological cases), so the number of calls to n() or whatever it's called is unlikely to have any significant impact. Even when I was using u() calls with the 2.5 port - which are of course much more common - the performance impact was unremarkable. Regards, Vinay Sajip
On Mon, 27 Feb 2012 22:11:36 +0000, Armin Ronacher <armin.ronacher@active-4.com> wrote:
On 2/27/12 9:58 PM, R. David Murray wrote:
But the PEP doesn't address the unicode_literals plus str() approach. That is, the rationale currently makes a false claim. Which would be exactly what that u() does not do?
The rationale claims there's no way to spell "native string" if you use unicode_literals, which is not true. It would be different from u('') in that I would expect that there are far fewer instances where 'native string' is required than there are places where unicode strings work (and should therefore be preferred). This only matters now in order to make the PEP more accurate, but I think that is a good thing to do. --David
R. David Murray <rdmurray <at> bitdance.com> writes:
The rationale claims there's no way to spell "native string" if you use unicode_literals, which is not true.
It would be different from u('') in that I would expect that there are far fewer instances where 'native string' is required than there are places where unicode strings work (and should therefore be preferred).
A couple of people have said that 'native string' is spelt 'str', but I'm not sure that's the right answer. For example, 2.x's cString.StringIO expects native strings, not Unicode:
from cStringIO import StringIO s = StringIO(u'\xe9') s <cStringIO.StringI object at 0x232de40> s.getvalue() '\xe9\x00\x00\x00'
Of course, you can't call str() on that value to get a native string:
str(u'\xe9') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)
So I think using str will not give the desired effect in some situations: on Django, I used a function that resolves differently depending on Python version: something like def native(literal): return literal on Python 3, and def native(literal): return literal.encode('utf-8') on Python 2. I'm not saying this is the right thing to do for all cases - just that str() may not be, either. This should be elaborated in the PEP. Regards, Vinay Sajip
I'm +1 on the PEP, for reasons already repeated here. We need three types of strings when supporting both Python 2 and Python 3. A binary string, a unicode string and a "native" string, ie one that is the old 8-bit str in python 2 but a Unicode str in Python 3. Adding back the u'' prefix is the easiest, most obvious/intuitive/pythong/whatever way of getting that support, that requires the least amount of code change, and the least ugly code. -- Lennart Regebro: http://regebro.wordpress.com/ Porting to Python 3: http://python3porting.com/
Lennart Regebro <regebro <at> gmail.com> writes:
I'm +1 on the PEP, for reasons already repeated here. We need three types of strings when supporting both Python 2 and Python 3. A binary string, a unicode string and a "native" string, ie one that is the old 8-bit str in python 2 but a Unicode str in Python 3.
Well it's a done deal, and as I said elsewhere on the thread, I wasn't opposing the PEP, but wanting some improvements in it. ISTM that given the PEP as it is, working across 3.2 and 3.3 on a single codebase may not always be the easiest process (IIUC you have to run a mini2to3 process, and it'll need to be cleverer than 2to3 about running over the entire codebase if it's to appear seamless), but I guess that's a smaller number of people you'd upset, and those people are committed to 3.x anyway. It's the 2.x porters we're trying to win over - I see that. It will be very nice if this leads to an increase in the rate at which libraries are ported to 3.x.
Adding back the u'' prefix is the easiest, most obvious/intuitive/pythong/whatever way of getting that support, that requires the least amount of code change, and the least ugly code.
"Least ugly" is subjective; I find u'xxx' less pretty than 'xxx' for text. Regards, Vinay Sajip
On Tue, Feb 28, 2012 at 08:51, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
Lennart Regebro <regebro <at> gmail.com> writes:
I'm +1 on the PEP, for reasons already repeated here. We need three types of strings when supporting both Python 2 and Python 3. A binary string, a unicode string and a "native" string, ie one that is the old 8-bit str in python 2 but a Unicode str in Python 3.
Well it's a done deal, and as I said elsewhere on the thread, I wasn't opposing the PEP, but wanting some improvements in it. ISTM that given the PEP as it is, working across 3.2 and 3.3 on a single codebase may not always be the easiest process (IIUC you have to run a mini2to3 process, and it'll need to be cleverer than 2to3 about running over the entire codebase if it's to appear seamless),
Distribute helps with this. I think we might have to add a support in distribute to easily exclude the fixer that removes u''-prefixes, I don't remember if there is an "exclude" feature.
Lennart Regebro <regebro <at> gmail.com> writes:
Distribute helps with this. I think we might have to add a support in distribute to easily exclude the fixer that removes u''-prefixes, I don't remember if there is an "exclude" feature.
We might be at cross purposes here. I don't see how Distribute helps, because the use case I'm talking about is not about distributing or installing stuff, but iteratively changing and testing code which needs to work on 2.6+, 3.2 and 3.3+. If the 2.x code depends on having u'xxx' literals, then 3.2 testing will potentially involve running a fixer on all files in the project every time a change is made, writing to a separate directory, or else a fixer which is integrated into the editing environment so it knows what changed. This is painful, and what motivated PEP 314 in the first place - which seems ironic. The PEP 314 approach seems to assume that that if things work on 3.3, they will work on 3.2/3.1/3.0 without any changes other than replacing u'xxx' with 'xxx'. In other words, you aren't supposed to want to e.g. test 3.2 and 3.3 iteratively, using a workflow which intersperses edits with running tests using 3.2 and running tests with 3.3. In any case, a single code base seems not to be possible across 2.6+/3.0/3.1/3.2/3.3+ using the PEP 314 approach, though of course one will be possible for just 2.6+/3.3+. Early adopters of 3.x seem to be penalised by this approach: I for one will try to use the unicode_literals approach wherever I can. Regards, Vinay Sajip
On Tue, Feb 28, 2012 at 10:10 PM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
If the 2.x code depends on having u'xxx' literals, then 3.2 testing will potentially involve running a fixer on all files in the project every time a change is made, writing to a separate directory, or else a fixer which is integrated into the editing environment so it knows what changed. This is painful, and what motivated PEP 314 in the first place - which seems ironic.
No, the real idea behind PEP 414 is that most ports that rely on it simply won't support 3.2 - they will only target 3.3+. The u"" fixer will just be one more tool in the arsenal of those that *do* want to support 3.2 (either because they want to target Ubuntu's LTS 3.2 stack, or for their own reasons). All of the other alternatives (such as separate branches or the unicode_literals future import) will also remain available to them. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Nick Coghlan <ncoghlan <at> gmail.com> writes:
On Tue, Feb 28, 2012 at 10:10 PM, Vinay Sajip <vinay_sajip <at> yahoo.co.uk>
wrote:
If the 2.x code depends on having u'xxx' literals, then 3.2 testing will potentially involve running a fixer on all files in the project every time a change is made, writing to a separate directory, or else a fixer which is integrated into the editing environment so it knows what changed. This is painful, and what motivated PEP 314 in the first place - which seems ironic.
No, the real idea behind PEP 414 is that most ports that rely on it simply won't support 3.2 - they will only target 3.3+.
Well, yes in that the PEP will only be implemented in 3+, but the motivation was to make a single codebase easier to achieve. It does that if you take the narrow view of 2.6+/3.3+, but not if you factor 3.2 into the mix. Maybe 3.2 adoption is too low for us to worry about here, but I for one certainly wish it hadn't been relegated to being a 2nd-class citizen.
The u"" fixer will just be one more tool in the arsenal of those that *do* want to support 3.2 (either because they want to target Ubuntu's LTS 3.2 stack, or for their own reasons). All of the other alternatives (such as separate branches or the unicode_literals future import) will also remain available to them.
Right, I get that - as I said, unicode_literals is my preferred path of the options available. It's a shame to see this sort of Balkanisation, though. For example, if Django retains u'xxx' literals (even though I've ported it using unicode_literals, they may choose a different path officially), users wanting to work with it using 2.6/2.7/3.2/3.3 (as I do now) are SOL as far as a single codebase is concerned. Of course, when you're working on your own project, you can call the shots. But problems can arise if you have to work with an external project, as many of us frequently do. Regards, Vinay Sajip
On Tue, 28 Feb 2012 22:21:11 +1000, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Tue, Feb 28, 2012 at 10:10 PM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
If the 2.x code depends on having u'xxx' literals, then 3.2 testing will potentially involve running a fixer on all files in the project every time a change is made, writing to a separate directory, or else a fixer which is integrated into the editing environment so it knows what changed. This is painful, and what motivated PEP 314 in the first place - which seems ironic.
No, the real idea behind PEP 414 is that most ports that rely on it simply won't support 3.2 - they will only target 3.3+.
Hmm. It seems to me that this argument implies that PEP 414 is just as likely to *slow down* adoption of Python3 as it is to speed it up, since if this issue is as big a barrier as indicated, many potential porters may choose to wait until OS vendors are supporting 3.3 widely before starting their ports. We are clearly expecting that the reality is that the impact will be at worse neutral, and hopefully positive. --David
On Feb 28, 2012, at 08:41 AM, R. David Murray wrote:
Hmm. It seems to me that this argument implies that PEP 414 is just as likely to *slow down* adoption of Python3 as it is to speed it up, since if this issue is as big a barrier as indicated, many potential porters may choose to wait until OS vendors are supporting 3.3 widely before starting their ports. We are clearly expecting that the reality is that the impact will be at worse neutral, and hopefully positive.
If PEP 414 helps some projects migrate to Python 3, great. But I really hope we as a community don't perpetuate the myth that you cannot port to Python 3 without this, and I hope that we spend as much effort on educating other Python developers on how to port to Python 3 *right now* supporting Python 2.6, 2.7, and 3.2. That's the message we should be spreading and we should be helping developers understand exactly how to do this effectively, among the many great options that exist today. Only in the most extreme cases or the most inertially challenged projects should we say "wait for Python 3.3". Cheers, -Barry
On Tue, Feb 28, 2012 at 09:53, Barry Warsaw <barry@python.org> wrote:
On Feb 28, 2012, at 08:41 AM, R. David Murray wrote:
Hmm. It seems to me that this argument implies that PEP 414 is just as likely to *slow down* adoption of Python3 as it is to speed it up, since if this issue is as big a barrier as indicated, many potential porters may choose to wait until OS vendors are supporting 3.3 widely before starting their ports. We are clearly expecting that the reality is that the impact will be at worse neutral, and hopefully positive.
If PEP 414 helps some projects migrate to Python 3, great.
But I really hope we as a community don't perpetuate the myth that you cannot port to Python 3 without this, and I hope that we spend as much effort on educating other Python developers on how to port to Python 3 *right now* supporting Python 2.6, 2.7, and 3.2. That's the message we should be spreading and we should be helping developers understand exactly how to do this effectively, among the many great options that exist today. Only in the most extreme cases or the most inertially challenged projects should we say "wait for Python 3.3".
Well, when the code is committed I will update the porting HOWTO and push the __future__ imports first since they cover more versions of Python (i.e. Python 3.2). But I will mention the options that skip the __future__ imports for those that choose not to use them (or have already done the work of using the u prefix in their code). Plus that doc probably will need an update of caveats that seem to bit everyone (e.g. the str(bytes) thing which I didn't know about) when trying to do source-compatible versions.
On Feb 28, 2012, at 10:23 AM, Brett Cannon wrote:
Well, when the code is committed I will update the porting HOWTO and push the __future__ imports first since they cover more versions of Python (i.e. Python 3.2). But I will mention the options that skip the __future__ imports for those that choose not to use them (or have already done the work of using the u prefix in their code). Plus that doc probably will need an update of caveats that seem to bit everyone (e.g. the str(bytes) thing which I didn't know about) when trying to do source-compatible versions.
See, I think the emphasis should be on using the future imports and unadorning your unicode literals. Forget about this PEP except as a footnote. This strategy works today for most packages. You might think that this is ugly, but really, I think that doesn't matter (or maybe better: get over it! :). Definitely don't let that stop you from porting *now*. In the small minority of cases where this strategy cannot work for you (and I admit to not really understanding what those cases are), then the footnote about the reintroduction of the u-prefix should be enough. And yes, the str(bytes) thing is a pain, but it too can be worked around, and is such a minor wart that it should not delay your porting efforts. -Barry
If PEP 414 helps some projects migrate to Python 3, great.
But I really hope we as a community don't perpetuate the myth that you cannot port to Python 3 without this, and I hope that we spend as much effort on educating other Python developers on how to port to Python 3 *right now* supporting Python 2.6, 2.7, and 3.2.
One thing that the PEP will certainly achieve is to spread the myth that you cannot port to Python 3 if you also want to support Python 2.5. That's because people will accept the "single source" approach as the one right way, and will accept that this only works well with Python 2.6. Regards, Martin
<martin <at> v.loewis.de> writes:
One thing that the PEP will certainly achieve is to spread the myth that you cannot port to Python 3 if you also want to support Python 2.5. That's because people will accept the "single source" approach as the one right way, and will accept that this only works well with Python 2.6.
Let's hope not. We can mitigate that by spelling out in the docs that there's no one right way, how to choose which approach is best for a given project, and so on. Regards, Vinay Sajip
On Tue, Feb 28, 2012 at 12:07, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
<martin <at> v.loewis.de> writes:
One thing that the PEP will certainly achieve is to spread the myth that you cannot port to Python 3 if you also want to support Python 2.5. That's because people will accept the "single source" approach as the one right way, and will accept that this only works well with Python 2.6.
Let's hope not. We can mitigate that by spelling out in the docs that there's no one right way, how to choose which approach is best for a given project, and so on.
Changes to http://docs.python.org/howto/pyporting.html are welcome. I tried to make sure it exposed all possibilities with tips on how to support as far back as Python 2.5.
Brett Cannon <brett <at> python.org> writes:
Changes to http://docs.python.org/howto/pyporting.html are welcome. I tried to make sure it exposed all possibilities with tips on how to support as far back as Python 2.5.
Right, will take a look. FYI a Google search for "python 3 porting guide" shows the Wiki PortingToPy3K page, then Brian Curtin's Python 3 Porting Guide, then Lennart Regebro's porting book website, and then the howto referred to above. Possibly the Wiki page and Brian's guide need to link to the howto, as I presume that's the canonical go-to guide - they don't seem to do so currently. Regards, Vinay Sajip
On Tue, Feb 28, 2012 at 11:51, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
Brett Cannon <brett <at> python.org> writes:
Changes to http://docs.python.org/howto/pyporting.html are welcome. I tried to make sure it exposed all possibilities with tips on how to support as far back as Python 2.5.
Right, will take a look. FYI a Google search for "python 3 porting guide" shows the Wiki PortingToPy3K page, then Brian Curtin's Python 3 Porting Guide, then Lennart Regebro's porting book website, and then the howto referred to above. Possibly the Wiki page and Brian's guide need to link to the howto, as I presume that's the canonical go-to guide - they don't seem to do so currently.
Funny that you mention this: just a few minutes ago someone mentioned on twitter that they found and liked the guide I wrote, then I mentioned the howto/porting page since Brett's last message reminded me of it, and I mentioned that I should update and link to howto/porting. In the words of Guido, I will "make it so".
Brett Cannon <brett@python.org> wrote:
Changes to http://docs.python.org/howto/pyporting.html are welcome. I tried to make sure it exposed all possibilities with tips on how to support as far back as Python 2.5.
I'd like to add a section that highlights the advantages of separate branches. Starting perhaps with: Advantages of separate branches: 1) The two code bases are cleaner. 2) Neither version is a second class citizen. 3) New Python3 features can be adopted without worrying about conversion tools. 4) For the developer: psychologically, slowly the py3k version becomes the master branch (as it should). 5) For the user: running 2to3 on install sends the signal that version 2 is the real version. This is not the case if there are, say, src2/ and src3/ directories in the distribution. Stefan Krah
On 1 March 2012 12:11, Stefan Krah <stefan@bytereef.org> wrote:
Advantages of separate branches:
Even though I agree on most of your points, I disagree with 2) Neither version is a second class citizen. In my experience, this is only true if you have a very strict discipline, or if both branches are used a lot. If there are two branches (say: py2 and py3), and one is used much less (say: py3), that one will always be the second class citizen - the py2 branch, which is used by 'most people' gets more feature requests and bug reports. People will implement the features and bug fixes in the py2 branch, and sometimes forget to port them to the py3 branch, which means the branches start diverging. This divergence makes applying newer changes even more difficult, leading to further divergence. Another cause for this is the painful merging in most version control systems. I'm guessing you all know the pain of 'svn merge' - and there are a *lot *of projects still using SVN or even CVS. As such, you need to impose the discipline to always apply changes to both branches. This is a reasonable thing for larger projects, but it is generally harder to implement it for smaller projects, as you're already lucky people are actually contributing. Best, Merlijn
I also don't agree with the claim that a py3 version using 2to3 is a "second class citizen". You need to adopt the Python 2 code to Python 3 in that case too, and none of the overrules the other. //Lennart
Lennart Regebro <regebro@gmail.com> wrote:
I also don't agree with the claim that a py3 version using 2to3 is a "second class citizen". You need to adopt the Python 2 code to Python 3 in that case too, and none of the overrules the other.
That's a fair point. Then of course *both* versions do not use their full potential, but that is strongly related to the "using all (new) features" item in the list. Stefan Krah
Merlijn van Deen <valhallasw@arctus.nl> wrote:
Another cause for this is the painful merging in most version control systems. I'm guessing you all know the pain of 'svn merge' - and there are a lot of projects still using SVN or even CVS.
As such, you need to impose the discipline to always apply changes to both branches. This is a reasonable thing for larger projects, but it is generally harder to implement it for smaller projects, as you're already lucky people are actually contributing.
What you say is all true, but I wonder if the additional work is really that much of a problem. Several people have said here that applying changes to both versions becomes second nature, and this is also my experience. While mercurial may be nicer, svnmerge.py isn't that bad. Projects have different needs and priorities. From my own experience with cdecimal I can positively say that the amount of work required to keep two branches [1] in sync is completely dwarfed by first figuring out what to write and then implementing it correctly. After doing all that, the actual synchronization work feels like a vacation. Another aspect, which may be again cdecimal-specific, is that keeping 2.5 compatibility is *at least* as bothersome as supporting 2.6/2.7 and 3.x. As an example for a pretty large project, it looks like Antoine is making good progress with Twisted: https://bitbucket.org/pitrou/t3k/wiki/Home I certainly can't say what's possible or best for other projects. I do think though that choosing the separate branches strategy will pay off eventually (at the very latest when Python-2.7 will reach the status that Python-1.5 currently has). Stefan Krah [1] I don't even use two branches but 2.c/3.c and 2.py/3.py file name patterns.
On Thu, 1 Mar 2012 16:31:14 +0100 Stefan Krah <stefan@bytereef.org> wrote:
As an example for a pretty large project, it looks like Antoine is making good progress with Twisted:
Well, to be honest, "making good progress" currently means "bored and not progressing at all" :-) But that's not due to the strategy I adopted, only to the sheer amount of small changes needed, and lack of immediate motivation to continue this work. However, merging actually ended up easier than I expected. The last time, merging one month's worth of upstream changes took me around one hour (including fixing additional tests and regressions). Regards Antoine.
On Mar 01, 2012, at 04:42 PM, Antoine Pitrou wrote:
Well, to be honest, "making good progress" currently means "bored and not progressing at all" :-) But that's not due to the strategy I adopted, only to the sheer amount of small changes needed, and lack of immediate motivation to continue this work.
For any porting strategy, the best thing to do is to get as many changes into upstream as possible that prepares the way for Python 3 support. For example, when I did the dbus-python port, upstream (rightly so) rejected my big all-together-now patch. Instead, we took a number of smaller steps, many of which were incorporated before the Python 3 support landed. These included: - Agreeing to Python 2.6 as a minimum base - #include <byteobject.h> and global PyString_* -> PyBytes_* conversion - (yes) adding future imports for unicode_literals, unadorning unicodes and adding b'' prefixes where necessary - fixing except clauses to use 'as' - removing L suffix on integer literals - lots of other little syntactic nits You could add to that things like print functions (although IIRC dbus-python had few if any of these), etc. So really, it was the same strategy as any porting process, but the key was breaking these up into reviewable chunks that could be applied while still keeping the code base Python 2 only. I really do think that to the extent that you can do that kind of thing, you may end up with essentially Python 3 support without even realizing it. :) Cheers, -Barry
On Thu, 1 Mar 2012 11:24:19 -0500 Barry Warsaw <barry@python.org> wrote:
I really do think that to the extent that you can do that kind of thing, you may end up with essentially Python 3 support without even realizing it. :)
That's unlikely. Twisted processes bytes data a lot, and the bytes indexing behaviour of 3.x is a chore for porting. Regards Antoine.
On Thu, 01 Mar 2012 17:24:31 +0100, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Thu, 1 Mar 2012 11:24:19 -0500 Barry Warsaw <barry@python.org> wrote:
I really do think that to the extent that you can do that kind of thing, you may end up with essentially Python 3 support without even realizing it. :)
That's unlikely. Twisted processes bytes data a lot, and the bytes indexing behaviour of 3.x is a chore for porting.
The dodges you have to use work fine in python2 as well, though, so I think Barry's point stands, even if it does make the python2 code a bit uglier...but not as bad as the 2.5 exception hacks. Still, I'll grant that it would be a harder sell to upstream than the changes Barry mentioned. On the other hand, it's not like the code will get *prettier* once you drop Python2 support :(. --David
martin@v.loewis.de writes:
One thing that the PEP will certainly achieve is to spread the myth that you cannot port to Python 3 if you also want to support Python 2.5. That's because people will accept the "single source" approach as the one right way, and will accept that this only works well with Python 2.6.
Please, Martin, I dislike this idea as much as you do. (There was no -1 from me, though, because I don't work in the context of the claimed use cases at all, but lots of people obviously find them persuasive.) But in respect of myth-spreading, the problem with the PEP is the polemic tone. (Yeah, I've seen Armin's claim that it's not polemic. I disagree.) The unqualified claims that "2to3 is insufficient" and the PEP will "enable side-by-side support" of Python 2 and Python 3 by libraries are too extreme, and really unnecessary in light of Guido's logic for acceptance. As far as I can see, like 2to3, like u()/b(), this PEP introduces a device that will be the most *convenient* approach for *some* use cases. If it were presented that way, with recommendation for its use restricted to the particular intended use case, I don't think it would have a huge effect on people's perception of the difficulty of porting in general, including multiversion support including 2.5. If others want to use it, even though you and I think that's a bad idea, well, we can blog, and "consenting adults" covers those users. On the other hand, implementation of the PEP itself should have a positive effect on the community's perception of python-dev's responsiveness to its pain. Ie, a lot of us feel strongly that this is the wrong thing to do in principle -- but we're gonna do it anyway, because part of the community wants it. So, let's work on integrating this PEP into the more general framework of recommendations for porting Python 2 code to Python 3 and/or developing libraries targeting both.
On Wed, Feb 29, 2012 at 5:23 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
martin@v.loewis.de writes:
> One thing that the PEP will certainly achieve is to spread the myth that > you cannot port to Python 3 if you also want to support Python 2.5. That's > because people will accept the "single source" approach as the one right way, > and will accept that this only works well with Python 2.6.
Please, Martin, I dislike this idea as much as you do. (There was no -1 from me, though, because I don't work in the context of the claimed use cases at all, but lots of people obviously find them persuasive.)
But in respect of myth-spreading, the problem with the PEP is the polemic tone. (Yeah, I've seen Armin's claim that it's not polemic. I disagree.) The unqualified claims that "2to3 is insufficient" and the PEP will "enable side-by-side support" of Python 2 and Python 3 by libraries are too extreme, and really unnecessary in light of Guido's logic for acceptance.
FWIW, I agree that much of the rhetoric in the current version of PEP 414 is excessive. Armin has given me permission to create an updated version of PEP 414 and toning down the hyperbole (or removing it entirely in cases where it's irrelevant to the final decision) is one of the things that I will be changing. I also plan to add a link to Lennart's guide to the various porting strategies that are currently available, more clearly articulate the cases where the new approach can most help (i.e. when there are project specific reasons to avoid the unicode_literals import), as well as name drop Pyramid (Chris McDonough), Flask (Armin), Django (Jacob Kaplan-Moss) and requests (Kenneth Reitz) as cases where key developers of web-related third party frameworks or libraries have indicated that PEP 414 will help greatly with bringing the sections of the Python ecosystem they're involved with into the Python 3 fold over the next few years. My aim is for the end result to better reflect the reasons why Guido *accepted* the PEP, moreso than Armin's own reasons for *wanting* it. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
FWIW, I agree that much of the rhetoric in the current version of PEP 414 is excessive.
Armin has given me permission to create an updated version of PEP 414 and toning down the hyperbole (or removing it entirely in cases where it's irrelevant to the final decision) is one of the things that I will be changing. I also plan to add a link to Lennart's guide to the various porting strategies that are currently available, more clearly articulate the cases where the new approach can most help (i.e. when there are project specific reasons to avoid the unicode_literals import), as well as name drop Pyramid (Chris McDonough), Flask (Armin), Django (Jacob Kaplan-Moss) and requests (Kenneth Reitz) as cases where key developers of web-related third party frameworks or libraries have indicated that PEP 414 will help greatly with bringing the sections of the Python ecosystem they're involved with into the Python 3 fold over the next few years.
My aim is for the end result to better reflect the reasons why Guido *accepted* the PEP, moreso than Armin's own reasons for *wanting* it.
Thank you Nick and Armin. I think toning down the rhetoric is a very amicable solution. Let me know if I need to add anything to http://getpython3.com/ (have linked porting guides there too if you want) jesse
On 2/28/2012 7:10 AM, Vinay Sajip wrote:
The PEP 314 approach seems to assume that that if things work on 3.3, they will work on 3.2/3.1/3.0 without any changes other than replacing u'xxx' with 'xxx'.
(Delete 3.0. 3.1 is also less of a concern.) It actually assumes that if things work on 3.3 *and* 2.7 (or .6), then ... . At first glance, this seems reasonable. If the code works on 2.7, then it does not use any new 3.3 features. Nor does it depend on any 3.3-only bug fixes that were part of a feature patch. 2.6, of course, is essentially not getting any bugfixes.
In other words, you aren't supposed to want to e.g. test 3.2 and 3.3 iteratively, using a workflow which intersperses edits with running tests using 3.2 and running tests with 3.3.
Anyone who is also targeting 3.2 could run a test32 script whenever they need to take a break. Or it could be run in the background (perhaps on a different core) while editing continues. People will work this out on a project by project basis, or use one of the other solutions.
In any case, a single code base seems not to be possible across 2.6+/3.0/3.1/3.2/3.3+ using the PEP 314 approach, though of course one will be possible for just 2.6+/3.3+. Early adopters of 3.x seem to be penalised by this approach: I for one will try to use the unicode_literals approach wherever I can.
Early adoption of new tech typically has costs as well as benefits ;-). -- Terry Jan Reedy
On Tue, Feb 28, 2012 at 13:10, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
We might be at cross purposes here. I don't see how Distribute helps, because the use case I'm talking about is not about distributing or installing stuff, but iteratively changing and testing code which needs to work on 2.6+, 3.2 and 3.3+.
Make sure you can run the tests with python setup.py test, and you're "in the butter", as we say in Sweden. :-)
If the 2.x code depends on having u'xxx' literals, then 3.2 testing will potentially involve running a fixer on all files in the project every time a change is made, writing to a separate directory, or else a fixer which is integrated into the editing environment so it knows what changed. This is painful
Sure, and distribute does this for you. http://python3porting.com/2to3.html //Lennart
A couple of people have said that 'native string' is spelt 'str', but I'm not sure that's the right answer. For example, 2.x's cString.StringIO expects native strings, not Unicode:
Your counter-example is non-ASCII characters/bytes. I doubt that this is a valid use case; in a "native" string, these shouldn't occur (i.e. native strings should always be ASCII), since the semantics of non-ASCII changes drastically between 2.x and 3.x. So whoever defines some API to take "native" strings can't have defined a valid use of non-ASCII in that interface.
I'm not saying this is the right thing to do for all cases - just that str() may not be, either. This should be elaborated in the PEP.
Indeed it should. If there is a known application of non-ASCII native strings, I surely would like to know what that is. Regards, Martin
<martin <at> v.loewis.de> writes:
A couple of people have said that 'native string' is spelt 'str', but I'm not sure that's the right answer. For example, 2.x's cString.StringIO expects native strings, not Unicode:
Your counter-example is non-ASCII characters/bytes. I doubt that this is a valid use case; in a "native" string, these shouldn't occur (i.e. native strings should always be ASCII), since the semantics of non-ASCII changes drastically between 2.x and 3.x. So whoever defines some API to take "native" strings can't have defined a valid use of non-ASCII in that interface.
It might not be a valid usage, but the 2.x ecosystem has numerous occurrences of invalid usages, which tend to crop up when porting because of 3.x's increased strictness. In the example I gave, cStringIO.StringIO should be able to cope with text strings, but doesn't. Of course there are StringIO.StringIO and io.StringIO in 2.6, but when porting a project, you can't be sure which of these you might run into.
Indeed it should. If there is a known application of non-ASCII native strings, I surely would like to know what that is.
I can't think of a specific instance off-hand, but I seem to recall having problems with some of the cookie APIs insisting on native strings (rather than text, which is validated against ASCII where appropriate). Regards, Vinay Sajip
On Feb 27, 2012, at 03:39 PM, Chris McDonough wrote:
Note that u'' literals are sort of the tip of the iceberg here; supporting them will obviously not make development under the subset an order of magnitude less sucky, just a tiny little bit less sucky. There are other extremely annoying things, like str(bytes) returning the repr of a bytestring on Python 3. That's almost as irritating as the absence of u'' literals, but we have to evaluate one thing at a time.
Yeah, that one has bitten me many times, and for me it *is* more irritating because it's harder to work around. -Barry
On 27 February 2012 20:39, Chris McDonough <chrism@plope.com> wrote:
Note that u'' literals are sort of the tip of the iceberg here; supporting them will obviously not make development under the subset an order of magnitude less sucky, just a tiny little bit less sucky. There are other extremely annoying things, like str(bytes) returning the repr of a bytestring on Python 3. That's almost as irritating as the absence of u'' literals, but we have to evaluate one thing at a time.
So. Am I misunderstanding here, or are you suggesting that this particular PEP doesn't help you much, but if it's accepted, it represents "the thin end of the wedge" for a series of subsequent PEPs suggesting fixes for a number of other "extremely annoying things"...? I'm sure that's not what you meant, but it's certainly what it sounded like to me! Paul.
On Mon, 2012-02-27 at 21:07 +0000, Paul Moore wrote:
On 27 February 2012 20:39, Chris McDonough <chrism@plope.com> wrote:
Note that u'' literals are sort of the tip of the iceberg here; supporting them will obviously not make development under the subset an order of magnitude less sucky, just a tiny little bit less sucky. There are other extremely annoying things, like str(bytes) returning the repr of a bytestring on Python 3. That's almost as irritating as the absence of u'' literals, but we have to evaluate one thing at a time.
So. Am I misunderstanding here, or are you suggesting that this particular PEP doesn't help you much, but if it's accepted, it represents "the thin end of the wedge" for a series of subsequent PEPs suggesting fixes for a number of other "extremely annoying things"...?
I'm sure that's not what you meant, but it's certainly what it sounded like to me!
I'm way too lazy. The political wrangling is just too draining (especially over something so trivial). But I will definitely support other proposals that make it easier to straddle, sure. - C
On Mon, 27 Feb 2012 16:10:25 -0500, Chris McDonough <chrism@plope.com> wrote:
On Mon, 2012-02-27 at 21:07 +0000, Paul Moore wrote:
On 27 February 2012 20:39, Chris McDonough <chrism@plope.com> wrote:
Note that u'' literals are sort of the tip of the iceberg here; supporting them will obviously not make development under the subset an order of magnitude less sucky, just a tiny little bit less sucky. There are other extremely annoying things, like str(bytes) returning the repr of a bytestring on Python 3. That's almost as irritating as the absence of u'' literals, but we have to evaluate one thing at a time.
So. Am I misunderstanding here, or are you suggesting that this particular PEP doesn't help you much, but if it's accepted, it represents "the thin end of the wedge" for a series of subsequent PEPs suggesting fixes for a number of other "extremely annoying things"...?
I'm sure that's not what you meant, but it's certainly what it sounded like to me!
I'm way too lazy. The political wrangling is just too draining (especially over something so trivial). But I will definitely support other proposals that make it easier to straddle, sure.
"tip of the iceberg", eh? Or the nose of the camel in the tent. This pushes me in the direction of a -1 vote. --David
Indeed, the wrangling has gone too far already. I'm accepting the PEP. It's about as harmless as they come. Make it so. --Guido van Rossum (sent from Android phone) On Feb 27, 2012 1:12 PM, "Chris McDonough" <chrism@plope.com> wrote:
On Mon, 2012-02-27 at 21:07 +0000, Paul Moore wrote:
On 27 February 2012 20:39, Chris McDonough <chrism@plope.com> wrote:
Note that u'' literals are sort of the tip of the iceberg here; supporting them will obviously not make development under the subset an order of magnitude less sucky, just a tiny little bit less sucky. There are other extremely annoying things, like str(bytes) returning the repr of a bytestring on Python 3. That's almost as irritating as the absence of u'' literals, but we have to evaluate one thing at a time.
So. Am I misunderstanding here, or are you suggesting that this particular PEP doesn't help you much, but if it's accepted, it represents "the thin end of the wedge" for a series of subsequent PEPs suggesting fixes for a number of other "extremely annoying things"...?
I'm sure that's not what you meant, but it's certainly what it sounded like to me!
I'm way too lazy. The political wrangling is just too draining (especially over something so trivial). But I will definitely support other proposals that make it easier to straddle, sure.
- C
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
On Feb 27, 2012, at 02:06 PM, Guido van Rossum wrote:
Indeed, the wrangling has gone too far already. I'm accepting the PEP. It's about as harmless as they come. Make it so.
I've learned that once a PEP is pronounced upon, it's usually to my personal (if not all of our mutual :) benefit to stop arguing. I still urge the PEP author to clean up the PEP and specifically address the issues brought up in this thread. That will be useful for the historical record. -Barry
Armin Ronacher <armin.ronacher <at> active-4.com> writes:
Hi,
On 2/27/12 10:29 PM, Barry Warsaw wrote:
I still urge the PEP author to clean up the PEP and specifically address the issues brought up in this thread. That will be useful for the historical record. That is a given.
Great. My particular interest is w.r.t. the installation hook for 3.2 and the workflow for testing code in 3.2 and 3.3 at the same time. Regards, Vinay Sajip
On 2/27/2012 4:10 PM, Chris McDonough wrote:
On Mon, 2012-02-27 at 21:07 +0000, Paul Moore wrote:
On 27 February 2012 20:39, Chris McDonough<chrism@plope.com> wrote:
Note that u'' literals are sort of the tip of the iceberg here; supporting them will obviously not make development under the subset an order of magnitude less sucky, just a tiny little bit less sucky. There are other extremely annoying things, like str(bytes) returning the repr of a bytestring on Python 3. That's almost as irritating as the absence of u'' literals, but we have to evaluate one thing at a time.
So. Am I misunderstanding here, or are you suggesting that this particular PEP doesn't help you much, but if it's accepted, it represents "the thin end of the wedge" for a series of subsequent PEPs suggesting fixes for a number of other "extremely annoying things"...?
Last December, Armin wrote "And in my absolutely personal opinion Python 3.3/3.4 should be more like Python 2* and Python 2.8 should happen and be a bit more like Python 3." * he wrote '3' but obviously means '2'. http://lucumr.pocoo.org/2011/12/7/thoughts-on-python3/
I'm sure that's not what you meant, but it's certainly what it sounded like to me!
I'm way too lazy. The political wrangling is just too draining (especially over something so trivial).
Turning Python 3 back into Python 2, or even moving in that direction, is neither 'trivial' nor a 'no-brainer'.
But I will definitely support other proposals that make it easier to straddle, sure.
-- Terry Jan Reedy
In http://mail.python.org/pipermail/python-dev/2012-February/116953.html Terry J. Reedy wrote:
I presume that most 2.6 code has problems other than u'' when attempting to run under 3.x.
Why? If you're talking about generic code that has seen minimal changes since 2.0, sure. But I think this request is specifically for projects that are thinking about python 3, but are trying to use a single source base regardless of version. Using an automatic translation step means that python (or at least python 3) would no longer be the actual source code. I've worked with enough generated "source" code in other languages that it is worth some pain to avoid even a slippery slope. By the time you drop 2.5, the "subset" language is already pretty good; if I have to write something version-specific, I prefer to treat that as a sign that I am using the wrong approach. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ
On 2/27/2012 4:56 PM, Jim J. Jewett wrote:
In http://mail.python.org/pipermail/python-dev/2012-February/116953.html Terry J. Reedy wrote:
I presume that most 2.6 code has problems other than u'' when attempting to run under 3.x.
Why?
Since writing the above, I realized that the following is a realistic scenario. 2.6 or 2.7 code a) uses has/set/getattr, so unicode literals would require a change; b) uses non-ascii chars in unicode literals; c) uses (or could be converted to use) print as a function; and d) otherwise uses a common 2-3 subset. Such would only need the u prefix addition to run under both Pythons. This works the other way, of course, for backporting code. So I am replacing 'most' with 'some unknown-to-me fraction' ;-). -- Terry Jan Reedy
On Tue, Feb 28, 2012 at 9:19 AM, Terry Reedy <tjreedy@udel.edu> wrote:
Since writing the above, I realized that the following is a realistic scenario. 2.6 or 2.7 code a) uses has/set/getattr, so unicode literals would require a change; b) uses non-ascii chars in unicode literals; c) uses (or could be converted to use) print as a function; and d) otherwise uses a common 2-3 subset. Such would only need the u prefix addition to run under both Pythons. This works the other way, of course, for backporting code. So I am replacing 'most' with 'some unknown-to-me fraction' ;-).
Yep, that's exactly the situation I'm in with PulpDist (a web app that primarily targets deployment on RHEL 6, which means Python 2.6). Since I preformat all my print output with either str.format or str.join (or use the logging module) and always use "except exc as var" to catch exceptions, the natural way to write Python 2 code for me is *almost* source compatible with Python 3. The only big discrepancy I'm currently aware of? Unicode literals. Now, I could retrofit the entire code base with the unicode_literals import and str("") for native strings, but that has problems of its own: - it doesn't match the Pulp upstream, so it would make it harder for them to review my plugins and client API usage code (or integrate them into the default plugin set or client support API if they decide they like them). Given that I'm one of the guinea pigs for experimental Pulp APIs and have to dive into *their* code on occasion, it would also be a challenge for *me* to switch modes when debugging . - it doesn't match Django (at least, not in 1.3, which is the version I'm using) (another potential annoyance when debugging) - it doesn't match any of the other Django applications I use (once again, debugging may lead to me looking at this code) - it doesn't match the standard library (yep, you guessed it, I'd have to mode switch when looking at standard library code, too) - it doesn't match the intuitions of current Python 2 developers that aren't up to speed with the niceties of Python 3 porting Basically, using the unicode_literals import would significantly raise the barrier to entry for PulpDist *as a Python 2 project*, as well as forcing me to switch mental models for text processing whenever I have to look at the code in a dependency during a debugging session. Therefore, given that Python 2 will be my primary target for the immediate future (and any collaborators are likely to be RHEL 6 and hence Python 2 focused), I don't want to use that particular future import. The downside of that choice (currently) is that it kills any possibility of running any of it on Python 3, even the command line client or the web front end after Django gets ported. With explicit unicode literals being restored in Python 3.3, though, I'm a lot more optimistic about the feasibility of porting it without too much effort (as well as the prospect of other Django app dependencies gaining Python 3 support). In terms of third party upstreams, python 3 compatibility patches that affect *every single string literal in the entire project* (either directly or converting the entire project to the "unicode_literals" import) aren't likely to even get reviewed, let alone accepted. By contrast (for a project that already only supports 2.6+), cleaning up print statements and exception handling should be a much smaller patch that is easy to both review and accept. Making it as easy as possible for maintainers that don't really care about Python 3 to accept patches from people that *do* care is a very good thing. There are still other problems that are going to affect the folks playing at the wire protocol level, but the lack of unicode literals is a big one that affects the entire application stack. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 2/27/2012 1:01 PM, Chris McDonough wrote:
I just don't understand the pushback here at all. This is such a nobrainer.
Last December, Armin wrote in http://lucumr.pocoo.org/2011/12/7/thoughts-on-python3/ "And in my absolutely personal opinion Python 3.3/3.4 should be more like Python 2* and Python 2.8 should happen and be a bit more like Python 3." * he wrote '3' but obviously mean '2'. Today, you made it clear that you regard this PEP as one small step in reverting Python 3 toward Python 2 and that you support the above goal. *That* is what some are pushing back against. -- Terry Jan Reedy
R. David Murray wrote:
On Mon, 27 Feb 2012 09:05:54 -0800, Ethan Furman wrote:
Martin v. Löwis wrote:
Am 26.02.2012 07:06, schrieb Nick Coghlan:
On Sun, Feb 26, 2012 at 1:13 PM, Guido van Rossum wrote:
A small quibble: I'd like to see a benchmark of a 'u' function implemented in C.
Even if it was quite fast, I don't think such a function would bring the same benefits as restoring support for u'' literals. You claim that, but your argument doesn't actually support that claim (or I fail to see the argument).
Python 2.6 code: this = u'that'
Python 3.3 code: this = u('that')
Not source compatible, not elegant. (Even though 2to3 could make this fix, it's still kinda ugly.)
Eh? The 2.6 version would also be u('that'). That's the whole point of the idiom. You'll need a better counter argument than that.
So the idea is to convert the existing 2.6 code to use parenthesis as well? (I obviously haven't read the PEP -- my apologies.) Then I primarily object on ergonomic reasons, but I still think it's kinda ugly. ;) ~Ethan~
Eh? The 2.6 version would also be u('that'). That's the whole point of the idiom. You'll need a better counter argument than that.
So the idea is to convert the existing 2.6 code to use parenthesis as well? (I obviously haven't read the PEP -- my apologies.)
Well, if you didn't, you wouldn't have the same sources on 2.x and 3.x. And if that was ok, you wouldn't need the u() function in 3.x at all, since plain string literals are *already* unicode strings there. Regards, Martin
Martin v. Löwis wrote:
Eh? The 2.6 version would also be u('that'). That's the whole point of the idiom. You'll need a better counter argument than that. So the idea is to convert the existing 2.6 code to use parenthesis as well? (I obviously haven't read the PEP -- my apologies.)
Well, if you didn't, you wouldn't have the same sources on 2.x and 3.x. And if that was ok, you wouldn't need the u() function in 3.x at all, since plain string literals are *already* unicode strings there.
True -- but I would rather have u'' in 2.6 and 3.3 than u('') in 2.6 and 3.3. ~Ethan~
On Mon, 27 Feb 2012 13:09:24 -0800 Ethan Furman <ethan@stoneleaf.us> wrote:
Martin v. Löwis wrote:
Eh? The 2.6 version would also be u('that'). That's the whole point of the idiom. You'll need a better counter argument than that. So the idea is to convert the existing 2.6 code to use parenthesis as well? (I obviously haven't read the PEP -- my apologies.)
Well, if you didn't, you wouldn't have the same sources on 2.x and 3.x. And if that was ok, you wouldn't need the u() function in 3.x at all, since plain string literals are *already* unicode strings there.
True -- but I would rather have u'' in 2.6 and 3.3 than u('') in 2.6 and 3.3.
You don't want to be 3.2-compatible? Antoine.
Armin Ronacher <armin.ronacher <at> active-4.com> writes:
On 2/27/12 9:36 PM, Antoine Pitrou wrote:
You don't want to be 3.2-compatible? See the PEP. It shows how it would still be 3.2 compatible at installation time due to an installation hook that would be provided.
I thought Antoine was just responding to the fact that Ethan's comment didn't mention 3.2. Re. the installation hook, let me get this right. If I have to work with code that needs to run under 3.2 or earlier *and* 3.3, and say that because this PEP has been accepted, the code contains both u'xxx' and 'yyy' forms of Unicode literal, then I can't just edit-save-test, right? I have to run your hook every time I want to switch between testing with 3.3 and 3.2 (say). Isn't this exactly the same problem as with running 2to3, except that your hook might run faster? I'm not convinced you can guarantee a seamless testing experience ;-) Regards, Vinay Sajip
Antoine Pitrou wrote:
On Mon, 27 Feb 2012 13:09:24 -0800 Ethan Furman <ethan@stoneleaf.us> wrote:
Martin v. Löwis wrote:
Eh? The 2.6 version would also be u('that'). That's the whole point of the idiom. You'll need a better counter argument than that. So the idea is to convert the existing 2.6 code to use parenthesis as well? (I obviously haven't read the PEP -- my apologies.) Well, if you didn't, you wouldn't have the same sources on 2.x and 3.x. And if that was ok, you wouldn't need the u() function in 3.x at all, since plain string literals are *already* unicode strings there. True -- but I would rather have u'' in 2.6 and 3.3 than u('') in 2.6 and 3.3.
You don't want to be 3.2-compatible?
Unfortunately I do. However, at some point 3.2 will fall off the edge of the earth and then u'' will be just fine. This is probably a dumb question, but why can't we add u'' back to 3.2? It seems an incredibly minor change, and we are not in security-only fix stage, are we? ~Ethan~
Brian Curtin wrote:
On Mon, Feb 27, 2012 at 17:15, Ethan Furman <ethan@stoneleaf.us> wrote:
This is probably a dumb question, but why can't we add u'' back to 3.2? It seems an incredibly minor change, and we are not in security-only fix stage, are we?
We don't add features to bug-fix releases.
Ah. Well that's easy then! Call it a bug! ;) ~Ethan~
Ethan Furman <ethan <at> stoneleaf.us> writes:
True -- but I would rather have u'' in 2.6 and 3.3 than u('') in 2.6 and 3.3.
You don't need u('') in 2.6 - why do you think you need it there? If you don't implement this PEP, you can have, *uniformly* across 2.6, 2.7 and all 3.x versions, 'xxx' for text and b'yyy' for bytes. For 2.6 you would have to add "from __future__ import unicode_literals", and this might uncover places where you need to change things to use bytes or native strings - either because of bugs in the original code, or drawbacks in a Python version where you can't use Unicode as keys in a kwargs dictionary, or some API that wants you to use str explicitly. But at least some of those places will be things you would have to address anyway, when porting, whatever the state of Unicode literal support. Regards, Vinay Sajip
Am 27.02.2012 18:05, schrieb Ethan Furman:
Martin v. Löwis wrote:
Am 26.02.2012 07:06, schrieb Nick Coghlan:
A small quibble: I'd like to see a benchmark of a 'u' function implemented in C. Even if it was quite fast, I don't think such a function would bring
On Sun, Feb 26, 2012 at 1:13 PM, Guido van Rossum <guido@python.org> wrote: the same benefits as restoring support for u'' literals.
You claim that, but your argument doesn't actually support that claim (or I fail to see the argument).
Python 2.6 code: this = u'that'
Python 3.3 code: this = u('that')
Not source compatible, not elegant. (Even though 2to3 could make this fix, it's still kinda ugly.)
No: Python 2.6 code this = u('that') Python 3.3 code this = u('that') It *is* source compatible, and 100% so. As for elegance: I find the u prefix fairly inelegant already; the function removes just a little more elegance. Regards, Martin
On Saturday, February 25, 2012 at 10:13 PM, Guido van Rossum wrote:
If this can encourage more projects to support Python 3 (even if it's only 3.3 and later) and hence improve adoption of Python 3, I'm all for it.
A small quibble: I'd like to see a benchmark of a 'u' function implemented in C.
--Guido
After having this explained quite a bit to me by the more web-savvy folks such as Armin and Chris M/etc, I am a +1, the rationale makes sense, and much for the same reason that Guido cites, I think this will help with code bases using the single code base approach, and assist with overall adoption. +1 jesse
On Sat, 25 Feb 2012 19:13:26 -0800 Guido van Rossum <guido@python.org> wrote:
If this can encourage more projects to support Python 3 (even if it's only 3.3 and later) and hence improve adoption of Python 3, I'm all for it.
A small quibble: I'd like to see a benchmark of a 'u' function implemented in C.
Even without implementing it in C, caching the results makes it much less prohibitive in tight loops: if sys.version_info >= (3, 0): def u(value): return value else: def u(value, _lit_cache={}): if value in _lit_cache: return _lit_cache[value] s = _lit_cache[value] = unicode(value, 'unicode-escape') return s u'\N{SNOWMAN}barbaz' -> 100000000 loops, best of 3: 0.00928 usec per loop u('\N{SNOWMAN}barbaz') -> 10000000 loops, best of 3: 0.15 usec per loop u'foobarbaz_%d' % x -> 1000000 loops, best of 3: 0.424 usec per loop u('foobarbaz_%d') % x -> 1000000 loops, best of 3: 0.598 usec per loop Regards Antoine.
On Sat, Feb 25, 2012 at 22:13, Guido van Rossum <guido@python.org> wrote:
If this can encourage more projects to support Python 3 (even if it's only 3.3 and later) and hence improve adoption of Python 3, I'm all for it.
+1 from me for the same reasons. If this were to go in then for Python 3.3 the section of the porting HOWTO on what to do when you support Python 2.6 and later ( http://docs.python.org/howto/pyporting.html#python-2-3-compatible-source) would change to: * Use ``from __future__ import print_functions`` OR use ``print(x)`` but always with a single argument OR use six * Use ``from __future__ import unicode_literals`` OR make sure to use the 'u' prefix for all Unicode strings (and then mention the concept of native strings) or use six * Use the 'b' prefix for byte literals or use six All understandable and with either a __future__ import solution or syntactic support solution for all issues, giving people the choice of either approach for what they prefer for each approach. I would also be willing to move the Python 2/3 compatible source section to the top and thus implicitly become the preferred way to port since people in the community have seemingly been gravitating towards that approach even without this help. -Brett A small quibble: I'd like to see a benchmark of a 'u' function implemented
in C.
--Guido
On Sat, Feb 25, 2012 at 12:23 PM, Armin Ronacher <armin.ronacher@active-4.com> wrote:
Hi,
I just uploaded PEP 414 which proposes am optional 'u' prefix for string literals for Python 3.
You can read the PEP online: http://www.python.org/dev/peps/pep-0414/
This is a followup to the discussion about this topic here on the mailinglist and on twitter/IRC over the last few weeks.
Regards, Armin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org
On Feb 26, 2012, at 05:44 PM, Brett Cannon wrote:
On Sat, Feb 25, 2012 at 22:13, Guido van Rossum <guido@python.org> wrote:
If this can encourage more projects to support Python 3 (even if it's only 3.3 and later) and hence improve adoption of Python 3, I'm all for it.
+1 from me for the same reasons.
Just to be clear, I'm solidly +1 on anything we can do to increase the pace of Python 3 migration. -Barry
2012/2/27 Barry Warsaw <barry@python.org>
On Feb 26, 2012, at 05:44 PM, Brett Cannon wrote:
On Sat, Feb 25, 2012 at 22:13, Guido van Rossum <guido@python.org> wrote:
If this can encourage more projects to support Python 3 (even if it's only 3.3 and later) and hence improve adoption of Python 3, I'm all for it.
+1 from me for the same reasons.
Just to be clear, I'm solidly +1 on anything we can do to increase the pace of Python 3 migration.
+1 I think this is a great proposal that has the potential to remove one of the (for me at least, _the_) main obstacles to writing code compatible with both 2.7 and 3.x. -- /f I reject your reality and substitute my own. http://blaag.haard.se
Am 27.02.2012 00:07, schrieb Barry Warsaw:
On Feb 26, 2012, at 05:44 PM, Brett Cannon wrote:
On Sat, Feb 25, 2012 at 22:13, Guido van Rossum <guido@python.org> wrote:
If this can encourage more projects to support Python 3 (even if it's only 3.3 and later) and hence improve adoption of Python 3, I'm all for it.
+1 from me for the same reasons.
Just to be clear, I'm solidly +1 on anything we can do to increase the pace of Python 3 migration.
I find this rationale a bit sad: it's not that there is any (IMO) good technical reason for the feature - only that people "hate" the many available alternatives for some reason. But then, practicality beats purity, so be it. Regards, Martin
On Feb 27, 2012, at 11:21 AM, Martin v. Löwis wrote:
I find this rationale a bit sad: it's not that there is any (IMO) good technical reason for the feature - only that people "hate" the many available alternatives for some reason.
It makes me sad too, and as I've said, I personally have no problem with the existing solutions. They work just fine for me. But I also consistently hear from folks doing web frameworks that there's a big missing piece in the Python 3 story for them. Maybe restoring u-prefix solves their problem, or maybe there's another better solution out there. I don't do a lot of web development these days so I can't say. -Barry
On Mon, 27 Feb 2012 11:21:16 +0100, =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= <martin@v.loewis.de> wrote:
I find this rationale a bit sad: it's not that there is any (IMO) good technical reason for the feature - only that people "hate" the many available alternatives for some reason.
But then, practicality beats purity, so be it.
Agreed on both counts (but only reluctantly on the second :) The PEP does not currently contain a discussion of the unicode_literals + str() alternative and why that is not considered acceptable. That should be added (and I'm very curious why it isn't acceptable, it seems very elegant to me). In fact, I'd like to see the PEP contain a bullet list of alternatives with a discussion of why each is unacceptable or insufficient. The text as organized now is hard to follow for that purpose. Other comments: I disagree that "it is clear that 2to3 as a tool is insufficient" and that *therefore* people are attempting to use unified source. I think the truth is that people just prefer the unified source approach, because that is more Pythonic. I also strongly disagree with the statement that unicode_literals is doing more harm that good. Many people are using it very successfully. In *certain contexts* (WSGI) it may be problematic, but that doesn't mean it was a bad idea or that it shouldn't be used (given that a project uses it consistently, as noted previously in this thread). As noted above, the native string type *is* available with unicode_literals, it is spelled "str('somestring'). I don't understand the "Who Benefits?" section at all. For example, I think you'll agree I'm experienced working with email issues, and I don't understand how this proposal would help at all in dealing with email. The PEP would be strengthened by providing specific examples of the claims made in this section. I am -0 on this proposal. I will bow to the experience of those actually trying to port and support web code, which I am not doing myself. But I'd like to see the PEP improved so that the proposal is as strong as possible. --David
The PEP does not consider an alternative idea such as using "from __future__ import unicode_literals" in code which needs to run on 2.x, together with e.g. a callable n('xxx') which can be used where native strings are needed. This avoids the need to reintroduce the u'xxx' literal syntax, makes it explicit where native strings are needed, is less obtrusive that u('xxx') or u'xxx' because typically there will be vastly fewer places where you need native strings, and is unlikely to impose a major runtime penalty when compared with u('xxx') (again, because of the lower frequency of occurrence). Even if you have arguments against this idea, I think it's at least worth mentioning in the PEP with any counter-arguments you have. Regards, Vinay Sajip
26.02.12 11:05, Vinay Sajip написав(ла):
The PEP does not consider an alternative idea such as using "from __future__ import unicode_literals" in code which needs to run on 2.x, together with e.g. a callable n('xxx') which can be used where native strings are needed. This avoids the need to reintroduce the u'xxx' literal syntax, makes it explicit where native strings are needed, is less obtrusive that u('xxx') or u'xxx' because typically there will be vastly fewer places where you need native strings, and is unlikely to impose a major runtime penalty when compared with u('xxx') (again, because of the lower frequency of occurrence).
n = str
Vinay Sajip wrote:
Serhiy Storchaka <storchaka <at> gmail.com> writes:
n = str
Well, n to indicate that native string is required.
str indicates the native string type, because it *is* the native string type. By definition, str = str in both Python 2.x and Python 3.x. There's no point in aliasing it to "n". Besides, "n" is commonly used for ints. It would be disturbing for me to read code with n a function or type, particularly one that returns a string. I think your suggestion is not well explained. You suggested a function n, expected to take a string literal. The example you gave earlier was: n('xxx') But it seems to me that this is a no-op, because 'xxx' is already the native string type. In Python 2, it gives a str (byte-string), which the n() function converts to a byte-string. In Python 3, it gives a str (unicode-string), which the n() function converts to a unicode-string. -- Steven
On Sun, Feb 26, 2012 at 9:00 PM, Steven D'Aprano <steve@pearwood.info> wrote:
I think your suggestion is not well explained. You suggested a function n, expected to take a string literal. The example you gave earlier was:
n('xxx')
But it seems to me that this is a no-op, because 'xxx' is already the native string type. In Python 2, it gives a str (byte-string), which the n() function converts to a byte-string. In Python 3, it gives a str (unicode-string), which the n() function converts to a unicode-string.
Vinay's suggestion was that it be used in conjunction with the "from __future__ import unicode_literals" import, so that you could write: b"" # Binary data "" # Text (unicode) data str("") # Native string type It reduces the problem (compared to omitting the import and using a u() function), but it's still ugly and still involves the "action at a distance" of the unicode literals import. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Feb 26, 2012, at 09:20 PM, Nick Coghlan wrote:
It reduces the problem (compared to omitting the import and using a u() function), but it's still ugly and still involves the "action at a distance" of the unicode literals import.
Frankly, that doesn't bother me at all. I've been using the future import in all my code pretty successfully for a long while now. It's much more important for a project to use or not use the future import consistently, and then there really should be no confusion when looking at the code for that project. I'm not necessarily saying I'm opposed to the purpose of the PEP. I do think it's unnecessary for most Python problem domains, but can appreciate that WSGI apps are feeling a special pain here that should be addressed somehow. It would be nice however if the solution were in the form of a separate module that could be used in earlier Python versions. -Barry
On Sun, 2012-02-26 at 16:06 -0500, Barry Warsaw wrote:
On Feb 26, 2012, at 09:20 PM, Nick Coghlan wrote:
It reduces the problem (compared to omitting the import and using a u() function), but it's still ugly and still involves the "action at a distance" of the unicode literals import.
Frankly, that doesn't bother me at all. I've been using the future import in all my code pretty successfully for a long while now. It's much more important for a project to use or not use the future import consistently, and then there really should be no confusion when looking at the code for that project.
That's completely reasonable in a highly controlled project with relatively few highly-bought-in contributors. In projects with lots of hit-and-run contributors, though, it's more desirable to have things meet a rule of least surprise. Much of the software I work on is Python 3 compatible, but it's still used primarily on Python 2. Because most people still care primarily about Python 2, and most don't have a lot of Python 3 experience, it's extremely common to see folks submitting patches with u'' literals in them.
I'm not necessarily saying I'm opposed to the purpose of the PEP. I do think it's unnecessary for most Python problem domains, but can appreciate that WSGI apps are feeling a special pain here that should be addressed somehow. It would be nice however if the solution were in the form of a separate module that could be used in earlier Python versions.
If we use the unicode_literals future import, or some other exernal module strategy, it doesn't help much with the hitnrun contributor thing, I fear. - C
Chris McDonough <chrism <at> plope.com> writes:
If we use the unicode_literals future import, or some other exernal module strategy, it doesn't help much with the hitnrun contributor thing, I fear.
Surely some curating of hit-and-run contributions takes place? If you accept contributions from hit-and-run contributors without changes, ISTM that could compromise the quality of the codebase somewhat. Also, is not the overall impact on the codebase of hit-and-run contributors small compared to more the impact from involved contributors? Regards, Vinay Sajip
On Sun, 2012-02-26 at 23:06 +0000, Vinay Sajip wrote:
Chris McDonough <chrism <at> plope.com> writes:
If we use the unicode_literals future import, or some other exernal module strategy, it doesn't help much with the hitnrun contributor thing, I fear.
Surely some curating of hit-and-run contributions takes place? If you accept contributions from hit-and-run contributors without changes, ISTM that could compromise the quality of the codebase somewhat.
Nah. Real developers just accept all pull requests and let god sort it out. ;-) But seriously, the less time it takes me to review and fix a pull request from a casual contributor, the better. - C
Much of the software I work on is Python 3 compatible, but it's still used primarily on Python 2. Because most people still care primarily about Python 2, and most don't have a lot of Python 3 experience, it's extremely common to see folks submitting patches with u'' literals in them.
These can be easily fixed, right? Regards, Martin
On Sun, Feb 26, 2012 at 7:05 PM, Vinay Sajip <vinay_sajip@yahoo.co.uk> wrote:
The PEP does not consider an alternative idea such as using "from __future__ import unicode_literals" in code which needs to run on 2.x, together with e.g. a callable n('xxx') which can be used where native strings are needed. This avoids the need to reintroduce the u'xxx' literal syntax, makes it explicit where native strings are needed, is less obtrusive that u('xxx') or u'xxx' because typically there will be vastly fewer places where you need native strings, and is unlikely to impose a major runtime penalty when compared with u('xxx') (again, because of the lower frequency of occurrence).
Even if you have arguments against this idea, I think it's at least worth mentioning in the PEP with any counter-arguments you have.
The PEP already mentions that. In fact, all bar the first paragraph in the "Rationale and Goals" section discusses it. However, it's the last paragraph that explains why using that particular future import is, in and of itself, a bad idea: ============ Additionally, the vast majority of people who maintain Python 2.x codebases are more familiar with Python 2.x semantics, and a per-file difference in literal meanings will be very annoying for them in the long run. A quick poll on Twitter about the use of the division future import supported my suspicions that people opt out of behaviour-changing future imports because they are a maintenance burden. Every time you review code you have to check the top of the file to see if the behaviour was changed. Obviously that was an unscientific informal poll, but it might be something worth considering. ============ As soon as you allow the use of "from __future__ import unicode_literals" or a module level "__metaclass__ = type", you can't review diffs in isolation any more - whether the diff is correct or not will depend on the presence or absence of module level tweak to the language semantics. Future imports work well for things like absolute imports, new keywords, or statements becoming functions - if the future import is missing when you expected it to be present (or vice-versa) will result in a quick SyntaxError or ImportError that will point you directly to the offending code. Unicode literals and implicitly creating new-style classes are a different matter - for those, if the module level modification takes place (or doesn't take place when you expected it to be there), you get unexpected changes in behaviour instead of a clear exception that refers directly to the source of the problem. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Nick Coghlan <ncoghlan <at> gmail.com> writes:
The PEP already mentions that. In fact, all bar the first paragraph in the "Rationale and Goals" section discusses it. However, it's the last
I didn't meaning the __future__ import bit, but a discussion re. alternatives to u('xxx').
Future imports work well for things like absolute imports, new keywords, or statements becoming functions - if the future import is missing when you expected it to be present (or vice-versa) will result in a quick SyntaxError or ImportError that will point you directly to the offending code. Unicode literals and implicitly creating new-style classes are a different matter - for those, if the module level modification takes place (or doesn't take place when you expected it to be there), you get unexpected changes in behaviour instead of a clear exception that refers directly to the source of the problem.
I don't disagree with anything you said here. Perhaps I've been doing too much work recently with single 2.x/3.x codebase projects, so I've just gotten to like using Unicode literals without the u prefix. However, as the proposal doesn't force one to use u prefixes, I'm not really objecting, especially if it speeds transition to 3.x. Regards, Vinay Sajip
On 2/26/2012 6:14 AM, Nick Coghlan wrote:
As soon as you allow the use of "from __future__ import unicode_literals" or a module level "__metaclass__ = type", you can't review diffs in isolation any more - whether the diff is correct or not will depend on the presence or absence of module level tweak to the language semantics.
Future imports work well for things like absolute imports, new keywords, or statements becoming functions - if the future import is missing when you expected it to be present (or vice-versa) will result in a quick SyntaxError or ImportError that will point you directly to the offending code. Unicode literals and implicitly creating new-style classes are a different matter - for those, if the module level modification takes place (or doesn't take place when you expected it to be there), you get unexpected changes in behaviour instead of a clear exception that refers directly to the source of the problem. There are already __future__ imports that violate this principle: from __future__ import division. That doesn't mean I'm in favor of this new __future__, just keeping a wide angle on the viewfinder.
--Ned.
On Sun, Feb 26, 2012 at 10:34 PM, Ned Batchelder <ned@nedbatchelder.com> wrote:
There are already __future__ imports that violate this principle: from __future__ import division. That doesn't mean I'm in favor of this new __future__, just keeping a wide angle on the viewfinder.
Armin's straw poll was actually about whether or not people used the future import for division, rather than unicode literals. It is indeed the same problem - and several of us had a strong preference for forcing float division with "float(x) / y" over relying on the long distance effect of the future import (although it was only in this thread that I figured out exactly *why* I don't like those two, but happily used many of the other future imports when they were necessary). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Nick Coghlan wrote:
Armin's straw poll was actually about whether or not people used the future import for division, rather than unicode literals. It is indeed the same problem
There are differences, though. Personally I'm very glad of the division import -- it's the only thing that keeps me sane when using floats. The alternative is not only butt-ugly but imposes an annoying performance penalty. I don't mind occasionally needing to glance at the top of a module in order to get the benefits. On the other hand, it's not much of a burden to put 'u' in front of string literals, and there is no performance difference. -- Greg
Hi, On 2/26/12 12:34 PM, Ned Batchelder wrote:
There are already __future__ imports that violate this principle: from __future__ import division. That doesn't mean I'm in favor of this new __future__, just keeping a wide angle on the viewfinder. That's actually mentioned in the PEP :-)
A quick poll on Twitter about the use of the division future import supported my suspicions that people opt out of behaviour-changing future imports because they are a maintenance burden. Every time you review code you have to check the top of the file to see if the behaviour was changed.
Regards, Armin
This seems like too strong a statement: "Python 2.6 and Python 2.7 support syntax features from Python 3 which for the most part make a unified code base possible. Many thought that the unicode_literals future import might make a common source possible, but it turns out that it's doing more harm than good." While it may be true for *some* problem domains, such as WSGI apps, it is not true in general, IMO. I use this future import all the time in both libraries and applications and it's almost always helpful. Cheers, -Barry
Some microbenchmarks: $ python -m timeit -n 10000 -r 100 -s "x = 123" "'foobarbaz_%d' % x" 10000 loops, best of 100: 1.24 usec per loop $ python -m timeit -n 10000 -r 100 -s "x = 123" "str('foobarbaz_%d') % x" 10000 loops, best of 100: 1.59 usec per loop $ python -m timeit -n 10000 -r 100 -s "x = 123" "str(u'foobarbaz_%d') % x" 10000 loops, best of 100: 1.58 usec per loop $ python -m timeit -n 10000 -r 100 -s "x = 123; n = lambda s: s" "n('foobarbaz_%d') % x" 10000 loops, best of 100: 1.41 usec per loop $ python -m timeit -n 10000 -r 100 -s "x = 123; s = 'foobarbaz_%d'" "s % x" 10000 loops, best of 100: 1.22 usec per loop There are no significant overhead to use converters.
Hi, On 2/26/12 12:35 PM, Serhiy Storchaka wrote:
Some microbenchmarks:
$ python -m timeit -n 10000 -r 100 -s "x = 123" "'foobarbaz_%d' % x" 10000 loops, best of 100: 1.24 usec per loop $ python -m timeit -n 10000 -r 100 -s "x = 123" "str('foobarbaz_%d') % x" 10000 loops, best of 100: 1.59 usec per loop $ python -m timeit -n 10000 -r 100 -s "x = 123" "str(u'foobarbaz_%d') % x" 10000 loops, best of 100: 1.58 usec per loop $ python -m timeit -n 10000 -r 100 -s "x = 123; n = lambda s: s" "n('foobarbaz_%d') % x" 10000 loops, best of 100: 1.41 usec per loop $ python -m timeit -n 10000 -r 100 -s "x = 123; s = 'foobarbaz_%d'" "s % x" 10000 loops, best of 100: 1.22 usec per loop
There are no significant overhead to use converters. That's because what you're benchmarking here more than anything is the overhead of eval() :-) See the benchmark linked in the PEP for one that measures the actual performance of the string literal / wrapper.
Regards, Armin
26.02.12 14:42, Armin Ronacher написав(ла):
On 2/26/12 12:35 PM, Serhiy Storchaka wrote:
Some microbenchmarks:
$ python -m timeit -n 10000 -r 100 -s "x = 123" "'foobarbaz_%d' % x" 10000 loops, best of 100: 1.24 usec per loop $ python -m timeit -n 10000 -r 100 -s "x = 123" "str('foobarbaz_%d') % x" 10000 loops, best of 100: 1.59 usec per loop $ python -m timeit -n 10000 -r 100 -s "x = 123" "str(u'foobarbaz_%d') % x" 10000 loops, best of 100: 1.58 usec per loop $ python -m timeit -n 10000 -r 100 -s "x = 123; n = lambda s: s" "n('foobarbaz_%d') % x" 10000 loops, best of 100: 1.41 usec per loop $ python -m timeit -n 10000 -r 100 -s "x = 123; s = 'foobarbaz_%d'" "s % x" 10000 loops, best of 100: 1.22 usec per loop
There are no significant overhead to use converters. That's because what you're benchmarking here more than anything is the overhead of eval() :-) See the benchmark linked in the PEP for one that measures the actual performance of the string literal / wrapper.
$ python -m timeit -n 10000 -r 100 "" 10000 loops, best of 100: 0.087 usec per loop Overhead of eval is 5%. Real code is not single string literal, every string literal occured together with a lot of code (getting and setting variables, attribute access, function calls, binary operators, unconditional and conditional jumps, etc), and total effect of using simple converter will be insignificant.
There are no significant overhead to use converters. That's because what you're benchmarking here more than anything is the overhead of eval() :-) See the benchmark linked in the PEP for one that measures the actual performance of the string literal / wrapper.
There are a few other unproven performance claims in the PEP. Can you kindly provide the benchmarks you have been using? In particular, I'm interested in the claim " In many cases 2to3 runs one or two orders of magnitude slower than the testsuite for the library or application it's testing." Regards, Martin
Hi, On 2/27/12 10:17 AM, "Martin v. Löwis" wrote:
There are a few other unproven performance claims in the PEP. Can you kindly provide the benchmarks you have been using? In particular, I'm interested in the claim " In many cases 2to3 runs one or two orders of magnitude slower than the testsuite for the library or application it's testing." The benchmarks used are linked in the PEP.
Regards, Armin
Zitat von Armin Ronacher <armin.ronacher@active-4.com>:
Hi,
On 2/27/12 10:17 AM, "Martin v. Löwis" wrote:
There are a few other unproven performance claims in the PEP. Can you kindly provide the benchmarks you have been using? In particular, I'm interested in the claim " In many cases 2to3 runs one or two orders of magnitude slower than the testsuite for the library or application it's testing." The benchmarks used are linked in the PEP.
Maybe I'm missing something, but there doesn't seem to be a benchmark that measures the 2to3 performance, supporting the claim that it runs "two orders of magnitude" slower (which I'd interpret as a factor of 100). If the claim actually cannot be supported, please remove it from the PEP. Regards, Martin
Hi, On 2/27/12 4:44 PM, martin@v.loewis.de wrote:
Maybe I'm missing something, but there doesn't seem to be a benchmark that measures the 2to3 performance, supporting the claim that it runs "two orders of magnitude" slower (which I'd interpret as a factor of 100). My Jinja2+Werkzeug's testsuite combined takes 2 seconds to run (Werkzeug actually takes 3 because it pauses for two seconds in a cache expiration test). 2to3 takes 45 seconds to run. And those are small code bases (15K lines combined).
It's not exactly two orders of magnitude so I will probably change the writing to "just" 20 times slower but it illustrates the point. Regards, Armin
Armin Ronacher wrote:
Hi,
On 2/27/12 4:44 PM, martin@v.loewis.de wrote:
Maybe I'm missing something, but there doesn't seem to be a benchmark that measures the 2to3 performance, supporting the claim that it runs "two orders of magnitude" slower (which I'd interpret as a factor of 100). My Jinja2+Werkzeug's testsuite combined takes 2 seconds to run (Werkzeug actually takes 3 because it pauses for two seconds in a cache expiration test). 2to3 takes 45 seconds to run. And those are small code bases (15K lines combined).
It's not exactly two orders of magnitude so I will probably change the writing to "just" 20 times slower but it illustrates the point.
That would be one order of magnitude. -- Steven
Am 27.02.2012 22:35, schrieb Armin Ronacher:
Hi,
On 2/27/12 4:44 PM, martin@v.loewis.de wrote:
Maybe I'm missing something, but there doesn't seem to be a benchmark that measures the 2to3 performance, supporting the claim that it runs "two orders of magnitude" slower (which I'd interpret as a factor of 100). My Jinja2+Werkzeug's testsuite combined takes 2 seconds to run (Werkzeug actually takes 3 because it pauses for two seconds in a cache expiration test). 2to3 takes 45 seconds to run. And those are small code bases (15K lines combined).
I'm not quite able to reproduce that. I don't know how to run the Jinja2 and Werkzeug test suites combined (Werkzeug's setup.py install gives SyntaxError on Python3). So taking Jinja2 alone, this is what I get: - test suite run: 0.86s (python setup.py test) - 2to3 run: 6.7s (python3 setup.py build, using default:3328e388cb28) So this is less than a factor of ten, but more importantly, much shorter than 45s. I also claim that the example is atypical, in that the test suite completes so quickly. Taking distribute 0.6.24 as a counter-example: - test suite run: 9s - 2to3 run: 7s So the test suite runs longer than the build process. Therefore, even a claim "In many cases 2to3 runs 20 times slower than the testsuite for the library or application it's testing" cannot be substantiated, as cannot the claim "This for instance is the case for the Jinja2 library". On the contrary, I'd expect that the build time using 2to3 is significantly shorter than the test suite run times, *in particular* for large projects. For example, for Django, 2to3 takes less than 3 minutes (IIRC), and the test suite runs an hour or so (depending on how many tests get skipped). Regards, Martin
On Tue, 28 Feb 2012 10:02:46 +0100 "Martin v. Löwis" <martin@v.loewis.de> wrote:
On the contrary, I'd expect that the build time using 2to3 is significantly shorter than the test suite run times, *in particular* for large projects. For example, for Django, 2to3 takes less than 3 minutes (IIRC), and the test suite runs an hour or so (depending on how many tests get skipped).
In the end, that's not particularly relevant, because you don't have to run the test suite entirely; when working on small changes, you usually re-run the impacted parts of the test suite until everything goes fine; on the other hand, 2to3 *has* to run on the entire code base. So, really, it's a couple of seconds to run a single bunch of tests vs. several minutes to run 2to3 on the code base. And it's not just the test suite: every concrete experiment with the library you're porting has a serial dependency on running 2to3. Regards Antoine.
In the end, that's not particularly relevant, because you don't have to run the test suite entirely; when working on small changes, you usually re-run the impacted parts of the test suite until everything goes fine; on the other hand, 2to3 *has* to run on the entire code base.
Not at all. If you are working on the code, 2to3 only needs to run on the parts of the code that you changed, since the unmodified parts will not need to be re-transformed using 2to3.
So, really, it's a couple of seconds to run a single bunch of tests vs. several minutes to run 2to3 on the code base.
Not in my experience. The incremental run-time of 2to3 after a single change is in the order of fractions of a second.
And it's not just the test suite: every concrete experiment with the library you're porting has a serial dependency on running 2to3.
Therefore, your build process should support incremental changes. Fortunately, distribute does support this approach. Regards, Martin
On Sun, 26 Feb 2012 12:42:53 +0000 Armin Ronacher <armin.ronacher@active-4.com> wrote:
Hi,
On 2/26/12 12:35 PM, Serhiy Storchaka wrote:
Some microbenchmarks:
$ python -m timeit -n 10000 -r 100 -s "x = 123" "'foobarbaz_%d' % x" 10000 loops, best of 100: 1.24 usec per loop $ python -m timeit -n 10000 -r 100 -s "x = 123" "str('foobarbaz_%d') % x" 10000 loops, best of 100: 1.59 usec per loop $ python -m timeit -n 10000 -r 100 -s "x = 123" "str(u'foobarbaz_%d') % x" 10000 loops, best of 100: 1.58 usec per loop $ python -m timeit -n 10000 -r 100 -s "x = 123; n = lambda s: s" "n('foobarbaz_%d') % x" 10000 loops, best of 100: 1.41 usec per loop $ python -m timeit -n 10000 -r 100 -s "x = 123; s = 'foobarbaz_%d'" "s % x" 10000 loops, best of 100: 1.22 usec per loop
There are no significant overhead to use converters. That's because what you're benchmarking here more than anything is the overhead of eval() :-) See the benchmark linked in the PEP for one that measures the actual performance of the string literal / wrapper.
Could you update your benchmarks with the caching version of u()? Thanks Antoine.
Hi, On Sat, 25 Feb 2012 20:23:39 +0000 Armin Ronacher <armin.ronacher@active-4.com> wrote:
I just uploaded PEP 414 which proposes am optional 'u' prefix for string literals for Python 3.
You can read the PEP online: http://www.python.org/dev/peps/pep-0414/
I don't understand this sentence:
The automatic upgrading of binary strings to unicode strings that would be enabled by this proposal would make it much easier to port such libraries over.
What "automatic upgrading" is that talking about?
For instance, the urllib module in Python 2 is using byte strings, and the one in Python 3 is using unicode strings.
Are you talking about urllib.parse perhaps?
By leveraging a native string, users can avoid having to adjust for that.
What does "leveraging a native string" mean here?
The following is an incomplete list of APIs and general concepts that use native strings and need implicit upgrading to unicode in Python 3, and which would directly benefit from this support
I'm confused. This PEP talks about unicode literals, not native string literals, so why would these APIs "directly benefit from this support"? Thanks Antoine.
On 26 Feb 2012, at 17:45, Antoine Pitrou wrote:
Hi,
On Sat, 25 Feb 2012 20:23:39 +0000 Armin Ronacher <armin.ronacher@active-4.com> wrote:
I just uploaded PEP 414 which proposes am optional 'u' prefix for string literals for Python 3.
You can read the PEP online: http://www.python.org/dev/peps/pep-0414/
I don't understand this sentence:
The automatic upgrading of binary strings to unicode strings that would be enabled by this proposal would make it much easier to port such libraries over.
What "automatic upgrading" is that talking about?
If you use native string syntax (no prefix) then moving from Python 2 to Python 3 automatically "upgrades" (I agree an odd choice of word) byte string literals to unicode string literals.
For instance, the urllib module in Python 2 is using byte strings, and the one in Python 3 is using unicode strings.
Are you talking about urllib.parse perhaps?
By leveraging a native string, users can avoid having to adjust for that.
What does "leveraging a native string" mean here?
By using native string syntax (without the unicode literals future import) then apis that take a binary string in Python 2 and a unicode string in Python 3 "just work" with the same syntax. You are "leveraging" native syntax to use the same apis with different types across the different version of Python.
The following is an incomplete list of APIs and general concepts that use native strings and need implicit upgrading to unicode in Python 3, and which would directly benefit from this support
I'm confused. This PEP talks about unicode literals, not native string literals, so why would these APIs "directly benefit from this support"?
Because sometimes in your code you want to specify "native strings" and sometimes you want to specify Unicode strings. There is no single *syntax* that is compatible with both Python 2 and Python 3 that permits this. (If you use "u" for Unicode in Python 2 and no prefix for native strings then your code is Python 3 incompatible, if you use the future import so that your strings are unicode in both Python 2 and Python 3 then you lose the syntax for native strings.) Michael
Thanks
Antoine.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.u...
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html
Hi, On 2/26/12 5:45 PM, Antoine Pitrou wrote:
The automatic upgrading of binary strings to unicode strings that would be enabled by this proposal would make it much easier to port such libraries over.
What "automatic upgrading" is that talking about? The word "upgrade" is probably something that should be changed. It refers to the fact that 'foo' is a bytestring in 2.x and the same syntax means a unicode string in Python 3. This is exactly what is necessary for interfaces that were promoted to unicode interfaces in Python 3 (for instance Python identifiers, URLs etc.)
Are you talking about urllib.parse perhaps? Not only the parsing module. Headers on the urllib.request module are unicode as well. What the PEP is referring to is the urllib/urlparse and cgi module which was largely consolidated to the urllib package in Python 3.
What does "leveraging a native string" mean here? It means by using a native string to achieve the automatic upgrading which "does the right thing" in a lot of situations.
I'm confused. This PEP talks about unicode literals, not native string literals, so why would these APIs "directly benefit from this support"? The native string literal already exists. It disappears if `unicode_literals` are future imported which is why this is relevant since the unicode literals future import in 2.x is recommended by some for making libraries run in both 2.x and 3.x.
Regards, Armin
Il 25 febbraio 2012 21:23, Armin Ronacher <armin.ronacher@active-4.com> ha scritto:
Hi,
I just uploaded PEP 414 which proposes am optional 'u' prefix for string literals for Python 3.
You can read the PEP online: http://www.python.org/dev/peps/pep-0414/
This is a followup to the discussion about this topic here on the mailinglist and on twitter/IRC over the last few weeks.
Regards, Armin
If the main point of this proposal is avoiding an explicit 2to3 run on account of 2to3 being too slow then I'm -1. That should be fixed at 2to3 level, not at python syntax level. A common strategy to distribute code able to run on both python 2 and python 3 is using the following hack in setup.py: http://docs.python.org/dev/howto/pyporting.html#during-installation That's what I used in psutil and it works just fine. Also, I believe it's the *right* strategy as it lets you freely write python 2 code and avoid using ugly hacks such as "sys.exc_info()[1]" and "if PY3: ..." all around the place. 2to3 might be slow but introducing workarounds encouraging not to use it is only going to cause a proliferation of ugly and hackish code in the python ecosystem. Now, psutil is a relatively small project and the 2to3 conversion doesn't take much time. Having users "unawarely" run 2to3 at installation time is an acceptable burden in terms of speed. That's going to be different on larger code bases such as Twisted's. One way to fix that might be making 2to3 generate and rely on a "2to3.diff" file containing all the differences. That would be generated the first time "python setup.py build/install" is run and then partially re-calculated every time a file is modified. Third-party library vendors can include 2to3.diff as part of the tarball they distribute so that the end user won't experience any slow down deriving from the 2to3 conversion. --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/
On Mon, Feb 27, 2012 at 9:34 PM, Giampaolo Rodolà <g.rodola@gmail.com> wrote:
If the main point of this proposal is avoiding an explicit 2to3 run on account of 2to3 being too slow then I'm -1.
No, the main point is that adding a compile step to the Python development process sucks. The slow speed of 2to3 is one factor, but single source is just far, far, easier to maintain than continually running 2to3 to get a working Python 3 version. When we have the maintainers of major web frameworks and libraries telling us that this is a painful aspect for their ports (and, subsequently, the ports of their users), it would be irresponsible of us to ignore their feedback. Sure, some early adopters are happy with the 2to3 process, that's not in dispute. However, many developers are not, and (just as relevant) many folks that haven't started their ports yet have highlighted it as one of the aspects that bothers them. Is restoring support for unicode literals a small retrograde step that partially undoes the language cleanup that occurred in 3.0? Yes, it is. However, it really does significantly increase the amount of 2.x code that will *just run* on Python 3 (or will run with minor tweaks). I can live with that - as MvL said, this is a classic case of practicality beating purity. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Feb 27, 2012, at 12:34 PM, Giampaolo Rodolà wrote:
Il 25 febbraio 2012 21:23, Armin Ronacher If the main point of this proposal is avoiding an explicit 2to3 run on account of 2to3 being too slow then I'm -1.
2to3's speed isn't the only problem with the tool, although it's a big one. It also doesn't always work, and it makes packaging libraries dependent on it more difficult. As for the "working" part, I forget the details, but let's say you have a test suite in your package. If you run `python setup.py test` in a Python 2 world, then `python3 setup.py test` may fail to build properly. IIRC this was due to some confusion that 2to3 had. I've no doubt that these things can be fixed, but why? I'd much rather see the effort put into allowing us to write Python 3 code natively, with some accommodations for Python 2 from a single code base for the last couple of years that that will still be necessary <wink>. Cheers, -Barry
Barry Warsaw <barry <at> python.org> writes:
As for the "working" part, I forget the details, but let's say you have a test suite in your package. If you run `python setup.py test` in a Python 2 world, then `python3 setup.py test` may fail to build properly. IIRC this was due to some confusion that 2to3 had.
There are other things, too, which make 2to3 a good advisory tool rather than a fully automated solution. 2to3 does a pretty good job of solving a difficult problem, but there are some things it just won't be able to do. For example, it assumes that certain method names belong to dictionaries and wraps their result with a list() because 3.x produces iterators where 2.x produces lists. This has caused problems in practice, e.g. with Django where IIRC calls to the values() method of querysets were wrapped with list(), when it was wrong to do so. Regards, Vinay Sajip
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 02/27/2012 06:34 AM, Giampaolo Rodolà wrote:
Il 25 febbraio 2012 21:23, Armin Ronacher <armin.ronacher@active-4.com> ha scritto:
Hi,
I just uploaded PEP 414 which proposes am optional 'u' prefix for string literals for Python 3.
You can read the PEP online: http://www.python.org/dev/peps/pep-0414/
This is a followup to the discussion about this topic here on the mailinglist and on twitter/IRC over the last few weeks.
Regards, Armin
If the main point of this proposal is avoiding an explicit 2to3 run on account of 2to3 being too slow then I'm -1.
The main point is that 2to3 as a strategy for "straddling" python2 and python3 is a showstopper for folks who actually need to straddle (as opposed to one-time conversion): - - 2to3 erformance on large projects sucks. - - 2to3 introduces oddities in testing, coverage, etc. - - 2to3 creates problems with stack traces / bug reports from Py3k users. There are a *lot* of folks who have abandoned 2to3 in favor of "single codebase": the PEP addresses one of the last remaining issues to making such codebases clean and easy to maintain (the sys.exec_info hack is not needed in Python >= 2.6). Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk9Lmj4ACgkQ+gerLs4ltQ5wBgCfXWUe81vnQh5ptKpGhqLTOL5L oUgAnRrgEUFIq85rgGU6Ky3kN+KzZaqV =CNVl -----END PGP SIGNATURE-----
participants (32)
-
"Martin v. Löwis"
-
Antoine Pitrou
-
Armin Ronacher
-
Barry Warsaw
-
Brett Cannon
-
Brian Curtin
-
Chris McDonough
-
Ethan Furman
-
Ezio Melotti
-
Fredrik Håård
-
Giampaolo Rodolà
-
Greg Ewing
-
Guido van Rossum
-
Jesse Noller
-
Jim J. Jewett
-
Lennart Regebro
-
martin@v.loewis.de
-
Matej Cepl
-
Merlijn van Deen
-
Michael Foord
-
Ned Batchelder
-
Nick Coghlan
-
Paul Moore
-
R. David Murray
-
Serhiy Storchaka
-
Stefan Krah
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Terry Reedy
-
Tres Seaver
-
Vinay Sajip
-
Éric Araujo