Smoothing the transition from Python 2 to 3
[I've posted something about this on python-ideas but since I now have some basic working code, I think it is more than an idea.] I think the uptake of Python 3 is starting to accelerate. That's good. However, there are still millions or maybe billions of lines of Python code that still needs to be ported. It is beneficial to the Python ecosystem if this code can get ported. My idea is to make a stepping stone version of Python, between 2.7.x and 3.x that eases the porting job. The high level goals are: - code coming out of 2to3 runs correctly on this modified Python - code that runs without warnings on this modified Python will run correctly on Python 3.x. Achieving these goals is not technically possible. Still, I want to reduce as much as possible the manual work involved in porting. Incrementally fixing code that generates warnings is a lot easier than trying to fix an entire application or library at once. I have a very early version on github: https://github.com/nascheme/ppython I'm hoping if people find it useful then they would contribute backwards compatibility fixes that help their applications or librarys run. I am currently running a newly 2to3 ported application on it. At this time there is no warning generated but I would rather get a warning then have one of my customers run into a porting bug. To be clear, I'm not proposing that these backwards compatiblity features go into Python 3.x or that this modified Python becomes the standard version. It is purely an intermediate step in getting code ported to Python 3. I've temporarily named it "Pragmatic Python". I'd like a better name if someone can suggest one. Maybe something like Perverted, Debauched or Impure Python. Regards, Neil
On Jun 8, 2016 4:04 PM, "Neil Schemenauer"
[I've posted something about this on python-ideas but since I now have some basic working code, I think it is more than an idea.]
I think the uptake of Python 3 is starting to accelerate. That's good. However, there are still millions or maybe billions of lines of Python code that still needs to be ported. It is beneficial to the Python ecosystem if this code can get ported.
My idea is to make a stepping stone version of Python, between 2.7.x and 3.x that eases the porting job. The high level goals are:
- code coming out of 2to3 runs correctly on this modified Python
- code that runs without warnings on this modified Python will run correctly on Python 3.x.
Achieving these goals is not technically possible. Still, I want to reduce as much as possible the manual work involved in porting. Incrementally fixing code that generates warnings is a lot easier than trying to fix an entire application or library at once.
I have a very early version on github:
https://github.com/nascheme/ppython
I'm hoping if people find it useful then they would contribute backwards compatibility fixes that help their applications or librarys run. I am currently running a newly 2to3 ported application on it. At this time there is no warning generated but I would rather get a warning then have one of my customers run into a porting bug.
To be clear, I'm not proposing that these backwards compatiblity features go into Python 3.x or that this modified Python becomes the standard version. It is purely an intermediate step in getting code ported to Python 3.
I've temporarily named it "Pragmatic Python". I'd like a better name if someone can suggest one. Maybe something like Perverted, Debauched or Impure Python.
...Perverted Python? Ouch. What about something like "unpythonic" or similar?
Regards,
Neil _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com
-- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong. http://kirbyfan64.github.io/
On 06/08/2016 02:40 PM, Fred Drake wrote:
On Wed, Jun 8, 2016 at 5:33 PM, Ryan Gonzalez
wrote: What about something like "unpythonic" or similar?
Or perhaps... antipythy?
That's awfully close to antipathy [1], my path module on PyPI. Besides, I liked the suggestion from the -ideas list: Python 2therescue. ;) -- ~Ethan~ [1] https://pypi.python.org/pypi/antipathy
On Thu, Jun 9, 2016 at 6:16 PM, Ethan Furman
That's awfully close to antipathy [1], my path module on PyPI.
Good point. Increasing confusion would not help.
Besides, I liked the suggestion from the -ideas list: Python 2therescue. ;)
Nice; I like that too. :-) -Fred -- Fred L. Drake, Jr. <fred at fdrake.net> "A storm broke loose in my mind." --Albert Einstein
On Jun 8, 2016 4:04 PM, "Neil Schemenauer"
mailto:neil@python.ca> wrote: I've temporarily named it "Pragmatic Python". I'd like a better name if someone can suggest one. Maybe something like Perverted, Debauched or Impure Python.
Python Two and Three Quarters. -- Greg
On Thu, Jun 09, 2016 at 10:08:50AM +1200, Greg Ewing
On Jun 8, 2016 4:04 PM, "Neil Schemenauer"
mailto:neil@python.ca> wrote: I've temporarily named it "Pragmatic Python". I'd like a better name if someone can suggest one. Maybe something like Perverted, Debauched or Impure Python.
Python Two and Three Quarters.
QOTW! :-D
-- Greg
Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.
2016-06-08 23:01 GMT+02:00 Neil Schemenauer
- code coming out of 2to3 runs correctly on this modified Python
Stop using 2to3. This tool adds many useless changes when you only care of Python 2.7 and Python 3.4+. I suggest to use better tools like 2to6, modernize or my own tool: https://pypi.python.org/pypi/sixer "Add Python 3 support to Python 2 applications using the six module." Victor
Or write your own set of 2to3 fixers that *are* necessary.
On Wed, Jun 8, 2016 at 6:11 PM, Victor Stinner
2016-06-08 23:01 GMT+02:00 Neil Schemenauer
: - code coming out of 2to3 runs correctly on this modified Python
Stop using 2to3. This tool adds many useless changes when you only care of Python 2.7 and Python 3.4+. I suggest to use better tools like 2to6, modernize or my own tool: https://pypi.python.org/pypi/sixer
"Add Python 3 support to Python 2 applications using the six module."
Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
On 8 June 2016 at 14:01, Neil Schemenauer
[I've posted something about this on python-ideas but since I now have some basic working code, I think it is more than an idea.]
I think the uptake of Python 3 is starting to accelerate. That's good. However, there are still millions or maybe billions of lines of Python code that still needs to be ported. It is beneficial to the Python ecosystem if this code can get ported.
My idea is to make a stepping stone version of Python, between 2.7.x and 3.x that eases the porting job. The high level goals are:
- code coming out of 2to3 runs correctly on this modified Python
- code that runs without warnings on this modified Python will run correctly on Python 3.x.
As Victor noted, and as the porting guide describes in https://docs.python.org/3/howto/pyporting.html#update-your-code, we've determined that 2to3 isn't the best choice of tool for folks that can't afford to immediately drop Python 2 support. Once you switch to those now recommended more conservative migration tools, the tool suite you request already exists: - update your code with modernize or futurize - check it still runs on Python 2.7 - check it doesn't generate warnings under 2.7's "-3" switch - check it passes "pylint --py3k" - check if it runs on Python 3.5 Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Thu, 9 Jun 2016 at 14:56 Nick Coghlan
On 8 June 2016 at 14:01, Neil Schemenauer
wrote: [I've posted something about this on python-ideas but since I now have some basic working code, I think it is more than an idea.]
I think the uptake of Python 3 is starting to accelerate. That's good. However, there are still millions or maybe billions of lines of Python code that still needs to be ported. It is beneficial to the Python ecosystem if this code can get ported.
My idea is to make a stepping stone version of Python, between 2.7.x and 3.x that eases the porting job. The high level goals are:
- code coming out of 2to3 runs correctly on this modified Python
- code that runs without warnings on this modified Python will run correctly on Python 3.x.
As Victor noted, and as the porting guide describes in https://docs.python.org/3/howto/pyporting.html#update-your-code, we've determined that 2to3 isn't the best choice of tool for folks that can't afford to immediately drop Python 2 support.
Once you switch to those now recommended more conservative migration tools, the tool suite you request already exists:
- update your code with modernize or futurize - check it still runs on Python 2.7 - check it doesn't generate warnings under 2.7's "-3" switch - check it passes "pylint --py3k" - check if it runs on Python 3.5
`python3.5 -bb` is best to help keep Python 2.7 compatibility, otherwise what Nick said. :)
On 2016-06-09, Brett Cannon wrote:
On Thu, 9 Jun 2016 at 14:56 Nick Coghlan
wrote: Once you switch to those now recommended more conservative migration tools, the tool suite you request already exists:
- update your code with modernize or futurize - check it still runs on Python 2.7 - check it doesn't generate warnings under 2.7's "-3" switch - check it passes "pylint --py3k" - check if it runs on Python 3.5
`python3.5 -bb` is best to help keep Python 2.7 compatibility, otherwise what Nick said. :)
I have to wonder if you guys actually ported at lot of Python 2 code. Maybe you somehow avoided the problematic behavior. Below is a pretty trival set of functions. The tools you recommend do not help at all. One problem is that the str literals should be bytes literals. Comparison with None needs to be avoided. With Python 2 code runs successfully. With Python 3 the code crashes with a traceback. With my modified Python 3.6, the code runs successfully but generates the following warnings: test.py:13: DeprecationWarning: encoding bytes to str output.write('%d:' % len(s)) test.py:14: DeprecationWarning: encoding bytes to str output.write(s) test.py:15: DeprecationWarning: encoding bytes to str output.write(',') test.py:5: DeprecationWarning: encoding bytes to str if c == ':': test.py:9: DeprecationWarning: encoding bytes to str size += c test.py:24: DeprecationWarning: encoding bytes to str data = data + s test.py:26: DeprecationWarning: encoding bytes to str if input.read(1) != ',': test.py:31: DeprecationWarning: default compare is depreciated if a > 0: It is very easy for me to find code written for Python 2 that will fail in the same way. According to you guys, there is no problem and we already have good enough tooling. ;-(
On Thu, 9 Jun 2016 at 16:08 Neil Schemenauer
On 2016-06-09, Brett Cannon wrote:
On Thu, 9 Jun 2016 at 14:56 Nick Coghlan
wrote: Once you switch to those now recommended more conservative migration tools, the tool suite you request already exists:
- update your code with modernize or futurize - check it still runs on Python 2.7 - check it doesn't generate warnings under 2.7's "-3" switch - check it passes "pylint --py3k" - check if it runs on Python 3.5
`python3.5 -bb` is best to help keep Python 2.7 compatibility, otherwise what Nick said. :)
I have to wonder if you guys actually ported at lot of Python 2 code.
Yes I have, including code that needed to be 2.4-3.4 compatible of all things. Plus I'm the author of the porting HOWTO so I know the edge cases pretty well. I don't think you meant for what you said to sound insulting, Neil, but it did feel like it upon first reading.
Maybe you somehow avoided the problematic behavior. Below is a pretty trival set of functions. The tools you recommend do not help at all. One problem is that the str literals should be bytes literals.
At least for Modernize that's on purpose as it can't tell semantically what is meant to be binary data vs. textual ASCII data (which you obviously know, else you wouldn't be trying to add runtime warnings for this sort of stuff).
Comparison with None needs to be avoided.
With Python 2 code runs successfully. With Python 3 the code crashes with a traceback. With my modified Python 3.6, the code runs successfully but generates the following warnings:
test.py:13: DeprecationWarning: encoding bytes to str output.write('%d:' % len(s)) test.py:14: DeprecationWarning: encoding bytes to str output.write(s) test.py:15: DeprecationWarning: encoding bytes to str output.write(',') test.py:5: DeprecationWarning: encoding bytes to str if c == ':': test.py:9: DeprecationWarning: encoding bytes to str size += c test.py:24: DeprecationWarning: encoding bytes to str data = data + s test.py:26: DeprecationWarning: encoding bytes to str if input.read(1) != ',': test.py:31: DeprecationWarning: default compare is depreciated if a > 0:
It is very easy for me to find code written for Python 2 that will fail in the same way. According to you guys, there is no problem and we already have good enough tooling. ;-(
That's not what I'm saying at all (nor what I think Nick is saying); more tooling to ease the transition is always welcomed. The point we are trying to make is 2to3 is not considered best practice anymore, and so targeting its specific output might not be the best use of your time. I'm totally happy to have your fork work out and help give warnings for situations where runtime semantics are the only way to know there will be a problem that static analyzing tools can't handle and have the porting HOWTO updated so that people can run their test suite with your interpreter to help with that final bit of porting. I personally just don't want to see you waste time on warnings that are handled by the tools already or ignore the fact that six, modernize, and futurize can help more than 2to3 typically can with the easy stuff when trying to keep 2/3 compatibility. IOW some of us have become allergic to the word "2to3" in regards to porting. :) But if you want to target 2to3 output then by all means please do and your work will still be appreciated. And I should also mention in case you don't know -- and assuming I'm remembering correctly -- that adding new Py3kWarning cases to Python 2.7 is still allowed, so if there is a warning you want to add that makes sense to be upstream then we can consider adding it in Python 2.7.12 (or later).
On 2016-06-09, Brett Cannon wrote:
I don't think you meant for what you said to sound insulting, Neil, but it did feel like it upon first reading.
Sorry, I think I misunderstood what you and Nick were saying. I've experienced a fair amount of negative feedback on my idea so I'm pretty cranky at this point. Amber Brown claimed that she spent $60k of her time porting Twisted to Python 3. I think there is lots of room to make our porting tools better. Using something like modernize, 2to6, or sixer seems like a better idea than trying to improve on 2to3. I agree on that point. However, those tools combined with my modified Python 3.6 makes for a much easier migration path than going directly to Python 3.x. My runtime warnings catch many common problems and make it easy to see what needs fixing. We have a lot more freedom to put ugly, backwards compatibility hacks into this stepping stone version, rather than changing either Python 2.7.x or the main 3.x line. I'm hoping to get community contributions to add more backwards compatibility and runtime warnings.
On Jun 09, 2016, at 05:35 PM, Neil Schemenauer wrote:
Amber Brown claimed that she spent $60k of her time porting Twisted to Python 3. I think there is lots of room to make our porting tools better.
Amber gave a presentation at the language summit and a Pycon talk. The latter video is up on YouTube but the former wasn't recorded. I'm hoping Jake will post a summary of it though. She's done a truly impressive amount of work in porting Twisted and has a lot of good insight. I've ported a fair bit, but nothing of the size and complexity of Twisted. FWIW, I did port the Mailman 3 core, which is now Python 3.4 and 3.5 compatible. In my own experience, and IIRC Amber had a similar experience, the ease of porting to Python 3 really comes down to how bytes/unicode clean your code base is. Almost all the other pieces are either pretty manageable or fairly easily automated. But if you're code isn't bytes-clean you're in for a world of hurt because you first have to decide how to represent those things. Twisted's job is especially fun because it's all about wire protocols, which I think Amber described as (paraphrasing) bytes that happen to have contents that look like strings. I've ported some libraries that weren't bytes-clean. With one of them, I actually failed twice before I hit on the correct representation. Once I got that right the rest went much more quickly. There's does seem to be a wide variety of experiences in porting to Python 3. I think is worth both accepting, acknowledging, and promoting that for a lot of code, it's really not that hard, but that for some code it's really painful. It's within our job to help understand the remaining pain and address it in some way. But let's also not scare people away from Python 3, because it *can* be very easy to port, and I think there's fairly widespread agreement that once you're in the Python 3 world, you don't want to look back. Cheers, -Barry
On 10 June 2016 at 03:13, Barry Warsaw
In my own experience, and IIRC Amber had a similar experience, the ease of porting to Python 3 really comes down to how bytes/unicode clean your code base is. Almost all the other pieces are either pretty manageable or fairly easily automated. But if you're code isn't bytes-clean you're in for a world of hurt because you first have to decide how to represent those things. Twisted's job is especially fun because it's all about wire protocols, which I think Amber described as (paraphrasing) bytes that happen to have contents that look like strings.
Although I have much less experience with porting than many others in this thread, that's my experience as well. Get a clear and well-understood separation of bytes and strings, and the rest of the porting exercise is (relatively!) straightforward. But if you just once think "I'm not quite sure, but I think I just need to decode here to be safe" and you'll be fighting Unicode errors for ever. My hope is that static typing tools like MyPy could help here. I typically review Python 2 code by mentally categorising which functions (theoretically) take bytes, which take strings, and which are confused. And sort things out from there. Type annotations seem like they'd help that process. But I've yet to use typing in practice, so it may not be that simple. Paul
On 10/06/2016 00:43, Brett Cannon wrote:
That's not what I'm saying at all (nor what I think Nick is saying); more tooling to ease the transition is always welcomed. The point we are trying to make is 2to3 is not considered best practice anymore, and so targeting its specific output might not be the best use of your time. I'm totally happy to have your fork work out and help give warnings for situations where runtime semantics are the only way to know there will be a problem that static analyzing tools can't handle and have the porting HOWTO updated so that people can run their test suite with your interpreter to help with that final bit of porting. I personally just don't want to see you waste time on warnings that are handled by the tools already or ignore the fact that six, modernize, and futurize can help more than 2to3 typically can with the easy stuff when trying to keep 2/3 compatibility. IOW some of us have become allergic to the word "2to3" in regards to porting. :) But if you want to target 2to3 output then by all means please do and your work will still be appreciated.
Given the above and that 2to3 appears to be unsupported* is there a case for deprecating it? * There are 46 outstanding issues on the bug tracker. Is the above the reason for this, I don't know? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence
On Thu, 9 Jun 2016 at 19:53 Mark Lawrence via Python-Dev < python-dev@python.org> wrote:
On 10/06/2016 00:43, Brett Cannon wrote:
That's not what I'm saying at all (nor what I think Nick is saying); more tooling to ease the transition is always welcomed. The point we are trying to make is 2to3 is not considered best practice anymore, and so targeting its specific output might not be the best use of your time. I'm totally happy to have your fork work out and help give warnings for situations where runtime semantics are the only way to know there will be a problem that static analyzing tools can't handle and have the porting HOWTO updated so that people can run their test suite with your interpreter to help with that final bit of porting. I personally just don't want to see you waste time on warnings that are handled by the tools already or ignore the fact that six, modernize, and futurize can help more than 2to3 typically can with the easy stuff when trying to keep 2/3 compatibility. IOW some of us have become allergic to the word "2to3" in regards to porting. :) But if you want to target 2to3 output then by all means please do and your work will still be appreciated.
Given the above and that 2to3 appears to be unsupported* is there a case for deprecating it?
I don't think so because it's still a useful transpiler tool. Basically the community has decided the standard rewriters included with 2to3 aren't how people prefer to port, but 2to3 as a tool is the basis of both modernize and futurize (as are some of those rewriters, but tweaked to do something different).
* There are 46 outstanding issues on the bug tracker. Is the above the reason for this, I don't know?
Typically the bugs are for the rewrite rules and they are for edge cases that no one wants to try and tackle as they are tough to cover (although this is based on what comes through my inbox so my generalization could be wrong). -Brett
-- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language.
Mark Lawrence
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/brett%40python.org
On 9 June 2016 at 16:43, Brett Cannon
That's not what I'm saying at all (nor what I think Nick is saying); more tooling to ease the transition is always welcomed.
What Brett said is mostly accurate for me, except with one slight caveat: I've been explicitly trying to nudge you towards making the *existing tools better*, rather than introducing new tools. With modernize and futurize we have a fairly clear trade-off ("Do you want your code to look more like Python 2 or more like Python 3?"), and things like "pylint --py3k" and the static analyzers are purely additive to the migration process (so folks can take them or leave them), but alternate interpreter builds and new converters have really high barriers to adoption. More -3 warnings in Python 2.7 are definitely welcome (since those can pick up runtime behaviors that the static analysers miss), and if there are things the existing code converters and static analysers *could* detect but don't, that's a fruitful avenue for improvement as well. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 6/10/2016 10:49 AM, Nick Coghlan wrote:
What Brett said is mostly accurate for me, except with one slight caveat: I've been explicitly trying to nudge you towards making the *existing tools better*, rather than introducing new tools. With modernize and futurize we have a fairly clear trade-off ("Do you want your code to look more like Python 2 or more like Python 3?"), and things like "pylint --py3k" and the static analyzers are purely additive to the migration process (so folks can take them or leave them), but alternate interpreter builds and new converters have really high barriers to adoption.
I agree with that idea. If there is anything that is "clean" enough, it should be merged with either 2.7.x or 3.x. There is nothing in my tree that can be usefully merged though.
More -3 warnings in Python 2.7 are definitely welcome (since those can pick up runtime behaviors that the static analysers miss), and if there are things the existing code converters and static analysers *could* detect but don't, that's a fruitful avenue for improvement as well. We are really limited on what can be done with the bytes/string issue because in Python 2 there is no distinct type for bytes. Also, the standard library does all sorts of unclean mixing of str and unicode so a warning would spew a lot of noise.
Likewise, a warning about comparison behavior (None, default ordering of types) would also not be useful because there is so much standard library code that would spew warnings.
On 10 June 2016 at 11:00, Neil Schemenauer
On 6/10/2016 10:49 AM, Nick Coghlan wrote:
More -3 warnings in Python 2.7 are definitely welcome (since those can pick up runtime behaviors that the static analysers miss), and if there are things the existing code converters and static analysers *could* detect but don't, that's a fruitful avenue for improvement as well.
We are really limited on what can be done with the bytes/string issue because in Python 2 there is no distinct type for bytes. Also, the standard library does all sorts of unclean mixing of str and unicode so a warning would spew a lot of noise.
Likewise, a warning about comparison behavior (None, default ordering of types) would also not be useful because there is so much standard library code that would spew warnings.
Implicitly enabling those warnings universally with -3 might not be an option then, but it may be feasible to have those warnings ignored by default, and allow people to enable them selectively for their own code via the warnings module. Failing that, you may be right that there's value in a permissive Python 3.x variant as an optional compatibility testing tool (I admit I originally thought you were proposing such an environment as a production deployment target for partially migrated code, which I'd be thoroughly against, but as a tool for running a test suite or experimentally migrated instance it would be closer in spirit to the -3 switch and the static analysers - folks can use it if they think it will help them, but they don't need to worry about it if they don't need it themselves) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Neil Schemenauer writes:
I have to wonder if you guys actually ported at lot of Python 2 code.
Python 3 (including stdlib) itself is quite a bit of code.
According to you guys, there is no problem
No, according to us, there are problems, but in the code, not in the language or its implementation. This is a Brooksian "no silver bullet" problem: it's very hard to write reliable code that handles multiple text representations (as pretty much everything does nowadays), except by converting to internal text on input and back to encoded text on output. The warnings you quote (and presumably the code that generates them) make assumptions (cf Barry's post) that are frequently invalid. I don't know about cross-type comparisons, but as Barry and Brett both pointed out, mixtures of bytes and text are *rarely* easy to fix, because it's often extremely difficult to know which is the appropriate representation for a given variable unless you do a complete refactoring as described above. When I've tried to fix such warnings one at a time, it's always been whack-a-mole. The experience in GNU Emacs and Mailman 2 has been that it took about ten years to get to the point where they went a whole year without an encoding bug once non-Latin-1 encodings were being handled. XEmacs OTOH took only about 3 years from the proof-of-concept introduction of multibyte characters to essentially no bugs (except in C code, of course!) because we had the same policy as Python 3: bytes and text don't mix, and in development we also would abort on mixing integers and characters (in GNU Emacs, the character type was the same as the integer type until very recently). We affectionately referred to those bugs as "Ebola" (not very polite, but it gets the point across about how seriously we took the idea of making the internal text representation completely opaque). In Mailman 2, we still can't say confidently that there are no Unicode bugs left even today. We still need an outer "except UnicodeError: quarantine_and_call_for_help(msg)" handler, although AFAIK it hasn't been reported for a couple years. It's not that you can't continue to run the potentially buggy code in Python 2. Mailman 2 does; you can, too. What we don't support (and I personally hope we never support) is running that code in Python 3 (warnings or no). If you want to support that yourself, more power to you, but I advise you that my experience suggests that it's not going to be a panacea, and I do believe it's going to be more trouble than biting the bullet and just thoroughly porting your code. Even if that takes as much time as it took Amber to port Twisted.
and we already have good enough tooling. ;-(
Nobody said that, just that the existing tooling is pretty good for the problems that tools can help with, while no tool is likely to be much help with some of the code your tool allows to run. You're welcome to try to prove that claim wrong -- if you do, it would indeed be very valuable! But I personally, based on my own experience, think that the chance of success is too low to justify the cost. (Granted, I don't have to port Twisted, so in that sense I'm biased. :-/ ) BTW tools continue to be added, as well as language changes (PEP 461!) There is no resistence to that. What you're running into here is that several of us have substantial experience with various of the issues raised, and that experience convinces us that there's no silver bullet, just hard work, if you face them. Steve
One problem is that the str literals should be bytes literals. Comparison with None needs to be avoided.
With Python 2 code runs successfully. With Python 3 the code crashes with a traceback. With my modified Python 3.6, the code runs successfully but generates the following warnings:
test.py:13: DeprecationWarning: encoding bytes to str output.write('%d:' % len(s)) test.py:14: DeprecationWarning: encoding bytes to str output.write(s) test.py:15: DeprecationWarning: encoding bytes to str output.write(',') test.py:5: DeprecationWarning: encoding bytes to str if c == ':': test.py:9: DeprecationWarning: encoding bytes to str size += c test.py:24: DeprecationWarning: encoding bytes to str data = data + s test.py:26: DeprecationWarning: encoding bytes to str if input.read(1) != ',': test.py:31: DeprecationWarning: default compare is depreciated if a > 0:
This seems _very_ useful; I'm surprised that other people don't think so too. Currently, the easiest way to find bytes/str errors in a big application is by running the program, finding where it crashes, fixing that one line (or hopefully wherever the data entered the system if you can find it), and repeating the process. This is nice because you can get in "fix my encoding errors" mode for more than just one traceback at a time; the new method would be to run the program, look at the millions of bytes/str errors, and fix everything that showed up in this round at once. That seems like a big win for productivity to me. Cody
On 10 June 2016 at 15:09, Cody Piersall
One problem is that the str literals should be bytes literals. Comparison with None needs to be avoided.
With Python 2 code runs successfully. With Python 3 the code crashes with a traceback. With my modified Python 3.6, the code runs successfully but generates the following warnings:
test.py:13: DeprecationWarning: encoding bytes to str output.write('%d:' % len(s)) test.py:14: DeprecationWarning: encoding bytes to str output.write(s) test.py:15: DeprecationWarning: encoding bytes to str output.write(',') test.py:5: DeprecationWarning: encoding bytes to str if c == ':': test.py:9: DeprecationWarning: encoding bytes to str size += c test.py:24: DeprecationWarning: encoding bytes to str data = data + s test.py:26: DeprecationWarning: encoding bytes to str if input.read(1) != ',': test.py:31: DeprecationWarning: default compare is depreciated if a > 0:
This seems _very_ useful; I'm surprised that other people don't think so too. Currently, the easiest way to find bytes/str errors in a big application is by running the program, finding where it crashes, fixing that one line (or hopefully wherever the data entered the system if you can find it), and repeating the process.
It *is* very nice. But...
This is nice because you can get in "fix my encoding errors" mode for more than just one traceback at a time; the new method would be to run the program, look at the millions of bytes/str errors, and fix everything that showed up in this round at once. That seems like a big win for productivity to me.
If you're fixing encoding errors at the point they occur, rather than looking at the high-level design of the program's handling of textual and bytestring data, you're likely to end up in a bit of a mess no matter how you locate the issues. Most likely because at the point in the code where the warning occurs, you no longer know what the correct encoding to use should be. But absolutely, anything that gives extra information about where the encoding hotspots are in your code is of value. Paul
On 10 June 2016 at 07:09, Cody Piersall
One problem is that the str literals should be bytes literals. Comparison with None needs to be avoided.
With Python 2 code runs successfully. With Python 3 the code crashes with a traceback. With my modified Python 3.6, the code runs successfully but generates the following warnings:
test.py:13: DeprecationWarning: encoding bytes to str output.write('%d:' % len(s)) test.py:14: DeprecationWarning: encoding bytes to str output.write(s) test.py:15: DeprecationWarning: encoding bytes to str output.write(',') test.py:5: DeprecationWarning: encoding bytes to str if c == ':': test.py:9: DeprecationWarning: encoding bytes to str size += c test.py:24: DeprecationWarning: encoding bytes to str data = data + s test.py:26: DeprecationWarning: encoding bytes to str if input.read(1) != ',': test.py:31: DeprecationWarning: default compare is depreciated if a > 0:
This seems _very_ useful; I'm surprised that other people don't think so too. Currently, the easiest way to find bytes/str errors in a big application is by running the program, finding where it crashes, fixing that one line (or hopefully wherever the data entered the system if you can find it), and repeating the process.
It could be very interesting to add an "ascii-warn" codec to Python 2.7, and then set that as the default encoding when the -3 flag is set. The expressed lack of interest has been in the idea of recommending people use an alternate interpreter build (which has nothing to do with the usefulness of the added warnings, and everything to do with the logistics of distributing and adopting alternate runtimes), rather than in the concept of improving the available runtime compatibility warnings. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Nick Coghlan
It could be very interesting to add an "ascii-warn" codec to Python 2.7, and then set that as the default encoding when the -3 flag is set.
I don't think that can work. The library code in Python would spew out warnings even in the cases when nothing is wrong with the application code. I think warnings have to be added to a Python where str and bytes have been properly separated. Without extreme backporting efforts, that means 3.x. We don't want to saddle 3.x with a bunch of backwards compatibility cruft. Maybe some of my runtime warning changes could be merged using a command line flag to enable them. It would be nice to have the stepping stone version just be normal 3.x with a command line option. However, for the sanity of people maintaining 3.x, I think perhaps we don't want to do it.
On 10 June 2016 at 16:36, Neil Schemenauer
Nick Coghlan
wrote: It could be very interesting to add an "ascii-warn" codec to Python 2.7, and then set that as the default encoding when the -3 flag is set.
I don't think that can work. The library code in Python would spew out warnings even in the cases when nothing is wrong with the application code. I think warnings have to be added to a Python where str and bytes have been properly separated. Without extreme backporting efforts, that means 3.x.
We don't want to saddle 3.x with a bunch of backwards compatibility cruft. Maybe some of my runtime warning changes could be merged using a command line flag to enable them. It would be nice to have the stepping stone version just be normal 3.x with a command line option. However, for the sanity of people maintaining 3.x, I think perhaps we don't want to do it.
Right, my initial negative reactions were mainly to the idea of having these kinds of capabilities in the mainline 3.x codebase (where we'd then have to support them for everyone, not just the folks that genuinely need them to help in migration from Python 2). The standard porting instructions currently assume code bases that are *mostly* bytes/unicode clean, with perhaps a few oversights where Python 3 rejects ambiguity that Python 2 tolerates. In that context, "run your test suite, address the test failures" should generally be sufficient, without needing to use a custom Python build. However, there are a couple of cases those standard instructions still don't cover: - if there's no test suite, exploratory discovery is problematic when the app falls over at the first type ambiguity - even if there is a test suite, sufficiently pervasive type ambiguity may make it difficult to use for fault isolation That's where I now agree your proposal for a variant build specifically aimed at compatibility testing is potentially interesting: - the tool would become an escalation path for folks that aren't in a position to use their own test suite to isolate type ambiguity problems under Python 3 - using Python 3 as a basis means you get a clean standard library that shouldn't emit any false alarms - the necessary feature set is defined by the common subset of Python 2.7 and a chosen minimum Python 3 version, not any future 3.x release, so you should be able to maintain the changes as a stable patch set without needing to chase CPython trunk (with the attendant risk of merge conflicts) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Nick Coghlan writes:
- even if there is a test suite, sufficiently pervasive [str/bytes] type ambiguity may make it difficult to use for fault isolation
Difficult yes, but I would argue that that difficuly is inherent[1]. Ie, if it's pervasive, the fault should be isolated to the whole module. Such a fault *will* regress, often in the exact same place, but if not there, elsewhere due to the same ambiguity. That was my experience in both GNU Emacs and Mailman. In GNU Emacs's case there's a paired, much more successful (in respect of encoding problems) experience with XEmacs to compare.[2] We'll see how things go in Mailman 3 (which uses a nearly completely rewritten email package), but I'll bet the experience there is even more successful.[3] If you're looking for a band-aid that will get you back running asap, then you're better off bisecting the change history than going through a slew of warnings one-by-one, as a recent error is likely due to a recent change. If Neil still wants to go ahead, more power to him. I don't know everything. It's just that my experience in this area is sufficiently extensive and sufficiently bad that it's worth repeating (just this once!) Footnotes: [1] Or as Brooks would have said, "of the essence". [2] GNU Emacs has a multilingualization specialist in Ken Handa whose day job is writing multiligualization libraries, so their encoding detection, accuracy of implementation, and codec coverage is and always was better than XEmacs's. I'm referring here to internal bugs in the Lisp primitives dealing with text, as well as the difficulty of writing applications that handled both internal text and external bytes without confusing them. [3] Though not strictly comparable to the XEmacs experience, due to (1) being a second implementation, not a parallel implementation, and (2) the Internet environment being much more standard conformant, even in email, these days.
participants (15)
-
Barry Warsaw
-
Brett Cannon
-
Cody Piersall
-
Ethan Furman
-
Fred Drake
-
Greg Ewing
-
Guido van Rossum
-
Mark Lawrence
-
Neil Schemenauer
-
Nick Coghlan
-
Oleg Broytman
-
Paul Moore
-
Ryan Gonzalez
-
Stephen J. Turnbull
-
Victor Stinner