On 2014-01-17, Andrew Barnert wrote:
What exactly do you mean by "the bytes/unicode changes"?
I mean all of those things that you listed. bytes = the Python 2.7 str object, str object is Python 2.7 unicode object.
I suspect that, whatever your exact answers, it would be a lot easier to fork 3.4 and port the 2.7 behavior you want than to fork 2.7 and backport almost all of 3.4.
It's a lot of work no matter which way you do it. That's one of the biggest problems with this idea.
And if you do it that way, you could even adapt the idea someone proposed a few weeks ago—not popular on this list, but maybe popular with your target audience—of turning each change on and off with a "from __past__ import misfeature" statement, so people could pick and choose the ones they need, and gradually remove past statements as they port from your forked 2.8 to real 3.4.
You can't make those changes with __future__/__past__ imports. They effect the whole runtime, not single module.
However, I also suspect that, whatever your exact answers, it won't be that useful. Look at people's reasons for not moving to 3.x:
Imagine I'm a developer with the Python 2.x codebase. I'm either lazy or I'm too busy with other company projects that I can't put the effort into porting my 2.x code to 3.x, even if all the 3rd party libraries have been ported.
How can we make it easier for them to move their code towards Python 3.x rather than keeping it as 2.x? A maintained interpreter to run Python 2.x code is going to continue to exist. Some python-dev people seem to suggest we can suggest that end of maintenance of Python 2.7 is going to force people to port their code. That's ridiculous.
I want to make it more attractive for these developers to move towards Python 3 rather than stalling out on Python 2.7 forever. How best to do that is still to be determined. I think my 2.8 idea might be better than the status quo but it's just a crazy idea.
I'm having a hard time imagining code that would be easy to port to 2.8, but not to 3.x. For example:
payload = <some object with a __str__ method to serialize it> sock.sendall('Header: {}\r\nAnother: {}\r\n\r\n{}'.format( headers['header'], headers['another'], payload))
Even with just the two changes you already suggested: First, you have to change the literal to a bytes literal.
That part is easy, could even be done with an automated tool (change u' to ' and ' to b').
More seriously, you have to rename that payload type's __str__ method to __bytes__.
Nope, no __bytes__ in my proposed 2.8.
And if it does any string stuff internally, like encoding JSON, that has to change. Meanwhile, your logging code probably relies on the same _str__ method actually returning a str, so you have to add one of those. Assuming headers is a dict of strs, you either need to go back up the chain (or into the API that provides it) and change that so it's been a dict of bytes all along, or you need to explicitly encode the headers here. That doesn't sound too hard overall… but that gives you working Python 3.5 code (assuming PEP 460 goes through). And there doesn't seem to be any shortcut that would give you working 2.8 code without also working in 3.5.
I think you are misunderstanding my proposal, no problems like the ones you suggest, bytes() would be the Python 2.7 str class. All the internal bytes/unicode internals would be like 2.7. That's basically the whole idea of this proposal, the bytes/str change in 3.x is the really disruptive one, separate it into separate interpreter versions to make porting easier.
Neil