Trying to focus the whole bytes/str formatting discussion

From where I'm sitting, the EIBTI group and their PEP 460 proposal from Antoine (and no longer Victor) are not controversial. Everyone seems to agree that PEP 460 **at minimum** is acceptable and should happen for Python 3.5. The people with the uphill battle and something to prove are
I don't know about the rest of you but I feel like the discussion is heading off the rails (if it hasn't already jumped the tracks). Let's try to bring this back around to something actionable which people can focus their energy on as the amount of developer time spent arguing could have led to several coded-up solutions. I see it as a practicality-beats-purity vs. explicit-is-better-than-implicit. The PBP group want bytes.format() (just assume I include interpolation support if you want that) to work as close to a drop-in replacement for current str.format() use in Python 2 to ease porting. The argument is that code looks cleaner and the amount of changes in Python 2 code being ported to Python 3 is much smaller. THE EIBTI group are willing to support PEP 460 but beyond that don't want to have in Python itself anything for bytes.format() which takes in a string and spits out bytes. It's bytes in->bytes out and not bytes & str in->bytes out as the PBP group is after. The EIBTI group are arguing that letting str into bytes.format() and then automatically be converted to strict ASCII leads to conflating the text/bytes divide as well as being too magical, e.g. what if you actually wanted UTF-16 for you number string instead of ASCII; the EIBTI group **wants** to force people to make a decision. They are also less concerned with making users update Python 2 code to handle this as it already needs to be updated for other Python 3 things anyway. those arguing for str in->bytes out support in bytes.format(). The added features that the PBP group want are the ones being argued over. As the onus is on the PBP group to convince the EIBTI group (or Guido), I think the PBP group should code up a solution that does what they want and put it on PyPI to see what the community thinks. If the PBP group wants to convince the EIBTI group that str in->bytes out for bytes.format() is critical in getting a key group of users to start using Python 3 then I think that needs to be demonstrated through real-world usage by some people. If there is serious pickup of the solution from PyPI by projects then we can discuss integrating it into Python 3.5. That gives at least **one year** to come up with a solution which gets picked up by the community (standard requirement for stdlib inclusion). At worst some projects use the PyPI project and find it useful but it doesn't go into Python 3.5. At best lots of people find it useful enough that we add it to Python 3.5. But regardless, a PyPI project helps people **no matter what** the EIBTI group thinks. That's more forward momentum than this conversation currently has. This has split down philosophical lines and does not look to be tilting one way or the other by simply using words. I think it has reached the point that showing code is going to be the only way to tilt the favour towards the PBP group at this point. Guido has not spoken up so either he is ignoring it because he's busy, he doesn't care, or he's mulling things over still. Assuming he doesn't speak up then it comes down to getting a clear majority on the side of the PBP group and that is not going to happen the way this discussion is going. So, action items are: * Get PEP 460 pronounced on **as is** * A PyPI project containing PBP ideas and see if the community seizes on it or not (benefit to people regardless) * Do a separate PEP that builds on PEP 460 if people really want to continue down that road at this time Don't forget, we are talking about Python 3.5; we have not even hit Python 3.4rc1 yet so this level of arguing seems a bit premature and going nowhere.

On 12Jan2014 17:46, Brett Cannon <brett@python.org> wrote:
THE EIBTI group are willing to support PEP 460 but beyond that don't want to have in Python itself anything for bytes.format() which takes in a string and spits out bytes. It's bytes in->bytes out and not bytes & str in->bytes out as the PBP group is after. The EIBTI group are arguing that letting str into bytes.format() and then automatically be converted to strict ASCII leads to conflating the text/bytes divide as well as being too magical, e.g. what if you actually wanted UTF-16 for you number string instead of ASCII; the EIBTI group **wants** to force people to make a decision. They are also less concerned with making users update Python 2 code to handle this as it already needs to be updated for other Python 3 things anyway. [...]
I'm in the EIBTI on the whole, but I would also be happy for the bytes.format() function to accept strings (and floats or whatever the str.format supports) _provided_ it required an explicit encoding= parameter to enable it. i.e. make it easy to use, _but_ require an overt specification of the str->bytes encoding. You don't even need a special mode, but have it raise a ValueError if the (default) encoding is None when an encoding became needed. Just my 2c on Brett's EIBTI vs PBP divide. I'll try to stay off this thread now and bikeshed only in the others... -- Cameron Simpson <cs@zip.com.au> You can blip it twice to clear the bore, But blip it thrice, and you've sinned once more. - Tom Warner <tom@dfind.demon.co.uk>

Sorry, I started my own "PEP 460 reboot" thread -- I wrote that message before yours arrived, even if maybe I posted after you. I'm in the PBP camp myself for this. I won't pronounce on PEP 460 as-is. Please follow up in the other thread if you need clarifications. On Sun, Jan 12, 2014 at 2:46 PM, Brett Cannon <brett@python.org> wrote:
I don't know about the rest of you but I feel like the discussion is heading off the rails (if it hasn't already jumped the tracks). Let's try to bring this back around to something actionable which people can focus their energy on as the amount of developer time spent arguing could have led to several coded-up solutions.
I see it as a practicality-beats-purity vs. explicit-is-better-than-implicit. The PBP group want bytes.format() (just assume I include interpolation support if you want that) to work as close to a drop-in replacement for current str.format() use in Python 2 to ease porting. The argument is that code looks cleaner and the amount of changes in Python 2 code being ported to Python 3 is much smaller.
THE EIBTI group are willing to support PEP 460 but beyond that don't want to have in Python itself anything for bytes.format() which takes in a string and spits out bytes. It's bytes in->bytes out and not bytes & str in->bytes out as the PBP group is after. The EIBTI group are arguing that letting str into bytes.format() and then automatically be converted to strict ASCII leads to conflating the text/bytes divide as well as being too magical, e.g. what if you actually wanted UTF-16 for you number string instead of ASCII; the EIBTI group **wants** to force people to make a decision. They are also less concerned with making users update Python 2 code to handle this as it already needs to be updated for other Python 3 things anyway.
From where I'm sitting, the EIBTI group and their PEP 460 proposal from Antoine (and no longer Victor) are not controversial. Everyone seems to agree that PEP 460 **at minimum** is acceptable and should happen for Python 3.5. The people with the uphill battle and something to prove are those arguing for str in->bytes out support in bytes.format(). The added features that the PBP group want are the ones being argued over.
As the onus is on the PBP group to convince the EIBTI group (or Guido), I think the PBP group should code up a solution that does what they want and put it on PyPI to see what the community thinks. If the PBP group wants to convince the EIBTI group that str in->bytes out for bytes.format() is critical in getting a key group of users to start using Python 3 then I think that needs to be demonstrated through real-world usage by some people.
If there is serious pickup of the solution from PyPI by projects then we can discuss integrating it into Python 3.5. That gives at least **one year** to come up with a solution which gets picked up by the community (standard requirement for stdlib inclusion). At worst some projects use the PyPI project and find it useful but it doesn't go into Python 3.5. At best lots of people find it useful enough that we add it to Python 3.5. But regardless, a PyPI project helps people **no matter what** the EIBTI group thinks. That's more forward momentum than this conversation currently has.
This has split down philosophical lines and does not look to be tilting one way or the other by simply using words. I think it has reached the point that showing code is going to be the only way to tilt the favour towards the PBP group at this point. Guido has not spoken up so either he is ignoring it because he's busy, he doesn't care, or he's mulling things over still. Assuming he doesn't speak up then it comes down to getting a clear majority on the side of the PBP group and that is not going to happen the way this discussion is going.
So, action items are:
* Get PEP 460 pronounced on **as is** * A PyPI project containing PBP ideas and see if the community seizes on it or not (benefit to people regardless) * Do a separate PEP that builds on PEP 460 if people really want to continue down that road at this time
Don't forget, we are talking about Python 3.5; we have not even hit Python 3.4rc1 yet so this level of arguing seems a bit premature and going nowhere.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)

On 13 January 2014 08:46, Brett Cannon <brett@python.org> wrote:
I don't know about the rest of you but I feel like the discussion is heading off the rails (if it hasn't already jumped the tracks). Let's try to bring this back around to something actionable which people can focus their energy on as the amount of developer time spent arguing could have led to several coded-up solutions.
I see it as a practicality-beats-purity vs. explicit-is-better-than-implicit. The PBP group want bytes.format() (just assume I include interpolation support if you want that) to work as close to a drop-in replacement for current str.format() use in Python 2 to ease porting. The argument is that code looks cleaner and the amount of changes in Python 2 code being ported to Python 3 is much smaller.
THE EIBTI group are willing to support PEP 460 but beyond that don't want to have in Python itself anything for bytes.format() which takes in a string and spits out bytes. It's bytes in->bytes out and not bytes & str in->bytes out as the PBP group is after. The EIBTI group are arguing that letting str into bytes.format() and then automatically be converted to strict ASCII leads to conflating the text/bytes divide as well as being too magical, e.g. what if you actually wanted UTF-16 for you number string instead of ASCII; the EIBTI group **wants** to force people to make a decision. They are also less concerned with making users update Python 2 code to handle this as it already needs to be updated for other Python 3 things anyway.
From where I'm sitting, the EIBTI group and their PEP 460 proposal from Antoine (and no longer Victor) are not controversial. Everyone seems to agree that PEP 460 **at minimum** is acceptable and should happen for Python 3.5. The people with the uphill battle and something to prove are those arguing for str in->bytes out support in bytes.format(). The added features that the PBP group want are the ones being argued over.
As the onus is on the PBP group to convince the EIBTI group (or Guido), I think the PBP group should code up a solution that does what they want and put it on PyPI to see what the community thinks. If the PBP group wants to convince the EIBTI group that str in->bytes out for bytes.format() is critical in getting a key group of users to start using Python 3 then I think that needs to be demonstrated through real-world usage by some people.
Note that I am now fine with Guido's more lenient proposal *so long as* explicitly bytes-only formatb and formatb_map methods are also included. That would give us the following situation in 3.5: Text interpolation: str.__mod__, str.format, str.format_map ASCII compatible interpolation: bytes.__mod__, bytes.format, bytes.format_map Arbitrary binary interpolation: bytes.formatb, bytes.formatb_map Those are all reasonable operations for the language to support natively, and by providing convenient access to all three, we avoid the attractive nuisance that would be created by providing *only* ASCII interpolation without providing strict binary interpolation (since people would inevitably use the former when they should really be using the latter, because interpolation is such a convenient construct), while still addressing the interests of both groups (people like me and Antoine that like PEP 460 as it stands, as well as those that favour the ASCII encoding features). It's only the introduction of ASCII compatible interpolation support *without* binary interpolation support that I am adamantly opposed to - that's the kind of attractive nuisance that leads to people inappropriately using ASCII compatible only APIs and then discovering that their code breaks when confronted with ASCII incompatible encodings like UTF-16, ShiftJIS and ISO-2022. Originally I was opposed to the idea entirely, but then Antoine wrote the binary only version of PEP 460 and I found it to be a *very* elegant solution that didn't compromise the Python 3 text model. As long as this pure API remains available in some form (such as formatb and formatb_map methods), then I'm OK with the ASCII only version existing in parallel - at that point, it *is* analogous to all the other existing bytes methods that assume the use of ASCII compatible data. ** The caveat ** However, note that there were *two* significant issues that were raised in the recent broader discussions. PEP 460 only tackles the more tractable of the two: the fact that Twisted and Mercurial both consider bytes.__mod__ support a blocker for switching to Python 3. That's a useful discussion to have, but it's important for people to realise that the mod-formatting feature is utterly irrelevant to the concerns Armin Ronacher raised in http://lucumr.pocoo.org/2014/1/5/unicode-in-2-and-3/ that kicked off this whole recent spate of interest in the topic. Obviously, I disagree with his conclusions (and personally wish Python 2 Unicode experts would show a little more humility in trying to understand the core team's motivations for Python 3 design decisions rather than assuming that we're clueless idiots that decided to maintain 4 parallel branches in Subversion for a couple of years just because we thought it might be fun), but I can certainly understand his pain. I'm the one who actually *made* the changes to restore dual bytes/unicode support in urllib.parse for Python 3 (one of Armin's favourite examples of the difficulty of writing that kind of code using the Python 3 text model), and I agree entirely with Armin's assessment of that code: it isn't pretty, and it wasn't fun to write. Yes, I got it to work, and yes, it was satisfying when the tests finally based, and yes there is now a smaller number of cases where errors will pass silently, but that's far from the same thing as finding the process of getting there a pleasant one, or considering the result an elegant approach to porting hybrid APIs from Python 2 such that bytes in = bytes out and str in = str out. The only difference between Armin and myself in this respect is that I know the reasons for the changes the text model, and I think the increased difficulty in implementing that particular use case was worth it, given the pay-off in finally being able to remove the implicit encoding and decoding operations from the text model (Note that the unicode input handling in urlparse in Python 2 breaks entirely if you turn off implicit decoding. You can still get hits from the cache, but if you have to actually parse anything, it will fail: http://python-notes.curiousefficiency.org/en/latest/python3/binary_protocols...). The fact remains, however, that in Python 2 the code you need for that kind of hybrid API was *easy* to write - you just made all your internal constants 8-bit strings, and the implicit decoding to Unicode took care of the case of str inputs. There are still valid use cases for such hybrid APIs, even in Python 3 (urllib.parse is one of them), and the reason I helped Benno start the asciicompat project (https://github.com/jeamland/asciicompat) is because I want to make that kind of code almost as effortless as it was in Python 2 - all you should need to do is make your constants asciistr instances rather than builtin bytes or str objects. My ambition here is not "good enough to get people to stop complaining", it's "there's no actual reason Python 3 needs to be worse at this than Python 2, it just doesn't need to be part of the core builtin types, because we're in a better position to fix interoperability issues now that we don't have to deal with the close coupling between str and unicode that existed in Python 2, and the bytes type will generally play nice with anything that exposes the PEP 3118 buffer interface". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (4)
-
Brett Cannon
-
Cameron Simpson
-
Guido van Rossum
-
Nick Coghlan