[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5

Sat Jan 11 17:33:17 CET 2014

On 11.01.2014 16:34, Nick Coghlan wrote:
> On 12 January 2014 01:15, M.-A. Lemburg <mal at egenix.com> wrote:
>> On 11.01.2014 14:54, Georg Brandl wrote:
>>> Am 11.01.2014 14:49, schrieb Georg Brandl:
>>>> Am 11.01.2014 10:44, schrieb Stephen Hansen:
>>>>
>>>>> I mean, its not like the "bytes" type lacks knowledge of the subset of bytes
>>>>> that happen to be 7-bit ascii-compatible and can't perform text-ish operations
>>>>> on them--
>>>>>
>>>>>   Python 3.3.3 (v3.3.3:c3896275c0f6, Nov 18 2013, 21:18:40) [MSC v.1600 32 bit
>>>>> (Intel)] on win32
>>>>>   Type "help", "copyright", "credits" or "license" for more information.
>>>>>   >>> b"stephen hansen".title()
>>>>>   b'Stephen Hansen'
>>>>>
>>>>> How is this not a practical recognition that yes, while bytes are byte streams
>>>>> and not text, a huge subset of bytes are text-y, and as long as we maintain the
>>>>> barrier between higher characters and implicit conversion therein, we're fine?
>>>>>
>>>>> I don't see the difference here. There is a very real, practical need to
>>>>> interpolate bytes. This very real, practical need includes the very real
>>>>> recognition that converting 12345 to b'12345' is not something weird, unusual,
>>>>> and subject to the thorny issues of Encodings. It is not violating the doctrine
>>>>> of separation of powers between Text and Bytes.
>>>>
>>>> This. Exactly. Thanks for putting it so nicely, Stephen.
>>>
>>> To elaborate: if the bytes type didn't have all this ASCII-aware functionality
>>> already, I think we would have (and be using) a dedicated "asciistr" type right
>>> now.  But it has the functionality, and it's way too late to remove it.
>>
>> I think we need to step back a little from the purist view
>> of things and give more emphasis on the "practicality beats
>> purity" Zen.
>>
>> I complete agree with Stephen, that bytes are in fact often
>> an encoding of text. If that text is ASCII compatible, I don't
>> see any reason why we should not continue to expose the C lib
>> standard string APIs available for text manipulations on bytes.
>>
>> We don't have to be pedantic about the bytes/text separation.
>> It doesn't help in real life.
> 
> Yes, it bloody well does. The number of people who have told me that
> using Python 3 is what allowed them to finally understand how Unicode
> works vastly exceeds the number of wire protocol and file format devs
> that have complained about working with binary formats being
> significantly less tolerant of the "it's really like ASCII text"
> mindset.
> 
> We are NOT going back to the confusing incoherent mess that is the
> Python 2 model of bolting Unicode onto the side of POSIX:
> http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html#what-actually-changed-in-the-text-model-between-python-2-and-python-3
> 
> While that was an *expedient* (and, in fact, necessary) solution at
> the time, the fact it is still thoroughly confusing people 13 years
> later shows it is not a *comprehensible* solution.

FWIW: I quite liked the Python 2 model, but perhaps that's because
I already knww how Unicode works, so could use it to make my
life easier ;-)

Seriously, Unicode has always caused heated discussions and
I don't expect this to change in the next 5-10 years.

The point is: there is no 100% perfect solution either way and
when you acknowledge this, things don't look black and white anymore,
but instead full of colors :-)

Python 3 forces people to actually use Unicode; in Python 2 they
could easily avoid it. It's good to educate people on how it's
used and the issues you can run into, but let's not forget
that people are trying to get work done and we all love readable
code.

PEP 460 just adds two more methods to the bytes object which come
in handy when formatting binary data; I don't think it has potential
to muddy the Python 3 text model, given that the bytes
object already exposes a dozen of other ASCII text methods :-)

>> If you give programmers the choice they will - most of the time -
>> do the right thing. If you don't give them the tools, they'll
>> work around the missing features in a gazillion different
>> ways of which many will probably miss a few edge cases.
>>
>> bytes already have most of the 8-bit string methods from Python 2,
>> so it doesn't hurt adding some more of the missing features
>> from Python 2 on top to make life easier for people dealing
>> with multiple/unknown encoding data.
> 
> Because people that aren't happy with the current bytes type
> persistently refuse to experiment with writing their own extension
> type to figure out what the API should look like. Jamming speculative
> API design into the core text model without experimenting in a third
> party extension first is a straight up stupid idea.
> 
> Anyone that is pushing for this should be checking out Benno's first
> draft experimental prototype for asciistr and be working on getting it
> passing the test suite I created:
> https://github.com/jeamland/asciicompat
> 
> The "Wah, you broke it and now I have completely forgotten how to
> create custom types, so I'm just going to piss and moan until somebody
> else fixes it" infantilism of the past five years in this regard has
> frankly pissed me off.

Ah, you see: we're entering heated discussions again :-)

asciistr is interesting in that it coerces to bytes instead
of to Unicode (as is the case in Python 2).

At the moment it doesn't cover the more common case bytes + str,
just str + bytes, but let's assume it would, then you'd write

...
headers += asciistr('Length: %i bytes\n' % 123)
headers += b'\n\n'
body = b'...'
socket.send(headers + body)
...

With PEP 460, you could write the above as:

...
headers += b'Length: %i bytes\n' % 123
headers += b'\n\n'
body = b'...'
socket.send(headers + body)
...

IMO, that's more readable.

Both variants essentially do the same thing: they implicitly
coerce ASCII text strings to bytes, so conceptually, there's
little difference.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 11 2014)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/