[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5

M.-A. Lemburg mal at egenix.com
Sat Jan 11 16:15:35 CET 2014

On 11.01.2014 14:54, Georg Brandl wrote:
> Am 11.01.2014 14:49, schrieb Georg Brandl:
>> Am 11.01.2014 10:44, schrieb Stephen Hansen:
>>> I mean, its not like the "bytes" type lacks knowledge of the subset of bytes
>>> that happen to be 7-bit ascii-compatible and can't perform text-ish operations
>>> on them--
>>>   Python 3.3.3 (v3.3.3:c3896275c0f6, Nov 18 2013, 21:18:40) [MSC v.1600 32 bit
>>> (Intel)] on win32
>>>   Type "help", "copyright", "credits" or "license" for more information.
>>>   >>> b"stephen hansen".title()
>>>   b'Stephen Hansen'
>>> How is this not a practical recognition that yes, while bytes are byte streams
>>> and not text, a huge subset of bytes are text-y, and as long as we maintain the
>>> barrier between higher characters and implicit conversion therein, we're fine?
>>> I don't see the difference here. There is a very real, practical need to
>>> interpolate bytes. This very real, practical need includes the very real
>>> recognition that converting 12345 to b'12345' is not something weird, unusual,
>>> and subject to the thorny issues of Encodings. It is not violating the doctrine
>>> of separation of powers between Text and Bytes.
>> This. Exactly. Thanks for putting it so nicely, Stephen.
> To elaborate: if the bytes type didn't have all this ASCII-aware functionality
> already, I think we would have (and be using) a dedicated "asciistr" type right
> now.  But it has the functionality, and it's way too late to remove it.

I think we need to step back a little from the purist view
of things and give more emphasis on the "practicality beats
purity" Zen.

I complete agree with Stephen, that bytes are in fact often
an encoding of text. If that text is ASCII compatible, I don't
see any reason why we should not continue to expose the C lib
standard string APIs available for text manipulations on bytes.

We don't have to be pedantic about the bytes/text separation.
It doesn't help in real life.

If you give programmers the choice they will - most of the time -
do the right thing. If you don't give them the tools, they'll
work around the missing features in a gazillion different
ways of which many will probably miss a few edge cases.

bytes already have most of the 8-bit string methods from Python 2,
so it doesn't hurt adding some more of the missing features
from Python 2 on top to make life easier for people dealing
with multiple/unknown encoding data.

BTW: I don't know why so many people keep asking for use cases.
Isn't it obvious that text data without known (but ASCII compatible)
encoding or multiple different encodings in a single data chunk
is part of life ? Most HTTP packets fall into this category,
many email messages as well. And let's not forget that we don't
live in a perfect world. Broken encodings are everywhere around
you - just have a look at your spam folder for a decent chunk
of example data :-)

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Jan 11 2014)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

More information about the Python-Dev mailing list