[Python-Dev] RFC: PEP 460: Add bytes % args and bytes.format(args) to Python 3.5

Nick Coghlan ncoghlan at gmail.com
Sat Jan 11 09:17:07 CET 2014


On 11 January 2014 08:58, Ethan Furman <ethan at stoneleaf.us> wrote:
> On 01/10/2014 02:42 PM, Antoine Pitrou wrote:
>>
>> On Fri, 10 Jan 2014 17:33:57 -0500
>> "Eric V. Smith" <eric at trueblade.com> wrote:
>>>
>>> On 1/10/2014 5:29 PM, Antoine Pitrou wrote:
>>>>
>>>> On Fri, 10 Jan 2014 12:56:19 -0500
>>>> "Eric V. Smith" <eric at trueblade.com> wrote:
>>>>>
>>>>>
>>>>> I agree. I don't see any reason to exclude int and float. See Guido's
>>>>> messages http://bugs.python.org/issue3982#msg180423 and
>>>>> http://bugs.python.org/issue3982#msg180430 for some justification and
>>>>> discussion.
>>>>
>>>>
>>>> If you are representing int and float, you're really formatting a text
>>>> message, not bytes. Basically if you allow the formatting of int and
>>>> float instances, there's no reason not to allow the formatting of
>>>> arbitrary objects through __str__. It doesn't make sense to
>>>> special-case those two types and nothing else.
>>>
>>>
>>> It might not for .format(), but I'm not convinced. But for %-formatting,
>>> str is already special-cased for these types.
>>
>>
>> That's not what I'm saying. str.__mod__ is able to represent all kinds
>> of types through %s and calling __str__. It doesn't make sense for
>> bytes.__mod__ to only support int and float. Why only them?
>
>
> Because embedding the ASCII equivalent of ints and floats in byte streams is
> a common operation?

It's emphatically *NOT* a binary interpolation operation though - the
binary representation of the integer 1 is the byte value 1, not the
byte value 49. If you want the byte value 49 to appear in the stream,
then you need to interpolate the *ASCII encoding* of the string "1",
not the integer 1.

If you want to manipulate text representations, do it in the text
domain. If you want to manipulate binary representations, do it in the
binary domain. The *whole point* of the text model change in Python 3
is to force programmers to *decide* which domain they're operating in
at any given point in time - while the approach of blurring the
boundaries between the two can be convenient for wire protocol and
file format manipulation, it is a horrendous bug magnet everywhere
else.

PEP 360 is just about adding back some missing functionality in the
binary domain (interpolating binary sequences together), not about
bringing back the problematic text model that allows particular text
representations to be interpreted as if they were also binary data.

That said, I actually think there's a valid use case for a Python 3
type that allows the bytes/text boundary to be blurred in making it
easier to port certain kinds of Python 2 code to Python 3
(specifically, working with wire protocols and file formats that
contain a mixture of encodings, but all encodings are *known* to at
least be ASCII compatible). It is highly unlikely that such a type
will *ever* be part of the standard library, though - idiomatic Python
3 code shouldn't need it, affected Python 2 code *can* be ported
without it (but may look more complicated due to the use of explicit
decoding and encoding operations, rather than relying on implicit
ones), and it should be entirely possible to implement it as an
extension module (modulo one bug in CPython that may impact the
approach, but we won't know for sure until people actually try it
out).

Fortunately, after years of my suggesting the idea to almost everyone
that complained about the move away from the broken POSIX text model
in Python 3, Benno Rice has started experimenting with such a type
based on a preliminary test case I wrote at linux.conf.au last week:
https://github.com/jeamland/asciicompat/blob/master/tests/ncoghlan.py

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list