[Python-Dev] PEP 460 reboot

Tue Jan 14 03:25:50 CET 2014

On 1/13/2014 4:32 PM, Guido van Rossum wrote:

 > I will doggedly keep posting to this thread rather than creating more 
threads.

Please permit to to doggedly keep pointing you toward the possible 
solution I posted on the tracker last October.

> But formatb() feels absurd to me. PEP 460 has neither a precise
> specification or any actual examples, so I can't tell whether the

Two days ago, I reposted byteformat() here on pydev with a precise text 
specification added to the code, and with an expanded test example. I 
have just added another example based on your question below.

> intention is that the format string can *only* contain {...} sequences
> or whether it can also contain "regular" characters. Translating to
> formatb(), my question comes down to the legality of the following
> example:
>
>    b'Hello, {}'.formatb(name)  # Where name is some bytes object
>
> If this is allowed, it reintroduces the ASCII bias (since the
> substring 'Hello' is clearly ASCII).

Since byteformat() uses re to find {<format-spec>} replacement fields, 
it only has such ascii bias as re has, which I believe is not much, if 
any. As far as re and byteformat are concerned, everything outside of 
the {...} fields is uninterpreted bytes. As far as bytes.join is 
concerned, both joiner and joined are uninterpreted bytes.

 >>> byteformat(b'\x00{}\x02{}def', (b'\x01', b'abc',))
b'\x00\x01\x02abcdef'

re.split produces [b'\x00', b'', b'\x02', b'', b'def']. The only ascii 
bias is the one already present is the representation of bytes, and the 
fact that Python code must have an ascii-compatible encoding.

The advantage of
byteformat(b'\x00{}\x02{}def', (b'\x01', b'abc',))
over directly writing
b''.join([b'\x00', b'\x01', b'\x02', b'abc', b'def']
is that one does not have to manually split the presumably constant 
template into chunks and interleave them with the presumable variable 
chunks.

Here is the example that I used for testing, including non-blank format 
specs.

bformat = b"bytes: {}; bytearray: {:}; unicode: {:s}; int: {:5d}; float: 
{:7.2f}; end"
objects = (b'abc', bytearray(b'def'), u'ghi', 123, 12.3)
result = byteformat(bformat, objects)
 >>>
b'bytes: abc; bytearray: def; unicode: ghi; int:   123; float:   12.30; end'

The additional advantage here is the automatic encoding of formatted 
strings to bytes. As posted, byteformat() uses the str.encode defaults 
(encoding='utf-8', errors='strict'). But as I said in the post, these 
could become parameters to the function that are passed on to str.encode.

The design reuses re.split, bytes.join, format, and the format 
specification. By re-using the format-spec as is, the only new thing to 
learn is that blank specs correspond to bytes instead of strings. This 
is easier to design, implement, and learn than if the format-spec is 
limited to disallow some things (after much bike-shedding over what to 
eliminate ;-).

I would appreciate your comment on this proposal.

-- 
Terry Jan Reedy