[Python-Dev] byteformat() proposal: please critique

Sun Jan 12 02:20:28 CET 2014

The following function interpolates bytes, bytearrays, and formatted 
strings, the latter two auto-converted to bytes, into a bytes (or 
auto-converted bytearray) format. This function automates much of what 
some people have recommended for combining ascii text and binary blogs. 
The test passes on 2.7.6 as well as 3.3.3, though a 2.7-only version 
would be simpler.
===============

# bf.py -- Terry Jan Reedy, 2014 Jan 11
"Define byteformat(): a bytes version of str.format as a function."
import re

def byteformat(form, obs):
     '''Return bytes-formated objects interpolated into bytes format.

     The bytes or bytearray format has two types of replacement fields.
     b'{}' and b'{:}': The object can be any raw bytes or bytearray object.
     b'{:<format_spec>}: The object can by any object ob that can be
     string-formated with <format_spec>. Bytearray are converted to bytes.

     The text encoding is the default (encoding="utf-8", errors="strict").
     Users should be explicitly encode to bytes for any other encoding.
     The struct module can by used to produce bytes, such as binary-formated
     integers, that are not encoded text.

     Test passes on both 2.7.6 and 3.3.3.
     '''

     if isinstance(form, bytearray):
         form = bytes(form)
     fields = re.split(b'{:?([^}]*)}', form)
     # print(fields)
     if len(fields) != 2*len(obs)+1:
         raise ValueError('Number of replacement fields not same as 
len(obs)')
     j = 1 # index into fields
     for ob in obs:
         if isinstance(ob, bytearray):
             ob = bytes(ob)
         field = fields[j]
         fields[j] = format(ob, field.decode()).encode() if field else ob
         j += 2
     return b''.join(fields)

# test code
bformat = b"bytes: {}; bytearray: {:}; unicode: {:s}; int: {:5d}; float: 
{:7.2f}; end"
objects = (b'abc', bytearray(b'def'), u'ghi', 123, 12.3)
result = byteformat(bformat, objects)
result2 = byteformat(bytearray(bformat), objects)
strings = (ob.decode()  if isinstance(ob, (bytes, bytearray)) else ob
                for ob in objects)
expect = bformat.decode().format(*strings).encode()

#print(result)
#print(result2)
print(expect)
assert result == result2 == expect

=====
This has been edited from what I posted to issue 3982 to expand the 
docstrings and to work the same with both bytes and bytearrays on both 
2.7 and 3.3. When I posted before, I though of it merely as a 
proof-of-concept prototype. After reading the seemingly endless 
discussion of possible variations of byte formatting with % and .format, 
I now present it as a real, concrete, proposal.

There are, of course, details that could be tweaked. The encoding uses 
the default, which on 3.x is (encoding='utf-8', errors='strict').  This 
could be changed to an explicit encoding='ascii'. If that were done, the 
encoding could be made a parameter that defaults to 'ascii'. The joiner 
could be defined as type(form)() so the output type matches the input 
form type. I did not do that because it complicates the test.

The coercion of interpolated bytearray objects to bytes is needed for 
2.7 because in 2.7, str/bytes.join raises TypeError for bytearrays in 
the input sequence. A 3.x-only version could drop this.

One objection to the function is that it is neither % or .format. To me, 
this is an advantage in that a new function will not be expected to 
exactly match the % or .format behavior in either 2.x or 3.x. It 
eliminates the 'matching the old' arguments so we can focus on what 
actual functionality is needed. There is no need to convert true binary 
bytes to text with either latin-1 or surrogates. There is no need to add 
anything to bytes. The code above uses the built-in facilities that we 
already have, which to me should be the first thing to try, not the last.

One new feature that does not match old behavior is that {} and {:} are 
changed (in 3.x) to indicate bytes whereas {:s} continues to indicate 
(in 3.x) unicode text. ({:s} might be changed to mean unicode for 2.7 
also, but I did not explore that idea.) Similarly, a new function is 
free to borrow only the format_spec part of replace of replacement 
fields and use format(ob, format_spec) to format each object. Anyone who 
needs the full power of str.format is free to use it explicitly. I think 
format_specs cover most of what people have asked for.

For future releases, the function could go in the string module. It 
could otherwise be added to existing or future 2&3 porting packages.

-- 
Terry Jan Reedy