[Python-Dev] byteformat() proposal: please critique
Terry Reedy
tjreedy at udel.edu
Sun Jan 12 02:20:28 CET 2014
The following function interpolates bytes, bytearrays, and formatted
strings, the latter two auto-converted to bytes, into a bytes (or
auto-converted bytearray) format. This function automates much of what
some people have recommended for combining ascii text and binary blogs.
The test passes on 2.7.6 as well as 3.3.3, though a 2.7-only version
would be simpler.
===============
# bf.py -- Terry Jan Reedy, 2014 Jan 11
"Define byteformat(): a bytes version of str.format as a function."
import re
def byteformat(form, obs):
'''Return bytes-formated objects interpolated into bytes format.
The bytes or bytearray format has two types of replacement fields.
b'{}' and b'{:}': The object can be any raw bytes or bytearray object.
b'{:<format_spec>}: The object can by any object ob that can be
string-formated with <format_spec>. Bytearray are converted to bytes.
The text encoding is the default (encoding="utf-8", errors="strict").
Users should be explicitly encode to bytes for any other encoding.
The struct module can by used to produce bytes, such as binary-formated
integers, that are not encoded text.
Test passes on both 2.7.6 and 3.3.3.
'''
if isinstance(form, bytearray):
form = bytes(form)
fields = re.split(b'{:?([^}]*)}', form)
# print(fields)
if len(fields) != 2*len(obs)+1:
raise ValueError('Number of replacement fields not same as
len(obs)')
j = 1 # index into fields
for ob in obs:
if isinstance(ob, bytearray):
ob = bytes(ob)
field = fields[j]
fields[j] = format(ob, field.decode()).encode() if field else ob
j += 2
return b''.join(fields)
# test code
bformat = b"bytes: {}; bytearray: {:}; unicode: {:s}; int: {:5d}; float:
{:7.2f}; end"
objects = (b'abc', bytearray(b'def'), u'ghi', 123, 12.3)
result = byteformat(bformat, objects)
result2 = byteformat(bytearray(bformat), objects)
strings = (ob.decode() if isinstance(ob, (bytes, bytearray)) else ob
for ob in objects)
expect = bformat.decode().format(*strings).encode()
#print(result)
#print(result2)
print(expect)
assert result == result2 == expect
=====
This has been edited from what I posted to issue 3982 to expand the
docstrings and to work the same with both bytes and bytearrays on both
2.7 and 3.3. When I posted before, I though of it merely as a
proof-of-concept prototype. After reading the seemingly endless
discussion of possible variations of byte formatting with % and .format,
I now present it as a real, concrete, proposal.
There are, of course, details that could be tweaked. The encoding uses
the default, which on 3.x is (encoding='utf-8', errors='strict'). This
could be changed to an explicit encoding='ascii'. If that were done, the
encoding could be made a parameter that defaults to 'ascii'. The joiner
could be defined as type(form)() so the output type matches the input
form type. I did not do that because it complicates the test.
The coercion of interpolated bytearray objects to bytes is needed for
2.7 because in 2.7, str/bytes.join raises TypeError for bytearrays in
the input sequence. A 3.x-only version could drop this.
One objection to the function is that it is neither % or .format. To me,
this is an advantage in that a new function will not be expected to
exactly match the % or .format behavior in either 2.x or 3.x. It
eliminates the 'matching the old' arguments so we can focus on what
actual functionality is needed. There is no need to convert true binary
bytes to text with either latin-1 or surrogates. There is no need to add
anything to bytes. The code above uses the built-in facilities that we
already have, which to me should be the first thing to try, not the last.
One new feature that does not match old behavior is that {} and {:} are
changed (in 3.x) to indicate bytes whereas {:s} continues to indicate
(in 3.x) unicode text. ({:s} might be changed to mean unicode for 2.7
also, but I did not explore that idea.) Similarly, a new function is
free to borrow only the format_spec part of replace of replacement
fields and use format(ob, format_spec) to format each object. Anyone who
needs the full power of str.format is free to use it explicitly. I think
format_specs cover most of what people have asked for.
For future releases, the function could go in the string module. It
could otherwise be added to existing or future 2&3 porting packages.
--
Terry Jan Reedy
More information about the Python-Dev
mailing list