[Python-ideas] Bytes formatting (was Re: Adding 'bytes' as alias for 'latin_1' codec)

MRAB python at mrabarnett.plus.com
Tue May 31 04:11:59 CEST 2011


On 30/05/2011 21:04, Terry Reedy wrote:
> Changing the subject to what it has actually become.
> PROPOSAL
>
> A bytes template uses b'{' and b'}' to mark interpolation fields and
> other ascii bytes within as needed. It uses the ascii equivalent of the
> string field_name spec. It does not have a conversion spec. The
> format_spec should have the minimum needed for existing public
> protocols. How much more is up for discussion. We need use cases.
>
> One possibility to keep in mind is that a bytes template could
> constructed by an ascii-compatible encoding of formatted text. Specs for
> bytes fields can be protected in a text template by doubling the braces.
>
>  >>> '{} {{byte-field-spec}}'.format(1).encode()
> b'1 {byte-field-spec}'
>
> A major issue is what to do with numbers. Sometimes they needed to be
> ascii encoded, sometime binary encoded. The baseline is to do nothing
> extra and require all args to be bytes. I think this may be appropriate
> for floats as they are seldom specifically used in protocols. I think
> the same may be true for ints with signs. So I think we mainly need to
> consider counts (unsigned ints) for possible exceptional processing.
>
> Option 0. As stated, no special number specs.
>
> Option 1. Use a subset of the current int spec to produce ascii
> encodings; use struct.pack for binary encodings. (How many of the
> current integer presentation types would be needed?)
>
> Option 2. Use an adaptation of the struct.pack mini-language to produce
> binary encodings; use encoded str.format for ascii encodings. (The
> latter might be done as part of a text-to-bytes-template process as
> indicated above.)
>
> Option 3. Combine options 1 and 2. This might best be done by replacing
> the omitted 'conversion' field with a 'number-encoding' field, b'!a' or
> b'!b', to indicate ascii or binary conversion and corresponding
> interpretation of the format spec. (In other words, do not try to
> combine the number to text and number to binary mini-languages, but add
> a 'prefix' to specify which is being used.)
>
Perhaps something like this:

# Format int as byte.
b"{:b}".format(128) returns b"\x80"

# Format int as double-byte.
b"{:2b}".format(0x100) returns b"\x00\x01" or b"\x01\x00"

# Format int as double-byte, little-endian.
b"{:<2b}".format(0x100) returns b"\x00\x01"

# Format int as double-byte, big-endian.
b"{:>2b}".format(0x100) returns b"\x01\x00"

# Format list of ints as signed bytes.
b"{:s}".format([1, -2, 3]) returns b"\x01\xFE\x03"

# Format list of ints as unsigned bytes.
b"{:u}".format([1, 254, 3]) returns b"\x01\xFE\x03"

# Format ASCII-only string as bytes.
b"{:a}".format("abc") returns b"abc"



More information about the Python-ideas mailing list