Changing the subject to what it has actually become. On 5/27/2011 5:27 AM, Nick Coghlan wrote:
We can almost certainly do better when it comes to constructing byte sequences from component parts, but simply saying "oh, just add a format() method to bytes objects" doesn't cut it, since the associated magic methods for str.format are all string based,
STRING FORMATTING From a modern and Python viewpoint, string formatting is about interpolating text representations of objects into a text template. By default, the text representation is str(object). Exception 1. str.format has an optional conversion specifier "!s/r/a" to specify repr(object) or ascii(object) instead of str(object). (It can also be used to overrides exception 2.) This is not relevant to bytes formatting. Exception 2.str.format, like % formatting, does special processing of numbers. Electronic computing was originally used only to compute numbers and text formatting was originally about formatting numbers, usually in tables, with optional text decoration. That is why the maximum field size for string interpolation is still called 'precision'. There are numerous variations in number formatting and most of the complication of format specifications arise therefrom. BYTES FORMATTING If the desired result consists entirely of text encoded with one encoding, the current recommended method is to construct the text and encode. I think this is the proper method and do not think that anything we add should be aimed at this use case. There are two other current methods to assemble bytes from pieces. One is concatenation; it has the same advantages and disadvantages of string concatenation. Another, overlooked in the current discussion so afr, is in-place editing of a bytearray by index and slice assignment. It has the disadvantage of having to know the correct indexes and slice points. If we add another bytes formatting function or method, I think it should be about interpolating bytes into a bytes template. The use cases would be anything other than mono-encoded text -- text with multiple encodings or non-text bytes possibly intermixed with encoded text.
and bytes interpolation also needs to address encoding issues for anything that isn't already a byte sequence.
As indicated above, I disagree if 'encoding' means 'text encoding'. Let .encode handle encoding issues. PROPOSAL A bytes template uses b'{' and b'}' to mark interpolation fields and other ascii bytes within as needed. It uses the ascii equivalent of the string field_name spec. It does not have a conversion spec. The format_spec should have the minimum needed for existing public protocols. How much more is up for discussion. We need use cases. One possibility to keep in mind is that a bytes template could constructed by an ascii-compatible encoding of formatted text. Specs for bytes fields can be protected in a text template by doubling the braces.
'{} {{byte-field-spec}}'.format(1).encode() b'1 {byte-field-spec}'
A major issue is what to do with numbers. Sometimes they needed to be ascii encoded, sometime binary encoded. The baseline is to do nothing extra and require all args to be bytes. I think this may be appropriate for floats as they are seldom specifically used in protocols. I think the same may be true for ints with signs. So I think we mainly need to consider counts (unsigned ints) for possible exceptional processing. Option 0. As stated, no special number specs. Option 1. Use a subset of the current int spec to produce ascii encodings; use struct.pack for binary encodings. (How many of the current integer presentation types would be needed?) Option 2. Use an adaptation of the struct.pack mini-language to produce binary encodings; use encoded str.format for ascii encodings. (The latter might be done as part of a text-to-bytes-template process as indicated above.) Option 3. Combine options 1 and 2. This might best be done by replacing the omitted 'conversion' field with a 'number-encoding' field, b'!a' or b'!b', to indicate ascii or binary conversion and corresponding interpretation of the format spec. (In other words, do not try to combine the number to text and number to binary mini-languages, but add a 'prefix' to specify which is being used.) -- Terry Jan Reedy