[Python-ideas] Ideas for improving the struct module

Elizabeth Myers elizabeth at interlinked.me
Fri Jan 20 11:51:35 EST 2017


On 20/01/17 10:47, Elizabeth Myers wrote:
> On 19/01/17 20:54, Cameron Simpson wrote:
>> On 19Jan2017 16:04, Yury Selivanov <yselivanov.ml at gmail.com> wrote:
>>> This is a neat idea, but this will only work for parsing framed
>>> binary protocols.  For example, if you protocol prefixes all packets
>>> with a length field, you can write an efficient read buffer and
>>> use your proposal to decode all of message's fields in one shot.
>>> Which is good.
>>>
>>> Not all protocols use framing though.  For instance, your proposal
>>> won't help to write Thrift or Postgres protocols parsers.
>>
>> Sure, but a lot of things fit the proposal. Seems a win: both simple and
>> useful.
>>
>>> Overall, I'm not sure that this is worth the hassle.  With proposal:
>>>
>>>   data, = struct.unpack('!H$', buf)
>>>   buf = buf[2+len(data):]
>>>
>>> with the current struct module:
>>>
>>>   len, = struct.unpack('!H', buf)
>>>   data = buf[2:2+len]
>>>   buf = buf[2+len:]
>>>
>>> Another thing: struct.calcsize won't work with structs that use
>>> variable length fields.
>>
>> True, but it would be enough for it to raise an exception of some kind.
>> It won't break any in play code, and it will prevent accidents for users
>> of new variable sizes formats.
>>
>> We've all got things we wish struct might cover (I have a few, but
>> strangely the top of the list is nonsemantic: I wish it let me put
>> meaningless whitespace inside the format for readability).
>>
>> +1 on the proposal from me.
>>
>> Oh: subject to one proviso: reading a struct will need to return how
>> many bytes of input data were scanned, not merely returning the decoded
>> values.
> 
> This is a little difficult without breaking backwards compatibility,
> but, it is not difficult to compute the lengths yourself. That said,
> calcsize could require an extra parameter if given a format string with
> variable-length specifiers in it, e.g.:
> 
>   struct.calcsize("z", (b'test'))
> 
> Would return 5 (zero-length terminator), so you don't have to compute it
> yourself.
> 
> Also, I filed a bug, and proposed use of Z and z.
>

Should I write up a PEP about this? I am not sure if it's justified or
not. It's 3 changes (calcsize and two format specifiers), but it might
be useful to codify it.


More information about the Python-ideas mailing list