[Python-ideas] Ideas for improving the struct module
Elizabeth Myers
elizabeth at interlinked.me
Fri Jan 20 11:42:28 EST 2017
On 19/01/17 20:40, Cameron Simpson wrote:
> On 19Jan2017 12:08, Elizabeth Myers <elizabeth at interlinked.me> wrote:
>> I also didn't mention that when you are unpacking iteratively (e.g., you
>> have multiple strings), the code becomes a bit more hairy:
>>
>>>>> test_bytes = b'\x00\x05hello\x00\x07goodbye\x00\x04test'
>>>>> offset = 0
>>>>> while offset < len(test_bytes):
>> ... length = struct.unpack_from('!H', test_bytes, offset)[0]
>> ... offset += 2
>> ... string = struct.unpack_from('{}s'.format(length), test_bytes,
>> offset)[0]
>> ... offset += length
>>
>> It actually gets a lot worse when you have to unpack a set of strings in
>> a context-sensitive manner. You have to be sure to update the offset
>> constantly so you can always unpack strings appropriately. Yuck!
>
> Whenever I'm doing iterative stuff like this, either variable length
> binary or lexical stuff, I always end up with a bunch of functions which
> can be called like this:
>
> datalen, offset = get_bs(chunk, offset=offset)
>
> The notable thing here is just that they return the data and the new
> offset, which makes updating the offset impossible to forget, and also
> makes the calling code more succinct, like the internal call to get_bs()
> below:
>
> such as this decoder for a length encoded field:
>
> def get_bsdata(chunk, offset=0):
> ''' Fetch a length-prefixed data chunk.
> Decodes an unsigned value from a bytes at the specified `offset`
> (default 0), and collects that many following bytes.
> Return those following bytes and the new offset.
> '''
> ##is_bytes(chunk)
> offset0 = offset
> datalen, offset = get_bs(chunk, offset=offset)
> data = chunk[offset:offset+datalen]
> ##is_bytes(data)
> if len(data) != datalen:
> raise ValueError("bsdata(chunk, offset=%d): insufficient data:
> expected %d bytes, got %d bytes"
> % (offset0, datalen, len(data)))
> offset += datalen
> return data, offset
Gotta be honest, this seems less elegant than just adding something like
what netstruct does to the struct module. It's also way more verbose.
Perhaps some kind of higher level module could be built on struct at
some point, maybe in stdlib, maybe not (construct imo is not that lib
for previous raised objections).
>
> Cheers,
> Cameron Simpson <cs at zip.com.au>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
More information about the Python-ideas
mailing list