[Python-ideas] Ideas for improving the struct module

Elizabeth Myers elizabeth at interlinked.me
Fri Jan 20 18:37:33 EST 2017


On 20/01/17 17:26, Elizabeth Myers wrote:
> On 20/01/17 16:46, Cameron Simpson wrote:
>> On 20Jan2017 14:47, Elizabeth Myers <elizabeth at interlinked.me> wrote:
>>> 1) struct.unpack and struct.unpack_from should remain
>>> backwards-compatible. I don't want to return extra values from it like
>>> (length unpacked, (data...)) for that reason.
>>
>> Fully agree with this.
>>
>>> If the calcsize solution
>>> feels a bit weird (it isn't much less efficient, because strings store
>>> their length with them, so it's constant-time), there could also be new
>>> functions that *do* return the length if you need it. To me though, this
>>> feels like a use case for struct.iter_unpack.
>>
>> Often, maybe, but there are still going to be protocols that the new
>> format doesn't support, where the performant thing to do (in pure
>> Python) is to scan what you can with struct and "hand scan" the special
>> bits with special code. 
>> Consider, for example, a format like MP4/ISO14496, where there's a
>> regular block structure (which is somewhat struct parsable) that can
>> contain embedded arbitraily weird information. Or the flipside where
>> struct parsable data are embedded in a format not supported by struct.
>>
>> The mixed situation is where you need to know where the parse got up
>> to.  Calling calcsize or its variable size equivalent after a parse
>> seems needlessly repetetive of the parse work.
>>
>> For myself, I would want there to be some kind of call that returned the
>> parse and the length scanned, with the historic interface preserved for
>> the fixed size formats or for users not needing the length.
>>
>>> 2) I want to avoid making a weird incongruity, where only
>>> variable-length strings return the length actually parsed.
>>
>> Fully agree. Arguing for two API calls: the current one and one that
>> also returns the scan length.
>>
>> Cheers,
>> Cameron Simpson <cs at zip.com.au>
> 
> Some of the responses on the bug are discouraging... mostly seems to
> boil down to people just not wanting to expand the struct module or
> discourage its use. Everyone is a critic. I didn't know adding two
> format specifiers was going to be this controversial. You'd think I
> proposed adding braces or something :/.
> 
> I'm hesitant to go forward on this until the bug has a resolution.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
> 

Also, btw, adding 128-bit length specifiers sounds like a good idea in
theory, but the difficulty stems from the fact there's no real native
128-bit type that's portable. I don't know much about how python handles
big ints internally, either, but I could learn.

I was looking into implementing this already, and it appears it should
be possible by teaching the module that "not all data is fixed length"
and allowing functions to report back (via a Py_ssize_t *) how much data
was actually unpacked/packed. But again, waiting on that bug to have a
resolution before I do anything. I don't want to waste hours of effort
on something the developers ultimately decide they don't want and will
just reject.

--
Elizabeth


More information about the Python-ideas mailing list