[Python-ideas] Ideas for improving the struct module

Nick Timkovich prometheus235 at gmail.com
Thu Jan 19 13:41:46 EST 2017


ctypes.Structure is *literally* the interface to the C struct that as Chris
mentions has fixed offsets for all members. I don't think that should
(can?) be altered.

In file formats (beyond net protocols) the string size + variable length
string motif comes up often and I am frequently re-implementing the
two-line read-an-int + read-{}.format-bytes.

On Thu, Jan 19, 2017 at 12:17 PM, Joao S. O. Bueno <jsbueno at python.org.br>
wrote:

> I am for upgrading struct to these, if possible.
>
> But besides my +1,  I am writting in to remember folks thatthere is another
> "struct" model in the stdlib:
>
> ctypes.Structure  -
>
> For reading a lot of records with the same structure it is much more handy
> than
> struct, since it gives one a suitable Python object on instantiation.
>
> However, it also can't handle variable lenght fields automatically.
>
> But maybe, the improvement could be made on that side, or another package
> altogether taht works more like it than current "struct".
>
>
>
> On 19 January 2017 at 16:08, Elizabeth Myers <elizabeth at interlinked.me>
> wrote:
> > On 19/01/17 06:47, Elizabeth Myers wrote:
> >> On 19/01/17 05:58, Rhodri James wrote:
> >>> On 19/01/17 08:31, Mark Dickinson wrote:
> >>>> On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano <steve at pearwood.info
> >
> >>>> wrote:
> >>>>> [...] struct already supports
> >>>>> variable-width formats.
> >>>>
> >>>> Unfortunately, that's not really true: the Pascal strings it supports
> >>>> are in some sense variable length, but are stored in a fixed-width
> >>>> field. The internals of the struct module rely on each field starting
> >>>> at a fixed offset, computable directly from the format string. I don't
> >>>> think variable-length fields would be a good fit for the current
> >>>> design of the struct module.
> >>>>
> >>>> For the OPs use-case, I'd suggest a library that sits on top of the
> >>>> struct module, rather than an expansion to the struct module itself.
> >>>
> >>> Unfortunately as the OP explained, this makes the struct module a poor
> >>> fit for protocol decoding, even as a base layer for something.  It's
> one
> >>> of the things I use python for quite frequently, and I always end up
> >>> rolling my own and discarding struct entirely.
> >>>
> >>
> >> Yes, for variable-length fields the struct module is worse than useless:
> >> it actually reduces clarity a little. Consider:
> >>
> >>>>> test_bytes = b'\x00\x00\x00\x0chello world!'
> >>
> >> With this, you can do:
> >>
> >>>>> length = int.from_bytes(test_bytes[:4], 'big')
> >>>>> string = test_bytes[4:length]
> >>
> >> or you can do:
> >>
> >>>>> length = struct.unpack_from('!I', test_bytes)[0]
> >>>>> string = struct.unpack_from('{}s'.format(length), test_bytes, 4)[0]
> >>
> >> Which looks more readable without consulting the docs? ;)
> >>
> >> Building anything on top of the struct library like this would lead to
> >> worse-looking code for minimal gains in efficiency. To quote Jamie
> >> Zawinksi, it is like building a bookshelf out of mashed potatoes as it
> >> stands.
> >>
> >> If we had an extension similar to netstruct:
> >>
> >>>>> length, string = struct.unpack('!I$', test_bytes)
> >>
> >> MUCH improved readability, and also less verbose. :)
> >
> > I also didn't mention that when you are unpacking iteratively (e.g., you
> > have multiple strings), the code becomes a bit more hairy:
> >
> >>>> test_bytes = b'\x00\x05hello\x00\x07goodbye\x00\x04test'
> >>>> offset = 0
> >>>> while offset < len(test_bytes):
> > ...     length = struct.unpack_from('!H', test_bytes, offset)[0]
> > ...     offset += 2
> > ...     string = struct.unpack_from('{}s'.format(length), test_bytes,
> > offset)[0]
> > ...     offset += length
> >
> > It actually gets a lot worse when you have to unpack a set of strings in
> > a context-sensitive manner. You have to be sure to update the offset
> > constantly so you can always unpack strings appropriately. Yuck!
> >
> > It's worth mentioning that a few years ago, a coworker and I found
> > ourselves needing variable length strings in the context of a binary
> > protocol (DHCP), and wound up abandoning the struct module entirely
> > because it was unsuitable. My co-worker said the same thing I did: "it's
> > like building a bookshelf out of mashed potatoes."
> >
> > I do understand it might require a possible major rewrite or major
> > changes the struct module, but in the long run, I think it's worth it
> > (especially because the struct module is not all that big in scope). As
> > it stands, the struct module simply is not suited for protocols where
> > you have variable-length strings, and in my experience, that is the vast
> > majority of modern binary protocols on the Internet.
> >
> > --
> > Elizabeth
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at python.org
> > https://mail.python.org/mailman/listinfo/python-ideas
> > Code of Conduct: http://python.org/psf/codeofconduct/
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20170119/8cc40027/attachment-0001.html>


More information about the Python-ideas mailing list