<div dir="ltr">Construct has radical API changes and should remain apart. It feels to me like a straw-man to introduce a large library to the discussion as justification for it being too-specialized. <div><br></div><div>This proposal to me seems much more modest: add another format character (or two) to the existing set of a dozen or so that will be packed/unpacked just like the others. It also has demonstrable use in various formats/protocols.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jan 19, 2017 at 12:50 PM, Nathaniel Smith <span dir="ltr"><<a href="mailto:njs@pobox.com" target="_blank">njs@pobox.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="auto">I haven't had a chance to use it myself yet, but I've heard good things about<div dir="auto"><br><div dir="auto"><a href="https://construct.readthedocs.io/en/latest/" target="_blank">https://construct.readthedocs.<wbr>io/en/latest/</a><br></div><div dir="auto"><br></div><div dir="auto">It's certainly far more comprehensive than struct for this and other problems.</div><div dir="auto"><br></div><div dir="auto">As usual, there's some tension between adding stuff to the stdlib versus using more specialized third-party packages. The existence of packages like construct doesn't automatically mean that we should stop improving the stdlib, but OTOH not every useful thing can or should be in the stdlib.</div><div dir="auto"><br></div><div dir="auto">Personally, I find myself parsing uleb128-prefixed strings more often than u4-prefixed strings.</div></div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><div class="gmail_quote">On Jan 19, 2017 10:42 AM, "Nick Timkovich" <<a href="mailto:prometheus235@gmail.com" target="_blank">prometheus235@gmail.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">ctypes.Structure is *literally* the interface to the C struct that as Chris mentions has fixed offsets for all members. I don't think that should (can?) be altered.<div><br></div><div>In file formats (beyond net protocols) the string size + variable length string motif comes up often and I am frequently re-implementing the two-line read-an-int + read-{}.format-bytes.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jan 19, 2017 at 12:17 PM, Joao S. O. Bueno <span dir="ltr"><<a href="mailto:jsbueno@python.org.br" target="_blank">jsbueno@python.org.br</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I am for upgrading struct to these, if possible.<br>
<br>
But besides my +1, I am writting in to remember folks thatthere is another<br>
"struct" model in the stdlib:<br>
<br>
ctypes.Structure -<br>
<br>
For reading a lot of records with the same structure it is much more handy than<br>
struct, since it gives one a suitable Python object on instantiation.<br>
<br>
However, it also can't handle variable lenght fields automatically.<br>
<br>
But maybe, the improvement could be made on that side, or another package<br>
altogether taht works more like it than current "struct".<br>
<div class="m_-5475656896304801177m_8581298839046506960HOEnZb"><div class="m_-5475656896304801177m_8581298839046506960h5"><br>
<br>
<br>
On 19 January 2017 at 16:08, Elizabeth Myers <<a href="mailto:elizabeth@interlinked.me" target="_blank">elizabeth@interlinked.me</a>> wrote:<br>
> On 19/01/17 06:47, Elizabeth Myers wrote:<br>
>> On 19/01/17 05:58, Rhodri James wrote:<br>
>>> On 19/01/17 08:31, Mark Dickinson wrote:<br>
>>>> On Thu, Jan 19, 2017 at 1:27 AM, Steven D'Aprano <<a href="mailto:steve@pearwood.info" target="_blank">steve@pearwood.info</a>><br>
>>>> wrote:<br>
>>>>> [...] struct already supports<br>
>>>>> variable-width formats.<br>
>>>><br>
>>>> Unfortunately, that's not really true: the Pascal strings it supports<br>
>>>> are in some sense variable length, but are stored in a fixed-width<br>
>>>> field. The internals of the struct module rely on each field starting<br>
>>>> at a fixed offset, computable directly from the format string. I don't<br>
>>>> think variable-length fields would be a good fit for the current<br>
>>>> design of the struct module.<br>
>>>><br>
>>>> For the OPs use-case, I'd suggest a library that sits on top of the<br>
>>>> struct module, rather than an expansion to the struct module itself.<br>
>>><br>
>>> Unfortunately as the OP explained, this makes the struct module a poor<br>
>>> fit for protocol decoding, even as a base layer for something. It's one<br>
>>> of the things I use python for quite frequently, and I always end up<br>
>>> rolling my own and discarding struct entirely.<br>
>>><br>
>><br>
>> Yes, for variable-length fields the struct module is worse than useless:<br>
>> it actually reduces clarity a little. Consider:<br>
>><br>
>>>>> test_bytes = b'\x00\x00\x00\x0chello world!'<br>
>><br>
>> With this, you can do:<br>
>><br>
>>>>> length = int.from_bytes(test_bytes[:4], 'big')<br>
>>>>> string = test_bytes[4:length]<br>
>><br>
>> or you can do:<br>
>><br>
>>>>> length = struct.unpack_from('!I', test_bytes)[0]<br>
>>>>> string = struct.unpack_from('{}s'.forma<wbr>t(length), test_bytes, 4)[0]<br>
>><br>
>> Which looks more readable without consulting the docs? ;)<br>
>><br>
>> Building anything on top of the struct library like this would lead to<br>
>> worse-looking code for minimal gains in efficiency. To quote Jamie<br>
>> Zawinksi, it is like building a bookshelf out of mashed potatoes as it<br>
>> stands.<br>
>><br>
>> If we had an extension similar to netstruct:<br>
>><br>
>>>>> length, string = struct.unpack('!I$', test_bytes)<br>
>><br>
>> MUCH improved readability, and also less verbose. :)<br>
><br>
> I also didn't mention that when you are unpacking iteratively (e.g., you<br>
> have multiple strings), the code becomes a bit more hairy:<br>
><br>
>>>> test_bytes = b'\x00\x05hello\x00\x07goodbye<wbr>\x00\x04test'<br>
>>>> offset = 0<br>
>>>> while offset < len(test_bytes):<br>
> ... length = struct.unpack_from('!H', test_bytes, offset)[0]<br>
> ... offset += 2<br>
> ... string = struct.unpack_from('{}s'.forma<wbr>t(length), test_bytes,<br>
> offset)[0]<br>
> ... offset += length<br>
><br>
> It actually gets a lot worse when you have to unpack a set of strings in<br>
> a context-sensitive manner. You have to be sure to update the offset<br>
> constantly so you can always unpack strings appropriately. Yuck!<br>
><br>
> It's worth mentioning that a few years ago, a coworker and I found<br>
> ourselves needing variable length strings in the context of a binary<br>
> protocol (DHCP), and wound up abandoning the struct module entirely<br>
> because it was unsuitable. My co-worker said the same thing I did: "it's<br>
> like building a bookshelf out of mashed potatoes."<br>
><br>
> I do understand it might require a possible major rewrite or major<br>
> changes the struct module, but in the long run, I think it's worth it<br>
> (especially because the struct module is not all that big in scope). As<br>
> it stands, the struct module simply is not suited for protocols where<br>
> you have variable-length strings, and in my experience, that is the vast<br>
> majority of modern binary protocols on the Internet.<br>
><br>
> --<br>
> Elizabeth<br>
> ______________________________<wbr>_________________<br>
> Python-ideas mailing list<br>
> <a href="mailto:Python-ideas@python.org" target="_blank">Python-ideas@python.org</a><br>
> <a href="https://mail.python.org/mailman/listinfo/python-ideas" rel="noreferrer" target="_blank">https://mail.python.org/mailma<wbr>n/listinfo/python-ideas</a><br>
> Code of Conduct: <a href="http://python.org/psf/codeofconduct/" rel="noreferrer" target="_blank">http://python.org/psf/codeofco<wbr>nduct/</a><br>
______________________________<wbr>_________________<br>
Python-ideas mailing list<br>
<a href="mailto:Python-ideas@python.org" target="_blank">Python-ideas@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/python-ideas" rel="noreferrer" target="_blank">https://mail.python.org/mailma<wbr>n/listinfo/python-ideas</a><br>
Code of Conduct: <a href="http://python.org/psf/codeofconduct/" rel="noreferrer" target="_blank">http://python.org/psf/codeofco<wbr>nduct/</a><br>
</div></div></blockquote></div><br></div>
<br>______________________________<wbr>_________________<br>
Python-ideas mailing list<br>
<a href="mailto:Python-ideas@python.org" target="_blank">Python-ideas@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/python-ideas" rel="noreferrer" target="_blank">https://mail.python.org/mailma<wbr>n/listinfo/python-ideas</a><br>
Code of Conduct: <a href="http://python.org/psf/codeofconduct/" rel="noreferrer" target="_blank">http://python.org/psf/codeofco<wbr>nduct/</a><br></blockquote></div></div>
</div></div></blockquote></div><br></div>