[Python-ideas] Ideas for improving the struct module
Daniel Spitz
spitz.dan.l at gmail.com
Wed Jan 18 12:08:02 EST 2017
+1 on the idea of supporting variable-length strings with the length
encoded in the preceding packed element!
Several months ago I was trying to write a parser and writer of
PostgreSQL's COPY ... WITH BINARY format. I started out trying to implement
it in pure python using the struct module. Due to the existence of
variable-length strings encoded in precisely the way you mention, it was
not possible to parse an entire row of data without invoking any
pure-python-level logic. This made the implementation infeasibly slow. I
had to switch to using cython to get it done fast enough (implementation is
here: https://github.com/spitz-dan-l/postgres-binary-parser).
I believe that with this single change ($, or whatever format specifier one
wishes to use), assuming it were implemented efficiently in c, I could have
avoided using cython and gotten a satisfactory level of performance with
the struct module and python/numpy's already-performant bytestring
manipulation faculties.
-Dan Spitz
On Wed, Jan 18, 2017 at 5:32 AM Elizabeth Myers <elizabeth at interlinked.me>
wrote:
> Hello,
>
> I've noticed a lot of binary protocols require variable length
> bytestrings (with or without a null terminator), but it is not easy to
> unpack these in Python without first reading the desired length, or
> reading bytes until a null terminator is reached.
>
> I've noticed the netstruct library
> (https://github.com/stendec/netstruct) has a format specifier, $, which
> assumes the previous type to pack/unpack is the string's length. This is
> an interesting idea in of itself, but doesn't handle the null-terminated
> string chase. I know $ is similar to pascal strings, but sometimes you
> need more than 255 characters :p.
>
> For null-terminated strings, it may be simpler to have a specifier for
> those. I propose 0, but this point can be bikeshedded over endlessly if
> desired ;) (I thought about using n/N but they're :P).
>
> It's worth noting that (maybe one of?) Perl's equivalent to the struct
> module, whose name escapes me atm, has a module which can handle this
> case. I can't remember if it handled variable length or zero-terminated
> though; maybe it did both. Perl is more or less my 10th language. :p
>
> This pain point is an annoyance imo and would greatly simplify a lot of
> code if implemented, or something like it. I'd be happy to take a look
> at implementing it if the idea is received sufficiently warmly.
>
> --
> Elizabeth
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20170118/d580fb85/attachment.html>
More information about the Python-ideas
mailing list