+1 on the idea of supporting variable-length strings with the length encoded in the preceding packed element!

Several months ago I was trying to write a parser and writer of PostgreSQL's COPY ... WITH BINARY format. I started out trying to implement it in pure python using the struct module. Due to the existence of variable-length strings encoded in precisely the way you mention, it was not possible to parse an entire row of data without invoking any pure-python-level logic. This made the implementation infeasibly slow. I had to switch to using cython to get it done fast enough (implementation is here: https://github.com/spitz-dan-l/postgres-binary-parser).

I believe that with this single change ($, or whatever format specifier one wishes to use), assuming it were implemented efficiently in c, I could have avoided using cython and gotten a satisfactory level of performance with the struct module and python/numpy's already-performant bytestring manipulation faculties.

-Dan Spitz

On Wed, Jan 18, 2017 at 5:32 AM Elizabeth Myers <elizabeth@interlinked.me> wrote:

Hello,

I've noticed a lot of binary protocols require variable length
bytestrings (with or without a null terminator), but it is not easy to
unpack these in Python without first reading the desired length, or
reading bytes until a null terminator is reached.

I've noticed the netstruct library
(https://github.com/stendec/netstruct) has a format specifier, $, which
assumes the previous type to pack/unpack is the string's length. This is
an interesting idea in of itself, but doesn't handle the null-terminated
string chase. I know $ is similar to pascal strings, but sometimes you
need more than 255 characters :p.

For null-terminated strings, it may be simpler to have a specifier for
those. I propose 0, but this point can be bikeshedded over endlessly if
desired ;) (I thought about using n/N but they're :P).

It's worth noting that (maybe one of?) Perl's equivalent to the struct
module, whose name escapes me atm, has a module which can handle this
case. I can't remember if it handled variable length or zero-terminated
though; maybe it did both. Perl is more or less my 10th language. :p

This pain point is an annoyance imo and would greatly simplify a lot of
code if implemented, or something like it. I'd be happy to take a look
at implementing it if the idea is received sufficiently warmly.

--
Elizabeth
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/