[Python-Dev] proposed struct module format code addition

Tim Peters tim.peters at gmail.com
Sun Oct 3 20:14:35 CEST 2004

[Josiah Carlson]
> ...
>    The 'p' format specifier is also not a standard C type, and yet it
> is included in struct, specifically because it is useful.

I never used it <wink>.

> Argument:
>    You can already do the same thing with:
>     pickle.encode_long(long_int)
>     pickle.decode_long(packed_long)

That isn't an argument I've seen before, although I pointed out these
pickle functions in the tracker item.  The argument there is that
pickle demonstrates actual long<->bytes use cases in the core,
implemented painfully in Python, and that the struct patch wouldn't
make them easier.  The argument was not that you can already use the
pickle functions, it was that the pickle functions shouldn't need to
exist -- they're isolated hacks that address only one part of the
whole problem, and even that part is addressed painfully now.  The
workalike implementation in cPickle.c was actually easier than the
Python implementation in pickle.py, because the former get to use the
flexible C API functions I added for "this kind of thing".

> and some likely soon-to-be included additions to the binascii module.

The pickle functions *may* argue in favor of binascii functions, if
those were actually specified.  I'm not sure Raymond has ever spelled
out what their signatures would be, so I don't know.  If suitable
functions in binascii will exist, then that (as Guido said) raises the
bar for adding similar functionality to struct too, but does not (as
Guido also said) preclude adding similar functionality to struct.

> My Response:
>    That is not the same.  Nontrivial problems require multiple passes
> over your data with multiple calls.  A simple:
>     struct.unpack('H3G3G', st)

My problem with that use case is that I've never had it, and have
never seen an app that had it.  Since I've been around for decades,
that triggers a suspicion that it's not a common need.


> Argument:
>    The struct module has a steep learning curve already, and this new
> format specifier doesn't help it.

That's another argument I haven't seen before, but bears an
hallucinatory resemblance to one I made.  People have wondered how to
convert between long and bytestring in Python for years, and prior to
this iteration, they have always asked whether there's "a function" to
do it.  Like all the use cases I ever had, they have one long to
convert, or one bytestring, at a time.  "Ah, see the functions in
binascii" would be a direct answer to their question, and one that
wouldn't require that they climb any part of struct's learning curve. 
IOW, if it *could* be done without struct, I'm sure that would make
life easier for most people who ever asked about it.  For people who
already know struct, the marginal learning burden of adding another
format code is clearly minor.

>    I can't see how a new format specifier would necessarily make the
> learning curve any more difficult,

Neither can I, for people who already know struct.

> if it was even difficult in the first place.

It is difficult.  There are many format codes, they don't all work the
same way, and there are distinctions among:

- native, standard, and forced endianness
- native and standard (== no) alignment
- native and standard sizes for some types

Newbie confusion about how to use struct is common on c.l.py, and is
especially acute among those who don't know C (just *try* to read the
struct docs from the POV of someone who hasn't used C).

> ...
> If you believe this functionality is useful, or even if you think that I
> am full of it, tell us: http://python.org/sf/1023290

I certainly would like to see more people say they'd *use* the g and G
codes in struct even if "one shot" functions in binascii were

I'd also like to see a specific design for binascii functions.  I
don't think "simple" would be an accurate description of those, if
they're flexible enough to handle the common use cases I know about. 
They'd be more like

    long2bytes(n, length=None, unsigned=False, msdfirst=False)

where, by default (length is None), long2bytes(n) is the same as
pickle.encode_long(n), except that long2bytes(0) would produce '\0'
instead of an empty string.  Specifying length <= 0 is a ValueError. 
length > 0 would be akin to the C API

    _PyLong_AsByteArray(n, buf, length, not msdfirst, not unsigned)

ValueError if n < 0 and unsigned=True.  OverflowError if length > 0
and n's magnitude is too large to fit in length bytes.

    bytes2long(bytes, unsigned=False, msdfirst=False)

would be akin to the C API

    _PyLong_FromByteArray(bytes, len(bytes), not msdfirst, not unsigned)

except that bytes=="" would raise ValueError instead of returning 0.

More information about the Python-Dev mailing list