[Python-Dev] Struct schizophrenia

Tim Peters tim.one@home.com
Sat, 9 Jun 2001 21:10:53 -0400


I'm adding "long long" integral types to struct (in native mode, "long long"
or __int64 on platforms that have them; in standard mode, 64 bits).

This is proving harder than it should be, because the code that's already
there is schizophrenic across boundaries, so is failing as a base to build
on (raises more questions than it answers).  Like:

>>> x = 256
>>> struct.pack("b", x)  # complains about magnitude in native mode
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
struct.error: byte format requires -128<=number<=127

>>> struct.pack("=b", x) # but doesn't with native order + std align
'\x00'

>>> struct.pack("<b", x) # or when little-endian (ditto big-endian)
'\x00'

>>> struct.pack("<b", 256L)  # too-long *longs* also OK in little-endian
'\x00'

>>> struct.pack("<b", 256256256256L) # but not if they're "too too-long"
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
OverflowError: long int too large to convert
>>>

Much the same is true of other small int sizes:  you can't predict what will
happen without trying it; and once you get to ints, no range-checking is
performed even in native mode.

Surely this can't stand, but what do people *want*?  My preference is to
raise the same "byte format requires -128<=number<=127" exception in all
these cases; OTOH, the code structure fights that, working with Python longs
is clumsy in C, and there are other "undocumented features" here that may or
may not be accidents:

>>> struct.pack("B", 234.3)
'\xea'
>>>

That is, did we *intend* to accept floats packed via integer typecodes?
Feature or bug?

In the other (unpack) direction, the docs say for 'I' (unsigned int):

    The "I" conversion code will convert to a Python long if the
    C int is the same size as a C long, which is typical on most
    modern systems. If a C int is smaller than a C long, an Python
    integer will be created instead.

That's in a footnote.  In another part, they say:

    For the "I" and "L" format characters, the return value is a
    Python long integer.

The footnote is wrong -- but is the footnote what was intended (somebody
went to a fair bit of work to write all the stuff <wink>)?