[Tutor] Struct headspinner

Thu Oct 13 00:23:57 CEST 2005

On Wed, 12 Oct 2005, Liam Clarke wrote:

> Erm, can someone please aid me? I'm using Windows XP, haven't tested
> this code on Linux yet, but, well watch this...
>
> '<' indicates little-endian, @ indicates native. i is an integer, q is a long.
>
> >>> struct.calcsize('<3i')
> 12
> >>> struct.calcsize('@3i')
> 12
> >>> struct.calcsize('<3iq')
> 20
> >>> struct.calcsize('@3iq')
> 24
> >>> struct.calcsize('@4iq')
> 24
>
> Is this a feature I don't understand? Is a long preceded by 3 integers
> really 12 bytes long? Surely Microsoft wouldn't do that?

Hi Liam,

What you're seeing really has little to do with Microsoft: it has to do
with the "alignment" of data structures against your computer's hardware
and the underlying C compiler for your system.

Most computer architectures will try to make sure that integers and words
in memory are always "aligned" on some predefined boundary.  It's easier
on the decoder hardware if it know exactly where those primitive data
types can start.

This is mentioned in the comment in the 'struct' documentation:

"""By default, C numbers are represented in the machine's native format
and byte order, and properly aligned by skipping pad bytes if necessary
(according to the rules used by the C compiler)."""

If the bytes that represent an integer are out of frame, then the hardware
might be slower about decoding the values in a good case, and might just
not be able to decode them at all in a bad case.  *grin*

'struct' will introduce "padding" bytes to make sure things are framed up
nicely.  And we can see this padding in action: a single integer takes up
four bytes:

######
>>> struct.calcsize("i")
4
######

but if we preceed that with a single byte character:

######
>>> struct.calcsize("ci")
8
######

rather than see that this takes five bytes, we see that it takes eight!
The 'struct' module has transparently added three "padding" bytes to make
sure the integer aligns in a way that's compatible with the underlying
computer hardware.

Order matters.  Here's another example of an integer followed by a
character:

######
>>> struct.calcsize("ic")
5
######

Nothing follows that character, so no padding is necessary.  But again, if
we put an integer after that, then we see padding in action:

#######
>>> struct.calcsize("ici")
12
>>> struct.calcsize("icci")
12
>>> struct.calcsize("iccci")
12
>>> struct.calcsize("icccci")
12
#######

All of this is meant to guarantee that the second integer is on a memory
location that can be efficiently destructured by the computer hardware.

Does this make sense?  Please feel free to ask more questions on this;
it's a bit of a low-level topic.