[Python-Dev] an idea for improving struct.unpack api

Thu Jan 6 08:33:39 CET 2005

[Ilya Sandler]
> A problem:
> 
> The current struct.unpack api works well for unpacking C-structures
where
> everything is usually unpacked at once, but it
> becomes  inconvenient when unpacking binary files where things
> often have to be unpacked field by field. Then one has to keep track
> of offsets, slice the strings,call struct.calcsize(), etc...

Yes.  That bites.

> Eg. with a current api unpacking  of a record which consists of a
> header followed by a variable  number of items would go like this
> 
>  hdr_fmt="iiii"
>  item_fmt="IIII"
>  item_size=calcsize(item_fmt)
>  hdr_size=calcsize(hdr_fmt)
>  hdr=unpack(hdr_fmt, rec[0:hdr_size]) #rec is the record to unpack
>  offset=hdr_size
>  for i in range(hdr[0]): #assume 1st field of header is a counter
>    item=unpack( item_fmt, rec[ offset: offset+item_size])
>    offset+=item_size
> 
> which is quite inconvenient...
> 
> 
> A  solution:
> 
> We could have an optional offset argument for
> 
> unpack(format, buffer, offset=None)
> 
> the offset argument is an object which contains a single integer field
> which gets incremented inside unpack() to point to the next byte.
> 
> so with a new API the above code could be written as
> 
>  offset=struct.Offset(0)
>  hdr=unpack("iiii", offset)
>  for i in range(hdr[0]):
>     item=unpack( "IIII", rec, offset)
> 
> When an offset argument is provided, unpack() should allow some bytes
to
> be left unpacked at the end of the buffer..
> 
> 
> Does this suggestion make sense? Any better ideas?

Rather than alter struct.unpack(), I suggest making a separate class
that tracks the offset and encapsulates some of the logic that typically
surrounds unpacking:

    r = StructReader(rec)
    hdr = r('iiii')
    for item in r.getgroups('IIII', times=rec[0]):
       . . .

It would be especially nice if it handled the more complex case where
the next offset is determined in-part by the data being read (see the
example in section 11.3 of the tutorial):

    r = StructReader(open('myfile.zip', 'rb'))
    for i in range(3):                  # show the first 3 file headers
        fields = r.getgroup('LLLHH', offset=14)
        crc32, comp_size, uncomp_size, filenamesize, extra_size = fields
        filename = g.getgroup('c', offset=16, times=filenamesize)
        extra = g.getgroup('c', times=extra_size)
        r.advance(comp_size)
        print filename, hex(crc32), comp_size, uncomp_size

If you come up with something, I suggest posting it as an ASPN recipe
and then announcing it on comp.lang.python.  That ought to generate some
good feedback based on other people's real world issues with
struct.unpack().

Raymond Hettinger