[Python-Dev] an idea for improving struct.unpack api

Ilya Sandler ilya at bluefir.net
Sun Jan 9 21:19:42 CET 2005


> (a) A higher-level API can and should be constructed which acts like a
> (binary) stream but has additional methods for reading and writing
> values using struct format codes (or, preferably, somewhat
> higher-level type names, as suggested). Instances of this API should
> be constructable from a stream or from a "buffer" (e.g. a string).


Ok, I think it's getting much bigger than what I was initially aiming for
;-)...

One more comment though regarding unpack_at

> Then the definition would be:
>
> def unpack_at(fmt, buf, pos):
>     size = calcsize(fmt)
>     end = pos + size
>     data = buf[pos:end]
>     if len(data) < size:
>         raise struct.error("not enough data for format")
>     ret = unpack(fmt, data)
>     ret = ret + (end,)
>     return ret

While I see usefulness of this, I think it's a too limited, eg.
  result=unpack_at(fmt,buf, offset)
  offset=result.pop()
feels quite unnatural...
So my feeling is that adding this new API is not worth the trouble.
Especially if there are plans for anything higher level...

Instead, I would suggest that even a very limited initial
implementation of StructReader() like object suggested by Raymond would
be more useful...

class StructReader: #or maybe call it Unpacker?
    def __init__(self, buf):
        self._buf=buf
        self._offset=0
    def unpack(self, format):
        """unpack at current offset, advance internal offset
          accordingly"""
          size=struct.calcize(format)
          self._pos+=size
          ret=struct.unpack(format, self._buf[self._pos:self._pos+size)
  	  return ret
     #or may be just make _offset public??
     def tell(self):
        "return current offset"
        return self._offset
     def seek(self, offset, whence=0):
        "set current offset"
        self._offset=offset

This solves the original offset tracking problem completely (at least as
far as inconvenience is concerned, improving unpack() perfomance
would require struct reader to be written in C) , while allowing to add
the rest later.

E.g the original "hdr+variable number of data items" code would
look:

 buf=StructReader(rec)
 hdr=buf.unpack("iiii")
 for i in range(hdr[0]):
    item=buf.unpack( "IIII")


Ilya


PS with unpack_at() this code would look like:

 offset=0
 hdr=buf.unpack("iiii", offset)
 offset=hdr.pop()
 for i in range(hdr[0]):
    item=buf.unpack( "IIII",offset)
    offset=item.pop()




On Sat, 8 Jan 2005, Guido van Rossum wrote:

> First, let me say two things:
>
> (a) A higher-level API can and should be constructed which acts like a
> (binary) stream but has additional methods for reading and writing
> values using struct format codes (or, preferably, somewhat
> higher-level type names, as suggested). Instances of this API should
> be constructable from a stream or from a "buffer" (e.g. a string).
>
> (b) -1 on Ilya's idea of having a special object that acts as an
> input-output integer; it is too unpythonic (no matter your objection).
>
> [Paul Moore]
> > OTOH, Nick's idea of returning a tuple with the new offset might make
> > your example shorter without sacrificing readability:
> >
> >     result, newpos = struct.unpack('>l', self.__buf, self.__pos)
> >     self.__pos = newpos # retained "newpos" for readability...
> >     return result
>
> This is okay, except I don't want to overload this on unpack() --
> let's pick a different function name like unpack_at().
>
> > A third possibility - rather than "magically" adding an additional
> > return value because you supply a position, you could have a "where am
> > I?" format symbol (say & by analogy with the C "address of" operator).
> > Then you'd say
> >
> >     result, newpos = struct.unpack('>l&', self.__buf, self.__pos)
> >
> > Please be aware, I don't have a need myself for this feature - my
> > interest is as a potential reader of others' code...
>
> I think that adding more magical format characters is probably not
> doing the readers of this code a service.
>
> I do like the idea of not introducing an extra level of tuple to
> accommodate the position return value but instead make it the last
> item in the tuple when using unpack_at().
>
> Then the definition would be:
>
> def unpack_at(fmt, buf, pos):
>     size = calcsize(fmt)
>     end = pos + size
>     data = buf[pos:end]
>     if len(data) < size:
>         raise struct.error("not enough data for format")
>     # if data is too long that would be a bug in buf[pos:size] and
> cause an error below
>     ret = unpack(fmt, data)
>     ret = ret + (end,)
>     return ret
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>


More information about the Python-Dev mailing list