[Python-Dev] an idea for improving struct.unpack api
Ilya Sandler
ilya at bluefir.net
Sun Jan 9 21:19:42 CET 2005
> (a) A higher-level API can and should be constructed which acts like a
> (binary) stream but has additional methods for reading and writing
> values using struct format codes (or, preferably, somewhat
> higher-level type names, as suggested). Instances of this API should
> be constructable from a stream or from a "buffer" (e.g. a string).
Ok, I think it's getting much bigger than what I was initially aiming for
;-)...
One more comment though regarding unpack_at
> Then the definition would be:
>
> def unpack_at(fmt, buf, pos):
> size = calcsize(fmt)
> end = pos + size
> data = buf[pos:end]
> if len(data) < size:
> raise struct.error("not enough data for format")
> ret = unpack(fmt, data)
> ret = ret + (end,)
> return ret
While I see usefulness of this, I think it's a too limited, eg.
result=unpack_at(fmt,buf, offset)
offset=result.pop()
feels quite unnatural...
So my feeling is that adding this new API is not worth the trouble.
Especially if there are plans for anything higher level...
Instead, I would suggest that even a very limited initial
implementation of StructReader() like object suggested by Raymond would
be more useful...
class StructReader: #or maybe call it Unpacker?
def __init__(self, buf):
self._buf=buf
self._offset=0
def unpack(self, format):
"""unpack at current offset, advance internal offset
accordingly"""
size=struct.calcize(format)
self._pos+=size
ret=struct.unpack(format, self._buf[self._pos:self._pos+size)
return ret
#or may be just make _offset public??
def tell(self):
"return current offset"
return self._offset
def seek(self, offset, whence=0):
"set current offset"
self._offset=offset
This solves the original offset tracking problem completely (at least as
far as inconvenience is concerned, improving unpack() perfomance
would require struct reader to be written in C) , while allowing to add
the rest later.
E.g the original "hdr+variable number of data items" code would
look:
buf=StructReader(rec)
hdr=buf.unpack("iiii")
for i in range(hdr[0]):
item=buf.unpack( "IIII")
Ilya
PS with unpack_at() this code would look like:
offset=0
hdr=buf.unpack("iiii", offset)
offset=hdr.pop()
for i in range(hdr[0]):
item=buf.unpack( "IIII",offset)
offset=item.pop()
On Sat, 8 Jan 2005, Guido van Rossum wrote:
> First, let me say two things:
>
> (a) A higher-level API can and should be constructed which acts like a
> (binary) stream but has additional methods for reading and writing
> values using struct format codes (or, preferably, somewhat
> higher-level type names, as suggested). Instances of this API should
> be constructable from a stream or from a "buffer" (e.g. a string).
>
> (b) -1 on Ilya's idea of having a special object that acts as an
> input-output integer; it is too unpythonic (no matter your objection).
>
> [Paul Moore]
> > OTOH, Nick's idea of returning a tuple with the new offset might make
> > your example shorter without sacrificing readability:
> >
> > result, newpos = struct.unpack('>l', self.__buf, self.__pos)
> > self.__pos = newpos # retained "newpos" for readability...
> > return result
>
> This is okay, except I don't want to overload this on unpack() --
> let's pick a different function name like unpack_at().
>
> > A third possibility - rather than "magically" adding an additional
> > return value because you supply a position, you could have a "where am
> > I?" format symbol (say & by analogy with the C "address of" operator).
> > Then you'd say
> >
> > result, newpos = struct.unpack('>l&', self.__buf, self.__pos)
> >
> > Please be aware, I don't have a need myself for this feature - my
> > interest is as a potential reader of others' code...
>
> I think that adding more magical format characters is probably not
> doing the readers of this code a service.
>
> I do like the idea of not introducing an extra level of tuple to
> accommodate the position return value but instead make it the last
> item in the tuple when using unpack_at().
>
> Then the definition would be:
>
> def unpack_at(fmt, buf, pos):
> size = calcsize(fmt)
> end = pos + size
> data = buf[pos:end]
> if len(data) < size:
> raise struct.error("not enough data for format")
> # if data is too long that would be a bug in buf[pos:size] and
> cause an error below
> ret = unpack(fmt, data)
> ret = ret + (end,)
> return ret
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>
More information about the Python-Dev
mailing list