struct: type registration?
noway at sorry.com
Thu Jun 1 14:35:16 CEST 2006
Giovanni Bajo wrote:
> You need struct.unpack() to parse these datas, and you need custom
> packer/unpacker to avoid post-processing the output of unpack() just
> because it just knows of basic Python types. In binary structs, there
> happen to be *types* which do not map 1:1 to Python types, nor they
> are just basic C types (like the ones struct supports). Using custom
> formatter is a way to better represent these types (instead of
> mapping them to the "most similar" type, and then post-process it).
> In my example, "S" is a basic-type which is a "A 0-terminated 20-byte
> string", and expressing it in the struct format with the single
> letter "S" is more meaningful in my code than using "20s" and then
> post-processing the resulting string each and every time this happens.
Another compelling example is the SSH protocol:
Go to section 4, "Data Type Representations Used in the SSH Protocols", and it
describes the data types used by the SSH protocol. In a perfect world, I would
write some custom packers/unpackers for those types which struct does not
handle already (like the "mpint" format), so that I could use struct to parse
and compose SSH messages. What I ended up doing was writing a new module
sshstruct.py from scratch, which duplicates struct's work, just because I
couldn't extend struct. Some examples:
client.py: cookie, server_algorithms, guess, reserverd =
client.py: prompts = sshstruct.unpack("sssu" + "sB"*num_prompts,
connection.py: pkt = sshstruct.pack("busB", SSH_MSG_CHANNEL_REQUEST,
self.recipient_number, type, reply) + custom
kex.py: self.P, self.G = sshstruct.unpack("mm",pkt[1:])
Notice for instance how "s" is a SSH string and unpacks directly to a Python
string, and "m" is a SSH mpint (infinite precision integer) but unpacks
directly into a Python long. Using struct.unpack() this would have been
impossible and would have required much post-processing.
Actually, another thing that struct should support to cover the SSH protocol
(and many other binary protocols) is the ability to parse strings whose size is
not known at import-time (variable-length data types). For instance, type
"string" in the SSH protocol is a string prepended with its size as uint32. So
it's actual size depends on each instance. For this reason, my sshstruct did
not have the equivalent of struct.calcsize(). I guess that if there's a way to
extend struct, it would comprehend variable-size data types (and calcsize()
would return -1 or raise an exception).
More information about the Python-list