struct: type registration?

Thu Jun 1 18:51:16 EDT 2006

On 2/06/2006 4:18 AM, Serge Orlov wrote:
> Giovanni Bajo wrote:
>> John Machin wrote:
>>> I am an idiot, so please be gentle with me: I don't understand why you
>>> are using struct.pack at all:
>> Because I want to be able to parse largest chunks of binary datas with custom
>> formatting. Did you miss the whole point of my message:
>>
>> struct.unpack("3liiSiiShh", data)
> 
> Did you want to write struct.unpack("Sheesh", data) ? Seriously, the
> main problem of struct is that it uses ad-hoc abbreviations for
> relatively rarely[1] used functions calls and that makes it hard to
> read.

Indeed. The first time I saw something like struct.pack('20H', ...) I 
thought it was a FORTRAN format statement :-)

> 
> If you want to parse binary data use pyconstruct
> <http://pyconstruct.wikispaces.com/>
> 

Looks promising on the legibility and functionality fronts. Can you make 
any comment on the speed? Reason for asking is that Microsoft Excel 
files have this weird "RK" format for expressing common float values in 
32 bits (refer http://sc.openoffice.org, see under "Documentation" 
heading). I wrote and support the xlrd module (see 
http://cheeseshop.python.org/pypi/xlrd) for reading those files in 
portable pure Python. Below is a function that would plug straight in as 
an example of Giovanni's custom unpacker functions. Some of the files 
can be very large, and reading rather slow.

Cheers,
John

from struct import unpack

def unpack_RK(rk_str): # arg is 4 bytes
     flags = ord(rk_str[0])
     if flags & 2:
         # There's a SIGNED 30-bit integer in there!
         i, = unpack('<i', rk_str)
         i >>= 2 # div by 4 to drop the 2 flag bits
         if flags & 1:
             return i / 100.0
         return float(i)
     else:
         # It's the most significant 30 bits
         # of an IEEE 754 64-bit FP number
         d, = unpack('<d', '\0\0\0\0' + chr(flags & 252) + rk_str[1:4])
         if flags & 1:
             return d / 100.0
         return d