[Tutor] newb - Problem reading binary files

Sat May 12 13:56:21 CEST 2007

Elizabeth Finn wrote:
> I need to read a file that is in binary format, and then convert some of 
> the values into integer values. These values can range from 1 to 4 bytes.
>  
> First question – is there an easy way to do this? I finally wrote my own 
> little utility to handle multi-byte integers because I couldn’t find a 
> built-in way (except for ord() which works only for single bytes)

Alan suggested the struct module. It doesn't directly support 3-byte 
integers but perhaps you could pad them with a zero byte at the front. 
struct does let you unpack a whole string at once.

The Construct library is a higher-level interface to similar 
functionality. It doesn't seem to support 3-byte integers either but it 
is extensible.
>  
> def getnum(num_str):
>             """
>             Given a string representing a binary number, return the number.
>             If string is more than one byte, calculate the number to return.
>             Assume that each byte is signed magnitude
>             """                   
>             x = len(num_str)
>             ans = 0
>             for i in range( x ):
>                         nextstr = num_str[i:i+1]
>                         ans = ans * 256
>                         ans = ans + ord(nextstr)
>             return ans

Your loop could be written more simply as
for nextstr in num_str:
   ans = ans * 256
   ans = ans + ord(nextstr)

> This “brute force” method usually works, but - now here is the other 
> question -sometimes the code does not pick up two full bytes when it is 
> supposed to. I open the file and read a block that I want into a string:
>            
>                         f=open(fname, 'rb')
> f.seek(offset, 0)
> block = f.read(2000)
>  
> Then for each number I pull the bytes from the string, then call 
> getnum() to calculate the number.
>  
>             test = block[0:1]             # 1 byte
>             test = block[1:4]             # 3 bytes
>             test = block[4:6]             # 2 bytes
>             test = block[20:12]             # 2 bytes
>             test = block[1996:2000]       #4 bytes
>  
> This method usually works, except that for some of the 2-byte numbers I 
> get only the first byte and first half of the second byte – for 
> instance: 'x06\x33’ comes out as ‘x063’. This is very confusing 
> especially because one 2-byte substring – “00 01” comes out as expected, 
> but “06 52” becomes “065”. Any ideas?

It seems to work for me. I wonder if you are confused about the input 
you are giving it? Using your definition of getnum(), I get these results:
In [31]: getnum('\x06\x33')
Out[31]: 1587
In [33]: 6*256 + 0x33
Out[33]: 1587

In [34]: getnum('\x06\x52')
Out[34]: 1618
In [35]: 6*256 + 0x52
Out[35]: 1618

So it seems to be doing the right thing. Can you put
   print repr(test)
   print getnum(test)
into your test program and show us the results?

Kent