[Tutor] newb - Problem reading binary files

Kent Johnson kent37 at tds.net
Sat May 12 13:56:21 CEST 2007


Elizabeth Finn wrote:
> I need to read a file that is in binary format, and then convert some of 
> the values into integer values. These values can range from 1 to 4 bytes.
>  
> First question – is there an easy way to do this? I finally wrote my own 
> little utility to handle multi-byte integers because I couldn’t find a 
> built-in way (except for ord() which works only for single bytes)

Alan suggested the struct module. It doesn't directly support 3-byte 
integers but perhaps you could pad them with a zero byte at the front. 
struct does let you unpack a whole string at once.

The Construct library is a higher-level interface to similar 
functionality. It doesn't seem to support 3-byte integers either but it 
is extensible.
>  
> def getnum(num_str):
>             """
>             Given a string representing a binary number, return the number.
>             If string is more than one byte, calculate the number to return.
>             Assume that each byte is signed magnitude
>             """                   
>             x = len(num_str)
>             ans = 0
>             for i in range( x ):
>                         nextstr = num_str[i:i+1]
>                         ans = ans * 256
>                         ans = ans + ord(nextstr)
>             return ans

Your loop could be written more simply as
for nextstr in num_str:
   ans = ans * 256
   ans = ans + ord(nextstr)

> This “brute force” method usually works, but - now here is the other 
> question -sometimes the code does not pick up two full bytes when it is 
> supposed to. I open the file and read a block that I want into a string:
>            
>                         f=open(fname, 'rb')
> f.seek(offset, 0)
> block = f.read(2000)
>  
> Then for each number I pull the bytes from the string, then call 
> getnum() to calculate the number.
>  
>             test = block[0:1]             # 1 byte
>             test = block[1:4]             # 3 bytes
>             test = block[4:6]             # 2 bytes
>             test = block[20:12]             # 2 bytes
>             test = block[1996:2000]       #4 bytes
>  
> This method usually works, except that for some of the 2-byte numbers I 
> get only the first byte and first half of the second byte – for 
> instance: 'x06\x33’ comes out as ‘x063’. This is very confusing 
> especially because one 2-byte substring – “00 01” comes out as expected, 
> but “06 52” becomes “065”. Any ideas?

It seems to work for me. I wonder if you are confused about the input 
you are giving it? Using your definition of getnum(), I get these results:
In [31]: getnum('\x06\x33')
Out[31]: 1587
In [33]: 6*256 + 0x33
Out[33]: 1587

In [34]: getnum('\x06\x52')
Out[34]: 1618
In [35]: 6*256 + 0x52
Out[35]: 1618

So it seems to be doing the right thing. Can you put
   print repr(test)
   print getnum(test)
into your test program and show us the results?

Kent


More information about the Tutor mailing list