[Tutor] newb - Problem reading binary files
kent37 at tds.net
Sat May 12 13:56:21 CEST 2007
Elizabeth Finn wrote:
> I need to read a file that is in binary format, and then convert some of
> the values into integer values. These values can range from 1 to 4 bytes.
> First question – is there an easy way to do this? I finally wrote my own
> little utility to handle multi-byte integers because I couldn’t find a
> built-in way (except for ord() which works only for single bytes)
Alan suggested the struct module. It doesn't directly support 3-byte
integers but perhaps you could pad them with a zero byte at the front.
struct does let you unpack a whole string at once.
The Construct library is a higher-level interface to similar
functionality. It doesn't seem to support 3-byte integers either but it
> def getnum(num_str):
> Given a string representing a binary number, return the number.
> If string is more than one byte, calculate the number to return.
> Assume that each byte is signed magnitude
> x = len(num_str)
> ans = 0
> for i in range( x ):
> nextstr = num_str[i:i+1]
> ans = ans * 256
> ans = ans + ord(nextstr)
> return ans
Your loop could be written more simply as
for nextstr in num_str:
ans = ans * 256
ans = ans + ord(nextstr)
> This “brute force” method usually works, but - now here is the other
> question -sometimes the code does not pick up two full bytes when it is
> supposed to. I open the file and read a block that I want into a string:
> f=open(fname, 'rb')
> f.seek(offset, 0)
> block = f.read(2000)
> Then for each number I pull the bytes from the string, then call
> getnum() to calculate the number.
> test = block[0:1] # 1 byte
> test = block[1:4] # 3 bytes
> test = block[4:6] # 2 bytes
> test = block[20:12] # 2 bytes
> test = block[1996:2000] #4 bytes
> This method usually works, except that for some of the 2-byte numbers I
> get only the first byte and first half of the second byte – for
> instance: 'x06\x33’ comes out as ‘x063’. This is very confusing
> especially because one 2-byte substring – “00 01” comes out as expected,
> but “06 52” becomes “065”. Any ideas?
It seems to work for me. I wonder if you are confused about the input
you are giving it? Using your definition of getnum(), I get these results:
In : getnum('\x06\x33')
In : 6*256 + 0x33
In : getnum('\x06\x52')
In : 6*256 + 0x52
So it seems to be doing the right thing. Can you put
into your test program and show us the results?
More information about the Tutor