# [Tutor] newb - Problem reading binary files

Kent Johnson kent37 at tds.net
Sat May 12 13:56:21 CEST 2007

```Elizabeth Finn wrote:
> I need to read a file that is in binary format, and then convert some of
> the values into integer values. These values can range from 1 to 4 bytes.
>
> First question – is there an easy way to do this? I finally wrote my own
> little utility to handle multi-byte integers because I couldn’t find a
> built-in way (except for ord() which works only for single bytes)

Alan suggested the struct module. It doesn't directly support 3-byte
integers but perhaps you could pad them with a zero byte at the front.
struct does let you unpack a whole string at once.

The Construct library is a higher-level interface to similar
functionality. It doesn't seem to support 3-byte integers either but it
is extensible.
>
> def getnum(num_str):
>             """
>             Given a string representing a binary number, return the number.
>             If string is more than one byte, calculate the number to return.
>             Assume that each byte is signed magnitude
>             """
>             x = len(num_str)
>             ans = 0
>             for i in range( x ):
>                         nextstr = num_str[i:i+1]
>                         ans = ans * 256
>                         ans = ans + ord(nextstr)
>             return ans

Your loop could be written more simply as
for nextstr in num_str:
ans = ans * 256
ans = ans + ord(nextstr)

> This “brute force” method usually works, but - now here is the other
> question -sometimes the code does not pick up two full bytes when it is
> supposed to. I open the file and read a block that I want into a string:
>
>                         f=open(fname, 'rb')
> f.seek(offset, 0)
>
> Then for each number I pull the bytes from the string, then call
> getnum() to calculate the number.
>
>             test = block[0:1]             # 1 byte
>             test = block[1:4]             # 3 bytes
>             test = block[4:6]             # 2 bytes
>             test = block[20:12]             # 2 bytes
>             test = block[1996:2000]       #4 bytes
>
> This method usually works, except that for some of the 2-byte numbers I
> get only the first byte and first half of the second byte – for
> instance: 'x06\x33’ comes out as ‘x063’. This is very confusing
> especially because one 2-byte substring – “00 01” comes out as expected,
> but “06 52” becomes “065”. Any ideas?

It seems to work for me. I wonder if you are confused about the input
you are giving it? Using your definition of getnum(), I get these results:
In [31]: getnum('\x06\x33')
Out[31]: 1587
In [33]: 6*256 + 0x33
Out[33]: 1587

In [34]: getnum('\x06\x52')
Out[34]: 1618
In [35]: 6*256 + 0x52
Out[35]: 1618

So it seems to be doing the right thing. Can you put
print repr(test)
print getnum(test)
into your test program and show us the results?

Kent
```