[Tutor] reading binary files

eShopping etrade.griffiths at dsl.pipex.com
Wed Feb 4 07:58:32 CET 2009


Bob

I am trying to read UNFORMATTED files.  The files also occur as 
formatted files and the format string I provided is the string used 
to write the formatted version.  I can read the formatted version 
OK.  I (naively) assumed that the same format string was used for 
both files, the only differences being whether the FORTRAN WRITE 
statement indicated unformatted or formatted.

Best regards

Alun Griffiths

At 21:41 03/02/2009, bob gailer wrote:
>First question: are you trying to work with the file written 
>UNFORMATTED? If so read on.
>
>If you are working with a file formatted (1X, 1X, A8, 1X, 1X, I6, 
>1X, 1X, A1) then we have a completely different issue to deal with. 
>Do not read on, instead let us know.
>
>eShopping wrote:
>
>>>>Data format:
>>>>
>>>>TIME      1  F  0.0
>>>>DISTANCE 10  F  0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
>>>>
>>>>F=float, D=double, L=logical, S=string etc
>>>>
>>>>
>>>>The first part of the file should contain a string (eg "TIME"),
>>>>an integer (1) and another string (eg "F") so I tried using
>>>>
>>>>import struct
>>>>in_file = open(file_name+".dat","rb")
>>>>data = in_file.read()
>>>>items = struct.unpack('sds', data)
>>>>
>>>>Now I get the error
>>>>
>>>>error: unpack requires a string argument of length 17
>>>>
>>>>which has left me completely baffled!
>>>
>>>Did you open the file with mode 'b'? If not change that.
>>>
>>>You are passing the entire file to unpack when you should be 
>>>giving it only the first "line". That's why is is complaining 
>>>about the length. We need to figure out the lengths of the lines.
>>>
>>>Consider the first "line"
>>>
>>>TIME      1  F  0.0
>>>
>>>There were (I assume)  4 FORTRAN variables written here: character 
>>>integer character float. Without knowing the lengths of the 
>>>character variables we are at a loss as to what the struct format 
>>>should be. Do you know their lengths? Is the last float or double?
>>>
>>>Try this: print data[:40] You should see something like:
>>>
>>>TIME...\x01\x00\x00\x00...F...\x00\x00\x00\x00...DISTANCE...\n\x00\x00\x00
>>>
>>>where ... means 0 or more intervening stuff. It might be that the 
>>>\x01 and the \n are in other places, as we also have to deal with 
>>>"byte order" issues.
>>>
>>>Please do this and report back your results. And also the FORTRAN 
>>>variable types if you have access to them.
>>
>>Apologies if this is getting a bit messy but the files are at a 
>>remote location and I forgot to bring copies home.  I don't have 
>>access to the original FORTRAN program so I tried to emulate the 
>>reading the data using the Python script below.  AFAIK the FORTRAN 
>>format line for the header is  (1X, 1X, A8, 1X, 1X, I6, 1X, 1X, 
>>A1).  If the data following is a float it is written using n(1X, 
>>F6.2) where n is the number of records picked up from the preceding header.
>>
>># test program to read binary data
>>
>>import struct
>>
>># create dummy data
>>
>>data = []
>>for i in range(0,10):
>>     data.append(float(i))
>>
>># write data to binary file
>>
>>b_file = open("test.bin","wb")
>>
>>b_file.write("  %8s  %6d  %1s\n" % ("DISTANCE", len(data), "F"))
>>for x in data:
>>     b_file.write(" %6.2f" % x)
>
>You are still confusing text vs binary. The above writes text 
>regardless of the file mode. If the FORTRAN file was written 
>UNFORMATTED then you are NOT emulating that with the above program. 
>The character data is read back in just fine, since there is no 
>translation involved in the writing nor in the reading. The integer 
>len(data) is being written as its text (character) representation 
>(translating binary to text) but being read back in without 
>translation. Also all the floating point data is going out as text.
>
>The file looks like (where b = blank) (how it would look in notepad):
>
>bbDISTANCEbbbbbb10bFbbb0.00bbb1.00bbb2.00 If you analyze this with 2s8s2si2s1s
>you will see 2s matches bb, 8s matches DISTANCE, 2s matches bb, i 
>matches bbbb. (\x40\x40\x40\x40). The i tells unpack to shove those 
>4 bytes unaltered into a Python integer, resulting in 538976288. You 
>can verify that:
>
> >>> struct.unpack('i', '    ')
>(538976288,)
>
>Please either assure me you understand or are prepared for a more in 
>depth tutorial.
>>b_file.close()
>>
>># read back data from file
>>
>>c_file = open("test.bin","rb")
>>
>>data = c_file.read()
>>start, stop = 0, struct.calcsize("2s8s2si2s1s")
>>
>>items = struct.unpack("2s8s2si2s1s",data[start:stop])
>>print items
>>print data[:40]
>>
>>I'm pretty sure that when I tried this at the other PC there were a 
>>bunch of \x00\x00 characters in the file but they don't appear in 
>>NotePad  ... anyway, I thought the Python above would unpack the 
>>data but items appears as
>>
>>('  ', 'DISTANCE', '  ', 538976288, '10', ' ')
>>
>>which seems to be contain an extra item (538976288)
>>
>>Alun Griffiths
>
>
>--
>Bob Gailer
>Chapel Hill NC
>919-636-4239



More information about the Tutor mailing list