[Tutor] reading binary files
eShopping
etrade.griffiths at dsl.pipex.com
Wed Feb 4 07:58:32 CET 2009
Bob
I am trying to read UNFORMATTED files. The files also occur as
formatted files and the format string I provided is the string used
to write the formatted version. I can read the formatted version
OK. I (naively) assumed that the same format string was used for
both files, the only differences being whether the FORTRAN WRITE
statement indicated unformatted or formatted.
Best regards
Alun Griffiths
At 21:41 03/02/2009, bob gailer wrote:
>First question: are you trying to work with the file written
>UNFORMATTED? If so read on.
>
>If you are working with a file formatted (1X, 1X, A8, 1X, 1X, I6,
>1X, 1X, A1) then we have a completely different issue to deal with.
>Do not read on, instead let us know.
>
>eShopping wrote:
>
>>>>Data format:
>>>>
>>>>TIME 1 F 0.0
>>>>DISTANCE 10 F 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0
>>>>
>>>>F=float, D=double, L=logical, S=string etc
>>>>
>>>>
>>>>The first part of the file should contain a string (eg "TIME"),
>>>>an integer (1) and another string (eg "F") so I tried using
>>>>
>>>>import struct
>>>>in_file = open(file_name+".dat","rb")
>>>>data = in_file.read()
>>>>items = struct.unpack('sds', data)
>>>>
>>>>Now I get the error
>>>>
>>>>error: unpack requires a string argument of length 17
>>>>
>>>>which has left me completely baffled!
>>>
>>>Did you open the file with mode 'b'? If not change that.
>>>
>>>You are passing the entire file to unpack when you should be
>>>giving it only the first "line". That's why is is complaining
>>>about the length. We need to figure out the lengths of the lines.
>>>
>>>Consider the first "line"
>>>
>>>TIME 1 F 0.0
>>>
>>>There were (I assume) 4 FORTRAN variables written here: character
>>>integer character float. Without knowing the lengths of the
>>>character variables we are at a loss as to what the struct format
>>>should be. Do you know their lengths? Is the last float or double?
>>>
>>>Try this: print data[:40] You should see something like:
>>>
>>>TIME...\x01\x00\x00\x00...F...\x00\x00\x00\x00...DISTANCE...\n\x00\x00\x00
>>>
>>>where ... means 0 or more intervening stuff. It might be that the
>>>\x01 and the \n are in other places, as we also have to deal with
>>>"byte order" issues.
>>>
>>>Please do this and report back your results. And also the FORTRAN
>>>variable types if you have access to them.
>>
>>Apologies if this is getting a bit messy but the files are at a
>>remote location and I forgot to bring copies home. I don't have
>>access to the original FORTRAN program so I tried to emulate the
>>reading the data using the Python script below. AFAIK the FORTRAN
>>format line for the header is (1X, 1X, A8, 1X, 1X, I6, 1X, 1X,
>>A1). If the data following is a float it is written using n(1X,
>>F6.2) where n is the number of records picked up from the preceding header.
>>
>># test program to read binary data
>>
>>import struct
>>
>># create dummy data
>>
>>data = []
>>for i in range(0,10):
>> data.append(float(i))
>>
>># write data to binary file
>>
>>b_file = open("test.bin","wb")
>>
>>b_file.write(" %8s %6d %1s\n" % ("DISTANCE", len(data), "F"))
>>for x in data:
>> b_file.write(" %6.2f" % x)
>
>You are still confusing text vs binary. The above writes text
>regardless of the file mode. If the FORTRAN file was written
>UNFORMATTED then you are NOT emulating that with the above program.
>The character data is read back in just fine, since there is no
>translation involved in the writing nor in the reading. The integer
>len(data) is being written as its text (character) representation
>(translating binary to text) but being read back in without
>translation. Also all the floating point data is going out as text.
>
>The file looks like (where b = blank) (how it would look in notepad):
>
>bbDISTANCEbbbbbb10bFbbb0.00bbb1.00bbb2.00 If you analyze this with 2s8s2si2s1s
>you will see 2s matches bb, 8s matches DISTANCE, 2s matches bb, i
>matches bbbb. (\x40\x40\x40\x40). The i tells unpack to shove those
>4 bytes unaltered into a Python integer, resulting in 538976288. You
>can verify that:
>
> >>> struct.unpack('i', ' ')
>(538976288,)
>
>Please either assure me you understand or are prepared for a more in
>depth tutorial.
>>b_file.close()
>>
>># read back data from file
>>
>>c_file = open("test.bin","rb")
>>
>>data = c_file.read()
>>start, stop = 0, struct.calcsize("2s8s2si2s1s")
>>
>>items = struct.unpack("2s8s2si2s1s",data[start:stop])
>>print items
>>print data[:40]
>>
>>I'm pretty sure that when I tried this at the other PC there were a
>>bunch of \x00\x00 characters in the file but they don't appear in
>>NotePad ... anyway, I thought the Python above would unpack the
>>data but items appears as
>>
>>(' ', 'DISTANCE', ' ', 538976288, '10', ' ')
>>
>>which seems to be contain an extra item (538976288)
>>
>>Alun Griffiths
>
>
>--
>Bob Gailer
>Chapel Hill NC
>919-636-4239
More information about the Tutor
mailing list