[Tutor] Reading binary files #2
eShopping
etrade.griffiths at dsl.pipex.com
Mon Feb 9 20:40:43 CET 2009
Hi Bob
some replies below. One thing I noticed with the "full" file was
that I ran into problems when the number of records was 10500, and
the file read got misaligned. Presumably 10500 is still within the
range of int?
Best regards
Alun
At 17:49 09/02/2009, bob gailer wrote:
>etrade.griffiths at dsl.pipex.com wrote:
>>Hi
>>
>>following last week's discussion with Bob Gailer about reading
>>unformatted FORTRAN files, I have attached an example of the file
>>in ASCII format and the equivalent unformatted version.
>
>Thank you. It is good to have real data to work with.
>
>>Below is some code that works OK until it gets to a data item that
>>has no additional associated data, then seems to have got 4 bytes
>>ahead of itself.
>
>Thank you. It is good to have real code to work with.
>
>>I though I had trapped this but it appears not. I think the issue
>>is asociated with "newline" characters or the unformatted equivalent.
>>
>
>I think not, But we will see.
>
>I fail to see where the problem is. The data printed below seems to
>agree with the files you sent. What am I missing?
When I run the program it exits in the middle but should run through
to the end. The output to the console was
236 ('\x00\x00\x00\x10', 'DATABEGI', 0, 'MESS',
'\x00\x00\x00\x10\x00\x00\x00\x10')
264 ('TIME', ' \x00\x00\x00\x01', 1380270412, '\x00\x00\x00\x10',
'\x00\x00\x00\x04\x00\x00\x00\x00')
Here "TIME" is in vals[0] when it should be in vals[1] and so on. I
found the problem earlier today and I re-wrote the main loop as
follows (before I saw your helpful coding style comments):
while stop < nrec:
# extract data structure
start, stop = stop, stop + struct.calcsize('4s8si4s4s')
vals = struct.unpack('>4s8si4s4s', data[start:stop])
items.extend(vals[1:4])
print stop, vals
# define format of subsequent data
nval = int(vals[2])
if vals[3] == 'INTE':
fmt_string = '>i'
elif vals[3] == 'CHAR':
fmt_string = '>8s'
elif vals[3] == 'LOGI':
fmt_string = '>i'
elif vals[3] == 'REAL':
fmt_string = '>f'
elif vals[3] == 'DOUB':
fmt_string = '>d'
elif vals[3] == 'MESS':
fmt_string = '>%ds' % nval
else:
print "Unknown data type ... exiting"
print items[-40:]
sys.exit(0)
# leading spaces
if nval > 0:
start, stop = stop, stop + struct.calcsize('4s')
vals = struct.unpack('4s', data[start:stop])
# extract data
for i in range(0,nval):
start, stop = stop, stop + struct.calcsize(fmt_string)
vals = struct.unpack(fmt_string, data[start:stop])
items.extend(vals)
# trailing spaces
if nval > 0:
start, stop = stop, stop + struct.calcsize('4s')
vals = struct.unpack('4s', data[start:stop])
Now I get this output
232 ('\x00\x00\x00\x10', 'DATABEGI', 0, 'MESS', '\x00\x00\x00\x10')
256 ('\x00\x00\x00\x10', 'TIME ', 1, 'REAL', '\x00\x00\x00\x10')
and the script runs to the end
>FWIW a few observations re coding style and techniques.
>
>1) put the formats in a dictionary before the while loop:
>formats = {'INTE': '>i', 'CHAR': '>8s', 'LOGI': '>i', 'REAL': '>f',
>'DOUB': '>d', 'MESS': ''>d,}
>
>2) retrieve the format in the while loop from the dictionary:
>format = formats[vals[3]]
Neat!!
>3) condense the 3 infile lines:
>data = open("test.bin","rb").read()
I still don't quite trust myself to "chain" functions together, but I
guess that's lack of practice
>4) nrec is a misleading name (to me it means # of records), nbytes
>would be better.
Agreed
>5) Be consistent with the format between calcsize and unpack:
>struct.calcsize('>4s8si4s8s')
>
>6) Use meaningful variable names instead of val for the unpacked data:
>blank, name, length, typ = struct.unpack ... etc
Will do
>7) The format for MESS should be '>d' rather than '>%dd' % nval.
>When nval is 0 the for loop will make 0 cycles.
Wasn't sure about that one. "MESS" implies string but I wasn't sure
what to do about a zero-length string
>8) You don't have a format for DATA (BEGI); therefore the prior
>format (for CHAR) is being applied. The formats are the same so it
>does not matter but could be confusing later.
DATABEGI should be a keyword to indicate the start of the "proper"
data which has format MESS (ie string). You did make me look again
at the MESS format and it should be '>%ds' % nval and not '>%dd' % nval
More information about the Tutor
mailing list