Read header and data from a binary file [LONG]

Ishwor ishwor.gurung at
Wed Sep 23 16:10:29 CEST 2009


Note: I've worked with struct but a while ago so might be rusty a bit.
Also, this sounds a bit like a homework. If it is a homework please do
it yourself(or at least try) as you'd otherwise never know the
knowledge behind it on real-world scenario :-)

Having said that I am giving you below an example on top of my reply.

> import struct
> name = "myaudio.dat"
> f = open(name,'rb')
> line is not explicitly needed afaik but you can if you want to.

> chain = "< 4s 4s I 4s I 20s I I i 4s I 67s s 4s I"
> s =*1+4*1+4*1+4*1+4*1+20*
> 1+4*1+4*1+4*1+4*1+4*1+67*1+1+4*1+4*1)

which is 136 bytes.

> a = struct.unpack(chain, s)

Yep. little-endian ordering pack 136 bytes of `s' in `a' according to chain.

> header = {'identifier'     : a[0],
>           'cid'              : a[1],
>           'clength'       : a[2],
>                   'hident'         : a[3],
>                   'hcid32'         : a[4],
>                   'hdate'          : a[5],
>                   'sampling'     : a[6],
>                   'length_B'      : a[7],
>                   'max_cA'       : a[8],
>                   'max_cA1'     : a[9],
>                   'identNOTE'  : a[10],
>                   'c2len'          : a[11],}
> It produces:
> {'length_B': 150001, 'sampling': 50000, 'max_cA1': 'NOTE', 'hident': 'HEDR',
> 'c2len': "Normal Sustained Vowel 'A', Voice and Speech Lab., MEEI, Boston,
> MA", 'hdate': 'Jul 13 11:57:41 1994', 'identNOTE': 68, 'max_cA': -44076,
> 'cid': 'DS16', 'hcid32': 32, 'identifier': 'FORM', 'clength': 300126}
> So far when I run f.tell()
> 136L

tell( ) gives you current position of the file descriptor (you read
136 bytes so tell( ) says that you read in 136 so far as the position
of the current file descriptor or position in the binary file).

> The audio data length is 300126, now I need a clue to build an array with
> the audio data (The Chunk SDA_), would it possible with struct?, any help ?

clength above is 300126. Maybe you can use that to get Data? :-)

SDA_'s format: does it mean it starts at offset 8 bytes-EOF?

If it starts at 8 bytes after the header then what is stored in
between the lengthOf(header)+8?

In anycase, as I understand, to get all the values from the offset
8(called `Data' as per your protocol spec), you can do:

reading_after_136_file_pos_to_eof =; #continue from 136L above.
clen_fs = '<%ds' % clength; # I assume here that is a character
x = struct.unpack(clen_fs, reading_after_136_file_pos_to_eof [8:]);
#start at index 8 onwards

Now, `x' will have stored unpacked value of the
reading_after_136_file_pos_to_eof starting from 8'th byte and wil only
store 300126 bytes of characters (1 byte each so 300136 bytes long)
i.e., starting from 8'th byte file descriptor position assuming each
char is 1 bytes long on Python (as per struct modules' definition)

[ ... ]

Ishwor Gurung

More information about the Python-list mailing list