Fast reading and unpacking of binary data (struct module)

Wed Jul 22 06:34:16 EDT 2009

Daniel Platz <mail.to.daniel.platz <at> googlemail.com> writes:

> 
> Hi,
> 
> I have a Python newbie question about reading data from a binary file.
> I have an huge binary file from an external program.

Please translate "huge" into Mb. Also, how much free real memory do you have?

> I want to read
> and process the data in this file in a reasonable time. It turns out
> that the reading of the data itself and the processing do not need
> most of the time. However, when using the read(bytes) method Python
> returns a string representing the binary information in hex.

"In hex" is not part of the string's nature. If you look at it through
hexadecimal/decimal/octal/binary spectacles, you'll see respectively
hex/decimal/octal/binary.

> This
> string I have to "cast/translate" into a number (in my case a signed
> short). For this I am using the method struct.unpack from the struct
> module. This unpacking part of the program takes by far the most time.
> Is there a way to speed this up or to do it the unpacking more
> cleverly than with the struct module?

Are you reading the file (a) two bytes at a time (b) several Kb at a time (c)
whole file?

Are you unpacking it ... [same options as above but the two answers may differ]?

What are you planning to do with this large list/array? The array option that
Gabriel mentioned should be fastest to load, and take up least memory, but
working with the elements will require making int objects out of the shorts (and
the reverse if you are updating the array).

Is numpy (http://numpy.scipy.org/) a possibility?

Cheers,

John