numpy.memmap advice?

sturlamolden sturlamolden at
Thu Feb 19 01:23:32 CET 2009

On 18 Feb, 00:08, Lionel < at> wrote:

> 1) What is "recarray"?

An ndarray of what C programmers know as a "struct", in which each
field is accessible by its name.

That is,

struct rgba{
  unsigned char r;
  unsigned char g;
  unsigned char b;
  unsigned char a;

struct rgba arr[480][640];

is similar to:

import numpy as np
rbga = np.dtype({'names':list('rgba'), 'formats':[np.uint8]*4})
arr = np.array((480,640), dtype=rgba)

Now you can access the r, g, b and a fields directly using arr['r'],
arr['g'], arr['b'], and arr['a'].
Internally the data will be represented compactly as with the C code
above. If you want to view the data as an 480 x 640 array of 32 bit
integers instead, it is as simple as arr.view(dtype=np.uint32).
Formatted binary data can of course be read from files using
np.fromfile with the specified dtype, and written to files by passing
a recarray as buffer to file.write. You can thus see NumPy's
recarray's as a more powerful alternative to Python's struct module.

> I don't really see in the diocumentation how portions are loaded, however.

Prior to Python 2.6, the mmap object (which numpy.memmap uses
internally) does not take an offset parameter. But when NumPy are
ported to newer version of Python this will be fixed. You should then
be able to memory map with an ndarray from a certain offset. To make
this work now, you must e.g. backport mmap from Python 2.6 and use
that with NumPy. Not difficult, but nobody has bothered to do it (as
far as I know).

Sturla Molden

More information about the Python-list mailing list