numpy.memmap advice?

Carl Banks pavlovevidence at
Thu Feb 19 03:13:25 CET 2009

On Feb 18, 4:23 pm, sturlamolden <sturlamol... at> wrote:
> On 18 Feb, 00:08, Lionel < at> wrote:
> > 1) What is "recarray"?
> An ndarray of what C programmers know as a "struct", in which each
> field is accessible by its name.
> That is,
> struct rgba{
>   unsigned char r;
>   unsigned char g;
>   unsigned char b;
>   unsigned char a;
> };
> struct rgba arr[480][640];
> is similar to:
> import numpy as np
> rbga = np.dtype({'names':list('rgba'), 'formats':[np.uint8]*4})
> arr = np.array((480,640), dtype=rgba)
> Now you can access the r, g, b and a fields directly using arr['r'],
> arr['g'], arr['b'], and arr['a'].
> Internally the data will be represented compactly as with the C code
> above. If you want to view the data as an 480 x 640 array of 32 bit
> integers instead, it is as simple as arr.view(dtype=np.uint32).
> Formatted binary data can of course be read from files using
> np.fromfile with the specified dtype, and written to files by passing
> a recarray as buffer to file.write. You can thus see NumPy's
> recarray's as a more powerful alternative to Python's struct module.
> > I don't really see in the diocumentation how portions are loaded, however.
> Prior to Python 2.6, the mmap object (which numpy.memmap uses
> internally) does not take an offset parameter. But when NumPy are
> ported to newer version of Python this will be fixed. You should then
> be able to memory map with an ndarray from a certain offset. To make
> this work now, you must e.g. backport mmap from Python 2.6 and use
> that with NumPy. Not difficult, but nobody has bothered to do it (as
> far as I know).

You can use an offset with numpy.memmap today; it'll mmap the whole
file, but start the array data at the given offset.

The offset parameter of mmap itself would be useful to map small
portions of gigabyte-sized files, and maybe numpy.memmap can take
advantage of that if the user passes an offset parameter.  One thing
you can't do with mmap's offset, but you can do with numpy.memmap, is
to set it to an arbitary value, since it has to be a multiple of some
large number (something like 1 MB, depending on the OS).

Carl Banks

More information about the Python-list mailing list