Large data arrays?

Thu Apr 23 15:30:04 EDT 2009

Ole Streicher <ole-usenet-spam at gmx.net> wrote:
>  Hi Nick,
> 
>  Nick Craig-Wood <nick at craig-wood.com> writes:
> > mmaps come out of your applications memory space, so out of that 3 GB
> > limit.  You don't need that much RAM of course but it does use up
> > address space.
> 
>  Hmm. So I have no chance to use >= 2 of these arrays simultaniously?
> 
> > Sorry don't know very much about numpy, but it occurs to me that you
> > could have two copies of your mmapped array, one the transpose of the
> > other which would then speed up the two access patterns enormously.
> 
>  That would be a solution, but it takes twice the amount of address
>  space (which seems already to be the limiting factor). In my case (1.6
>  GB per array), I could even not use one array. 

You don't need them mapped at the same time so you could get away with
just one copy mapped.

Also you can map the array in parts and use dramatically less address
space.

>  Also, I would need to fill two large files at program start: one for
>  each orientation (row-wise or column-wise). Depending on the input
>  data (which are also either row-wise or column-wise), the filling of
>  the array with opposite direction would take a lot of time because of
>  the inefficiencies.
> 
>  For that, using both directions probably would be not a good
>  solution. What I found is the "Morton layout" which uses a kind of
>  fractal interleaving and sound not that complicated.

It sounds cool!

>  But I have no idea on how to turn it into a "numpy" style: can I
>  just extend from numpy.ndarray (or numpy.memmap), and which
>  functions/methods then need to be overwritten? The best would be
>  ofcourse that someone already did this before that I could use
>  without trapping in all these pitfalls which occur when one
>  implements a very generic algorithm.

I'd start by writing a function which took (x, y) in array
co-ordinates and transformed that into (z) remapped in the Morton
layout.

Then instead of accessing array[x][y] you access
morton_array[f(x,y)].  That doesn't require any subclassing and is
relatively easy to implement. I'd try that and see if it works first!

Alternatively you could install a 64bit OS on your machine and use my
scheme!

-- 
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick