Large data arrays?
nick at craig-wood.com
Thu Apr 23 13:30:04 CEST 2009
Ole Streicher <ole-usenet-spam at gmx.net> wrote:
> for my application, I need to use quite large data arrays
> (100.000 x 4000 values) with floating point numbers where I need a fast
> row-wise and column-wise access (main case: return a column with the sum
> over a number of selected rows, and vice versa).
> I would use the numpy array for that, but they seem to be
> memory-resistent. So, one of these arrays would use about 1.6 GB
> memory which far too much. So I was thinking about a memory mapped
> file for that. As far as I understand, there is one in numpy.
> For this, I have two questions:
> 1. Are the "numpy.memmap" array unlimited in size (resp. only limited
> by the maximal file size)? And are they part of the system's memory
> limit (~3GB for 32bit systems)?
mmaps come out of your applications memory space, so out of that 3 GB
limit. You don't need that much RAM of course but it does use up
> 2. Since I need row-wise as well as column-wise access, a simple usage
> of a big array as memory mapped file will probably lead to a very poor
> performance, since one of them would need to read values splattered
> around the whole file. Are there any "plug and play" solutions for
> that? If not: what would be the best way to solve this problem?
> Probably, one needs to use someting like the "Morton layout" for the
> data. Would one then build a subclass of memmap (or ndarray?) that
> implements this specific layout? How would one do that? (Sorry, I am
> still a beginner with respect to python).
Sorry don't know very much about numpy, but it occurs to me that you
could have two copies of your mmapped array, one the transpose of the
other which would then speed up the two access patterns enormously.
You needn't mmap the two arrays (files) at the same time either.
Nick Craig-Wood <nick at craig-wood.com> -- http://www.craig-wood.com/nick
More information about the Python-list