![](https://secure.gravatar.com/avatar/5d370232b4ed32caac8bba5672893bfd.jpg?s=120&d=mm&r=g)
Robert Kern <robert.kern <at> gmail.com> writes:
On Mon, Feb 28, 2011 at 18:50, Sturla Molden <sturla <at> molden.no> wrote:
Den 01.03.2011 01:15, skrev Robert Kern:
You can have each of those processes memory-map the whole file and just operate on their own slices. Your operating system's virtual memory manager should handle all of the details for you.
Wow, I didn't know that. So as long as the ranges touched by each process do not overlap, I'll be safe? If I modify only a few discontiguous chunks in a range, will the virtual memory manager decide whether it is most efficient to write just the chunks or the entire range back to disk?
Mapping large files from the start will not always work on 32-bit systems. That is why mmap.mmap take an offset argument now (Python 2.7 and 3.1.)
Making a view np.memmap with slices is useful on 64-bit but not 32-bit systems.
I'm talking about the OP's stated use case where he generates the file via memory-mapping the whole thing on the same machine. The whole file does fit into the address space in his use case.
I'd like to see a real use case where this does not hold. I suspect that this is not the API we would want for such use cases.
Use case: Generate "large" output for "many" parameter scenarios. 1. Preallocate "enormous" output file on disk. 2. Each process fills in part of the output. 3. Analyze, aggregate results, perhaps save to HDF or database, in a sliding- window fashion using a memory-mapped array. The aggregated results fit in memory, even though the raw output doesn't. My real work has been done on a 64-bit cluster running 64-bit Python, but I'd like to have the option of post-processing on my laptop's 32-bit Python (either spending a few hours copying the file to my laptop first, or mounting the remote disk using e.g. ExpanDrive). Maybe that is impossible with 32-bit Python: at least I cannot allocate that big a file on my laptop.
m = np.lib.format.open_memmap("c:/temp/temp.npy", "w+", dtype=np.int8, shape=2**33)
Traceback (most recent call last): File "<ipython console>", line 1, in <module> File "C:\Python26\lib\site-packages\numpy\lib\format.py", line 563, in open_memmap mode=mode, offset=offset) File "C:\Python26\lib\site-packages\numpy\core\memmap.py", line 221, in __new__ mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start) OverflowError: cannot fit 'long' into an index-sized integer