Memory mapped files in scipy core
![](https://secure.gravatar.com/avatar/4d021a1d1319f36ad861ebef0eb5ba44.jpg?s=120&d=mm&r=g)
I would appreciate understanding typically use cases for memory-mapped files. I am not sure I understand why certain choices were made for numarray's memmap and memmap slice classes. They seem like a lot of "extra" stuff and I'm not sure what all that stuff is for. Rather than just copy these over, I would like to understand what people typically want to do with memory-mapped files to see if scipy core doesn't already provide that. For example, write now I can open a file, use mmap to obtain a memory map object and then pass that object into frombuffer in scipy_core to get an ndarray whose memory maps a file on disk. Now, this ndarray can be sliced and indexed and manipulated all the while referring to the file on disk (well technically, I suppose, the memory-mapped object would need to be flushed to synchronize). Now, I could see wanting to make the process of opening the file, getting the mmap object and setting it's buffer to the array object a little easier. Thus, a simple memmap class would be a useful construct -- I could even see it inheriting from the ndarray directly and adding a few methods. I guess I just don't see why one would care about a memory-mapped slice object, when the mmaparray sub-class would be perfectly useful. On a related, but orthogonal note: My understanding is that using memory-mapped files for *very* large files will require modification to the mmap module in Python --- something I think we should push. One part of that process would be to add the C-struct array interface to the mmap module and the buffer object -- perhaps this is how we get the array interface into Python quickly. Then, if we could make a base-type mmap that did not use the buffer interface or the sequence interface (similar to the bigndarray in scipy_core) and therefore by-passed the problems with Python in those areas, then the current mmap object could inherit from the base class and provide current functionality while still exposing the array interface for access to >2GB files on 64-bit systems. Who would like to take up the ball for modifying mmap in Python in this fashion? -Travis
![](https://secure.gravatar.com/avatar/faf9400121dca9940496a7473b1d8179.jpg?s=120&d=mm&r=g)
Travis Oliphant wrote:
There are a few extra capabilities which are supported in numarray's memmap: 1. slice insertion 2. slice deletion 3. memmap based array resizing Each of these things implicitly changes the layout of the underlying file. Whether or not these features get used or justify the complexity is another matter. Because of 32-bit address space exhaustion and swap file issues, memory mapping was a disappointment at STSCI so I'm doubtful we used these features ourselves. Regards, Todd
![](https://secure.gravatar.com/avatar/faf9400121dca9940496a7473b1d8179.jpg?s=120&d=mm&r=g)
Travis Oliphant wrote:
There are a few extra capabilities which are supported in numarray's memmap: 1. slice insertion 2. slice deletion 3. memmap based array resizing Each of these things implicitly changes the layout of the underlying file. Whether or not these features get used or justify the complexity is another matter. Because of 32-bit address space exhaustion and swap file issues, memory mapping was a disappointment at STSCI so I'm doubtful we used these features ourselves. Regards, Todd
participants (3)
-
Todd Miller
-
Travis Oliphant
-
Travis Oliphant