[Numpy-discussion] Adding `offset` argument to np.lib.format.open_memmap and np.load

Tue Mar 1 20:10:55 EST 2011

Den 01.03.2011 14:20, skrev Jon Olav Vik:
> Use case: Generate "large" output for "many" parameter scenarios.
> 1. Preallocate "enormous" output file on disk.
>

That's not a usecase, because this will in general require 64-bit, for 
which the offset parameter does not matter.

> Maybe that is impossible with 32-bit Python: at least I cannot allocate that
> big a file on my laptop.

32-bit Windows will give you 2 GB virtual memory available in user 
space. The reminding 2 GB is reserved for device drivers etc. I don't 
know about Linux, but it is approximately the same. Note that I am not 
talking about physical memory but virtual address space. I am not 
talking about RAM.

When you memory map a file, you use up some of this virtual address 
space. That is the key.

Because the 32-bit address space is so small by today's standard,  we 
often cannot afford to memory map large portions of it. That is where 
the "offset" helps. Instead of memory mapping the whole file, we just 
work with a small window of it. But unlike a NumPy subarray view, this 
slice is in the kernel of the operating system.

On 64-bit we have so much virtual memory that it does not matter. How 
much is system dependent. On recent AMD64 processors it is 256 TB, but I 
think Windows 64 "only" gives us 16 of those. Even so, this is still 
approximately 25 times the size of the hard disk on my computer. That 
is, with 64-bit Python I can memory map everything on my computer, and 
it would hardly be noticed in the virtual address space. That is why an 
offset is not needed.

A typical usecase for "offset" is a 32-bit database server memory 
mapping a small vindow of a huge database. On 64-bit the offset could be 
ignored, and the whole database mapped to memory -- one of the reasons 
64-bit database servers perform better.

Sturla