[Numpy-discussion] Adding `offset` argument to np.lib.format.open_memmap and np.load
Sturla Molden
sturla at molden.no
Tue Mar 1 20:10:55 EST 2011
Den 01.03.2011 14:20, skrev Jon Olav Vik:
> Use case: Generate "large" output for "many" parameter scenarios.
> 1. Preallocate "enormous" output file on disk.
>
That's not a usecase, because this will in general require 64-bit, for
which the offset parameter does not matter.
> Maybe that is impossible with 32-bit Python: at least I cannot allocate that
> big a file on my laptop.
32-bit Windows will give you 2 GB virtual memory available in user
space. The reminding 2 GB is reserved for device drivers etc. I don't
know about Linux, but it is approximately the same. Note that I am not
talking about physical memory but virtual address space. I am not
talking about RAM.
When you memory map a file, you use up some of this virtual address
space. That is the key.
Because the 32-bit address space is so small by today's standard, we
often cannot afford to memory map large portions of it. That is where
the "offset" helps. Instead of memory mapping the whole file, we just
work with a small window of it. But unlike a NumPy subarray view, this
slice is in the kernel of the operating system.
On 64-bit we have so much virtual memory that it does not matter. How
much is system dependent. On recent AMD64 processors it is 256 TB, but I
think Windows 64 "only" gives us 16 of those. Even so, this is still
approximately 25 times the size of the hard disk on my computer. That
is, with 64-bit Python I can memory map everything on my computer, and
it would hardly be noticed in the virtual address space. That is why an
offset is not needed.
A typical usecase for "offset" is a 32-bit database server memory
mapping a small vindow of a huge database. On 64-bit the offset could be
ignored, and the whole database mapped to memory -- one of the reasons
64-bit database servers perform better.
Sturla
More information about the NumPy-Discussion
mailing list