Mailman 3 Not enough storage for memmap on 32 bit Win XP for accumulated file size above approx. 1 GB - NumPy-Discussion

newer
Installed C libraries and using...

Not enough storage for memmap on 32 bit Win XP for accumulated file size above approx. 1 GB

older
Why does assert_array_almost_equal...

Kim Hansen

23 Jul 2009 23 Jul '09

11:36 a.m.

OS. Win XP SP3, 32 bits Python: 2.5.4 Numpy: 1.3.0 I have am having some major problems converting a 750 MB recarray into a 850 MB recarray To save RAM I would like to use a read-only and a writeable memap for the two recarrays during the conversion. So I do something like: import os from stat import ST_SIZE import numpy as np ... records = os.stat(toconvert_path)[ST_SIZE] / toconvert_dtype.itemsize toconvert = np.memmap(toconvert_path, dtype=toconvert_dtype, mode="r").view(np.recarray) result = np.memmap(result_path, dtype = result_dtype, mode = "w+", shape=(records,)) The code manages to create the toconvert memmap (750 MB), but when trying to create the second memap object I get File "C:\Python25\Lib\site-packages\numpy\core\memmap.py", line 226, in __new__ mm = mmap.mmap(fid.fileno(), bytes, access=acc) WindowsError: [Error 8] Not enough storage is available to process this command By tracing before and after, I can see the file size is zero before calling mmap.mmap and has the expected 850 MB size after the WindowsError has been thrown somewhere inside mmap.mmap. There is 26 GB of free disc space, so the error message seems wrong? If I comment out the creation of the first memmap, I can successfully create the result memmap, so the error seems to be related to the accumulated size of all mmap.mmaps generated. I have other cases with somewhat smaller files to convert where the transition is OK. It seem like I begin to get these problems when the accumulated size of memmaps exceeds 1GB I am surprised by this, as http://docs.scipy.org/doc/numpy/reference/arrays.classes.html#memory-mapped-... mentions there are upper bounds to the size when using versions of python before 2.5.

...

From this I had the impression that there was no size limit as long as you were using ver. >=2.5 (as I am)

Is it due to the 32 bit OS I am using? Is there anything I can do to resolve the problem? Best wishes, Kim

Show replies by date

Charles R Harris

23 Jul 23 Jul

1:16 p.m.

New subject: Not enough storage for memmap on 32 bit Win XP for accumulated file size above approx. 1 GB

On Thu, Jul 23, 2009 at 5:36 AM, Kim Hansen wrote:

...

OS. Win XP SP3, 32 bits Python: 2.5.4 Numpy: 1.3.0

I have am having some major problems converting a 750 MB recarray into a 850 MB recarray

To save RAM I would like to use a read-only and a writeable memap for the two recarrays during the conversion.

So I do something like:

import os from stat import ST_SIZE import numpy as np ... records = os.stat(toconvert_path)[ST_SIZE] / toconvert_dtype.itemsize toconvert = np.memmap(toconvert_path, dtype=toconvert_dtype, mode="r").view(np.recarray) result = np.memmap(result_path, dtype = result_dtype, mode = "w+", shape=(records,))

The code manages to create the toconvert memmap (750 MB), but when trying to create the second memap object I get File "C:\Python25\Lib\site-packages\numpy\core\memmap.py", line 226, in __new__ mm = mmap.mmap(fid.fileno(), bytes, access=acc) WindowsError: [Error 8] Not enough storage is available to process this command

By tracing before and after, I can see the file size is zero before calling mmap.mmap and has the expected 850 MB size after the WindowsError has been thrown somewhere inside mmap.mmap. There is 26 GB of free disc space, so the error message seems wrong?

If I comment out the creation of the first memmap, I can successfully create the result memmap, so the error seems to be related to the accumulated size of all mmap.mmaps generated. I have other cases with somewhat smaller files to convert where the transition is OK. It seem like I begin to get these problems when the accumulated size of memmaps exceeds 1GB

I am surprised by this, as

http://docs.scipy.org/doc/numpy/reference/arrays.classes.html#memory-mapped-...

mentions there are upper bounds to the size when using versions of python before 2.5.

...
From this I had the impression that there was no size limit as long as you were using ver. >=2.5 (as I am)

Is it due to the 32 bit OS I am using?

It could be. IIRC, 32 bit windows gives user programs 2 GB of addressable memory, so your files need to fit in that space even if the data is on disk. You aren't using that much memory but you are close and it could be that other programs make up the difference. Maybe you can monitor the memory to get a better idea of the usage. Chuck

Kim Hansen

1:48 p.m.

New subject: Not enough storage for memmap on 32 bit Win XP for accumulated file size above approx. 1 GB

2009/7/23 Charles R Harris :

...

...
Is it due to the 32 bit OS I am using?

It could be. IIRC, 32 bit windows gives user programs 2 GB of addressable memory, so your files need to fit in that space even if the data is on disk. You aren't using that much memory but you are close and it could be that other programs make up the difference. Maybe you can monitor the memory to get a better idea of the usage.

Chuck

Hi Chuck, If I use the Windows task manager to see how much memory is used by the Python application when running the memmap test it says Before loading first memmap: 8.588 MB After loading first memmap: 8.596 MB i.e. only an additional 8 kB for having the 750 MB recarray available Maybe I am measuring memory usage wrong? Kim Kim

Charles R Harris

2:28 p.m.

New subject: Not enough storage for memmap on 32 bit Win XP for accumulated file size above approx. 1 GB

On Thu, Jul 23, 2009 at 7:48 AM, Kim Hansen wrote:

...

2009/7/23 Charles R Harris :

...
...
Is it due to the 32 bit OS I am using?

It could be. IIRC, 32 bit windows gives user programs 2 GB of addressable memory, so your files need to fit in that space even if the data is on

disk.

...
You aren't using that much memory but you are close and it could be that other programs make up the difference. Maybe you can monitor the memory to get a better idea of the usage.

Chuck Hi Chuck,

If I use the Windows task manager to see how much memory is used by the Python application when running the memmap test it says

Before loading first memmap: 8.588 MB After loading first memmap: 8.596 MB

i.e. only an additional 8 kB for having the 750 MB recarray available

Maybe I am measuring memory usage wrong?

Hmm, I don't know what you should be looking at in XP. Memmapped files are sort of like virtual memory and exist in the address space even if they aren't in physical memory. When you address an element that isn't in physical memory there is a page fault and the OS reads in the needed page from disk. If you read through the file physical memory will probably fill up because the OS will keep try to keep as many pages in physical memory as possible in case they are referenced again. But I am not sure how windows does it's memory accounting or how it is displayed, someone here more familiar with windows may be able to tell you what to look for. Or you could try running on a 64 bit system if there is one available. Chuck

Kim Hansen

24 Jul 24 Jul

7:45 a.m.

New subject: Not enough storage for memmap on 32 bit Win XP for accumulated file size above approx. 1 GB

2009/7/23 Charles R Harris :

...

...
Maybe I am measuring memory usage wrong?

Hmm, I don't know what you should be looking at in XP. Memmapped files are sort of like virtual memory and exist in the address space even if they aren't in physical memory. When you address an element that isn't in physical memory there is a page fault and the OS reads in the needed page from disk. If you read through the file physical memory will probably fill up because the OS will keep try to keep as many pages in physical memory as possible in case they are referenced again. But I am not sure how windows does it's memory accounting or how it is displayed, someone here more familiar with windows may be able to tell you what to look for. Or you could try running on a 64 bit system if there is one available.

Chuck

Yes, it is indeed my general experience with memmaps, that as you start to access them, there are bursts of high memory usage, and somehow as you indicate, there must be happening some allocation of addresses which then hit a high wall. I tried to write a small test scripts, which gradually created more an more Python mmap.mmaps (here in chunks of 100 MB, but size per mmap does not matter): import itertools import mmap import os files = [] mmaps = [] file_names= [] mmap_cap=0 bytes_per_mmap = 100 * 1024 ** 2 try: for i in itertools.count(1): file_name = "d:/%d.tst" % i file_names.append(file_name) f = open(file_name, "w+b") files.append(f) mm = mmap.mmap(f.fileno(), bytes_per_mmap) mmaps.append(mm) mmap_cap += bytes_per_mmap print "Created %d writeable mmaps containing %d MB" % (i, mmap_cap/(1024**2)) #Clean up finally: print "Removing mmaps..." for mm, f, file_name in zip(mmaps, files, file_names): mm.close() f.close() os.remove(file_name) print "Done..." Here is the output: Created 1 writeable mmaps containing 100 MB Created 2 writeable mmaps containing 200 MB Created 3 writeable mmaps containing 300 MB Created 4 writeable mmaps containing 400 MB Created 5 writeable mmaps containing 500 MB Created 6 writeable mmaps containing 600 MB Created 7 writeable mmaps containing 700 MB Created 8 writeable mmaps containing 800 MB Created 9 writeable mmaps containing 900 MB Created 10 writeable mmaps containing 1000 MB Created 11 writeable mmaps containing 1100 MB Created 12 writeable mmaps containing 1200 MB Created 13 writeable mmaps containing 1300 MB Created 14 writeable mmaps containing 1400 MB Created 15 writeable mmaps containing 1500 MB Created 16 writeable mmaps containing 1600 MB Created 17 writeable mmaps containing 1700 MB Created 18 writeable mmaps containing 1800 MB Removing mmaps... Done... Traceback (most recent call last): File "C:\svn-sandbox\research\scipy\scipy\src\com\terma\kha\mmaptest.py", line 16, in <module> mm = mmap.mmap(f.fileno(), bytes_per_mmap) WindowsError: [Error 8] Not enough storage is available to process this command although there is 26 GB free storage on the drive. Such a <2 GB limit is not mentioned in the documentation for Python 2.5.4 - at least not in the mmap documentation, so I am surprised this is the case. I think I will make a post about it on python.org Unfortunately, I do not have a 64 bit system on which I can test this. Cheers, Kim

Citi, Luca

9:29 a.m.

New subject: Not enough storage for memmap on 32 bit WinXP for accumulated file size above approx. 1 GB

Hello! I have access to both a 32bit and a 64bit linux machine. I had to change your code (appended) because I got an error about not being able to create a mmap larger than the file. Here are the results... On the 32bit machine: lciti@xps2:~$ python /tmp/ppp.py Created 1 writeable mmaps containing 100 MB Created 2 writeable mmaps containing 200 MB Created 3 writeable mmaps containing 300 MB Created 4 writeable mmaps containing 400 MB Created 5 writeable mmaps containing 500 MB [......] Created 24 writeable mmaps containing 2400 MB Created 25 writeable mmaps containing 2500 MB Created 26 writeable mmaps containing 2600 MB Created 27 writeable mmaps containing 2700 MB Created 28 writeable mmaps containing 2800 MB Created 29 writeable mmaps containing 2900 MB Created 30 writeable mmaps containing 3000 MB Removing mmaps... Done... Traceback (most recent call last): File "/tmp/ppp.py", line 19, in <module> mm = mmap.mmap(f.fileno(), 0) mmap.error: [Errno 12] Cannot allocate memory On the 64bit machine I can create 510 mmaps both with bytes_per_mmap at 100MiB and 1GiB: Created 1 writeable mmaps containing 1000 MB Created 2 writeable mmaps containing 2000 MB Created 3 writeable mmaps containing 3000 MB Created 4 writeable mmaps containing 4000 MB Created 5 writeable mmaps containing 5000 MB Created 6 writeable mmaps containing 6000 MB [......] Created 501 writeable mmaps containing 501000 MB Created 502 writeable mmaps containing 502000 MB Created 503 writeable mmaps containing 503000 MB Created 504 writeable mmaps containing 504000 MB Created 505 writeable mmaps containing 505000 MB Created 506 writeable mmaps containing 506000 MB Created 507 writeable mmaps containing 507000 MB Created 508 writeable mmaps containing 508000 MB Created 509 writeable mmaps containing 509000 MB Created 510 writeable mmaps containing 510000 MB Removing mmaps... Done... Traceback (most recent call last): File "/tmp/ppp.py", line 19, in <module> mm = mmap.mmap(f.fileno(), 0) mmap.error: [Errno 24] Too many open files I do not even have 510GiB free in the disk. But I think that is because the ext3 filesystem allows sparse files. I think this shows that the maximum mapped space cannot be more than the maximum address space which is 2**64 for 64bit machines, 2GiB for windows32 and 3GiB for linux32. Under WindowsXP, you can try to increase it from 2GiB to 3GiB using the /3GB switch in the boot.ini Best, Luca ### CODE ### import itertools import mmap import os files = [] mmaps = [] file_names= [] mmap_cap=0 bytes_per_mmap = 100 * 1024 ** 2 try: for i in itertools.count(1): file_name = "/home/lciti/%d.tst" % i file_names.append(file_name) f = open(file_name, "w+b") files.append(f) f.seek(bytes_per_mmap) f.write('a') f.seek(0) mm = mmap.mmap(f.fileno(), 0) mmaps.append(mm) mmap_cap += bytes_per_mmap print "Created %d writeable mmaps containing %d MB" % (i,mmap_cap/(1024**2)) #Clean up finally: print "Removing mmaps..." for mm, f, file_name in zip(mmaps, files, file_names): mm.close() f.close() os.remove(file_name) print "Done..."

Kim Hansen

10:39 a.m.

New subject: Not enough storage for memmap on 32 bit WinXP for accumulated file size above approx. 1 GB

2009/7/24 Citi, Luca :

...

Hello! I have access to both a 32bit and a 64bit linux machine.

I had to change your code (appended) because I got an error about not being able to create a mmap larger than the file. Here are the results...

On the 32bit machine:

lciti@xps2:~$ python /tmp/ppp.py Created 1 writeable mmaps containing 100 MB Created 2 writeable mmaps containing 200 MB Created 3 writeable mmaps containing 300 MB Created 4 writeable mmaps containing 400 MB Created 5 writeable mmaps containing 500 MB [......] Created 24 writeable mmaps containing 2400 MB Created 25 writeable mmaps containing 2500 MB Created 26 writeable mmaps containing 2600 MB Created 27 writeable mmaps containing 2700 MB Created 28 writeable mmaps containing 2800 MB Created 29 writeable mmaps containing 2900 MB Created 30 writeable mmaps containing 3000 MB Removing mmaps... Done... Traceback (most recent call last): File "/tmp/ppp.py", line 19, in <module> mm = mmap.mmap(f.fileno(), 0) mmap.error: [Errno 12] Cannot allocate memory

On the 64bit machine I can create 510 mmaps both with bytes_per_mmap at 100MiB and 1GiB:

Created 1 writeable mmaps containing 1000 MB Created 2 writeable mmaps containing 2000 MB Created 3 writeable mmaps containing 3000 MB Created 4 writeable mmaps containing 4000 MB Created 5 writeable mmaps containing 5000 MB Created 6 writeable mmaps containing 6000 MB [......] Created 501 writeable mmaps containing 501000 MB Created 502 writeable mmaps containing 502000 MB Created 503 writeable mmaps containing 503000 MB Created 504 writeable mmaps containing 504000 MB Created 505 writeable mmaps containing 505000 MB Created 506 writeable mmaps containing 506000 MB Created 507 writeable mmaps containing 507000 MB Created 508 writeable mmaps containing 508000 MB Created 509 writeable mmaps containing 509000 MB Created 510 writeable mmaps containing 510000 MB Removing mmaps... Done... Traceback (most recent call last): File "/tmp/ppp.py", line 19, in <module> mm = mmap.mmap(f.fileno(), 0) mmap.error: [Errno 24] Too many open files

I do not even have 510GiB free in the disk. But I think that is because the ext3 filesystem allows sparse files.

I think this shows that the maximum mapped space cannot be more than the maximum address space which is 2**64 for 64bit machines, 2GiB for windows32 and 3GiB for linux32.

Under WindowsXP, you can try to increase it from 2GiB to 3GiB using the /3GB switch in the boot.ini

Best, Luca

Hi Luca, thx for trying. Your test clearly shows that 32 bits imposes a severe limitation. I tried adding the /3GB switch to boot.ini as you suggested: multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Microsoft Windows XP Professional" /noexecute=optin /fastdetect /3GB and rebooted the system. Unfortunately that did not change anything for me. I still hit a hard deck around 1.9 GB. Strange. Best wishes, Kim

David Cournapeau

10:24 a.m.

New subject: Not enough storage for memmap on 32 bit WinXP for accumulated file size above approx. 1 GB

Kim Hansen wrote:

...

2009/7/24 Citi, Luca :

...
Hello! I have access to both a 32bit and a 64bit linux machine.

I had to change your code (appended) because I got an error about not being able to create a mmap larger than the file. Here are the results...

On the 32bit machine:

lciti@xps2:~$ python /tmp/ppp.py Created 1 writeable mmaps containing 100 MB Created 2 writeable mmaps containing 200 MB Created 3 writeable mmaps containing 300 MB Created 4 writeable mmaps containing 400 MB Created 5 writeable mmaps containing 500 MB [......] Created 24 writeable mmaps containing 2400 MB Created 25 writeable mmaps containing 2500 MB Created 26 writeable mmaps containing 2600 MB Created 27 writeable mmaps containing 2700 MB Created 28 writeable mmaps containing 2800 MB Created 29 writeable mmaps containing 2900 MB Created 30 writeable mmaps containing 3000 MB Removing mmaps... Done... Traceback (most recent call last): File "/tmp/ppp.py", line 19, in <module> mm = mmap.mmap(f.fileno(), 0) mmap.error: [Errno 12] Cannot allocate memory

On the 64bit machine I can create 510 mmaps both with bytes_per_mmap at 100MiB and 1GiB:

Created 1 writeable mmaps containing 1000 MB Created 2 writeable mmaps containing 2000 MB Created 3 writeable mmaps containing 3000 MB Created 4 writeable mmaps containing 4000 MB Created 5 writeable mmaps containing 5000 MB Created 6 writeable mmaps containing 6000 MB [......] Created 501 writeable mmaps containing 501000 MB Created 502 writeable mmaps containing 502000 MB Created 503 writeable mmaps containing 503000 MB Created 504 writeable mmaps containing 504000 MB Created 505 writeable mmaps containing 505000 MB Created 506 writeable mmaps containing 506000 MB Created 507 writeable mmaps containing 507000 MB Created 508 writeable mmaps containing 508000 MB Created 509 writeable mmaps containing 509000 MB Created 510 writeable mmaps containing 510000 MB Removing mmaps... Done... Traceback (most recent call last): File "/tmp/ppp.py", line 19, in <module> mm = mmap.mmap(f.fileno(), 0) mmap.error: [Errno 24] Too many open files

I do not even have 510GiB free in the disk. But I think that is because the ext3 filesystem allows sparse files.

I think this shows that the maximum mapped space cannot be more than the maximum address space which is 2**64 for 64bit machines, 2GiB for windows32 and 3GiB for linux32.

Under WindowsXP, you can try to increase it from 2GiB to 3GiB using the /3GB switch in the boot.ini

Best, Luca

Hi Luca, thx for trying. Your test clearly shows that 32 bits imposes a severe limitation.

I tried adding the /3GB switch to boot.ini as you suggested: multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Microsoft Windows XP Professional" /noexecute=optin /fastdetect /3GB and rebooted the system.

Unfortunately that did not change anything for me. I still hit a hard deck around 1.9 GB. Strange.

The 3Gb thing only works for application specifically compiled for it: http://blogs.msdn.com/oldnewthing/archive/2004/08/12/213468.aspx I somewhat doubt python is built with this, but you could check this in python sources to be sure, cheers, David

Kim Hansen

10:55 a.m.

New subject: Not enough storage for memmap on 32 bit WinXP for accumulated file size above approx. 1 GB

...

...
I tried adding the /3GB switch to boot.ini as you suggested: multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Microsoft Windows XP Professional" /noexecute=optin /fastdetect /3GB and rebooted the system.

Unfortunately that did not change anything for me. I still hit a hard deck around 1.9 GB. Strange.

The 3Gb thing only works for application specifically compiled for it:

http://blogs.msdn.com/oldnewthing/archive/2004/08/12/213468.aspx

I somewhat doubt python is built with this, but you could check this in python sources to be sure,

cheers,

David Ahh, that explains it. Thank you for that enlightening link. Anyway would it not be worth mentioning in the memmap documentation that there is this 32 bit limitation, or is it so straightforwardly obvious (it was not for me) that his is the case?

The reason it isn't obvious for me is because I can read and manipulate files >200 GB in Python with no problems (yes I process that large files), so I thought why should it not be capable of handling quite large memmaps as well... Cheers, Kim

David Cournapeau

10:52 a.m.

New subject: Not enough storage for memmap on 32 bit WinXP for accumulated file size above approx. 1 GB

Kim Hansen wrote:

...

...
...
I tried adding the /3GB switch to boot.ini as you suggested: multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Microsoft Windows XP Professional" /noexecute=optin /fastdetect /3GB and rebooted the system.

Unfortunately that did not change anything for me. I still hit a hard deck around 1.9 GB. Strange.

The 3Gb thing only works for application specifically compiled for it:

http://blogs.msdn.com/oldnewthing/archive/2004/08/12/213468.aspx

I somewhat doubt python is built with this, but you could check this in python sources to be sure,

cheers,

David

Ahh, that explains it. Thank you for that enlightening link. Anyway would it not be worth mentioning in the memmap documentation that there is this 32 bit limitation, or is it so straightforwardly obvious (it was not for me) that his is the case?

Well, the questions has popped up a few times already, so I guess this is not so obvious :) 32 bits architecture fundamentally means that a pointer is 32 bits, so you can only address 2^32 different memory locations. The 2Gb instead of 4Gb is a consequence on how windows and linux kernels work. You can mmap a file which is bigger than 4Gb (as you can allocate more than 4Gb, at least in theory, on a 32 bits system), but you cannot 'see' more than 4Gb at the same time because the pointer is too small. Raymond Chen gives an example on windows: http://blogs.msdn.com/oldnewthing/archive/2004/08/10/211890.aspx I don't know if it is possible to do so in python, though.

...

The reason it isn't obvious for me is because I can read and manipulate files >200 GB in Python with no problems (yes I process that large files), so I thought why should it not be capable of handling quite large memmaps as well...

Handling large files is no problem on 32 bits: it is just a matter of API (and kernel/fs support). You move the file location using a 64 bits integer and so on. Handling more than 4 Gb of memory at the same time is much more difficult. To address more than 4Gb, you would need a segmented architecture in your memory handling (with a first address for a segment, and a second address for the location within one segment). cheers, David

Kim Hansen

27 Jul 27 Jul

10:11 a.m.

New subject: Not enough storage for memmap on 32 bit WinXP for accumulated file size above approx. 1 GB

2009/7/24 David Cournapeau :

...

Well, the questions has popped up a few times already, so I guess this is not so obvious :) 32 bits architecture fundamentally means that a pointer is 32 bits, so you can only address 2^32 different memory locations. The 2Gb instead of 4Gb is a consequence on how windows and linux kernels work. You can mmap a file which is bigger than 4Gb (as you can allocate more than 4Gb, at least in theory, on a 32 bits system), but you cannot 'see' more than 4Gb at the same time because the pointer is too small.

Raymond Chen gives an example on windows:

http://blogs.msdn.com/oldnewthing/archive/2004/08/10/211890.aspx

I don't know if it is possible to do so in python, though.

...
The reason it isn't obvious for me is because I can read and manipulate files >200 GB in Python with no problems (yes I process that large files), so I thought why should it not be capable of handling quite large memmaps as well...

Handling large files is no problem on 32 bits: it is just a matter of API (and kernel/fs support). You move the file location using a 64 bits integer and so on. Handling more than 4 Gb of memory at the same time is much more difficult. To address more than 4Gb, you would need a segmented architecture in your memory handling (with a first address for a segment, and a second address for the location within one segment).

cheers,

David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

OK, I understand what you are saying. However, in my application it would really be nice to have the ability to "typecast" recarrays with an accumulated size in ecxess of 2GB onto files such that I could have the convenient slicing notation available for accessing the data.

...

From my (admittedly ignorant) point of view it seems like an implementation detail for me, that there is a problem with some intermediate memory address space.

My typical use case would be to access and process the large filemapped, readonly recarray in chunks of up to 1,000,000 records 100 bytes each, or for instance pick every 1000th element of a specific field. That is data structures, which I can easily have in RAM while working at it. I think it would be cool to have an alternative (possible readonly) memmap implementation (filearray?), which is not just a wrapper around mmap.mmap (with its 32 bit address space limitation), but which (simply?) operates directly on the files with seek and read. I think that could be very usefull (well for me at least, that is). In my specific case, I will probably now proceed and make some poor mans wrapping convenience methods implementing just the specific featuires I need as I do not have the insight to subclass an ndarray myself and override the needed methods. In that manner I can go to >2GB still with low memory usage, but it will not be pretty. Kim

David Cournapeau

10:04 a.m.

New subject: Not enough storage for memmap on 32 bit WinXP for accumulated file size above approx. 1 GB

Kim Hansen wrote:

...

...
From my (admittedly ignorant) point of view it seems like an implementation detail for me, that there is a problem with some intermediate memory address space.

Yes, it is an implementation detail, but as is 32 vs 64 bits :)

...

My typical use case would be to access and process the large filemapped, readonly recarray in chunks of up to 1,000,000 records 100 bytes each, or for instance pick every 1000th element of a specific field. That is data structures, which I can easily have in RAM while working at it.

I think it would be cool to have an alternative (possible readonly) memmap implementation (filearray?), which is not just a wrapper around mmap.mmap (with its 32 bit address space limitation), but which (simply?) operates directly on the files with seek and read. I think that could be very usefull (well for me at least, that is). In my specific case, I will probably now proceed and make some poor mans wrapping convenience methods implementing just the specific featuires I need as I do not have the insight to subclass an ndarray myself and override the needed methods. In that manner I can go to >2GB still with low memory usage, but it will not be pretty.

I think it would be quite complicated. One fundamental "limitation" of numpy is that it views a contiguous chunk of memory. You can't have one numpy array which is the union of two memory blocks with a hole in between, so if you slice every 1000 items, the underlying memory of the array still needs to 'view' the whole thing. I think it is not possible to support what you want with one numpy array. I think the simple solution really is to go 64 bits, that's exactly the kind of things it is used for. If your machine is relatively recent, it supports 64 bits addressing. cheers, David

Kim Hansen

10:37 a.m.

New subject: Not enough storage for memmap on 32 bit WinXP for accumulated file size above approx. 1 GB

...

I think it would be quite complicated. One fundamental "limitation" of numpy is that it views a contiguous chunk of memory. You can't have one numpy array which is the union of two memory blocks with a hole in between, so if you slice every 1000 items, the underlying memory of the array still needs to 'view' the whole thing. I think it is not possible to support what you want with one numpy array.

Yes, I see the problem in getting the same kind of reuse of objects using simple indexing. For my specific case, I will just allocate a new array as containing a copy of every 100th element and return this array. It will basically give me the same result as the original recarray is for read-only purposes only. This will be very simple implement for the specific cases I have

...

I think the simple solution really is to go 64 bits, that's exactly the kind of things it is used for. If your machine is relatively recent, it supports 64 bits addressing.

The machine is new and shiny with loads of processing power and many TB of HDD storage. I am however bound to 32 bits Win XP OS as there are some other costum made third-party and very expensive applications running on that machine (which generate the large files I analyze), which can only run on 32 bits, oh well.... Cheers, Kim

...

cheers,

David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

David Cournapeau

10:28 a.m.

New subject: Not enough storage for memmap on 32 bit WinXP for accumulated file size above approx. 1 GB

Kim Hansen wrote:

...

The machine is new and shiny with loads of processing power and many TB of HDD storage. I am however bound to 32 bits Win XP OS as there are some other costum made third-party and very expensive applications running on that machine (which generate the large files I analyze), which can only run on 32 bits, oh well....

Windows 64 bits can run 32 bits applications - very few applications are 64 bits to this day (For example, even the most recent visual studio (2008) do not run in 64 bits AFAIK). But scipy does not work on windows 64 bits ATM - although numpy does without problem if you build it by yourself. David

Sebastian Haase

10:44 a.m.

New subject: Not enough storage for memmap on 32 bit WinXP for accumulated file size above approx. 1 GB

Is PyTables any option for you ? -- Sebastian Haase On Mon, Jul 27, 2009 at 12:37 PM, Kim Hansen wrote:

...

...
I think it would be quite complicated. One fundamental "limitation" of numpy is that it views a contiguous chunk of memory. You can't have one numpy array which is the union of two memory blocks with a hole in between, so if you slice every 1000 items, the underlying memory of the array still needs to 'view' the whole thing. I think it is not possible to support what you want with one numpy array.

Yes, I see the problem in getting the same kind of reuse of objects using simple indexing. For my specific case, I will just allocate a new array as containing a copy of every 100th element and return this array. It will basically give me the same result as the original recarray is for read-only purposes only. This will be very simple implement for the specific cases I have

...
I think the simple solution really is to go 64 bits, that's exactly the kind of things it is used for. If your machine is relatively recent, it supports 64 bits addressing.

The machine is new and shiny with loads of processing power and many TB of HDD storage. I am however bound to 32 bits Win XP OS as there are some other costum made third-party and very expensive applications running on that machine (which generate the large files I analyze), which can only run on 32 bits, oh well....

Cheers,

Kim

...
cheers,

David _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Kim Hansen

11:41 a.m.

New subject: Not enough storage for memmap on 32 bit WinXP for accumulated file size above approx. 1 GB

2009/7/27 Sebastian Haase :

...

Is PyTables any option for you ?

-- Sebastian Haase

That may indeed be something for me! I had heard the name before but I never realized exactly what it was. However, i have just seen their first tutorial video, and it seems like a very, very useful package and easy to use package, which could meet my needs. Thanks for telling me! I will install it and start toying around with it right away. Cheers, Kim

Robin

10:46 a.m.

New subject: Not enough storage for memmap on 32 bit WinXP for accumulated file size above approx. 1 GB

On Mon, Jul 27, 2009 at 11:37 AM, Kim Hansen wrote:

...

The machine is new and shiny with loads of processing power and many TB of HDD storage. I am however bound to 32 bits Win XP OS as there are some other costum made third-party and very expensive applications running on that machine (which generate the large files I analyze), which can only run on 32 bits, oh well....

You could think about using some kind of virtualisation - this is exactly the sort of situation where I find it really useful. You can run a 64 bit host OS, then have 32 bit XP as a 'guest' in VMware or Virtualbox or some other virtualisation software. With recent CPU's there is very little performance penalty (running 32bit on a 64bit host) and it can be very convenient (it is easy to map network drives between guest and host which perform very well since the 'network' is virtual) Cheers Robin

Kim Hansen

11:18 a.m.

New subject: Not enough storage for memmap on 32 bit WinXP for accumulated file size above approx. 1 GB

...

You could think about using some kind of virtualisation - this is exactly the sort of situation where I find it really useful.

You can run a 64 bit host OS, then have 32 bit XP as a 'guest' in VMware or Virtualbox or some other virtualisation software. With recent CPU's there is very little performance penalty (running 32bit on a 64bit host) and it can be very convenient (it is easy to map network drives between guest and host which perform very well since the 'network' is virtual)

Cheers

Robin

That is actually a very good idea. I had never thought of that myself. I will give it some consideration once I am sure I have exhausted options which do not require installing a new OS on the machine. Somewhat reluctant concerning risks, like, will the 32 bit drivers still work on the HW the propriatary 32 bit program is using, on a 64 bit host OS etc....also because I haven't tried using VMware etc - I've heard it should be quite easy though.... Cheers, Kim

Christopher Barker

3:41 p.m.

New subject: Not enough storage for memmap on 32 bit WinXP for accumulated file size above approx. 1 GB

Kim Hansen wrote:

...

Yes, I see the problem in getting the same kind of reuse of objects using simple indexing. For my specific case, I will just allocate a new array as containing a copy of every 100th element and return this array. It will basically give me the same result as the original recarray is for read-only purposes only. This will be very simple implement for the specific cases I have

It does sound like PyTables may do just what you want, but if not: You may be able to get this simple use case handles by writing your own simple memory mapped array implementation, perhaps as a subclass of ndarray, or perhaps from scratch. Python's "duck typing" could allow you to simply plop your implementation in where you need it. This does require that you really do only need a few simple features... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

5381

Age (days ago)

5385

Last active (days ago)

List overview

Download

18 comments

7 participants

participants (7)

Charles R Harris
Christopher Barker
Citi, Luca
David Cournapeau
Kim Hansen
Robin
Sebastian Haase

Not enough storage for memmap on 32 bit Win XP for accumulated file size above approx. 1 GB

Kim Hansen

Charles R Harris

Kim Hansen

Charles R Harris

Kim Hansen

Citi, Luca

Kim Hansen

David Cournapeau

Kim Hansen

David Cournapeau

Kim Hansen

David Cournapeau

Kim Hansen

David Cournapeau

Sebastian Haase

Kim Hansen

Robin

Kim Hansen

Christopher Barker

tags

participants (7)