I'm using the latest version of Sage (3.4.2) which is python 2.5 and numpy something or other (I will do more digging presently)
I'm able to map large files and access all the elements unless I'm using slices
so, for example:
fp = np.memmap("/mnt/hdd/data/mmap/numpy1e10.mmap", dtype='float64', mode='r+', shape=(10000000000,))
which is 1e10 doubles if you don't wanna count the zeros
gives full access to a 75 GB memory image
But when I do:
fp[:] = 1.0 np.sum(fp)
I get 1410065408.0 as the result
Interestingly, I can do:
fp[9999999999] = 3.0
and get the proper result stored and can read it back.
So, it appears to me that slicing is limited to 32 bit values
Trying to push it a bit, I tried making my own slice
myslice = slice(1410065408, 9999999999)
and using it like fp[myslice]=1.0
but it returns immediately having changed nothing. The slice creation "appears" to work in that I can get the values back out and all... but inside numpy it seems to get thrown out.
My guess is that internally the python slice in 2.5 is 32 bit even on my 64 bit version of python / numpy.
The good news is that it looks like the hard stuff (i.e. very large mmaped files) work... but slicing is, for some reason, limited to 32 bits.
Am I missing something?
-glenn
On Wed, May 13, 2009 at 10:50 PM, Glenn Tarbox, PhD glenn@tarbox.orgwrote:
I'm using the latest version of Sage (3.4.2) which is python 2.5 and numpy something or other (I will do more digging presently)
I'm able to map large files and access all the elements unless I'm using slices
so, for example:
fp = np.memmap("/mnt/hdd/data/mmap/numpy1e10.mmap", dtype='float64', mode='r+', shape=(10000000000,))
which is 1e10 doubles if you don't wanna count the zeros
gives full access to a 75 GB memory image
But when I do:
fp[:] = 1.0 np.sum(fp)
I get 1410065408.0 as the result
As doubles, that is more than 2**33 bytes, so I expect there is something else going on. How much physical memory/swap memory do you have? This could also be a python problem since python does the memmap.
Chuck
On Wed, May 13, 2009 at 11:04 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
On Wed, May 13, 2009 at 10:50 PM, Glenn Tarbox, PhD glenn@tarbox.orgwrote:
I'm using the latest version of Sage (3.4.2) which is python 2.5 and numpy something or other (I will do more digging presently)
I'm able to map large files and access all the elements unless I'm using slices
so, for example:
fp = np.memmap("/mnt/hdd/data/mmap/numpy1e10.mmap", dtype='float64', mode='r+', shape=(10000000000,))
which is 1e10 doubles if you don't wanna count the zeros
gives full access to a 75 GB memory image
But when I do:
fp[:] = 1.0 np.sum(fp)
I get 1410065408.0 as the result
As doubles, that is more than 2**33 bytes, so I expect there is something else going on. How much physical memory/swap memory do you have? This could also be a python problem since python does the memmap.
I've been working on some other things lately and that number seemed related to 2^32... now that I look more closely, I don't know where that number comes from.
To your question, I have 32GB of RAM and virtually nothing else running... Top tells me I'm getting between 96% and 98% for this process which seems about right.
Here's the thing. When I create the mmap file, I get the right number of bytes. I can, from what I can tell, update individual values within the array (I'm gonna bang on it a bit more with some other scripts)
Its only when using slicing that things get strange (he says having not really done a more thorough test)
Of course, I was assuming this is a 32 bit thing... but you're right... where did that result come from???
The other clue here is that when I create my own slice (as described above) it returns instantly... numpy doesn't throw an error but it doesn't do anything with the slice either.
Since I'm IO bound anyways, maybe i'll just write a loop and see if I can't set all the values. The machine could use a little exercise anyways.
-glenn
Chuck
Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Wed, May 13, 2009 at 11:22 PM, Glenn Tarbox, PhD glenn@tarbox.orgwrote:
On Wed, May 13, 2009 at 11:04 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
On Wed, May 13, 2009 at 10:50 PM, Glenn Tarbox, PhD glenn@tarbox.orgwrote:
I'm using the latest version of Sage (3.4.2) which is python 2.5 and numpy something or other (I will do more digging presently)
I'm able to map large files and access all the elements unless I'm using slices
so, for example:
fp = np.memmap("/mnt/hdd/data/mmap/numpy1e10.mmap", dtype='float64', mode='r+', shape=(10000000000,))
which is 1e10 doubles if you don't wanna count the zeros
gives full access to a 75 GB memory image
But when I do:
fp[:] = 1.0 np.sum(fp)
I get 1410065408.0 as the result
As doubles, that is more than 2**33 bytes, so I expect there is something else going on. How much physical memory/swap memory do you have? This could also be a python problem since python does the memmap.
I've been working on some other things lately and that number seemed related to 2^32... now that I look more closely, I don't know where that number comes from.
To your question, I have 32GB of RAM and virtually nothing else running... Top tells me I'm getting between 96% and 98% for this process which seems about right.
Here's the thing. When I create the mmap file, I get the right number of bytes. I can, from what I can tell, update individual values within the array (I'm gonna bang on it a bit more with some other scripts)
Its only when using slicing that things get strange (he says having not really done a more thorough test)
Of course, I was assuming this is a 32 bit thing... but you're right... where did that result come from???
The other clue here is that when I create my own slice (as described above) it returns instantly... numpy doesn't throw an error but it doesn't do anything with the slice either.
Since I'm IO bound anyways, maybe i'll just write a loop and see if I can't set all the values. The machine could use a little exercise anyways.
I ran the following test:
import numpy as np size=10000000000 fp = np.memmap("/mnt/hdd/data/mmap/numpy1e10.mmap", dtype='float64', mode='r+', shape=(size,)) for i in xrange(size): fp[i]=1.0
time np.sum(fp)
10000000000.0 Time: CPU 188.36 s, Wall: 884.33 s
So, everything seems to be working and it kinda makes sense. The sum should be IO bound which it is. I didn't time the loop but it took a while (maybe 30 minutes) and it was compute bound.
To make sure, I exited the program and ran everything but the initialization loop.
import numpy as np size=10000000000 fp = np.memmap("/mnt/hdd/data/mmap/numpy1e10.mmap", dtype='float64', mode='r+', shape=(size,)
time np.sum(fp)
10000000000.0 Time: CPU 180.02 s, Wall: 854.72 s
I was a little surprised that it didn't take longer given almost half of the mmap'ed data should have been resident in the sum performed immediately after initialization, but since it needed to start at the beginning and only had the second half in memory, it makes sense
So, it "appears" as though the mmap works but there's something strange with slices going on.
-glenn
Chuck
Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Glenn H. Tarbox, PhD || 206-274-6919 http://www.tarbox.org
On Thu, May 14, 2009 at 1:43 AM, Gael Varoquaux < gael.varoquaux@normalesup.org> wrote:
On Thu, May 14, 2009 at 01:31:45AM -0700, Glenn Tarbox, PhD wrote:
I've been working on some other things lately and that number seemed related to 2^32... now that I look more closely, I don't know where
that
number comes from.
Is your OS 64bit?
Yes, Ubuntu 9.04 x86_64
Linux hq2 2.6.28-11-server #42-Ubuntu SMP Fri Apr 17 02:45:36 UTC 2009 x86_64 GNU/Linux
-glenn
Gaël
Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Thu, May 14, 2009 at 02:13:23AM -0700, Glenn Tarbox, PhD wrote:
On Thu, May 14, 2009 at 1:43 AM, Gael Varoquaux <[1]gael.varoquaux@normalesup.org> wrote:
On Thu, May 14, 2009 at 01:31:45AM -0700, Glenn Tarbox, PhD wrote: > � � �I've been working on some other things lately and that number seemed > � � �related to 2^32... now that I look more closely, I don't know where that > � � �number comes from.
Is your OS 64bit?
Yes, Ubuntu 9.04 x86_64
Hum, I am wondering: could it be that Sage has not been compiled in 64bits? That number '32' seems to me to point toward a 32bit pointer issue (I may be wrong).
Ga�l
On Thu, May 14, 2009 at 2:16 AM, Gael Varoquaux < gael.varoquaux@normalesup.org> wrote:
On Thu, May 14, 2009 at 02:13:23AM -0700, Glenn Tarbox, PhD wrote:
On Thu, May 14, 2009 at 1:43 AM, Gael Varoquaux <[1]gael.varoquaux@normalesup.org> wrote:
On Thu, May 14, 2009 at 01:31:45AM -0700, Glenn Tarbox, PhD wrote: > I've been working on some other things lately and that number seemed > related to 2^32... now that I look more closely, I don't know where that > number comes from.
Is your OS 64bit?
Yes, Ubuntu 9.04 x86_64
Hum, I am wondering: could it be that Sage has not been compiled in 64bits? That number '32' seems to me to point toward a 32bit pointer issue (I may be wrong).
The other tests I posted indicate everything else is working... For example, np.sum(fp) runs over the full set of 1e10 doubes and seems to work fine.
Also, while my first thought was about 2^32, Chuck Harris's reply kinda put that to bed. Where 1410065408.0 comes from may involve e or PI (at least thats how we reverse engineered answers when I was in college :-)
-glenn
Gaël
Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Thu, May 14, 2009 at 07:40:58AM -0700, Glenn Tarbox, PhD wrote:
Hum, I am wondering: could it be that Sage has not been compiled in 64bits? That number '32' seems to me to point toward a 32bit pointer issue (I may be wrong).
The other tests I posted indicate everything else is working... For example, np.sum(fp) runs over the full set of 1e10 doubes and seems to work fine.�
Correct. I had missed that.
Ga�l
Today at Sage Days we tried slices on a few large arrays (no mmap) and found that slicing breaks on arrays somewhere between 2.0e9 and 2.5e9 elements. The failure mode is the same, no error thrown, basically nothing happens
This was on one of the big sage machines. I don't know the specific OS / CPU but it was definitely 64 bit and lots of available memory etc.
-glenn
On Thu, May 14, 2009 at 7:54 AM, Gael Varoquaux < gael.varoquaux@normalesup.org> wrote:
On Thu, May 14, 2009 at 07:40:58AM -0700, Glenn Tarbox, PhD wrote:
Hum, I am wondering: could it be that Sage has not been compiled in 64bits? That number '32' seems to me to point toward a 32bit pointer issue (I may be wrong).
The other tests I posted indicate everything else is working... For example, np.sum(fp) runs over the full set of 1e10 doubes and seems to work fine.
Correct. I had missed that.
Gaël
Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi,
Sat, 16 May 2009 22:24:34 -0700, Glenn Tarbox, PhD wrote:
Today at Sage Days we tried slices on a few large arrays (no mmap) and found that slicing breaks on arrays somewhere between 2.0e9 and 2.5e9 elements. The failure mode is the same, no error thrown, basically nothing happens
This was on one of the big sage machines. I don't know the specific OS / CPU but it was definitely 64 bit and lots of available memory etc.
Could you file a bug ticket in the Numpy Trac,
http://projects.scipy.org/numpy
so that there's a better chance that this doesn't get forgotten.
Thanks,
Hi Glen,
On Sat, May 16, 2009 at 11:24 PM, Glenn Tarbox, PhD glenn@tarbox.orgwrote:
Today at Sage Days we tried slices on a few large arrays (no mmap) and found that slicing breaks on arrays somewhere between 2.0e9 and 2.5e9 elements. The failure mode is the same, no error thrown, basically nothing happens
This was on one of the big sage machines. I don't know the specific OS / CPU but it was definitely 64 bit and lots of available memory etc.
Can you try slicing with an explicit upper bound? Something like a[:n] = 1, where n is the size of the array.
Chuck
On Sun, May 17, 2009 at 8:51 AM, Charles R Harris <charlesr.harris@gmail.com
wrote:
Hi Glen,
On Sat, May 16, 2009 at 11:24 PM, Glenn Tarbox, PhD glenn@tarbox.orgwrote:
Today at Sage Days we tried slices on a few large arrays (no mmap) and found that slicing breaks on arrays somewhere between 2.0e9 and 2.5e9 elements. The failure mode is the same, no error thrown, basically nothing happens
This was on one of the big sage machines. I don't know the specific OS / CPU but it was definitely 64 bit and lots of available memory etc.
Can you try slicing with an explicit upper bound? Something like a[:n] = 1, where n is the size of the array.
And maybe some things like a[n:n+1] = 1, which should only set a single element and might save some time ;)
Chuck