Hi guys, Britton and I worked on some improvements to hop today. Specifically, changing the means of accessing attributes from *copying* to *in-place* access. This should cut down on the memory substantially, but I believe there are still places that it could be improved. Specifically, I think a consolidation of the tags & iHop attributes could improve things. I've placed the patch at paste #97, and you additionally need: http://yt.enzotools.org/files/hop_numpy.h If y'all get a chance to look at it, see if it cuts down on memory usage at all, that'd be awesome. We're currently running a bunch of tests. If it works out, we're moving to this over the old method. -Matt
Guys, any kind of memory improvements are always welcome! Let me know if there's anything I can do to help out once you're finished testing. Also, while you're working on HOP, did Brian ever get you the code to calculate the expected maximum halo size for the padding? _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
Hi Stephen, It should be ready for testing; it gives identical results over here. Give a go with the two items, and let us know if it works for you? And any memory improvement you might notice? -Matt On Wed, Apr 15, 2009 at 3:08 PM, Stephen Skory <stephenskory@yahoo.com> wrote:
Guys,
any kind of memory improvements are always welcome! Let me know if there's anything I can do to help out once you're finished testing.
Also, while you're working on HOP, did Brian ever get you the code to calculate the expected maximum halo size for the padding?
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________ _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Matt,
It should be ready for testing; it gives identical results over here. Give a go with the two items, and let us know if it works for you? And any memory improvement you might notice?
Do you think it's worth trying to make this work with python/yt/hop? http://www.nics.tennessee.edu/craypat _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
Hi Stephen, Looks tricky. If you want to have a go at it, feel free, but I think I'm going to stay out -- I suspect we would not get the returns we'd like from it, and that it would require a lot of effort to compile successfully. That being said, if you're motivated, give it a go. I was thinking much simpler checks, to see how much main RAM is being used by each process, for instance. -Matt On Thu, Apr 16, 2009 at 8:45 AM, Stephen Skory <stephenskory@yahoo.com> wrote:
Matt,
It should be ready for testing; it gives identical results over here. Give a go with the two items, and let us know if it works for you? And any memory improvement you might notice?
Do you think it's worth trying to make this work with python/yt/hop?
http://www.nics.tennessee.edu/craypat
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________ _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Matt,
Looks tricky.
Now that I think about it some more, you're probably right.
I was thinking much simpler checks, to see how much main RAM is being used by each process, for instance.
I'll run parallel HOP on L7 on ranger before and after the patch and see what I find. _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
Hi, I applied the patch and downloaded the .h file, and hop is segfaulting on Ranger: Copying arrays for 262144 particles Calling hop... 262144 1.600e+02 nSmooth = 65 kd->nActive = 262144 Building Tree... Segmentation fault I've removed any .yt files just to be sure. There were no errors with building hop with 'python setup.py install' in yt: http://paste.enzotools.org/show/98/ Looking at hop_hop.c, it looks like it's crashing in kdBuildTree(kd) in hop_kd.c, and the only new part there is - c[i].fSplit = kd->p[m].r[d]; + c[i].fSplit = NP_POS(kd, m, d); which refers to the new stuff in hop_numpy.h. Are there changes to setup.py that weren't in the diff? Say, having to do with _NUMPY_HOP_H? _________________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
I have also applied this patch on Ranger, but I am still waiting for the hop job to run. Hopefully, soon I can add another data point to this. On Thu, Apr 16, 2009 at 1:53 PM, Stephen Skory <stephenskory@yahoo.com>wrote:
Hi,
I applied the patch and downloaded the .h file, and hop is segfaulting on Ranger:
Copying arrays for 262144 particles Calling hop... 262144 1.600e+02 nSmooth = 65 kd->nActive = 262144 Building Tree... Segmentation fault
I've removed any .yt files just to be sure. There were no errors with building hop with 'python setup.py install' in yt:
http://paste.enzotools.org/show/98/
Looking at hop_hop.c, it looks like it's crashing in kdBuildTree(kd) in hop_kd.c, and the only new part there is
- c[i].fSplit = kd->p[m].r[d]; + c[i].fSplit = NP_POS(kd, m, d);
which refers to the new stuff in hop_numpy.h. Are there changes to setup.py that weren't in the diff? Say, having to do with _NUMPY_HOP_H?
_________________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ <http://physics.ucsd.edu/%7Esskory/> _.>/ _Graduate Student ________________________________(_)_\(_)_______________ _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Britton,
I have also applied this patch on Ranger, but I am still waiting for the hop job to run. Hopefully, soon I can add another data point to this.
If you have a small enough dataset (I'm using 64^3), it crashes in serial for me. You can run that on the login node without being too obnoxious. _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
Can you put that dataset somewhere in webspace that we can grab? I'm running with RD0005-mine on my laptop in serial and it works just peachy... On Thu, Apr 16, 2009 at 1:01 PM, Stephen Skory <stephenskory@yahoo.com> wrote:
Britton,
I have also applied this patch on Ranger, but I am still waiting for the hop job to run. Hopefully, soon I can add another data point to this.
If you have a small enough dataset (I'm using 64^3), it crashes in serial for me. You can run that on the login node without being too obnoxious.
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________ _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
It's a 1024^3 dataset. If you can make one of your datasets available, I can try it just as you have. On Thu, Apr 16, 2009 at 2:01 PM, Stephen Skory <stephenskory@yahoo.com>wrote:
Britton,
I have also applied this patch on Ranger, but I am still waiting for the hop job to run. Hopefully, soon I can add another data point to this.
If you have a small enough dataset (I'm using 64^3), it crashes in serial for me. You can run that on the login node without being too obnoxious.
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ <http://physics.ucsd.edu/%7Esskory/> _.>/ _Graduate Student ________________________________(_)_\(_)_______________ _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Here's the data set: http://stephenskory.s3.amazonaws.com/research-s3/DD0065.tar.gz and the script: from yt.mods import * import yt.lagos.hop.SS_HopOutput as ss pf = EnzoStaticOutput("DD0065") hop = ss.HaloFinder(pf,padding=9.864189563e-02) hop.write_out(filename="p2-hop.out") _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
Happened to me, too. allocating 262144 particles. Copying arrays for 262144 particles Calling hop... 262144 1.600e+02 nSmooth = 65 kd->nActive = 262144 Building Tree... Segmentation fault The large job is now running, too. We'll see what happens there. On Thu, Apr 16, 2009 at 2:09 PM, Stephen Skory <stephenskory@yahoo.com>wrote:
Here's the data set:
http://stephenskory.s3.amazonaws.com/research-s3/DD0065.tar.gz
and the script:
from yt.mods import * import yt.lagos.hop.SS_HopOutput as ss
pf = EnzoStaticOutput("DD0065")
hop = ss.HaloFinder(pf,padding=9.864189563e-02)
hop.write_out(filename="p2-hop.out")
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ <http://physics.ucsd.edu/%7Esskory/> _.>/ _Graduate Student ________________________________(_)_\(_)_______________ _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
The large job is now running, too. We'll see what happens there.
I suspect you'll get the same thing, my parallel runs died too. Good luck! _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
Hi guys, Looks like it could be an issue with architecture. I'll track it down (but maybe not for an hour or so) and get back to you. Thanks very much! -Matt On Thu, Apr 16, 2009 at 1:18 PM, Stephen Skory <stephenskory@yahoo.com> wrote:
The large job is now running, too. We'll see what happens there.
I suspect you'll get the same thing, my parallel runs died too. Good luck!
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
I just ran Stephen's data in parallel and it failed in the same place. The large dataset failed as well, also in the same place. On Thu, Apr 16, 2009 at 2:18 PM, Matthew Turk <matthewturk@gmail.com> wrote:
Hi guys,
Looks like it could be an issue with architecture. I'll track it down (but maybe not for an hour or so) and get back to you. Thanks very much!
-Matt
On Thu, Apr 16, 2009 at 1:18 PM, Stephen Skory <stephenskory@yahoo.com> wrote:
The large job is now running, too. We'll see what happens there.
I suspect you'll get the same thing, my parallel runs died too. Good
luck!
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ <http://physics.ucsd.edu/%7Esskory/>_.>/ _Graduate Student ________________________________(_)_\(_)_______________
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hey guys, I can replicate the error with D0065 on my laptop. I'll figure it out and fix it... Not sure why it's not working with DD0065, but is with RD0005. All the dtypes seem the same. -Matt On Thu, Apr 16, 2009 at 1:28 PM, Britton Smith <brittonsmith@gmail.com> wrote:
I just ran Stephen's data in parallel and it failed in the same place. The large dataset failed as well, also in the same place.
On Thu, Apr 16, 2009 at 2:18 PM, Matthew Turk <matthewturk@gmail.com> wrote:
Hi guys,
Looks like it could be an issue with architecture. I'll track it down (but maybe not for an hour or so) and get back to you. Thanks very much!
-Matt
On Thu, Apr 16, 2009 at 1:18 PM, Stephen Skory <stephenskory@yahoo.com> wrote:
The large job is now running, too. We'll see what happens there.
I suspect you'll get the same thing, my parallel runs died too. Good luck!
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
I can replicate the error with D0065 on my laptop. I'll figure it out and fix it... Not sure why it's not working with DD0065, but is with RD0005. All the dtypes seem the same.
As a follow-up, yt-hop works fine on this dataset on Ranger with r1252. _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
Guys, What seems to be happening was a floating point comparison issue. It must just be something with the way the particles were aligned in the new dataset compared to the old. Changing the type of 'fm' on line 59 of hop_kd.c from 'float' to 'npy_float64' fixed it for me; the loop was never converging as a result, so it ran off the end and segfaulted when it was unable to locate any more data points insie the process's memory. There may be other places that this needs to happen, and I will look for them, but if you could test with this change that'd be awesome. Let me know if that fixes it? And if the memory improves? -Matt On Thu, Apr 16, 2009 at 1:55 PM, Stephen Skory <stephenskory@yahoo.com> wrote:
I can replicate the error with D0065 on my laptop. I'll figure it out and fix it... Not sure why it's not working with DD0065, but is with RD0005. All the dtypes seem the same.
As a follow-up, yt-hop works fine on this dataset on Ranger with r1252.
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________ _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
I just tried it in serial on Stephen's data and it worked. I'll queue it up for the larger data. I intend to try a few different processor configurations to get some information on the ram usage. Britton On Thu, Apr 16, 2009 at 5:19 PM, Matthew Turk <matthewturk@gmail.com> wrote:
Guys,
What seems to be happening was a floating point comparison issue. It must just be something with the way the particles were aligned in the new dataset compared to the old. Changing the type of 'fm' on line 59 of hop_kd.c from 'float' to 'npy_float64' fixed it for me; the loop was never converging as a result, so it ran off the end and segfaulted when it was unable to locate any more data points insie the process's memory. There may be other places that this needs to happen, and I will look for them, but if you could test with this change that'd be awesome.
Let me know if that fixes it? And if the memory improves?
-Matt
On Thu, Apr 16, 2009 at 1:55 PM, Stephen Skory <stephenskory@yahoo.com> wrote:
I can replicate the error with D0065 on my laptop. I'll figure it out and fix it... Not sure why it's not working with DD0065, but is with RD0005. All the dtypes seem the same.
As a follow-up, yt-hop works fine on this dataset on Ranger with r1252.
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ <http://physics.ucsd.edu/%7Esskory/>_.>/ _Graduate Student ________________________________(_)_\(_)_______________ _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hi guys, So I think the consensus here is that it doesn't fix the memory issues, but it does not give bad results. I'm going to commit, but we can roll back if necessary. -Matt On Thu, Apr 16, 2009 at 4:32 PM, Britton Smith <brittonsmith@gmail.com> wrote:
I just tried it in serial on Stephen's data and it worked. I'll queue it up for the larger data. I intend to try a few different processor configurations to get some information on the ram usage.
Britton
On Thu, Apr 16, 2009 at 5:19 PM, Matthew Turk <matthewturk@gmail.com> wrote:
Guys,
What seems to be happening was a floating point comparison issue. It must just be something with the way the particles were aligned in the new dataset compared to the old. Changing the type of 'fm' on line 59 of hop_kd.c from 'float' to 'npy_float64' fixed it for me; the loop was never converging as a result, so it ran off the end and segfaulted when it was unable to locate any more data points insie the process's memory. There may be other places that this needs to happen, and I will look for them, but if you could test with this change that'd be awesome.
Let me know if that fixes it? And if the memory improves?
-Matt
On Thu, Apr 16, 2009 at 1:55 PM, Stephen Skory <stephenskory@yahoo.com> wrote:
I can replicate the error with D0065 on my laptop. I'll figure it out and fix it... Not sure why it's not working with DD0065, but is with RD0005. All the dtypes seem the same.
As a follow-up, yt-hop works fine on this dataset on Ranger with r1252.
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________ _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
participants (3)
-
Britton Smith
-
Matthew Turk
-
Stephen Skory