
Is running Hop in parallel currently broken? I get way fewer halos in my catalog when I run Hop with 8 processors than with 1. Am I doing something wrong? This is my script: $ cat find_halos.py from yt.mods import * pf = load('RD0004/RedshiftOutput0004') halo_list = HaloFinder(pf, padding=0.02) halo_list.write_out('RD0004/RedshiftOutput0004_Hop.out') When I run 'python find_halos.py' I get 3716 halos, but when I run it with 'mpiexec -np 8 python find_halos.py --parallel' I only get 244. This is independent of the value of padding used. Note that the parallel catalog is a subset of the serial one. The problem occurs within HaloFinder, not during the write_out stage, since len(halo_list) = 242 on all 8 processors. Mike -- ********************************************************************* * * * Dr. Michael Kuhlen Theoretical Astrophysics Center * * email: mqk@astro.berkeley.edu UC Berkeley * * cell phone: (831) 588-1468 B-116 Hearst Field Annex # 3411 * * skype username: mikekuhlen Berkeley, CA 94720 * * * *********************************************************************

Hi Mike,
Is running Hop in parallel currently broken? I get way fewer halos in my catalog when I run Hop with 8 processors than with 1. Am I doing something wrong?
Why do you always got to start trouble? I'm getting this too. I'll dig into this. -- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice)

Ha! Trouble is good, amirite? Thanks for looking into it. It's not that big of a deal for me to run it serially right now, so don't sweat it if it turns out to be complicated to triage or fix. Mike On Fri, Feb 24, 2012 at 2:23 PM, Stephen Skory <s@skory.us> wrote:
Hi Mike,
Is running Hop in parallel currently broken? I get way fewer halos in my catalog when I run Hop with 8 processors than with 1. Am I doing something wrong?
Why do you always got to start trouble? I'm getting this too. I'll dig into this.
-- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
-- ********************************************************************* * * * Dr. Michael Kuhlen Theoretical Astrophysics Center * * email: mqk@astro.berkeley.edu UC Berkeley * * cell phone: (831) 588-1468 B-116 Hearst Field Annex # 3411 * * skype username: mikekuhlen Berkeley, CA 94720 * * * *********************************************************************

Hi Mike, Perhaps I missed something from the previous discussion. I've been locked away working on proposals all week. Out of curiosity, have you tried using ParallelHop, which is Stephen's specially designed parallel Hop. It is entirely distinct from simply running hop in parallel. Stephen, please correct me if I'm wrong. You should be able to use that simply by replacing HaloFinder with parallelHF in your script. Britton On Fri, Feb 24, 2012 at 5:23 PM, Stephen Skory <s@skory.us> wrote:
Hi Mike,
Is running Hop in parallel currently broken? I get way fewer halos in my catalog when I run Hop with 8 processors than with 1. Am I doing something wrong?
Why do you always got to start trouble? I'm getting this too. I'll dig into this.
-- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org

No, I haven't yet. From the documentation ParallelHop seems to require installing Forthon, and I was lazy. ;) I'll give that a shot. Thanks for the suggestion. Mike On Fri, Feb 24, 2012 at 2:31 PM, Britton Smith <brittonsmith@gmail.com> wrote:
Hi Mike,
Perhaps I missed something from the previous discussion. I've been locked away working on proposals all week. Out of curiosity, have you tried using ParallelHop, which is Stephen's specially designed parallel Hop. It is entirely distinct from simply running hop in parallel. Stephen, please correct me if I'm wrong.
You should be able to use that simply by replacing HaloFinder with parallelHF in your script.
Britton
On Fri, Feb 24, 2012 at 5:23 PM, Stephen Skory <s@skory.us> wrote:
Hi Mike,
Is running Hop in parallel currently broken? I get way fewer halos in my catalog when I run Hop with 8 processors than with 1. Am I doing something wrong?
Why do you always got to start trouble? I'm getting this too. I'll dig into this.
-- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
-- ********************************************************************* * * * Dr. Michael Kuhlen Theoretical Astrophysics Center * * email: mqk@astro.berkeley.edu UC Berkeley * * cell phone: (831) 588-1468 B-116 Hearst Field Annex # 3411 * * skype username: mikekuhlen Berkeley, CA 94720 * * * *********************************************************************

Sorry to be the bearer of bad news again, but ParallelHF failed for me. The installation of Forthon, etc. went smoothly. Only took 5 minutes. Note that in addition to replacing HaloFinder with parallelHF you also have to add "from yt.analysis_modules.halo_finding.api import *" to the top of the script. The error I encounter seems to be related to the large grid count that was also giving me problems yesterday. Here's the traceback. File "test.py", line 6, in <module> halo_list = parallelHF(pf) File "/home/mqk/local/lib/python2.7/site-packages/yt-2.4dev-py2.7-linux-x86_64.egg/yt/analysis_modules/halo_finding/halo_objects.py", line 2047, in __init__ total_mass = self.comm.mpi_allreduce((self._data_source["ParticleMassMsun"].astype('float64')).sum(), File "/home/mqk/local/lib/python2.7/site-packages/yt-2.4dev-py2.7-linux-x86_64.egg/yt/data_objects/data_containers.py", line 321, in __getitem__ self.get_data(key) File "/home/mqk/local/lib/python2.7/site-packages/yt-2.4dev-py2.7-linux-x86_64.egg/yt/data_objects/data_containers.py", line 2423, in get_data self.particles.get_data(field) File "/home/mqk/local/lib/python2.7/site-packages/yt-2.4dev-py2.7-linux-x86_64.egg/yt/data_objects/particle_io.py", line 99, in get_data count=len(grid_list), dtype='float64')) ValueError: iterator too short Again it's the total_mass calculation that is causing the problem. Note that this yt install includes the changes that Matt brought in yesterday to fix the memory consumption. When I specify total_mass manually, everything works and I get the correct number of halos in the final catalog. The number of halos and their properties are not exactly identical to serial Hop, but it's close enough. I got a speed-up of about a factor of 3, when running parallelHF on 8 cores vs. HaloFinder on 1 core. Mike On Fri, Feb 24, 2012 at 2:37 PM, Michael Kuhlen <mqk@astro.berkeley.edu> wrote:
No, I haven't yet. From the documentation ParallelHop seems to require installing Forthon, and I was lazy. ;) I'll give that a shot. Thanks for the suggestion.
Mike
On Fri, Feb 24, 2012 at 2:31 PM, Britton Smith <brittonsmith@gmail.com> wrote:
Hi Mike,
Perhaps I missed something from the previous discussion. I've been locked away working on proposals all week. Out of curiosity, have you tried using ParallelHop, which is Stephen's specially designed parallel Hop. It is entirely distinct from simply running hop in parallel. Stephen, please correct me if I'm wrong.
You should be able to use that simply by replacing HaloFinder with parallelHF in your script.
Britton
On Fri, Feb 24, 2012 at 5:23 PM, Stephen Skory <s@skory.us> wrote:
Hi Mike,
Is running Hop in parallel currently broken? I get way fewer halos in my catalog when I run Hop with 8 processors than with 1. Am I doing something wrong?
Why do you always got to start trouble? I'm getting this too. I'll dig into this.
-- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
-- ********************************************************************* * * * Dr. Michael Kuhlen Theoretical Astrophysics Center * * email: mqk@astro.berkeley.edu UC Berkeley * * cell phone: (831) 588-1468 B-116 Hearst Field Annex # 3411 * * skype username: mikekuhlen Berkeley, CA 94720 * * * *********************************************************************
-- ********************************************************************* * * * Dr. Michael Kuhlen Theoretical Astrophysics Center * * email: mqk@astro.berkeley.edu UC Berkeley * * cell phone: (831) 588-1468 B-116 Hearst Field Annex # 3411 * * skype username: mikekuhlen Berkeley, CA 94720 * * * *********************************************************************

I was able to reproduce the missing halo problem for the HOP in parallel compare to serial, but I wasn't able to reproduce the parallelHF problem. I'm running this on a 64 cube test dataset, so it's not as big as the 512 cube. This is using both my old YT and the version after Matt's pull request was merged. So the HOP in parallel bug existed before the memory update. From G.S. On Fri, Feb 24, 2012 at 3:36 PM, Michael Kuhlen <mqk@astro.berkeley.edu>wrote:
Sorry to be the bearer of bad news again, but ParallelHF failed for me.
The installation of Forthon, etc. went smoothly. Only took 5 minutes. Note that in addition to replacing HaloFinder with parallelHF you also have to add "from yt.analysis_modules.halo_finding.api import *" to the top of the script.
The error I encounter seems to be related to the large grid count that was also giving me problems yesterday. Here's the traceback.
File "test.py", line 6, in <module> halo_list = parallelHF(pf) File "/home/mqk/local/lib/python2.7/site-packages/yt-2.4dev-py2.7-linux-x86_64.egg/yt/analysis_modules/halo_finding/halo_objects.py", line 2047, in __init__ total_mass =
self.comm.mpi_allreduce((self._data_source["ParticleMassMsun"].astype('float64')).sum(), File "/home/mqk/local/lib/python2.7/site-packages/yt-2.4dev-py2.7-linux-x86_64.egg/yt/data_objects/data_containers.py", line 321, in __getitem__ self.get_data(key) File "/home/mqk/local/lib/python2.7/site-packages/yt-2.4dev-py2.7-linux-x86_64.egg/yt/data_objects/data_containers.py", line 2423, in get_data self.particles.get_data(field) File "/home/mqk/local/lib/python2.7/site-packages/yt-2.4dev-py2.7-linux-x86_64.egg/yt/data_objects/particle_io.py", line 99, in get_data count=len(grid_list), dtype='float64')) ValueError: iterator too short
Again it's the total_mass calculation that is causing the problem. Note that this yt install includes the changes that Matt brought in yesterday to fix the memory consumption.
When I specify total_mass manually, everything works and I get the correct number of halos in the final catalog. The number of halos and their properties are not exactly identical to serial Hop, but it's close enough. I got a speed-up of about a factor of 3, when running parallelHF on 8 cores vs. HaloFinder on 1 core.
Mike
On Fri, Feb 24, 2012 at 2:37 PM, Michael Kuhlen <mqk@astro.berkeley.edu> wrote:
No, I haven't yet. From the documentation ParallelHop seems to require installing Forthon, and I was lazy. ;) I'll give that a shot. Thanks for the suggestion.
Mike
On Fri, Feb 24, 2012 at 2:31 PM, Britton Smith <brittonsmith@gmail.com> wrote:
Hi Mike,
Perhaps I missed something from the previous discussion. I've been locked away working on proposals all week. Out of curiosity, have you tried using ParallelHop, which is Stephen's specially designed parallel Hop. It is entirely distinct from simply running hop in parallel. Stephen, please correct me if I'm wrong.
You should be able to use that simply by replacing HaloFinder with parallelHF in your script.
Britton
On Fri, Feb 24, 2012 at 5:23 PM, Stephen Skory <s@skory.us> wrote:
Hi Mike,
Is running Hop in parallel currently broken? I get way fewer halos in my catalog when I run Hop with 8 processors than with 1. Am I doing something wrong?
Why do you always got to start trouble? I'm getting this too. I'll dig into this.
-- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
-- ********************************************************************* * * * Dr. Michael Kuhlen Theoretical Astrophysics Center * * email: mqk@astro.berkeley.edu UC Berkeley * * cell phone: (831) 588-1468 B-116 Hearst Field Annex # 3411 * * skype username: mikekuhlen Berkeley, CA 94720 * * * *********************************************************************
-- ********************************************************************* * * * Dr. Michael Kuhlen Theoretical Astrophysics Center * * email: mqk@astro.berkeley.edu UC Berkeley * * cell phone: (831) 588-1468 B-116 Hearst Field Annex # 3411 * * skype username: mikekuhlen Berkeley, CA 94720 * * * ********************************************************************* _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org

Hi Mike, I think I've fixed the original HOP problem you had. It's in my yt branch here: https://bitbucket.org/sskory/yt/overview could you try with the HOP (HaloFinder) with the new changes? Let me know, and if it works for you, it will go into the main branch. With regards to your new problem, I tried the same swap of methods of calculating "total_mass" that we did in the HaloFinder, but I think there's something screwy going on with .quantities, and I haven't been able to track it down yet. With regards to speed, in general HaloFinder will be faster on all datasets than parallelHF for datasets that HaloFinder can handle. But HaloFinder is limited by the size of the dataset and by the cosmology of the dataset (small cosmologies mean that a large padding is needed when running in parallel, and this will defeat the point of parallelism). -- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice)

That fixed it for me, I get identical haloes now running HOP in serial and parallel from the 64 cube test dataset. From G.S. On Fri, Feb 24, 2012 at 4:20 PM, Stephen Skory <s@skory.us> wrote:
Hi Mike,
I think I've fixed the original HOP problem you had. It's in my yt branch here:
https://bitbucket.org/sskory/yt/overview
could you try with the HOP (HaloFinder) with the new changes? Let me know, and if it works for you, it will go into the main branch.
With regards to your new problem, I tried the same swap of methods of calculating "total_mass" that we did in the HaloFinder, but I think there's something screwy going on with .quantities, and I haven't been able to track it down yet.
With regards to speed, in general HaloFinder will be faster on all datasets than parallelHF for datasets that HaloFinder can handle. But HaloFinder is limited by the size of the dataset and by the cosmology of the dataset (small cosmologies mean that a large padding is needed when running in parallel, and this will defeat the point of parallelism).
-- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org

Same here. Thanks for the quick fix. Mike On Fri, Feb 24, 2012 at 4:31 PM, Geoffrey So <gsiisg@gmail.com> wrote:
That fixed it for me, I get identical haloes now running HOP in serial and parallel from the 64 cube test dataset.
From G.S.
On Fri, Feb 24, 2012 at 4:20 PM, Stephen Skory <s@skory.us> wrote:
Hi Mike,
I think I've fixed the original HOP problem you had. It's in my yt branch here:
https://bitbucket.org/sskory/yt/overview
could you try with the HOP (HaloFinder) with the new changes? Let me know, and if it works for you, it will go into the main branch.
With regards to your new problem, I tried the same swap of methods of calculating "total_mass" that we did in the HaloFinder, but I think there's something screwy going on with .quantities, and I haven't been able to track it down yet.
With regards to speed, in general HaloFinder will be faster on all datasets than parallelHF for datasets that HaloFinder can handle. But HaloFinder is limited by the size of the dataset and by the cosmology of the dataset (small cosmologies mean that a large padding is needed when running in parallel, and this will defeat the point of parallelism).
-- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
-- ********************************************************************* * * * Dr. Michael Kuhlen Theoretical Astrophysics Center * * email: mqk@astro.berkeley.edu UC Berkeley * * cell phone: (831) 588-1468 B-116 Hearst Field Annex # 3411 * * skype username: mikekuhlen Berkeley, CA 94720 * * * *********************************************************************

Hi Mike, This just leaves the parallelHF too-short-iterator, right? That error in this context means that an exception was thrown in the iterator. I wasn't able to reproduce it on your dataset ... if you change it to be expaned rather than an iterator, it should show where and what the exception is. -Matt On Fri, Feb 24, 2012 at 8:10 PM, Michael Kuhlen <mqk@astro.berkeley.edu> wrote:
Same here. Thanks for the quick fix.
Mike
On Fri, Feb 24, 2012 at 4:31 PM, Geoffrey So <gsiisg@gmail.com> wrote:
That fixed it for me, I get identical haloes now running HOP in serial and parallel from the 64 cube test dataset.
From G.S.
On Fri, Feb 24, 2012 at 4:20 PM, Stephen Skory <s@skory.us> wrote:
Hi Mike,
I think I've fixed the original HOP problem you had. It's in my yt branch here:
https://bitbucket.org/sskory/yt/overview
could you try with the HOP (HaloFinder) with the new changes? Let me know, and if it works for you, it will go into the main branch.
With regards to your new problem, I tried the same swap of methods of calculating "total_mass" that we did in the HaloFinder, but I think there's something screwy going on with .quantities, and I haven't been able to track it down yet.
With regards to speed, in general HaloFinder will be faster on all datasets than parallelHF for datasets that HaloFinder can handle. But HaloFinder is limited by the size of the dataset and by the cosmology of the dataset (small cosmologies mean that a large padding is needed when running in parallel, and this will defeat the point of parallelism).
-- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
-- ********************************************************************* * * * Dr. Michael Kuhlen Theoretical Astrophysics Center * * email: mqk@astro.berkeley.edu UC Berkeley * * cell phone: (831) 588-1468 B-116 Hearst Field Annex # 3411 * * skype username: mikekuhlen Berkeley, CA 94720 * * * ********************************************************************* _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org

This just leaves the parallelHF too-short-iterator, right?
Weird, I no longer get the iterator error. Must have been some kind of fluke. Sorry for the noise. Cheers, Mike On Fri, Feb 24, 2012 at 5:17 PM, Matthew Turk <matthewturk@gmail.com> wrote:
Hi Mike,
This just leaves the parallelHF too-short-iterator, right? That error in this context means that an exception was thrown in the iterator. I wasn't able to reproduce it on your dataset ... if you change it to be expaned rather than an iterator, it should show where and what the exception is.
-Matt
On Fri, Feb 24, 2012 at 8:10 PM, Michael Kuhlen <mqk@astro.berkeley.edu> wrote:
Same here. Thanks for the quick fix.
Mike
On Fri, Feb 24, 2012 at 4:31 PM, Geoffrey So <gsiisg@gmail.com> wrote:
That fixed it for me, I get identical haloes now running HOP in serial and parallel from the 64 cube test dataset.
From G.S.
On Fri, Feb 24, 2012 at 4:20 PM, Stephen Skory <s@skory.us> wrote:
Hi Mike,
I think I've fixed the original HOP problem you had. It's in my yt branch here:
https://bitbucket.org/sskory/yt/overview
could you try with the HOP (HaloFinder) with the new changes? Let me know, and if it works for you, it will go into the main branch.
With regards to your new problem, I tried the same swap of methods of calculating "total_mass" that we did in the HaloFinder, but I think there's something screwy going on with .quantities, and I haven't been able to track it down yet.
With regards to speed, in general HaloFinder will be faster on all datasets than parallelHF for datasets that HaloFinder can handle. But HaloFinder is limited by the size of the dataset and by the cosmology of the dataset (small cosmologies mean that a large padding is needed when running in parallel, and this will defeat the point of parallelism).
-- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
-- ********************************************************************* * * * Dr. Michael Kuhlen Theoretical Astrophysics Center * * email: mqk@astro.berkeley.edu UC Berkeley * * cell phone: (831) 588-1468 B-116 Hearst Field Annex # 3411 * * skype username: mikekuhlen Berkeley, CA 94720 * * * ********************************************************************* _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
-- ********************************************************************* * * * Dr. Michael Kuhlen Theoretical Astrophysics Center * * email: mqk@astro.berkeley.edu UC Berkeley * * cell phone: (831) 588-1468 B-116 Hearst Field Annex # 3411 * * skype username: mikekuhlen Berkeley, CA 94720 * * * *********************************************************************

Britton,
Out of curiosity, have you tried using ParallelHop, which is Stephen's specially designed parallel Hop. It is entirely distinct from simply running hop in parallel. Stephen, please correct me if I'm wrong.
You're correct, but Mike and I shouldn't be seeing the behavior we're seeing. Thanks for giving that tip. -- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice)
participants (5)
-
Britton Smith
-
Geoffrey So
-
Matthew Turk
-
Michael Kuhlen
-
Stephen Skory