Hey everyone, I am trying to use Parallel Hop in YT to analyze enzo data. I installed mpi4py, forthon and did the whole "python setup.py install" afterwards. I next try to find halos with this code on 2 nodes with 16 processors each (32 total): from yt.mods import * from yt.analysis_modules.halo_finding.api import * i = 5 filename = 'RD%04d/RedshiftOutput%04d' % (i,i) pf = load(filename) halos = parallelHF(pf) dumpn = 'RD%04d/MergerHalos' %i halos.dump(dumpn) The output is rather long since it has 32 processors of output. The full output is here: http://paste.yt-project.org/show/2761/ However, here are some highlights: $ mpirun -np 32 python findhalo.py --parallel Reported: 2 (out of 2) daemons - 32 (out of 32) procs yt : [INFO ] 2012-10-04 22:54:51,855 Global parallel computation enabled: 1 / 32 yt : [INFO ] 2012-10-04 22:54:51,855 Global parallel computation enabled: 21 / 32 .... yt : [INFO ] 2012-10-04 22:54:51,858 Global parallel computation enabled: 10 / 32 -------------------------------------------------------------------------- An MPI process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your MPI job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged. The process that invoked fork was: Local host: mu0002.localdomain (PID 9624) MPI_COMM_WORLD rank: 3 If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -------------------------------------------------------------------------- P000 yt : [INFO ] 2012-10-04 22:54:55,571 Parameters: current_time = 89.9505268216 P000 yt : [INFO ] 2012-10-04 22:54:55,571 Parameters: domain_dimensions = [1024 1024 1024] P000 yt : [INFO ] 2012-10-04 22:54:55,572 Parameters: domain_left_edge = [ 0. 0. 0.] P000 yt : [INFO ] 2012-10-04 22:54:55,572 Parameters: domain_right_edge = [ 1. 1. 1.] P000 yt : [INFO ] 2012-10-04 22:54:55,573 Parameters: cosmological_simulation = 1 P000 yt : [INFO ] 2012-10-04 22:54:55,573 Parameters: current_redshift = 5.99999153008 P000 yt : [INFO ] 2012-10-04 22:54:55,573 Parameters: omega_lambda = 0.724 ... P000 yt : [INFO ] 2012-10-04 23:04:33,681 Getting particle_index using ParticleIO P001 yt : [INFO ] 2012-10-04 23:05:09,222 Getting particle_index using ParticleIO Traceback (most recent call last): File "findhalo.py", line 7, in <module> halos = parallelHF(pf) File "/usr/projects/magnetic/jsmidt/yt-x86_64/lib/python2.7/site-packages/yt-2.5dev-py2.7-linux-x86_64.egg/yt/analysis_modules/halo_finding/halo_objects.py", line 2268, in __init__ premerge=premerge, tree=self.tree) File "/usr/projects/magnetic/jsmidt/yt-x86_64/lib/python2.7/site-packages/yt-2.5dev-py2.7-linux-x86_64.egg/yt/analysis_modules/halo_finding/halo_objects.py", line 1639, in __init__ HaloList.__init__(self, data_source, dm_only) File "/usr/projects/magnetic/jsmidt/yt-x86_64/lib/python2.7/site-packages/yt-2.5dev-py2.7-linux-x86_64.egg/yt/analysis_modules/halo_finding/halo_objects.py", line 1067, in __init__ self._run_finder() File "/usr/projects/magnetic/jsmidt/yt-x86_64/lib/python2.7/site-packages/yt-2.5dev-py2.7-linux-x86_64.egg/yt/analysis_modules/halo_finding/halo_objects.py", line 1648, in _run_finder if np.unique(self.particle_fields["particle_index"]).size != \ File "/usr/projects/magnetic/jsmidt/yt-x86_64/lib/python2.7/site-packages/numpy/lib/arraysetops.py", line 193, in unique return ar[flag] MemoryError mpirun: killing job... -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 6295 on node mu0001.localdomain exited on signal 0 (Unknown signal 0). -------------------------------------------------------------------------- 32 total processes killed (some possibly by mpirun during cleanup) Anyways, if anyone recognizes this or has any advice it would be appreciated. Thanks. -- ------------------------------------------------------------------------ Joseph Smidt <josephsmidt@gmail.com> Theoretical Division P.O. Box 1663, Mail Stop B283 Los Alamos, NM 87545 Office: 505-665-9752 Fax: 505-667-1931
Hi Joseph, The last line which says MemoryError makes me want to attribute to the machine running out of memory, I'm just guessing so it might not be the case at all. Can you tell us a little bit about the memory available to your machine (GB per core), and the number of particles in your simulation? In my past experience with Parallel HOP I believe a safe guideline has been to have 1MB of RAM per 5000 particles. YT has since been optimized further so that number should be smaller now, but it would be a safe place to start if you're having trouble. I'm guessing if you have 1 particle per cell, then 1024**2/5000/32 ~ 6710, so you'll need about 7GB per core if using 32 cores. If your machine has 4GB per core, you might want to try 64 cores for the job. Hope this helps. From G.S. On Thu, Oct 4, 2012 at 10:14 PM, Joseph Smidt <josephsmidt@gmail.com> wrote:
Hey everyone,
I am trying to use Parallel Hop in YT to analyze enzo data. I installed mpi4py, forthon and did the whole "python setup.py install" afterwards. I next try to find halos with this code on 2 nodes with 16 processors each (32 total):
from yt.mods import * from yt.analysis_modules.halo_finding.api import *
i = 5 filename = 'RD%04d/RedshiftOutput%04d' % (i,i) pf = load(filename) halos = parallelHF(pf)
dumpn = 'RD%04d/MergerHalos' %i halos.dump(dumpn)
The output is rather long since it has 32 processors of output. The full output is here: http://paste.yt-project.org/show/2761/
However, here are some highlights:
$ mpirun -np 32 python findhalo.py --parallel Reported: 2 (out of 2) daemons - 32 (out of 32) procs yt : [INFO ] 2012-10-04 22:54:51,855 Global parallel computation enabled: 1 / 32 yt : [INFO ] 2012-10-04 22:54:51,855 Global parallel computation enabled: 21 / 32 .... yt : [INFO ] 2012-10-04 22:54:51,858 Global parallel computation enabled: 10 / 32 -------------------------------------------------------------------------- An MPI process has executed an operation involving a call to the "fork()" system call to create a child process. Open MPI is currently operating in a condition that could result in memory corruption or other system errors; your MPI job may hang, crash, or produce silent data corruption. The use of fork() (or system() or other calls that create child processes) is strongly discouraged.
The process that invoked fork was:
Local host: mu0002.localdomain (PID 9624) MPI_COMM_WORLD rank: 3
If you are *absolutely sure* that your application will successfully and correctly survive a call to fork(), you may disable this warning by setting the mpi_warn_on_fork MCA parameter to 0. -------------------------------------------------------------------------- P000 yt : [INFO ] 2012-10-04 22:54:55,571 Parameters: current_time = 89.9505268216 P000 yt : [INFO ] 2012-10-04 22:54:55,571 Parameters: domain_dimensions = [1024 1024 1024] P000 yt : [INFO ] 2012-10-04 22:54:55,572 Parameters: domain_left_edge = [ 0. 0. 0.] P000 yt : [INFO ] 2012-10-04 22:54:55,572 Parameters: domain_right_edge = [ 1. 1. 1.] P000 yt : [INFO ] 2012-10-04 22:54:55,573 Parameters: cosmological_simulation = 1 P000 yt : [INFO ] 2012-10-04 22:54:55,573 Parameters: current_redshift = 5.99999153008 P000 yt : [INFO ] 2012-10-04 22:54:55,573 Parameters: omega_lambda = 0.724 ... P000 yt : [INFO ] 2012-10-04 23:04:33,681 Getting particle_index using ParticleIO P001 yt : [INFO ] 2012-10-04 23:05:09,222 Getting particle_index using ParticleIO Traceback (most recent call last): File "findhalo.py", line 7, in <module> halos = parallelHF(pf) File "/usr/projects/magnetic/jsmidt/yt-x86_64/lib/python2.7/site-packages/yt-2.5dev-py2.7-linux-x86_64.egg/yt/analysis_modules/halo_finding/halo_objects.py", line 2268, in __init__ premerge=premerge, tree=self.tree) File "/usr/projects/magnetic/jsmidt/yt-x86_64/lib/python2.7/site-packages/yt-2.5dev-py2.7-linux-x86_64.egg/yt/analysis_modules/halo_finding/halo_objects.py", line 1639, in __init__ HaloList.__init__(self, data_source, dm_only) File "/usr/projects/magnetic/jsmidt/yt-x86_64/lib/python2.7/site-packages/yt-2.5dev-py2.7-linux-x86_64.egg/yt/analysis_modules/halo_finding/halo_objects.py", line 1067, in __init__ self._run_finder() File "/usr/projects/magnetic/jsmidt/yt-x86_64/lib/python2.7/site-packages/yt-2.5dev-py2.7-linux-x86_64.egg/yt/analysis_modules/halo_finding/halo_objects.py", line 1648, in _run_finder if np.unique(self.particle_fields["particle_index"]).size != \ File "/usr/projects/magnetic/jsmidt/yt-x86_64/lib/python2.7/site-packages/numpy/lib/arraysetops.py", line 193, in unique return ar[flag] MemoryError mpirun: killing job... -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 6295 on node mu0001.localdomain exited on signal 0 (Unknown signal 0). -------------------------------------------------------------------------- 32 total processes killed (some possibly by mpirun during cleanup)
Anyways, if anyone recognizes this or has any advice it would be appreciated. Thanks.
-- ------------------------------------------------------------------------ Joseph Smidt <josephsmidt@gmail.com>
Theoretical Division P.O. Box 1663, Mail Stop B283 Los Alamos, NM 87545 Office: 505-665-9752 Fax: 505-667-1931 _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
participants (2)
-
Geoffrey So
-
Joseph Smidt