Hi guys, For some reason r1161 is having trouble parallel projecting for me on Ranger. I checked out r1160 and the same script ran to completion. r1161 runs for about two minutes, and gives nothing. stderr gives: setenv MY_NSLOTS 32 cd /scratch/00649/tg457850/RD0036 ibrun /share/home/00649/tg457850/yt/bin/mpi4py ./slices.py Timeout during client startup. Killing remote processes...DONE stdout gives: TACC: Done. TACC: Starting up job 487036 TACC: Setting up parallel environment for MVAPICH ssh-based mpirun. TACC: Setup complete. Running job script. TACC: starting parallel tasks... Signal 15 received. Signal 15 received. Signal 15 received. Signal 15 received. Signal 15 received. Signal 15 received. Signal 15 received. TACC: MPI job exited with code: 1 TACC: Shutting down parallel environment. TACC: Shutdown complete. Exiting. TACC: Cleaning up after job: 487036 TACC: Done. _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
Can you maybe help us out a bit? Where does it hang? On Feb 4, 2009, at 11:59 AM, Stephen Skory <stephenskory@yahoo.com> wrote:
Hi guys,
For some reason r1161 is having trouble parallel projecting for me on Ranger. I checked out r1160 and the same script ran to completion. r1161 runs for about two minutes, and gives nothing.
stderr gives:
setenv MY_NSLOTS 32 cd /scratch/00649/tg457850/RD0036 ibrun /share/home/00649/tg457850/yt/bin/mpi4py ./slices.py Timeout during client startup. Killing remote processes...DONE
stdout gives:
TACC: Done. TACC: Starting up job 487036 TACC: Setting up parallel environment for MVAPICH ssh-based mpirun. TACC: Setup complete. Running job script. TACC: starting parallel tasks... Signal 15 received. Signal 15 received. Signal 15 received. Signal 15 received. Signal 15 received. Signal 15 received. Signal 15 received. TACC: MPI job exited with code: 1 TACC: Shutting down parallel environment. TACC: Shutdown complete. Exiting. TACC: Cleaning up after job: 487036 TACC: Done.
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
----- Original Message ----
From: Matthew Turk <matthewturk@gmail.com> To: "yt-dev@lists.spacepope.org" <yt-dev@lists.spacepope.org> Sent: Wednesday, February 4, 2009 12:02:46 PM Subject: Re: [Yt-dev] r1161 parallel projection problem
Can you maybe help us out a bit? Where does it hang?
What I got it what I sent previously. Seriously! That's all the output I have. Maybe if I turn on logging? Suggestions? _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
I'd say take a look at the diff; obviously that's what broke it, right? Pepper around that area with print statements. Also, yes, turn on logging. Furthermore, which field were you projecting? As I recall (I am away from my desk) r1161 was a fix for derived fields. Is it doing data reading and hanging? Is it preloading and hanging? I think at the bare minimum, we need to be able to replicate the projection to test it. The field would go a long way toward that. What happens if you project Ones? This is also a good debugging technique, since it doesn't hit the disk at all. :) -Matt On Feb 4, 2009, at 12:08 PM, Stephen Skory <stephenskory@yahoo.com> wrote:
----- Original Message ----
From: Matthew Turk <matthewturk@gmail.com> To: "yt-dev@lists.spacepope.org" <yt-dev@lists.spacepope.org> Sent: Wednesday, February 4, 2009 12:02:46 PM Subject: Re: [Yt-dev] r1161 parallel projection problem
Can you maybe help us out a bit? Where does it hang?
What I got it what I sent previously. Seriously! That's all the output I have.
Maybe if I turn on logging? Suggestions? _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________ _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
r1162 seems to have fixed the issue. _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
Okay, that's super weird, because it absolutely *shouldn't* have. Were you running profiles? On Wed, Feb 4, 2009 at 1:30 PM, Stephen Skory <stephenskory@yahoo.com> wrote:
r1162 seems to have fixed the issue.
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________ _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Oh! Were you running the halo profiler? THAT was likely where the problem was -- I introduced a bug in parallel profiles (now fixed) that was causing a crash, and then it never got to the projection stage. On Wed, Feb 4, 2009 at 1:32 PM, Matthew Turk <matthewturk@gmail.com> wrote:
Okay, that's super weird, because it absolutely *shouldn't* have. Were you running profiles?
On Wed, Feb 4, 2009 at 1:30 PM, Stephen Skory <stephenskory@yahoo.com> wrote:
r1162 seems to have fixed the issue.
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________ _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
You know what, let me do a quick test of r1160, r1161 and r1162 and see if I get the same as before... I wasn't running halo profiler. _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
You know what, let me do a quick test of r1160, r1161 and r1162 and see if I get the same as before... I wasn't running halo profiler.
Frick. Sometimes I hate computers. Now all the versions work fine. Perhaps something got corrupted and when installed r1160 it overwrote what was wrong. So, sorry about the inbox filler! _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
No worries. Glad it got sorted out! On Wed, Feb 4, 2009 at 2:31 PM, Stephen Skory <stephenskory@yahoo.com> wrote:
You know what, let me do a quick test of r1160, r1161 and r1162 and see if I get the same as before... I wasn't running halo profiler.
Frick. Sometimes I hate computers. Now all the versions work fine. Perhaps something got corrupted and when installed r1160 it overwrote what was wrong. So, sorry about the inbox filler!
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________ _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
participants (2)
-
Matthew Turk
-
Stephen Skory