Re: [yt-users] error when running on cluster nodes

Tsz Ka,
I used the same python (~/yt-ppc64/bin/python) as I did in the command line. I think I tried python2.6 but I got the same error. (Actually how are they different?) The machine I am using is the Turing cluster in UI (http://www.cse.illinois.edu/turing/), which is an Apple Xserve cluster using G5 processors. You can find on the website some information about it. Sorry I am not that familiar with cluster issues. I suppose the nodes have full installation, though I don't know where to check. I am using the default compilers on Turing, which are gcc for C/C++ and xlf for Fortran.
I'm going to chime in here. I noticed that on their FAQ page: http://www.cse.illinois.edu/turing/faq.html There's something about path problems on compute nodes. Your error could be due to a library problem (i.e. DYLD_LIBRARY_PATH). Have you tried submitting your job with a "#PBS -V" that keeps your environment for the job? Matt may have some other bright ideas... Stephen Skory stephenskory@yahoo.com http://stephenskory.com/ 510.621.3687 (google voice)

Hi Stephen, I did try to include the -V flag in the interactive mode (i.e. qsub -I -V) and I saw no difference in the error outcome. Thanks, Tsz Ka On 2/17/2011 11:19 AM, Stephen Skory wrote:
Tsz Ka,
I used the same python (~/yt-ppc64/bin/python) as I did in the command line. I think I tried python2.6 but I got the same error. (Actually how are they different?) The machine I am using is the Turing cluster in UI (http://www.cse.illinois.edu/turing/), which is an Apple Xserve cluster using G5 processors. You can find on the website some information about it. Sorry I am not that familiar with cluster issues. I suppose the nodes have full installation, though I don't know where to check. I am using the default compilers on Turing, which are gcc for C/C++ and xlf for Fortran. I'm going to chime in here. I noticed that on their FAQ page:
http://www.cse.illinois.edu/turing/faq.html
There's something about path problems on compute nodes. Your error could be due to a library problem (i.e. DYLD_LIBRARY_PATH). Have you tried submitting your job with a "#PBS -V" that keeps your environment for the job?
Matt may have some other bright ideas...
Stephen Skory stephenskory@yahoo.com http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org

Hi Tsz Ka, I have been thinking about it, and I am afraid I don't have any bright ideas. The only thing I can think of is that we may be able to get more information about the dynamic loading problem by having you run: otool -L /turing/home/tszkali2/yt-ppc64/lib/python2.6/site-packages/matplotlib/_path.so I would again caution though that if you attempted to create a statically linked library, this likely resulted in major breakages. The static linking process described on the wiki is only really designed for the Cray Compute Node Linux distribution, and it has a fundamentally different mode of linking than Darwin/OSX. Could you send the output of that to us, run on both the compute nodes and on the head node? I presume loading yt still runs on the head node? Thanks, Matt On Thu, Feb 17, 2011 at 12:38 PM, Tsz Ka Li <tszkali2@illinois.edu> wrote:
Hi Stephen, I did try to include the -V flag in the interactive mode (i.e. qsub -I -V) and I saw no difference in the error outcome. Thanks, Tsz Ka
On 2/17/2011 11:19 AM, Stephen Skory wrote:
Tsz Ka,
I used the same python (~/yt-ppc64/bin/python) as I did in the command line. I think I tried python2.6 but I got the same error. (Actually how are they different?) The machine I am using is the Turing cluster in UI (http://www.cse.illinois.edu/turing/), which is an Apple Xserve cluster using G5 processors. You can find on the website some information about it. Sorry I am not that familiar with cluster issues. I suppose the nodes have full installation, though I don't know where to check. I am using the default compilers on Turing, which are gcc for C/C++ and xlf for Fortran.
I'm going to chime in here. I noticed that on their FAQ page:
http://www.cse.illinois.edu/turing/faq.html
There's something about path problems on compute nodes. Your error could be due to a library problem (i.e. DYLD_LIBRARY_PATH). Have you tried submitting your job with a "#PBS -V" that keeps your environment for the job?
Matt may have some other bright ideas...
Stephen Skory stephenskory@yahoo.com http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org

Hi Matt, I am afraid otool is not installed on the Turing cluster. (I got command not found.) Do you know of any substitute for that tool? Actually I did not follow the whole Cray installation instructions, but just those in the Section "Running on a compute node". So basically I just copied the yt directory to the scratch disk and used the python there. Maybe this does not make sense to you. Anyway sorry for the confusion. Thanks, Tsz Ka On 2/18/2011 9:26 AM, Matthew Turk wrote:
Hi Tsz Ka,
I have been thinking about it, and I am afraid I don't have any bright ideas. The only thing I can think of is that we may be able to get more information about the dynamic loading problem by having you run:
otool -L /turing/home/tszkali2/yt-ppc64/lib/python2.6/site-packages/matplotlib/_path.so
I would again caution though that if you attempted to create a statically linked library, this likely resulted in major breakages. The static linking process described on the wiki is only really designed for the Cray Compute Node Linux distribution, and it has a fundamentally different mode of linking than Darwin/OSX.
Could you send the output of that to us, run on both the compute nodes and on the head node? I presume loading yt still runs on the head node?
Thanks,
Matt
On Thu, Feb 17, 2011 at 12:38 PM, Tsz Ka Li<tszkali2@illinois.edu> wrote:
Hi Stephen, I did try to include the -V flag in the interactive mode (i.e. qsub -I -V) and I saw no difference in the error outcome. Thanks, Tsz Ka
On 2/17/2011 11:19 AM, Stephen Skory wrote:
Tsz Ka,
I used the same python (~/yt-ppc64/bin/python) as I did in the command line. I think I tried python2.6 but I got the same error. (Actually how are they different?) The machine I am using is the Turing cluster in UI (http://www.cse.illinois.edu/turing/), which is an Apple Xserve cluster using G5 processors. You can find on the website some information about it. Sorry I am not that familiar with cluster issues. I suppose the nodes have full installation, though I don't know where to check. I am using the default compilers on Turing, which are gcc for C/C++ and xlf for Fortran. I'm going to chime in here. I noticed that on their FAQ page:
http://www.cse.illinois.edu/turing/faq.html
There's something about path problems on compute nodes. Your error could be due to a library problem (i.e. DYLD_LIBRARY_PATH). Have you tried submitting your job with a "#PBS -V" that keeps your environment for the job?
Matt may have some other bright ideas...
Stephen Skory stephenskory@yahoo.com http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org

Hi Tsz Ka, For me, otool is just in /usr/bin -- so it sounds like it may not be there. There is a chance that 'ldd' will also work for this purpose. It sounds to me like this could be an issue of the DYLD_LIBRARY_PATH being confused, as well as possibly being unable to load various libraries. I'm not sure that I'm able to debug this from a distance -- the combination of the ppc64 cluster (which I am unfamiliar with) and the particulars of the local disk are making things a bit difficult. Any chance you could run the analysis on the head node? How big is your dataset? It may not benefit too much from parallelism. -Matt On Fri, Feb 18, 2011 at 3:39 PM, Tsz Ka Li <tszkali2@illinois.edu> wrote:
Hi Matt, I am afraid otool is not installed on the Turing cluster. (I got command not found.) Do you know of any substitute for that tool? Actually I did not follow the whole Cray installation instructions, but just those in the Section "Running on a compute node". So basically I just copied the yt directory to the scratch disk and used the python there. Maybe this does not make sense to you. Anyway sorry for the confusion. Thanks, Tsz Ka
On 2/18/2011 9:26 AM, Matthew Turk wrote:
Hi Tsz Ka,
I have been thinking about it, and I am afraid I don't have any bright ideas. The only thing I can think of is that we may be able to get more information about the dynamic loading problem by having you run:
otool -L /turing/home/tszkali2/yt-ppc64/lib/python2.6/site-packages/matplotlib/_path.so
I would again caution though that if you attempted to create a statically linked library, this likely resulted in major breakages. The static linking process described on the wiki is only really designed for the Cray Compute Node Linux distribution, and it has a fundamentally different mode of linking than Darwin/OSX.
Could you send the output of that to us, run on both the compute nodes and on the head node? I presume loading yt still runs on the head node?
Thanks,
Matt
On Thu, Feb 17, 2011 at 12:38 PM, Tsz Ka Li<tszkali2@illinois.edu> wrote:
Hi Stephen, I did try to include the -V flag in the interactive mode (i.e. qsub -I -V) and I saw no difference in the error outcome. Thanks, Tsz Ka
On 2/17/2011 11:19 AM, Stephen Skory wrote:
Tsz Ka,
I used the same python (~/yt-ppc64/bin/python) as I did in the command line. I think I tried python2.6 but I got the same error. (Actually how are they different?) The machine I am using is the Turing cluster in UI (http://www.cse.illinois.edu/turing/), which is an Apple Xserve cluster using G5 processors. You can find on the website some information about it. Sorry I am not that familiar with cluster issues. I suppose the nodes have full installation, though I don't know where to check. I am using the default compilers on Turing, which are gcc for C/C++ and xlf for Fortran.
I'm going to chime in here. I noticed that on their FAQ page:
http://www.cse.illinois.edu/turing/faq.html
There's something about path problems on compute nodes. Your error could be due to a library problem (i.e. DYLD_LIBRARY_PATH). Have you tried submitting your job with a "#PBS -V" that keeps your environment for the job?
Matt may have some other bright ideas...
Stephen Skory stephenskory@yahoo.com http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org

Hi Matt, I understand the difficulty. The problem is I cannot run programs very long on head node. Right now I am trying to build a merger tree for a 256^3 dark matter only run. The data size depends on the number of output dumps I use. Perhaps I can reduce the data size to make the analysis more efficient. I am not sure if these are the ldd outputs you might want. You may ignore them you find them not useful. on compute node: [tszkali2@tur2-26 ~]$ ldd /turing/home/tszkali2/yt-ppc64/lib/python2.6/site-packages/matplotlib/_path.so linux-vdso32.so.1 => (0x00100000) libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x6fe54000) libm.so.6 => /lib/libm.so.6 (0x6fd63000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x6fd28000) libpthread.so.0 => /lib/libpthread.so.0 (0x6fce5000) libc.so.6 => /lib/libc.so.6 (0x6fb21000) /lib/ld.so.1 (0x20510000) on head node: [tszkali2@turing-3 tszkali2]$ ldd /turing/home/tszkali2/yt-ppc64/lib/python2.6/site-packages/matplotlib/_path.so linux-vdso32.so.1 => (0x00100000) libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x6fe54000) libm.so.6 => /lib/libm.so.6 (0x6fd63000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x6fd2b000) libpthread.so.0 => /lib/libpthread.so.0 (0x6fce8000) libc.so.6 => /lib/libc.so.6 (0x6fb24000) /lib/ld.so.1 (0x2033b000) I am grateful for your input anyway. Thanks, Tsz Ka On 2/18/2011 2:49 PM, Matthew Turk wrote:
Hi Tsz Ka,
For me, otool is just in /usr/bin -- so it sounds like it may not be there. There is a chance that 'ldd' will also work for this purpose.
It sounds to me like this could be an issue of the DYLD_LIBRARY_PATH being confused, as well as possibly being unable to load various libraries. I'm not sure that I'm able to debug this from a distance -- the combination of the ppc64 cluster (which I am unfamiliar with) and the particulars of the local disk are making things a bit difficult. Any chance you could run the analysis on the head node? How big is your dataset? It may not benefit too much from parallelism.
-Matt
On Fri, Feb 18, 2011 at 3:39 PM, Tsz Ka Li<tszkali2@illinois.edu> wrote:
Hi Matt, I am afraid otool is not installed on the Turing cluster. (I got command not found.) Do you know of any substitute for that tool? Actually I did not follow the whole Cray installation instructions, but just those in the Section "Running on a compute node". So basically I just copied the yt directory to the scratch disk and used the python there. Maybe this does not make sense to you. Anyway sorry for the confusion. Thanks, Tsz Ka
On 2/18/2011 9:26 AM, Matthew Turk wrote:
Hi Tsz Ka,
I have been thinking about it, and I am afraid I don't have any bright ideas. The only thing I can think of is that we may be able to get more information about the dynamic loading problem by having you run:
otool -L /turing/home/tszkali2/yt-ppc64/lib/python2.6/site-packages/matplotlib/_path.so
I would again caution though that if you attempted to create a statically linked library, this likely resulted in major breakages. The static linking process described on the wiki is only really designed for the Cray Compute Node Linux distribution, and it has a fundamentally different mode of linking than Darwin/OSX.
Could you send the output of that to us, run on both the compute nodes and on the head node? I presume loading yt still runs on the head node?
Thanks,
Matt
On Thu, Feb 17, 2011 at 12:38 PM, Tsz Ka Li<tszkali2@illinois.edu> wrote:
Hi Stephen, I did try to include the -V flag in the interactive mode (i.e. qsub -I -V) and I saw no difference in the error outcome. Thanks, Tsz Ka
On 2/17/2011 11:19 AM, Stephen Skory wrote:
Tsz Ka,
I used the same python (~/yt-ppc64/bin/python) as I did in the command line. I think I tried python2.6 but I got the same error. (Actually how are they different?) The machine I am using is the Turing cluster in UI (http://www.cse.illinois.edu/turing/), which is an Apple Xserve cluster using G5 processors. You can find on the website some information about it. Sorry I am not that familiar with cluster issues. I suppose the nodes have full installation, though I don't know where to check. I am using the default compilers on Turing, which are gcc for C/C++ and xlf for Fortran. I'm going to chime in here. I noticed that on their FAQ page:
http://www.cse.illinois.edu/turing/faq.html
There's something about path problems on compute nodes. Your error could be due to a library problem (i.e. DYLD_LIBRARY_PATH). Have you tried submitting your job with a "#PBS -V" that keeps your environment for the job?
Matt may have some other bright ideas...
Stephen Skory stephenskory@yahoo.com http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
participants (3)
-
Matthew Turk
-
Stephen Skory
-
Tsz Ka Li