Re: Rockstar on Blue Waters
Britton, Thank you very much, this seems to work, although not without a quirk - there may be a bug in the yt_astro_analysis installer. I am using the version of yt that you so kindly created for me a while ago - the one that fixes the HOP parallel scaling bug; it also includes some of my modifications to the artio frontend. When I installed yt_astro_analysis per online instructions, the installer kicked out my version of yt and activated another one (not sure if it was the system wide one or it installed its own). I had to juggle with installing and uninstalling yt and yt_astro_analysis until I figured out the order that worked: I installed yt_astro_analysis first and then reinstalled my custom yt on top of it. Then the rockstar error disappeared. n On 1/25/19 1:14 PM, Britton Smith wrote:
Hi Nick,
I think you're running into a bug related to running rockstar with Python3. This has been fixed as of the last major yt release and the release of the yt_astro_analysis package. I just tried a small rockstar test on Blue Waters using yt_astro_analysis and it worked for me. If you want to go that route, I recommend following the install instructions here: https://yt-astro-analysis.readthedocs.io/en/latest/ <https://urldefense.proofpoint.com/v2/url?u=https-3A__yt-2Dastro-2Danalysis.readthedocs.io_en_latest_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=7jSLZBQB_2gqGQ3elA922w&m=r4Gndd7QTUYD3ucQildHI3Q5sxVOHHymQblz3tz-oms&s=rVlwc2zGxag2GN2YXVXgI2eZzFOXuZFxTgsX1S_UUDs&e=> and using the --user flag when you install with pip.
Britton
On Wed, Jan 23, 2019 at 8:50 AM Nick Gnedin <ngnedin@gmail.com <mailto:ngnedin@gmail.com>> wrote:
Folks,
I am trying to run Rickstar on Blue Waters, and I am getting an error even for a very simple setup: 1 writer and 1 reader running on a single node on a small test run. The same job runs just fine on a workstation.
The script is simple: --------------- import yt, yt.funcs import glob, sys, os.path
yt.enable_parallelism() from yt.analysis_modules.halo_finding.rockstar.api import RockstarHaloFinder
root = "../B20" n = 1
dirs = ( "D" )
for dir in dirs: sets = glob.glob(root+"/"+dir+"/rei20c1_a0.*/*.art") sets.sort() ts = yt.DatasetSeries(sets) rh = RockstarHaloFinder(ts, num_readers=n, num_writers=n, outbase=root+"/"+dir+"/rs", dm_only=True, particle_type='N-BODY') rh.run() ---------------
and the batch script is trivial: -------- #!/bin/csh #PBS -l nodes=1:ppn=32:xe #PBS -l walltime=30:00,vmem=64gb #PBS -q debug #PBS -N rshfc.1 #PBS -e $PBS_JOBID.err #PBS -o $PBS_JOBID.out #PBS -m be
cd $PBS_O_WORKDIR
module load bwpy module load bwpy-mpi
setenv APRUN_XFER_LIMITS 1 limit stacksize unlimited
aprun -n 3 -N 3 python3 rs-hfc.py >& std.rshfc1 --------
The error I get is not very informative:
[Error] Couldn't open ^A:0! (Err: Name or service not known)
and the ^A string varies in separate runs, c.f.
[Error] Couldn't open 344ei20:0! (Err: Name or service not known)
which may happen if a null or corrupted pointer is printed with a %s specification, but that's all I can guess. Is there a way to obtain more information and localize the offending code segment?
_______________________________________________ yt-users mailing list -- yt-users@python.org <mailto:yt-users@python.org> To unsubscribe send an email to yt-users-leave@python.org <mailto:yt-users-leave@python.org>
Hi Nick, Ok, that's good to know. Eventually all of the halo finder functionality will be removed entirely from yt and only exist in yt_astro_analysis, which will hopefully make this a little less complicated. In addition, if you'd ever like to issue a pull request with your ARTIO modifications, I'd be happy to help facilitate that. Britton On Tue, Jan 29, 2019 at 12:07 PM Nick Gnedin <ngnedin@gmail.com> wrote:
Britton,
Thank you very much, this seems to work, although not without a quirk - there may be a bug in the yt_astro_analysis installer.
I am using the version of yt that you so kindly created for me a while ago - the one that fixes the HOP parallel scaling bug; it also includes some of my modifications to the artio frontend. When I installed yt_astro_analysis per online instructions, the installer kicked out my version of yt and activated another one (not sure if it was the system wide one or it installed its own). I had to juggle with installing and uninstalling yt and yt_astro_analysis until I figured out the order that worked: I installed yt_astro_analysis first and then reinstalled my custom yt on top of it. Then the rockstar error disappeared.
n
On 1/25/19 1:14 PM, Britton Smith wrote:
Hi Nick,
I think you're running into a bug related to running rockstar with Python3. This has been fixed as of the last major yt release and the release of the yt_astro_analysis package. I just tried a small rockstar test on Blue Waters using yt_astro_analysis and it worked for me. If you want to go that route, I recommend following the install instructions here: https://yt-astro-analysis.readthedocs.io/en/latest/ < https://urldefense.proofpoint.com/v2/url?u=https-3A__yt-2Dastro-2Danalysis.readthedocs.io_en_latest_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=7jSLZBQB_2gqGQ3elA922w&m=r4Gndd7QTUYD3ucQildHI3Q5sxVOHHymQblz3tz-oms&s=rVlwc2zGxag2GN2YXVXgI2eZzFOXuZFxTgsX1S_UUDs&e=
and using the --user flag when you install with pip.
Britton
On Wed, Jan 23, 2019 at 8:50 AM Nick Gnedin <ngnedin@gmail.com <mailto:ngnedin@gmail.com>> wrote:
Folks,
I am trying to run Rickstar on Blue Waters, and I am getting an error even for a very simple setup: 1 writer and 1 reader running on a single node on a small test run. The same job runs just fine on a workstation.
The script is simple: --------------- import yt, yt.funcs import glob, sys, os.path
yt.enable_parallelism() from yt.analysis_modules.halo_finding.rockstar.api import RockstarHaloFinder
root = "../B20" n = 1
dirs = ( "D" )
for dir in dirs: sets = glob.glob(root+"/"+dir+"/rei20c1_a0.*/*.art") sets.sort() ts = yt.DatasetSeries(sets) rh = RockstarHaloFinder(ts, num_readers=n, num_writers=n, outbase=root+"/"+dir+"/rs", dm_only=True, particle_type='N-BODY') rh.run() ---------------
and the batch script is trivial: -------- #!/bin/csh #PBS -l nodes=1:ppn=32:xe #PBS -l walltime=30:00,vmem=64gb #PBS -q debug #PBS -N rshfc.1 #PBS -e $PBS_JOBID.err #PBS -o $PBS_JOBID.out #PBS -m be
cd $PBS_O_WORKDIR
module load bwpy module load bwpy-mpi
setenv APRUN_XFER_LIMITS 1 limit stacksize unlimited
aprun -n 3 -N 3 python3 rs-hfc.py >& std.rshfc1 --------
The error I get is not very informative:
[Error] Couldn't open ^A:0! (Err: Name or service not known)
and the ^A string varies in separate runs, c.f.
[Error] Couldn't open 344ei20:0! (Err: Name or service not known)
which may happen if a null or corrupted pointer is printed with a %s specification, but that's all I can guess. Is there a way to obtain more information and localize the offending code segment?
_______________________________________________ yt-users mailing list -- yt-users@python.org <mailto:yt-users@python.org> To unsubscribe send an email to yt-users-leave@python.org <mailto:yt-users-leave@python.org>
participants (2)
-
Britton Smith
-
Nick Gnedin