[yt-dev] missing objects using parallel_objects()

23 Mar 2012

      Hi everyone,

I am trying to improve the efficiency of my analysis script which
calculates attributes of haloes, after watching the workshop video on YT
parallelism I was motivated to give parallel_objects a try.   I am
basically trying to calculate, then output some properties of each haloes
found by parallel HOP.  It turns out that even if I just output the (DM
particles) mass of each halo, I am missing halo(s).  It doesn't matter if I
run this in serial or parallel, I end up missing the same amount of haloes
if I use parallel_objects() like:

haloes = LoadHaloes(pf, HaloListname)

for sto, halo in parallel_objects(haloes, num_procs, storage = my_storage):

to iterate over the haloes, and the problem goes away if I just switch to:

for halo in haloes:

I noticed this when I tried it on an 800 cube dataset with around 50k
haloes, I only get 4k haloes in return, I then tried to narrow things down,
and it ruled out the way I am calculating the attributes, because I can
just output the mass from halo.total_mass() that was basically read in from
the .h5 file and I'd end up missing halo using the parallel_objects.   For
128 cube dataset with 85 haloes, I'd end up missing 3 and get 82 back, and
for 64 cube dataset with 22 haloes, I'd get back 21 haloes.

Has anyone else encountered this behavior or can confirm it?

From
G.S.

[yt-dev] missing objects using parallel_objects()

Geoffrey So