Hi Yingchao,

I don't think covering grid construction is parallelized at the moment - in your script each processor is creating its own copy of the covering grid. I haven't tried experimenting with this though so it's possible I'm misreading the code or missing something happening elsewhere to make this operation parallel.

One way around this with the current version of yt would be to handle the parallelism yourself - i.e. have each processor construct a subvolume of the covering grid and then combine it yourself using e.g. mpi4py.

I think it might also be possible to make yt construct the covering grid in a parallel-aware fashion. The relevant code is here:

https://github.com/yt-project/yt/blob/master/yt/data_objects/construction_data_containers.py#L686

One would need to make the loop over io chunks a parallel loop and then add a reduction step at the end.

If you don't feel like taking on this task yourself please feel free to open an issue about it on github so that we don't lose track of the feature request.

-Nathan

On Tue, Sep 19, 2017 at 7:53 PM, Yingchao Lu <yingchao.lu@gmail.com> wrote:
Dear yt users,

I am trying to read some AMR data and convert them into 3D array. The test code:
######################### BEGIN ######################### 
import yt
from time import time
yt.enable_parallelism()

ds = yt.load("flash_hdf5_plt_cnt_0000")
tstart = time()
cg = ds.covering_grid(level=ds.max_level, left_edge=ds.domain_left_edge, dims=ds.domain_dimensions*2**ds.max_level)
cg['dens']

if yt_isroot(): print "It takes {0}s".format(time()-tstart)
######################### END ######################### 

I tried to run to serial or parallel on an interactive session on stampede:
 ######################### BEGIN ######################### 
[yclu@ test]$ ls
flash_hdf5_plt_cnt_0000  test.py
[yclu@ test]$ python test.py
It takes 34.0571820736s
[yclu@ test]$ export OMP_NUM_THREADS=68
[yclu@ test]$ python test.py           
It takes 33.1969199181s
[yclu@ test]$ export OMP_NUM_THREADS=1 
[yclu@ test]$ mpirun -np 68 python test.py
It takes 58.0391800404s
######################### END ######################### 

The time does not seem to be reduced by parallelism. And multi-process seem to have huge communication overhead. Is there a way to increase the speed by parallelism? 

Thanks,
Yingchao


_______________________________________________
yt-users mailing list
yt-users@lists.spacepope.org
http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org