Britton, Thank you, I will try RockStar. I also do not promise anything, but am willing to take a look at those MPI calls. n On 03/13/2018 12:44 PM, Britton Smith wrote:
Hi Nick,
Thanks for your report. Your timing data confirms my suspicion about which part of the code isn't scaling. The rejoining of the halo list after the halo finder is run makes heavy use of MPI broadcast calls. Reworking this shouldn't be too difficult, just a question of someone finding the time. If anyone is interested in trying to fix this, I can direct them to the places that need the attention.
Nick, in the mean time, you might try the Rockstar halo finder (either the one built-in to yt or the standalone), which scales quite well. The output from both Rockstar versions is loadable with yt.
Britton
On Tue, Mar 13, 2018 at 8:46 AM, Nick Gnedin
mailto:ngnedin@gmail.com> wrote: This is just a notice to the developer.
I have run a HOP halo finder for a large ART simulation (1024^3 particles) on BlueWaters with the following code:
import yt from yt.analysis_modules.halo_analysis.api import HaloCatalog yt.enable_parallelism() path = "/mnt/c/scratch/sciteam/ngnedin/PERM/B40/D" aexps = [ "0.1280", "0.1203", "0.1115", "0.1002", "0.0907" ] for aexp in aexps: d = yt.load(path+"/rei40_a"+aexp +"/rei40_a"+aexp+".art") hc = HaloCatalog(data_ds=d,finder_method='hop',output_dir=path+"/a="+aexp+"/hop",finder_kwargs={"dm_only":False,"ptype":"N-BODY"}) hc.create()
Because of memory constraints, I have to run it on at least 4 MPI ranks, and I noticed that yt implementation of HOP does not scale - a 4-rank job takes 14.5 hours and an 8-rank one takes 15.75 hours. Surely, halo finding for billion particles should scale better than that.
Here is some timing info (I can provide a full log if you care)
4-rank run: P000 yt : [INFO ] 2018-03-11 17:31:25,531 Parameters: ... P000 yt : [INFO ] 2018-03-11 18:36:49,781 Initializing HOP [1h] P002 yt : [INFO ] 2018-03-11 22:14:15,486 Parsing outputs [3.5h] P000 yt : [INFO ] 2018-03-12 08:06:38,231 Saving halo ... [10h]
8-rank run: P000 yt : [INFO ] 2018-03-10 21:03:27,226 Parameters: ... P000 yt : [INFO ] 2018-03-10 21:43:52,543 Initializing HOP [0.75h] P005 yt : [INFO ] 2018-03-10 23:52:10,389 Parsing outputs [2h] P000 yt : [INFO ] 2018-03-11 12:43:46,645 Saving halo ... [12.5h *]
* - does not scale at all.
n
_______________________________________________ yt-users mailing list -- yt-users@python.org mailto:yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org mailto:yt-users-leave@python.org
participants (1)
-
Nick Gnedin