yt-3.0 parallel_capable

Hi all, I don't recall when I first noticed this, but I'm finding that the parallel_root_only decorator doesn't seem to be working. A simple test is to run a script that only loads a dataset. The print_key_parameters function is inside of a parallel_root_only decorator, so all that should only be printed by the root processor, but when I run that script in parallel it is printed by all processors. I am still looking into this, but so far I have found that printing the value of parallel_capable inside this decorator always gives False even when running in parallel. Can someone confirm this? If so, does anyone know what's going on? Britton

Hi Britton, What happens when you call yt.enable_parallelism() at the top of the script? There were some adjustments to the way parallelism works related to YTEP-0019. See this PR: Do you have a short test script I can try running over here to confirm the issue? -Nathan On Tue, Apr 22, 2014 at 12:20 PM, Britton Smith <>wrote:

HI Nathan, Adding that call did not change the result. To be clear, this is what I ran: What is strange is that the logger was clearly set up for parallel as it correctly prints the different processor numbers. Enable parallelism shows that parallel_capable is True, so somewhere along the line this is getting changed. Britton On Tue, Apr 22, 2014 at 8:23 PM, Nathan Goldbaum <>wrote:

Hi again, I don't know how much this helps, but if I change line 43 of from parallel_capable = False to parallel_capable = True, things work as they are supposed to in parallel. Could there be an import somewhere that is resetting the value of parallel_capable? Britton On Tue, Apr 22, 2014 at 8:28 PM, Britton Smith <>wrote:

Hi Britton, It looks like parallel_capable on line 43 of *wants* to be a global mutable variable that is used elsewhere in yt, but it is not. Perhaps at one point it was imported into yt.funcs? I think you could fix this by explicitly importing parallel_cabable from into Hope that helps, Nathan On Tue, Apr 22, 2014 at 12:42 PM, Britton Smith <>wrote:

I think I've figured out what's going on here, but I don't have much experience with decorators in python so I'm not 100% sure. My guess is that parallel_root_only(func) is evaluated as soon as the @parallel_root_only decorator is encountered in each module as it's imported. At this point enable_parallelism hasn't been called yet, so parallel_capable is still False (it's initial assigned value) and the original function gets returned to all the processors (instead of None to everything except root). This would explain why Britton got the expected parallel behavior when he switched it so parallel_capable was initially assigned True instead (although this might create problems if it isn't actually being run in parallel). I tried enclosing a call to enable_parallelism inside parallel_root_only right before the "if parallel_capable" and this makes it behave as expected on both mpi and single processor runs. Unfortunately it means enable_parallelism gets called multiple times on each processor, which again probably isn't ideal behavior. I think the best way to solve this is to make sure enable_parallelism gets called before other imports are done, but I'm not sure how feasible this is. Someone who knows more about how decorators work can probably make a better suggestion. - Josh On Tue, Apr 22, 2014 at 2:14 PM, Britton Smith <>wrote:

Yup, I came to a similar conclusion: I agree that being careful with imports is probably the best solution. On Tue, Apr 22, 2014 at 3:04 PM, Josh Moloney <>wrote:

Hi all, Sorry, I was away nearly all day Monday and Tuesday, and I'm just now getting back online. I think Josh is right; the decorators are designed to swap out functions at import time, not at execution time. This worked just fine when we always made sure parallelism was set up at import time and never after. But now that things work slightly differently, so does the decorator stuff. We went with import-time changes because it can reduce branching logic calls, and function overhead, which can be important in tight loops. These are the functions that are affected: __check_directory _export_obj _export_ply parallel_root_only print_key_parameters print_stats _save_light_cone_solution _save_light_cone_stack save_profiles _upload_to_sketchfab _write_filtered_halo_list write_fits_file write_fits_image write_fits write_h5_file _write_halo_map _write_halo_mask write_hdf5 _write_light_ray _write_light_ray_solution write_out_arrays write_out_correlation write_png _write_seed_file write_simput_file write_spectrum I do not believe any of these should be significantly adversely affected by the inclusion of a conditional that is evaluated at execution time. I'll issue a PR that does this. -Matt On Tue, Apr 22, 2014 at 5:06 PM, Nathan Goldbaum <> wrote:

Hi all, Sorry for the quick followup, but it actually looks like the situation is more complex as we have a few decorators, not just @parallel_root_only. These decorators will also be affected: parallel_simple_proxy => only called by parallel_blocking_call => called almost exclusively in the halo and two point functions, but not in time-sensitive loops I think, so I will add runtime checking. parallel_splitter => not used, and I will remove this parallel_passthrough already has a runtime check. -Matt On Wed, Apr 23, 2014 at 7:13 AM, Matthew Turk <> wrote:

Hi Britton, What happens when you call yt.enable_parallelism() at the top of the script? There were some adjustments to the way parallelism works related to YTEP-0019. See this PR: Do you have a short test script I can try running over here to confirm the issue? -Nathan On Tue, Apr 22, 2014 at 12:20 PM, Britton Smith <>wrote:

HI Nathan, Adding that call did not change the result. To be clear, this is what I ran: What is strange is that the logger was clearly set up for parallel as it correctly prints the different processor numbers. Enable parallelism shows that parallel_capable is True, so somewhere along the line this is getting changed. Britton On Tue, Apr 22, 2014 at 8:23 PM, Nathan Goldbaum <>wrote:

Hi again, I don't know how much this helps, but if I change line 43 of from parallel_capable = False to parallel_capable = True, things work as they are supposed to in parallel. Could there be an import somewhere that is resetting the value of parallel_capable? Britton On Tue, Apr 22, 2014 at 8:28 PM, Britton Smith <>wrote:

Hi Britton, It looks like parallel_capable on line 43 of *wants* to be a global mutable variable that is used elsewhere in yt, but it is not. Perhaps at one point it was imported into yt.funcs? I think you could fix this by explicitly importing parallel_cabable from into Hope that helps, Nathan On Tue, Apr 22, 2014 at 12:42 PM, Britton Smith <>wrote:

I think I've figured out what's going on here, but I don't have much experience with decorators in python so I'm not 100% sure. My guess is that parallel_root_only(func) is evaluated as soon as the @parallel_root_only decorator is encountered in each module as it's imported. At this point enable_parallelism hasn't been called yet, so parallel_capable is still False (it's initial assigned value) and the original function gets returned to all the processors (instead of None to everything except root). This would explain why Britton got the expected parallel behavior when he switched it so parallel_capable was initially assigned True instead (although this might create problems if it isn't actually being run in parallel). I tried enclosing a call to enable_parallelism inside parallel_root_only right before the "if parallel_capable" and this makes it behave as expected on both mpi and single processor runs. Unfortunately it means enable_parallelism gets called multiple times on each processor, which again probably isn't ideal behavior. I think the best way to solve this is to make sure enable_parallelism gets called before other imports are done, but I'm not sure how feasible this is. Someone who knows more about how decorators work can probably make a better suggestion. - Josh On Tue, Apr 22, 2014 at 2:14 PM, Britton Smith <>wrote:

Yup, I came to a similar conclusion: I agree that being careful with imports is probably the best solution. On Tue, Apr 22, 2014 at 3:04 PM, Josh Moloney <>wrote:

Hi all, Sorry, I was away nearly all day Monday and Tuesday, and I'm just now getting back online. I think Josh is right; the decorators are designed to swap out functions at import time, not at execution time. This worked just fine when we always made sure parallelism was set up at import time and never after. But now that things work slightly differently, so does the decorator stuff. We went with import-time changes because it can reduce branching logic calls, and function overhead, which can be important in tight loops. These are the functions that are affected: __check_directory _export_obj _export_ply parallel_root_only print_key_parameters print_stats _save_light_cone_solution _save_light_cone_stack save_profiles _upload_to_sketchfab _write_filtered_halo_list write_fits_file write_fits_image write_fits write_h5_file _write_halo_map _write_halo_mask write_hdf5 _write_light_ray _write_light_ray_solution write_out_arrays write_out_correlation write_png _write_seed_file write_simput_file write_spectrum I do not believe any of these should be significantly adversely affected by the inclusion of a conditional that is evaluated at execution time. I'll issue a PR that does this. -Matt On Tue, Apr 22, 2014 at 5:06 PM, Nathan Goldbaum <> wrote:

Hi all, Sorry for the quick followup, but it actually looks like the situation is more complex as we have a few decorators, not just @parallel_root_only. These decorators will also be affected: parallel_simple_proxy => only called by parallel_blocking_call => called almost exclusively in the halo and two point functions, but not in time-sensitive loops I think, so I will add runtime checking. parallel_splitter => not used, and I will remove this parallel_passthrough already has a runtime check. -Matt On Wed, Apr 23, 2014 at 7:13 AM, Matthew Turk <> wrote:
participants (4)
Britton Smith
Josh Moloney
Matthew Turk
Nathan Goldbaum