Hi,

This is definitely something that we know needs improving. We have plans for a significant overhaul of the field system and one of the major goals of the overhaul is to reduce the cost of the field detection step when loading a dataset. Currently the field system generates the derived field graph in a somewhat baroque fashion, relying on Python exception handling on chained calls to functions that operate on numpy arrays. This process is not as efficient as if we somehow encoded the derived field dependency graph symbolically and relied on the graph itself to generate the derived field list given a set of available on-disk fields.

This work is ongoing and unfortunately is not ready to be used yet. As you noted field detection is not parallelized so I don't think there's much to be done architecturally to speed up your workflow right now. Hopefully in a year or so we'll be releasing a version of yt that has a much faster field detection system such that you won't notice that it's not parallelized simply because it's so much quicker!

That doesn't help you right now of course. To be honest I don't normally hear from users with workflows where the major overhead is the field detection step. We definitely notice when developing yt (we estimate about half the time in the unit tests is spent doing field detection over and over on different test datasets), which is why we're so gung ho on making things faster. If you could share more details about what your derived fields look like, either by sharing your code or even better by making a reduced minimal example that demonstrates the slowdown you're hitting, one of us might be able to suggest a way to speed up field detection for your derived fields based on something happening in your scropt, or possibly allow us to spot some low hanging fruit for optimization in field system as it currently exists in yt if you happen to be hitting an easy-to-fix scaling issue we're not aware of yet.

-Nathan



On Tue, Jul 31, 2018 at 5:43 AM, Rajika Kuruwita <rajika.kuruwita@anu.edu.au> wrote:
Over my years of using yt I have created many derived fields that are dependant on other derived fields and have various scripts that use them. So I have compiled all the definitions of fields and the yt.add_field() lines into one script which is now a module. One problem I have encountered is that, it doesn't seem that the derivation of these fields has been parallelised, as made evident by the fact that the time for ds.derived_field_list to run is independent of the number of processors available, even with yt.enable_parallelism(). Is this something that is planned to be implemented in the future?

This problem is further aggravated by the fact that after loading a file and attempting to obtain one of the fields (e.g. dd['Corrected_val_x']) seems to actually force the calculation of every possible field added to yt.

Has anyone determined a faster way of loading multiple derived fields?
_______________________________________________
yt-users mailing list -- yt-users@python.org
To unsubscribe send an email to yt-users-leave@python.org