Hi,
This is definitely something that we know needs improving. We have plans
for a significant overhaul of the field system and one of the major goals
of the overhaul is to reduce the cost of the field detection step when
loading a dataset. Currently the field system generates the derived field
graph in a somewhat baroque fashion, relying on Python exception handling
on chained calls to functions that operate on numpy arrays. This process is
not as efficient as if we somehow encoded the derived field dependency
graph symbolically and relied on the graph itself to generate the derived
field list given a set of available on-disk fields.
This work is ongoing and unfortunately is not ready to be used yet. As you
noted field detection is not parallelized so I don't think there's much to
be done architecturally to speed up your workflow right now. Hopefully in a
year or so we'll be releasing a version of yt that has a much faster field
detection system such that you won't notice that it's not parallelized
simply because it's so much quicker!
That doesn't help you right now of course. To be honest I don't normally
hear from users with workflows where the major overhead is the field
detection step. We definitely notice when developing yt (we estimate about
half the time in the unit tests is spent doing field detection over and
over on different test datasets), which is why we're so gung ho on making
things faster. If you could share more details about what your derived
fields look like, either by sharing your code or even better by making a
reduced minimal example that demonstrates the slowdown you're hitting, one
of us might be able to suggest a way to speed up field detection for your
derived fields based on something happening in your scropt, or possibly
allow us to spot some low hanging fruit for optimization in field system as
it currently exists in yt if you happen to be hitting an easy-to-fix
scaling issue we're not aware of yet.
-Nathan
On Tue, Jul 31, 2018 at 5:43 AM, Rajika Kuruwita wrote: Over my years of using yt I have created many derived fields that are
dependant on other derived fields and have various scripts that use them.
So I have compiled all the definitions of fields and the yt.add_field()
lines into one script which is now a module. One problem I have encountered
is that, it doesn't seem that the derivation of these fields has been
parallelised, as made evident by the fact that the time for
ds.derived_field_list to run is independent of the number of processors
available, even with yt.enable_parallelism(). Is this something that is
planned to be implemented in the future? This problem is further aggravated by the fact that after loading a file
and attempting to obtain one of the fields (e.g. dd['Corrected_val_x'])
seems to actually force the calculation of every possible field added to
yt. Has anyone determined a faster way of loading multiple derived fields?
_______________________________________________
yt-users mailing list -- yt-users@python.org
To unsubscribe send an email to yt-users-leave@python.org