Performance when using derived fields
![](https://secure.gravatar.com/avatar/36a9167966029d2b4f7375e03e42f06b.jpg?s=120&d=mm&r=g)
Hi everyone. Is it normal for slices to take substantially longer to be created if they're made from a derived field rather than an intrinsic one? Specifically, I'm having an issue creating an axis-aligned slice using the divergence of the velocity field. It's taking around 6.5 minutes just to make the slice, whereas if I use temperature or density, it takes around 10 seconds or so for the same dataset. I also notice that the amount of time it takes is not dependent on the number of processors I'm using. I've used 1, 12, and 24 processors, with identical results, even though I'm calling enable_parallelism(), and I can see that all the processes are running. I read in the docs that slice operations aren't generally done in parallel, but in this case it seems that maybe it would be beneficial. A similar operation in VisIt completes much faster, so I'm wondering if I've misconfigured something, or if there is something I can do to speed things up. I'd appreciate any thoughts anyone has on the subject. Thanks, Dan
![](https://secure.gravatar.com/avatar/7857f26c1ef2e9bdbfa843f9087710f7.jpg?s=120&d=mm&r=g)
On Wed, Nov 26, 2014 at 8:09 PM, Daniel Fenn <dsfenn@gmail.com> wrote:
Hi everyone. Is it normal for slices to take substantially longer to be created if they're made from a derived field rather than an intrinsic one?
Specifically, I'm having an issue creating an axis-aligned slice using the divergence of the velocity field. It's taking around 6.5 minutes just to make the slice, whereas if I use temperature or density, it takes around 10 seconds or so for the same dataset.
Velocity divergence is particularly expensive because it needs ghost zones, which must be filled in using cascading interpolation. Depending on the AMR structure in your data, this might be quite expensive. That said, 6.5 minutes for a slice is pretty bad. Is there any chance you can profile the execution of your script? If you run your script like so: $ python -m cProfile myscript.py you should get a summary printed to stdout of where the interpreter is spending most of its time. You can also visualize this in more depth if you dump the profile data to a file, and then use pyprof2html or snakeviz to load and display the data, e.g: $ python -m cProfile -o out.cprof myscript.py $ pyprof2html out.cprof cProfile comes with pyhon, but pyp4orf2html and snakeviz must be installed separately using pip. It would be great if you could file an issue about this. Even better if you can reproduce it using one of the test datasets on yt-project.org/data and attach a script that we can run ourselves.
I also notice that the amount of time it takes is not dependent on the number of processors I'm using. I've used 1, 12, and 24 processors, with identical results, even though I'm calling enable_parallelism(), and I can see that all the processes are running.
I read in the docs that slice operations aren't generally done in parallel, but in this case it seems that maybe it would be beneficial. A similar operation in VisIt completes much faster, so I'm wondering if I've misconfigured something, or if there is something I can do to speed things up.
I'd appreciate any thoughts anyone has on the subject.
Thanks,
Dan
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
![](https://secure.gravatar.com/avatar/36a9167966029d2b4f7375e03e42f06b.jpg?s=120&d=mm&r=g)
Thanks Nathan. I'll work on profiling it on Friday and get back with you. Dan 2014-11-27 0:08 GMT-05:00 Nathan Goldbaum <nathan12343@gmail.com>:
On Wed, Nov 26, 2014 at 8:09 PM, Daniel Fenn <dsfenn@gmail.com> wrote:
Hi everyone. Is it normal for slices to take substantially longer to be created if they're made from a derived field rather than an intrinsic one?
Specifically, I'm having an issue creating an axis-aligned slice using the divergence of the velocity field. It's taking around 6.5 minutes just to make the slice, whereas if I use temperature or density, it takes around 10 seconds or so for the same dataset.
Velocity divergence is particularly expensive because it needs ghost zones, which must be filled in using cascading interpolation. Depending on the AMR structure in your data, this might be quite expensive.
That said, 6.5 minutes for a slice is pretty bad.
Is there any chance you can profile the execution of your script? If you run your script like so:
$ python -m cProfile myscript.py
you should get a summary printed to stdout of where the interpreter is spending most of its time. You can also visualize this in more depth if you dump the profile data to a file, and then use pyprof2html or snakeviz to load and display the data, e.g:
$ python -m cProfile -o out.cprof myscript.py $ pyprof2html out.cprof
cProfile comes with pyhon, but pyp4orf2html and snakeviz must be installed separately using pip.
It would be great if you could file an issue about this. Even better if you can reproduce it using one of the test datasets on yt-project.org/data and attach a script that we can run ourselves.
I also notice that the amount of time it takes is not dependent on the number of processors I'm using. I've used 1, 12, and 24 processors, with identical results, even though I'm calling enable_parallelism(), and I can see that all the processes are running.
I read in the docs that slice operations aren't generally done in parallel, but in this case it seems that maybe it would be beneficial. A similar operation in VisIt completes much faster, so I'm wondering if I've misconfigured something, or if there is something I can do to speed things up.
I'd appreciate any thoughts anyone has on the subject.
Thanks,
Dan
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
![](https://secure.gravatar.com/avatar/36a9167966029d2b4f7375e03e42f06b.jpg?s=120&d=mm&r=g)
Hi Nathan, I've attached the html output of pyprof2html. I did it for two cases--a large file on a fast disk, and a somewhat smaller file on a slower disk. In both cases, I only used one processor. The output is not real meaningful to me, since I'm not familiar with the code, and I'm not much of a python programmer. I'll work on doing this with one of the yt datasets, and let you know how it goes. Thanks for your help. Dan 2014-11-27 0:08 GMT-05:00 Nathan Goldbaum <nathan12343@gmail.com>:
On Wed, Nov 26, 2014 at 8:09 PM, Daniel Fenn <dsfenn@gmail.com> wrote:
Hi everyone. Is it normal for slices to take substantially longer to be created if they're made from a derived field rather than an intrinsic one?
Specifically, I'm having an issue creating an axis-aligned slice using the divergence of the velocity field. It's taking around 6.5 minutes just to make the slice, whereas if I use temperature or density, it takes around 10 seconds or so for the same dataset.
Velocity divergence is particularly expensive because it needs ghost zones, which must be filled in using cascading interpolation. Depending on the AMR structure in your data, this might be quite expensive.
That said, 6.5 minutes for a slice is pretty bad.
Is there any chance you can profile the execution of your script? If you run your script like so:
$ python -m cProfile myscript.py
you should get a summary printed to stdout of where the interpreter is spending most of its time. You can also visualize this in more depth if you dump the profile data to a file, and then use pyprof2html or snakeviz to load and display the data, e.g:
$ python -m cProfile -o out.cprof myscript.py $ pyprof2html out.cprof
cProfile comes with pyhon, but pyp4orf2html and snakeviz must be installed separately using pip.
It would be great if you could file an issue about this. Even better if you can reproduce it using one of the test datasets on yt-project.org/data and attach a script that we can run ourselves.
I also notice that the amount of time it takes is not dependent on the number of processors I'm using. I've used 1, 12, and 24 processors, with identical results, even though I'm calling enable_parallelism(), and I can see that all the processes are running.
I read in the docs that slice operations aren't generally done in parallel, but in this case it seems that maybe it would be beneficial. A similar operation in VisIt completes much faster, so I'm wondering if I've misconfigured something, or if there is something I can do to speed things up.
I'd appreciate any thoughts anyone has on the subject.
Thanks,
Dan
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
![](https://secure.gravatar.com/avatar/b279d746f54e2cd6062e8279a767c4bc.jpg?s=120&d=mm&r=g)
Hi Dan, Looking this over, I think the problem is definitely that the ghost zone generation is currently quite stupid in 3.0. We're revving up for a 3.1 release, but I think for 3.2 or even within the 3.1 cycle we can aim to have this fixed; I've implemented some simple caching that should speed it up, but which has some issues presently that keep it from being fully functional and effective. One issue that we've (well, I say "we" but I'm the most guilty) had lately is keeping track of things and ensuring that blockers and important new features are added with reasonable time. This has come up repeatedly in the user survey that's currently out. You've been in IRC lately and I wonder if maybe you could help us to keep track of this, by reminding me once in a while if it slips off the radar? It seems the bug tracker is not the best place, and making sure that we have the underscoring that comes from mentioning and communicating can really help to prioritize things. -Matt On Thu, Nov 27, 2014 at 10:26 AM, Daniel Fenn <dsfenn@gmail.com> wrote:
Hi Nathan,
I've attached the html output of pyprof2html. I did it for two cases--a large file on a fast disk, and a somewhat smaller file on a slower disk. In both cases, I only used one processor. The output is not real meaningful to me, since I'm not familiar with the code, and I'm not much of a python programmer.
I'll work on doing this with one of the yt datasets, and let you know how it goes.
Thanks for your help.
Dan
2014-11-27 0:08 GMT-05:00 Nathan Goldbaum <nathan12343@gmail.com>:
On Wed, Nov 26, 2014 at 8:09 PM, Daniel Fenn <dsfenn@gmail.com> wrote:
Hi everyone. Is it normal for slices to take substantially longer to be created if they're made from a derived field rather than an intrinsic one?
Specifically, I'm having an issue creating an axis-aligned slice using the divergence of the velocity field. It's taking around 6.5 minutes just to make the slice, whereas if I use temperature or density, it takes around 10 seconds or so for the same dataset.
Velocity divergence is particularly expensive because it needs ghost zones, which must be filled in using cascading interpolation. Depending on the AMR structure in your data, this might be quite expensive.
That said, 6.5 minutes for a slice is pretty bad.
Is there any chance you can profile the execution of your script? If you run your script like so:
$ python -m cProfile myscript.py
you should get a summary printed to stdout of where the interpreter is spending most of its time. You can also visualize this in more depth if you dump the profile data to a file, and then use pyprof2html or snakeviz to load and display the data, e.g:
$ python -m cProfile -o out.cprof myscript.py $ pyprof2html out.cprof
cProfile comes with pyhon, but pyp4orf2html and snakeviz must be installed separately using pip.
It would be great if you could file an issue about this. Even better if you can reproduce it using one of the test datasets on yt-project.org/data and attach a script that we can run ourselves.
I also notice that the amount of time it takes is not dependent on the number of processors I'm using. I've used 1, 12, and 24 processors, with identical results, even though I'm calling enable_parallelism(), and I can see that all the processes are running.
I read in the docs that slice operations aren't generally done in parallel, but in this case it seems that maybe it would be beneficial. A similar operation in VisIt completes much faster, so I'm wondering if I've misconfigured something, or if there is something I can do to speed things up.
I'd appreciate any thoughts anyone has on the subject.
Thanks,
Dan
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
![](https://secure.gravatar.com/avatar/b279d746f54e2cd6062e8279a767c4bc.jpg?s=120&d=mm&r=g)
Hi Dan (and Nathan), I take that back, I was looking at the wrong file. IO is a very small portion of the time -- a lot of it is spent in units-related activities. I'm going to dig in to see what the time is spent in, but I think there is likely some location where a lot of teeny tiny unit operations are happening, over and over and over, and that's killing the performance. Looks like it's in sanitize_units_add and sanitize_units_mul, which gets called in data_containers.py (line 549) in the selector creation. Following that further down, I believe the issue is inside the RegionSelector, which gets generated a ton of times, and which is doing all unitful operations even after converting the values to the units it knows they should be in. I think this might be fixable by using the .d attributes inside the RegionSelector __init__. Dan, if you get an example of this working with the sample datasets on the website, that would be very useful to work on. Kacper, what do you think about adding speed tests to the CI system? -Matt On Thu, Nov 27, 2014 at 2:17 PM, Matthew Turk <matthewturk@gmail.com> wrote:
Hi Dan,
Looking this over, I think the problem is definitely that the ghost zone generation is currently quite stupid in 3.0. We're revving up for a 3.1 release, but I think for 3.2 or even within the 3.1 cycle we can aim to have this fixed; I've implemented some simple caching that should speed it up, but which has some issues presently that keep it from being fully functional and effective.
One issue that we've (well, I say "we" but I'm the most guilty) had lately is keeping track of things and ensuring that blockers and important new features are added with reasonable time. This has come up repeatedly in the user survey that's currently out. You've been in IRC lately and I wonder if maybe you could help us to keep track of this, by reminding me once in a while if it slips off the radar? It seems the bug tracker is not the best place, and making sure that we have the underscoring that comes from mentioning and communicating can really help to prioritize things.
-Matt
On Thu, Nov 27, 2014 at 10:26 AM, Daniel Fenn <dsfenn@gmail.com> wrote:
Hi Nathan,
I've attached the html output of pyprof2html. I did it for two cases--a large file on a fast disk, and a somewhat smaller file on a slower disk. In both cases, I only used one processor. The output is not real meaningful to me, since I'm not familiar with the code, and I'm not much of a python programmer.
I'll work on doing this with one of the yt datasets, and let you know how it goes.
Thanks for your help.
Dan
2014-11-27 0:08 GMT-05:00 Nathan Goldbaum <nathan12343@gmail.com>:
On Wed, Nov 26, 2014 at 8:09 PM, Daniel Fenn <dsfenn@gmail.com> wrote:
Hi everyone. Is it normal for slices to take substantially longer to be created if they're made from a derived field rather than an intrinsic one?
Specifically, I'm having an issue creating an axis-aligned slice using the divergence of the velocity field. It's taking around 6.5 minutes just to make the slice, whereas if I use temperature or density, it takes around 10 seconds or so for the same dataset.
Velocity divergence is particularly expensive because it needs ghost zones, which must be filled in using cascading interpolation. Depending on the AMR structure in your data, this might be quite expensive.
That said, 6.5 minutes for a slice is pretty bad.
Is there any chance you can profile the execution of your script? If you run your script like so:
$ python -m cProfile myscript.py
you should get a summary printed to stdout of where the interpreter is spending most of its time. You can also visualize this in more depth if you dump the profile data to a file, and then use pyprof2html or snakeviz to load and display the data, e.g:
$ python -m cProfile -o out.cprof myscript.py $ pyprof2html out.cprof
cProfile comes with pyhon, but pyp4orf2html and snakeviz must be installed separately using pip.
It would be great if you could file an issue about this. Even better if you can reproduce it using one of the test datasets on yt-project.org/data and attach a script that we can run ourselves.
I also notice that the amount of time it takes is not dependent on the number of processors I'm using. I've used 1, 12, and 24 processors, with identical results, even though I'm calling enable_parallelism(), and I can see that all the processes are running.
I read in the docs that slice operations aren't generally done in parallel, but in this case it seems that maybe it would be beneficial. A similar operation in VisIt completes much faster, so I'm wondering if I've misconfigured something, or if there is something I can do to speed things up.
I'd appreciate any thoughts anyone has on the subject.
Thanks,
Dan
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
![](https://secure.gravatar.com/avatar/bbf1912c02dbb920e27ef0b9de0ba2ce.jpg?s=120&d=mm&r=g)
On 11/27/2014 01:31 PM, Matthew Turk wrote:
Hi Dan (and Nathan),
I take that back, I was looking at the wrong file. IO is a very small portion of the time -- a lot of it is spent in units-related activities. I'm going to dig in to see what the time is spent in, but I think there is likely some location where a lot of teeny tiny unit operations are happening, over and over and over, and that's killing the performance. Looks like it's in sanitize_units_add and sanitize_units_mul, which gets called in data_containers.py (line 549) in the selector creation.
Following that further down, I believe the issue is inside the RegionSelector, which gets generated a ton of times, and which is doing all unitful operations even after converting the values to the units it knows they should be in. I think this might be fixable by using the .d attributes inside the RegionSelector __init__.
Dan, if you get an example of this working with the sample datasets on the website, that would be very useful to work on.
Kacper, what do you think about adding speed tests to the CI system?
It's already there, see for example [1]. There's one "but"... we would need pretty decent box to make those kind of tests meaningful. I bet that 99.99% of fluctuation in [1] are caused by buildbox load rather code changes. Cheers, Kacper [1] http://tests.yt-project.org/job/py2.7-yt-3.0_answer.enzo.moving7/lastComplet...
-Matt
On Thu, Nov 27, 2014 at 2:17 PM, Matthew Turk <matthewturk@gmail.com> wrote:
Hi Dan,
Looking this over, I think the problem is definitely that the ghost zone generation is currently quite stupid in 3.0. We're revving up for a 3.1 release, but I think for 3.2 or even within the 3.1 cycle we can aim to have this fixed; I've implemented some simple caching that should speed it up, but which has some issues presently that keep it from being fully functional and effective.
One issue that we've (well, I say "we" but I'm the most guilty) had lately is keeping track of things and ensuring that blockers and important new features are added with reasonable time. This has come up repeatedly in the user survey that's currently out. You've been in IRC lately and I wonder if maybe you could help us to keep track of this, by reminding me once in a while if it slips off the radar? It seems the bug tracker is not the best place, and making sure that we have the underscoring that comes from mentioning and communicating can really help to prioritize things.
-Matt
On Thu, Nov 27, 2014 at 10:26 AM, Daniel Fenn <dsfenn@gmail.com> wrote:
Hi Nathan,
I've attached the html output of pyprof2html. I did it for two cases--a large file on a fast disk, and a somewhat smaller file on a slower disk. In both cases, I only used one processor. The output is not real meaningful to me, since I'm not familiar with the code, and I'm not much of a python programmer.
I'll work on doing this with one of the yt datasets, and let you know how it goes.
Thanks for your help.
Dan
2014-11-27 0:08 GMT-05:00 Nathan Goldbaum <nathan12343@gmail.com>:
On Wed, Nov 26, 2014 at 8:09 PM, Daniel Fenn <dsfenn@gmail.com> wrote:
Hi everyone. Is it normal for slices to take substantially longer to be created if they're made from a derived field rather than an intrinsic one?
Specifically, I'm having an issue creating an axis-aligned slice using the divergence of the velocity field. It's taking around 6.5 minutes just to make the slice, whereas if I use temperature or density, it takes around 10 seconds or so for the same dataset.
Velocity divergence is particularly expensive because it needs ghost zones, which must be filled in using cascading interpolation. Depending on the AMR structure in your data, this might be quite expensive.
That said, 6.5 minutes for a slice is pretty bad.
Is there any chance you can profile the execution of your script? If you run your script like so:
$ python -m cProfile myscript.py
you should get a summary printed to stdout of where the interpreter is spending most of its time. You can also visualize this in more depth if you dump the profile data to a file, and then use pyprof2html or snakeviz to load and display the data, e.g:
$ python -m cProfile -o out.cprof myscript.py $ pyprof2html out.cprof
cProfile comes with pyhon, but pyp4orf2html and snakeviz must be installed separately using pip.
It would be great if you could file an issue about this. Even better if you can reproduce it using one of the test datasets on yt-project.org/data and attach a script that we can run ourselves.
I also notice that the amount of time it takes is not dependent on the number of processors I'm using. I've used 1, 12, and 24 processors, with identical results, even though I'm calling enable_parallelism(), and I can see that all the processes are running.
I read in the docs that slice operations aren't generally done in parallel, but in this case it seems that maybe it would be beneficial. A similar operation in VisIt completes much faster, so I'm wondering if I've misconfigured something, or if there is something I can do to speed things up.
I'd appreciate any thoughts anyone has on the subject.
Thanks,
Dan
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
![](https://secure.gravatar.com/avatar/7857f26c1ef2e9bdbfa843f9087710f7.jpg?s=120&d=mm&r=g)
I've reproduced similar timings using a FLASH dataset that I think is smaller than the one you're working with. Follow the issue I've created to track this here: https://bitbucket.org/yt_analysis/yt/issue/944/slices-of-fields-that-need-gh... Thanks for the report Dan! Hopefully now that we have a test script that shows similar timings, we should be able to work on making this faster. On Thu, Nov 27, 2014 at 11:39 AM, Kacper Kowalik <xarthisius.kk@gmail.com> wrote:
On 11/27/2014 01:31 PM, Matthew Turk wrote:
Hi Dan (and Nathan),
I take that back, I was looking at the wrong file. IO is a very small portion of the time -- a lot of it is spent in units-related activities. I'm going to dig in to see what the time is spent in, but I think there is likely some location where a lot of teeny tiny unit operations are happening, over and over and over, and that's killing the performance. Looks like it's in sanitize_units_add and sanitize_units_mul, which gets called in data_containers.py (line 549) in the selector creation.
Following that further down, I believe the issue is inside the RegionSelector, which gets generated a ton of times, and which is doing all unitful operations even after converting the values to the units it knows they should be in. I think this might be fixable by using the .d attributes inside the RegionSelector __init__.
Dan, if you get an example of this working with the sample datasets on the website, that would be very useful to work on.
Kacper, what do you think about adding speed tests to the CI system?
It's already there, see for example [1]. There's one "but"... we would need pretty decent box to make those kind of tests meaningful. I bet that 99.99% of fluctuation in [1] are caused by buildbox load rather code changes.
Cheers, Kacper
[1]
http://tests.yt-project.org/job/py2.7-yt-3.0_answer.enzo.moving7/lastComplet...
-Matt
Hi Dan,
Looking this over, I think the problem is definitely that the ghost zone generation is currently quite stupid in 3.0. We're revving up for a 3.1 release, but I think for 3.2 or even within the 3.1 cycle we can aim to have this fixed; I've implemented some simple caching that should speed it up, but which has some issues presently that keep it from being fully functional and effective.
One issue that we've (well, I say "we" but I'm the most guilty) had lately is keeping track of things and ensuring that blockers and important new features are added with reasonable time. This has come up repeatedly in the user survey that's currently out. You've been in IRC lately and I wonder if maybe you could help us to keep track of this, by reminding me once in a while if it slips off the radar? It seems the bug tracker is not the best place, and making sure that we have the underscoring that comes from mentioning and communicating can really help to prioritize things.
-Matt
On Thu, Nov 27, 2014 at 10:26 AM, Daniel Fenn <dsfenn@gmail.com> wrote:
Hi Nathan,
I've attached the html output of pyprof2html. I did it for two cases--a large file on a fast disk, and a somewhat smaller file on a slower disk. In both cases, I only used one processor. The output is not real meaningful to me, since I'm not familiar with the code, and I'm not much of a python programmer.
I'll work on doing this with one of the yt datasets, and let you know how it goes.
Thanks for your help.
Dan
2014-11-27 0:08 GMT-05:00 Nathan Goldbaum <nathan12343@gmail.com>:
On Wed, Nov 26, 2014 at 8:09 PM, Daniel Fenn <dsfenn@gmail.com>
wrote:
Hi everyone. Is it normal for slices to take substantially longer to
be
created if they're made from a derived field rather than an intrinsic one?
Specifically, I'm having an issue creating an axis-aligned slice using the divergence of the velocity field. It's taking around 6.5 minutes just to make the slice, whereas if I use temperature or density, it takes around 10 seconds or so for the same dataset.
Velocity divergence is particularly expensive because it needs ghost zones, which must be filled in using cascading interpolation. Depending on the AMR structure in your data, this might be quite expensive.
That said, 6.5 minutes for a slice is pretty bad.
Is there any chance you can profile the execution of your script? If you run your script like so:
$ python -m cProfile myscript.py
you should get a summary printed to stdout of where the interpreter is spending most of its time. You can also visualize this in more depth if you dump the profile data to a file, and then use pyprof2html or snakeviz to load and display the data, e.g:
$ python -m cProfile -o out.cprof myscript.py $ pyprof2html out.cprof
cProfile comes with pyhon, but pyp4orf2html and snakeviz must be installed separately using pip.
It would be great if you could file an issue about this. Even better if you can reproduce it using one of the test datasets on yt-project.org/data and attach a script that we can run ourselves.
I also notice that the amount of time it takes is not dependent on
number of processors I'm using. I've used 1, 12, and 24 processors, with identical results, even though I'm calling enable_parallelism(), and I can see that all the processes are running.
I read in the docs that slice operations aren't generally done in parallel, but in this case it seems that maybe it would be beneficial. A similar operation in VisIt completes much faster, so I'm wondering if I've misconfigured something, or if there is something I can do to speed
On Thu, Nov 27, 2014 at 2:17 PM, Matthew Turk <matthewturk@gmail.com> wrote: the things
up.
I'd appreciate any thoughts anyone has on the subject.
Thanks,
Dan
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
![](https://secure.gravatar.com/avatar/36a9167966029d2b4f7375e03e42f06b.jpg?s=120&d=mm&r=g)
Thanks everyone for your help. Nathan, I'm glad you were able to reproduce the behavior. I'm following the issue that you created to keep updated. Matt, I'll be sure to give periodic reminders if I don't hear anything for a while. This functionality would be extremely helpful to me, but it's not really feasible with the current speeds, so it's something I have a strong incentive to keep on top of. Thanks again, Dan 2014-11-27 15:53 GMT-05:00 Nathan Goldbaum <nathan12343@gmail.com>:
I've reproduced similar timings using a FLASH dataset that I think is smaller than the one you're working with. Follow the issue I've created to track this here:
https://bitbucket.org/yt_analysis/yt/issue/944/slices-of-fields-that-need-gh...
Thanks for the report Dan! Hopefully now that we have a test script that shows similar timings, we should be able to work on making this faster.
On Thu, Nov 27, 2014 at 11:39 AM, Kacper Kowalik <xarthisius.kk@gmail.com> wrote:
On 11/27/2014 01:31 PM, Matthew Turk wrote:
Hi Dan (and Nathan),
I take that back, I was looking at the wrong file. IO is a very small portion of the time -- a lot of it is spent in units-related activities. I'm going to dig in to see what the time is spent in, but I think there is likely some location where a lot of teeny tiny unit operations are happening, over and over and over, and that's killing the performance. Looks like it's in sanitize_units_add and sanitize_units_mul, which gets called in data_containers.py (line 549) in the selector creation.
Following that further down, I believe the issue is inside the RegionSelector, which gets generated a ton of times, and which is doing all unitful operations even after converting the values to the units it knows they should be in. I think this might be fixable by using the .d attributes inside the RegionSelector __init__.
Dan, if you get an example of this working with the sample datasets on the website, that would be very useful to work on.
Kacper, what do you think about adding speed tests to the CI system?
It's already there, see for example [1]. There's one "but"... we would need pretty decent box to make those kind of tests meaningful. I bet that 99.99% of fluctuation in [1] are caused by buildbox load rather code changes.
Cheers, Kacper
[1]
http://tests.yt-project.org/job/py2.7-yt-3.0_answer.enzo.moving7/lastComplet...
-Matt
Hi Dan,
Looking this over, I think the problem is definitely that the ghost zone generation is currently quite stupid in 3.0. We're revving up for a 3.1 release, but I think for 3.2 or even within the 3.1 cycle we can aim to have this fixed; I've implemented some simple caching that should speed it up, but which has some issues presently that keep it from being fully functional and effective.
One issue that we've (well, I say "we" but I'm the most guilty) had lately is keeping track of things and ensuring that blockers and important new features are added with reasonable time. This has come up repeatedly in the user survey that's currently out. You've been in IRC lately and I wonder if maybe you could help us to keep track of this, by reminding me once in a while if it slips off the radar? It seems the bug tracker is not the best place, and making sure that we have the underscoring that comes from mentioning and communicating can really help to prioritize things.
-Matt
On Thu, Nov 27, 2014 at 10:26 AM, Daniel Fenn <dsfenn@gmail.com> wrote:
Hi Nathan,
I've attached the html output of pyprof2html. I did it for two cases--a large file on a fast disk, and a somewhat smaller file on a slower disk. In both cases, I only used one processor. The output is not real meaningful to me, since I'm not familiar with the code, and I'm not much of a python programmer.
I'll work on doing this with one of the yt datasets, and let you know how it goes.
Thanks for your help.
Dan
2014-11-27 0:08 GMT-05:00 Nathan Goldbaum <nathan12343@gmail.com>:
On Wed, Nov 26, 2014 at 8:09 PM, Daniel Fenn <dsfenn@gmail.com>
wrote:
> > Hi everyone. Is it normal for slices to take substantially longer to be > created if they're made from a derived field rather than an intrinsic one? > > Specifically, I'm having an issue creating an axis-aligned slice using > the divergence of the velocity field. It's taking around 6.5 minutes just to > make the slice, whereas if I use temperature or density, it takes around 10 > seconds or so for the same dataset.
Velocity divergence is particularly expensive because it needs ghost zones, which must be filled in using cascading interpolation. Depending on the AMR structure in your data, this might be quite expensive.
That said, 6.5 minutes for a slice is pretty bad.
Is there any chance you can profile the execution of your script? If you run your script like so:
$ python -m cProfile myscript.py
you should get a summary printed to stdout of where the interpreter is spending most of its time. You can also visualize this in more depth if you dump the profile data to a file, and then use pyprof2html or snakeviz to load and display the data, e.g:
$ python -m cProfile -o out.cprof myscript.py $ pyprof2html out.cprof
cProfile comes with pyhon, but pyp4orf2html and snakeviz must be installed separately using pip.
It would be great if you could file an issue about this. Even better if you can reproduce it using one of the test datasets on yt-project.org/data and attach a script that we can run ourselves.
> > > I also notice that the amount of time it takes is not dependent on
> number of processors I'm using. I've used 1, 12, and 24 processors, with > identical results, even though I'm calling enable_parallelism(), and I can > see that all the processes are running. > > > I read in the docs that slice operations aren't generally done in > parallel, but in this case it seems that maybe it would be beneficial. A > similar operation in VisIt completes much faster, so I'm wondering if I've > misconfigured something, or if there is something I can do to speed
On Thu, Nov 27, 2014 at 2:17 PM, Matthew Turk <matthewturk@gmail.com> wrote: the things
> up. > > I'd appreciate any thoughts anyone has on the subject. > > Thanks, > > Dan > > _______________________________________________ > yt-users mailing list > yt-users@lists.spacepope.org > http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org >
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
![](https://secure.gravatar.com/avatar/b279d746f54e2cd6062e8279a767c4bc.jpg?s=120&d=mm&r=g)
Hi Dan, I've opened a PR with some optimizations: https://bitbucket.org/yt_analysis/yt/pull-request/1322/first-attempt-at-spee... -Matt On Fri, Nov 28, 2014 at 10:19 AM, Daniel Fenn <dsfenn@gmail.com> wrote:
Thanks everyone for your help.
Nathan, I'm glad you were able to reproduce the behavior. I'm following the issue that you created to keep updated.
Matt, I'll be sure to give periodic reminders if I don't hear anything for a while. This functionality would be extremely helpful to me, but it's not really feasible with the current speeds, so it's something I have a strong incentive to keep on top of.
Thanks again,
Dan
2014-11-27 15:53 GMT-05:00 Nathan Goldbaum <nathan12343@gmail.com>:
I've reproduced similar timings using a FLASH dataset that I think is smaller than the one you're working with. Follow the issue I've created to track this here:
https://bitbucket.org/yt_analysis/yt/issue/944/slices-of-fields-that-need-gh...
Thanks for the report Dan! Hopefully now that we have a test script that shows similar timings, we should be able to work on making this faster.
On Thu, Nov 27, 2014 at 11:39 AM, Kacper Kowalik <xarthisius.kk@gmail.com> wrote:
On 11/27/2014 01:31 PM, Matthew Turk wrote:
Hi Dan (and Nathan),
I take that back, I was looking at the wrong file. IO is a very small portion of the time -- a lot of it is spent in units-related activities. I'm going to dig in to see what the time is spent in, but I think there is likely some location where a lot of teeny tiny unit operations are happening, over and over and over, and that's killing the performance. Looks like it's in sanitize_units_add and sanitize_units_mul, which gets called in data_containers.py (line 549) in the selector creation.
Following that further down, I believe the issue is inside the RegionSelector, which gets generated a ton of times, and which is doing all unitful operations even after converting the values to the units it knows they should be in. I think this might be fixable by using the .d attributes inside the RegionSelector __init__.
Dan, if you get an example of this working with the sample datasets on the website, that would be very useful to work on.
Kacper, what do you think about adding speed tests to the CI system?
It's already there, see for example [1]. There's one "but"... we would need pretty decent box to make those kind of tests meaningful. I bet that 99.99% of fluctuation in [1] are caused by buildbox load rather code changes.
Cheers, Kacper
[1]
http://tests.yt-project.org/job/py2.7-yt-3.0_answer.enzo.moving7/lastComplet...
-Matt
On Thu, Nov 27, 2014 at 2:17 PM, Matthew Turk <matthewturk@gmail.com> wrote:
Hi Dan,
Looking this over, I think the problem is definitely that the ghost zone generation is currently quite stupid in 3.0. We're revving up for a 3.1 release, but I think for 3.2 or even within the 3.1 cycle we can aim to have this fixed; I've implemented some simple caching that should speed it up, but which has some issues presently that keep it from being fully functional and effective.
One issue that we've (well, I say "we" but I'm the most guilty) had lately is keeping track of things and ensuring that blockers and important new features are added with reasonable time. This has come up repeatedly in the user survey that's currently out. You've been in IRC lately and I wonder if maybe you could help us to keep track of this, by reminding me once in a while if it slips off the radar? It seems the bug tracker is not the best place, and making sure that we have the underscoring that comes from mentioning and communicating can really help to prioritize things.
-Matt
On Thu, Nov 27, 2014 at 10:26 AM, Daniel Fenn <dsfenn@gmail.com> wrote:
Hi Nathan,
I've attached the html output of pyprof2html. I did it for two cases--a large file on a fast disk, and a somewhat smaller file on a slower disk. In both cases, I only used one processor. The output is not real meaningful to me, since I'm not familiar with the code, and I'm not much of a python programmer.
I'll work on doing this with one of the yt datasets, and let you know how it goes.
Thanks for your help.
Dan
2014-11-27 0:08 GMT-05:00 Nathan Goldbaum <nathan12343@gmail.com>: > > > > On Wed, Nov 26, 2014 at 8:09 PM, Daniel Fenn <dsfenn@gmail.com> > wrote: >> >> Hi everyone. Is it normal for slices to take substantially longer >> to be >> created if they're made from a derived field rather than an >> intrinsic one? >> >> Specifically, I'm having an issue creating an axis-aligned slice >> using >> the divergence of the velocity field. It's taking around 6.5 >> minutes just to >> make the slice, whereas if I use temperature or density, it takes >> around 10 >> seconds or so for the same dataset. > > > Velocity divergence is particularly expensive because it needs ghost > zones, which must be filled in using cascading interpolation. > Depending on > the AMR structure in your data, this might be quite expensive. > > That said, 6.5 minutes for a slice is pretty bad. > > Is there any chance you can profile the execution of your script? If > you > run your script like so: > > $ python -m cProfile myscript.py > > you should get a summary printed to stdout of where the interpreter > is > spending most of its time. You can also visualize this in more depth > if you > dump the profile data to a file, and then use pyprof2html or > snakeviz to > load and display the data, e.g: > > $ python -m cProfile -o out.cprof myscript.py > $ pyprof2html out.cprof > > cProfile comes with pyhon, but pyp4orf2html and snakeviz must be > installed > separately using pip. > > It would be great if you could file an issue about this. Even > better if > you can reproduce it using one of the test datasets on > yt-project.org/data > and attach a script that we can run ourselves. > >> >> >> I also notice that the amount of time it takes is not dependent on >> the >> number of processors I'm using. I've used 1, 12, and 24 processors, >> with >> identical results, even though I'm calling enable_parallelism(), >> and I can >> see that all the processes are running. >> >> >> I read in the docs that slice operations aren't generally done in >> parallel, but in this case it seems that maybe it would be >> beneficial. A >> similar operation in VisIt completes much faster, so I'm wondering >> if I've >> misconfigured something, or if there is something I can do to speed >> things >> up. >> >> I'd appreciate any thoughts anyone has on the subject. >> >> Thanks, >> >> Dan >> >> _______________________________________________ >> yt-users mailing list >> yt-users@lists.spacepope.org >> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org >> > > > _______________________________________________ > yt-users mailing list > yt-users@lists.spacepope.org > http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org >
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
![](https://secure.gravatar.com/avatar/36a9167966029d2b4f7375e03e42f06b.jpg?s=120&d=mm&r=g)
Thanks Matt. I'll give that a try and see how much of a speedup I get with my large datasets. Dan 2014-12-01 14:30 GMT-05:00 Matthew Turk <matthewturk@gmail.com>:
Hi Dan,
I've opened a PR with some optimizations:
https://bitbucket.org/yt_analysis/yt/pull-request/1322/first-attempt-at-spee...
-Matt
Thanks everyone for your help.
Nathan, I'm glad you were able to reproduce the behavior. I'm following
issue that you created to keep updated.
Matt, I'll be sure to give periodic reminders if I don't hear anything for a while. This functionality would be extremely helpful to me, but it's not really feasible with the current speeds, so it's something I have a strong incentive to keep on top of.
Thanks again,
Dan
2014-11-27 15:53 GMT-05:00 Nathan Goldbaum <nathan12343@gmail.com>:
I've reproduced similar timings using a FLASH dataset that I think is smaller than the one you're working with. Follow the issue I've created to track this here:
https://bitbucket.org/yt_analysis/yt/issue/944/slices-of-fields-that-need-gh...
Thanks for the report Dan! Hopefully now that we have a test script
shows similar timings, we should be able to work on making this faster.
On Thu, Nov 27, 2014 at 11:39 AM, Kacper Kowalik < xarthisius.kk@gmail.com> wrote:
On 11/27/2014 01:31 PM, Matthew Turk wrote:
Hi Dan (and Nathan),
I take that back, I was looking at the wrong file. IO is a very
small
portion of the time -- a lot of it is spent in units-related activities. I'm going to dig in to see what the time is spent in, but I think there is likely some location where a lot of teeny tiny unit operations are happening, over and over and over, and that's killing the performance. Looks like it's in sanitize_units_add and sanitize_units_mul, which gets called in data_containers.py (line
in the selector creation.
Following that further down, I believe the issue is inside the RegionSelector, which gets generated a ton of times, and which is doing all unitful operations even after converting the values to the units it knows they should be in. I think this might be fixable by using the .d attributes inside the RegionSelector __init__.
Dan, if you get an example of this working with the sample datasets on the website, that would be very useful to work on.
Kacper, what do you think about adding speed tests to the CI system?
It's already there, see for example [1]. There's one "but"... we would need pretty decent box to make those kind of tests meaningful. I bet that 99.99% of fluctuation in [1] are caused by buildbox load rather code changes.
Cheers, Kacper
[1]
http://tests.yt-project.org/job/py2.7-yt-3.0_answer.enzo.moving7/lastComplet...
-Matt
On Thu, Nov 27, 2014 at 2:17 PM, Matthew Turk <matthewturk@gmail.com
wrote:
Hi Dan,
Looking this over, I think the problem is definitely that the ghost zone generation is currently quite stupid in 3.0. We're revving up for a 3.1 release, but I think for 3.2 or even within the 3.1 cycle we can aim to have this fixed; I've implemented some simple caching
should speed it up, but which has some issues presently that keep it from being fully functional and effective.
One issue that we've (well, I say "we" but I'm the most guilty) had lately is keeping track of things and ensuring that blockers and important new features are added with reasonable time. This has come up repeatedly in the user survey that's currently out. You've been in IRC lately and I wonder if maybe you could help us to keep track of this, by reminding me once in a while if it slips off the radar? It seems the bug tracker is not the best place, and making sure that we have the underscoring that comes from mentioning and communicating can really help to prioritize things.
-Matt
On Thu, Nov 27, 2014 at 10:26 AM, Daniel Fenn <dsfenn@gmail.com> wrote: > Hi Nathan, > > I've attached the html output of pyprof2html. I did it for two > cases--a > large file on a fast disk, and a somewhat smaller file on a slower > disk. In > both cases, I only used one processor. The output is not real > meaningful to > me, since I'm not familiar with the code, and I'm not much of a > python > programmer. > > I'll work on doing this with one of the yt datasets, and let you know > how it > goes. > > Thanks for your help. > > Dan > > 2014-11-27 0:08 GMT-05:00 Nathan Goldbaum <nathan12343@gmail.com>: >> >> >> >> On Wed, Nov 26, 2014 at 8:09 PM, Daniel Fenn <dsfenn@gmail.com> >> wrote: >>> >>> Hi everyone. Is it normal for slices to take substantially longer >>> to be >>> created if they're made from a derived field rather than an >>> intrinsic one? >>> >>> Specifically, I'm having an issue creating an axis-aligned slice >>> using >>> the divergence of the velocity field. It's taking around 6.5 >>> minutes just to >>> make the slice, whereas if I use temperature or density, it takes >>> around 10 >>> seconds or so for the same dataset. >> >> >> Velocity divergence is particularly expensive because it needs ghost >> zones, which must be filled in using cascading interpolation. >> Depending on >> the AMR structure in your data, this might be quite expensive. >> >> That said, 6.5 minutes for a slice is pretty bad. >> >> Is there any chance you can profile the execution of your script? If >> you >> run your script like so: >> >> $ python -m cProfile myscript.py >> >> you should get a summary printed to stdout of where the interpreter >> is >> spending most of its time. You can also visualize this in more depth >> if you >> dump the profile data to a file, and then use pyprof2html or >> snakeviz to >> load and display the data, e.g: >> >> $ python -m cProfile -o out.cprof myscript.py >> $ pyprof2html out.cprof >> >> cProfile comes with pyhon, but pyp4orf2html and snakeviz must be >> installed >> separately using pip. >> >> It would be great if you could file an issue about this. Even >> better if >> you can reproduce it using one of the test datasets on >> yt-project.org/data >> and attach a script that we can run ourselves. >> >>> >>> >>> I also notice that the amount of time it takes is not dependent on >>> the >>> number of processors I'm using. I've used 1, 12, and 24
On Fri, Nov 28, 2014 at 10:19 AM, Daniel Fenn <dsfenn@gmail.com> wrote: the that 549) that processors,
>>> with >>> identical results, even though I'm calling enable_parallelism(), >>> and I can >>> see that all the processes are running. >>> >>> >>> I read in the docs that slice operations aren't generally done in >>> parallel, but in this case it seems that maybe it would be >>> beneficial. A >>> similar operation in VisIt completes much faster, so I'm wondering >>> if I've >>> misconfigured something, or if there is something I can do to speed >>> things >>> up. >>> >>> I'd appreciate any thoughts anyone has on the subject. >>> >>> Thanks, >>> >>> Dan >>> >>> _______________________________________________ >>> yt-users mailing list >>> yt-users@lists.spacepope.org >>> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org >>> >> >> >> _______________________________________________ >> yt-users mailing list >> yt-users@lists.spacepope.org >> http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org >> > > > _______________________________________________ > yt-users mailing list > yt-users@lists.spacepope.org > http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org >
yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
participants (4)
-
Daniel Fenn
-
Kacper Kowalik
-
Matthew Turk
-
Nathan Goldbaum