Hi all, I've been playing around with some TimeSeries examples for the cookbook and I have noticed a couple of things that seem frankly annoying and I just wanted to ping the list and see if there was a reason for them. 1) If I run more than one task in a script, it has to load all of the pfs again. Is this by design? 2) Secondly, I have noticed that what gets returned in a parameter lookup or a quantities computation has one dimension more than desired. For example, getting the simulation time of each pf in the TimeSeries yields: times = ts.params.current_time where times is a list of lists, each sublist with one member, with na.array(times).shape == (number of pfs, 1) The same sort of thing happens with returning, say, the angular momentum vector, which will have a shape of (number of pfs, 1, 3). It seems to me that what should be returned is a NumPy array with the dimensionality scaled down by one, e.g. the lists "in the middle" should be eliminated. Best, John Z
Hi John, The first thing is by design. The reason is that when you keep all the pfs open, you keep all their hierarchies around which ends up taking a lot of memory. I was getting out of memory errors all the time before we made this change. On the second thing, I think that behavior is probably not right, because each item in the list ends up being a list itself. Even in the case where the parameter is multidimensional, what you get back is a list of list, where each of those list contains the array. Unless there's a reason I'm not seeing, that should probably be changed to just a list of the actual items. Britton On Tue, Jul 10, 2012 at 8:23 PM, John ZuHone <jzuhone@gmail.com> wrote:
Hi all,
I've been playing around with some TimeSeries examples for the cookbook and I have noticed a couple of things that seem frankly annoying and I just wanted to ping the list and see if there was a reason for them.
1) If I run more than one task in a script, it has to load all of the pfs again. Is this by design?
2) Secondly, I have noticed that what gets returned in a parameter lookup or a quantities computation has one dimension more than desired. For example, getting the simulation time of each pf in the TimeSeries yields:
times = ts.params.current_time
where times is a list of lists, each sublist with one member, with
na.array(times).shape == (number of pfs, 1)
The same sort of thing happens with returning, say, the angular momentum vector, which will have a shape of (number of pfs, 1, 3).
It seems to me that what should be returned is a NumPy array with the dimensionality scaled down by one, e.g. the lists "in the middle" should be eliminated.
Best,
John Z _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hi Britton,
The first thing is by design. The reason is that when you keep all the pfs open, you keep all their hierarchies around which ends up taking a lot of memory. I was getting out of memory errors all the time before we made this change.
That's what I assumed the reason was, but I wanted to check.
On the second thing, I think that behavior is probably not right, because each item in the list ends up being a list itself. Even in the case where the parameter is multidimensional, what you get back is a list of list, where each of those list contains the array. Unless there's a reason I'm not seeing, that should probably be changed to just a list of the actual items.
I've dived into the code on this a little bit, and it looks like it has something to do with the way piter is arranging things, since what gets appended onto store.result is an item that is not a list, but what comes out at the end is a list of lists. Best, John
Britton
On Tue, Jul 10, 2012 at 8:23 PM, John ZuHone <jzuhone@gmail.com> wrote: Hi all,
I've been playing around with some TimeSeries examples for the cookbook and I have noticed a couple of things that seem frankly annoying and I just wanted to ping the list and see if there was a reason for them.
1) If I run more than one task in a script, it has to load all of the pfs again. Is this by design?
2) Secondly, I have noticed that what gets returned in a parameter lookup or a quantities computation has one dimension more than desired. For example, getting the simulation time of each pf in the TimeSeries yields:
times = ts.params.current_time
where times is a list of lists, each sublist with one member, with
na.array(times).shape == (number of pfs, 1)
The same sort of thing happens with returning, say, the angular momentum vector, which will have a shape of (number of pfs, 1, 3).
It seems to me that what should be returned is a NumPy array with the dimensionality scaled down by one, e.g. the lists "in the middle" should be eliminated.
Best,
John Z _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hi John, The place to look is the eval function for TimeSeriesData. On line 125 of yt/data_objects/time_series.py, store.result is initialized to a list and all return values are appended to that list on line 141. This looks to me to be handling tasks called with piter that have multiple return values. If you add something like: if len(store.result) == 1: store.result = store.result[0] just outside of the "for task in tasks" loop, it produces the behavior that we discussed. However, now that I understand why this was done, I'm not so sure it should be changed. Having tasks dispatched with piter send back their return values in a list is a desired feature I think, and so I think that generality should be preserved. Perhaps there is a solution that could only affect getting parameters through params, but I think we should let Matt chime in on this, since he is the most knowledgeable about this area of the code. Britton On Wed, Jul 11, 2012 at 11:00 AM, John ZuHone <jzuhone@gmail.com> wrote:
Hi Britton,
The first thing is by design. The reason is that when you keep all the pfs open, you keep all their hierarchies around which ends up taking a lot of memory. I was getting out of memory errors all the time before we made this change.
That's what I assumed the reason was, but I wanted to check.
On the second thing, I think that behavior is probably not right, because each item in the list ends up being a list itself. Even in the case where the parameter is multidimensional, what you get back is a list of list, where each of those list contains the array. Unless there's a reason I'm not seeing, that should probably be changed to just a list of the actual items.
I've dived into the code on this a little bit, and it looks like it has something to do with the way piter is arranging things, since what gets appended onto store.result is an item that is not a list, but what comes out at the end is a list of lists.
Best,
John
Britton
On Tue, Jul 10, 2012 at 8:23 PM, John ZuHone <jzuhone@gmail.com> wrote:
Hi all,
I've been playing around with some TimeSeries examples for the cookbook and I have noticed a couple of things that seem frankly annoying and I just wanted to ping the list and see if there was a reason for them.
1) If I run more than one task in a script, it has to load all of the pfs again. Is this by design?
2) Secondly, I have noticed that what gets returned in a parameter lookup or a quantities computation has one dimension more than desired. For example, getting the simulation time of each pf in the TimeSeries yields:
times = ts.params.current_time
where times is a list of lists, each sublist with one member, with
na.array(times).shape == (number of pfs, 1)
The same sort of thing happens with returning, say, the angular momentum vector, which will have a shape of (number of pfs, 1, 3).
It seems to me that what should be returned is a NumPy array with the dimensionality scaled down by one, e.g. the lists "in the middle" should be eliminated.
Best,
John Z _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hi Britton,
The place to look is the eval function for TimeSeriesData. On line 125 of yt/data_objects/time_series.py, store.result is initialized to a list and all return values are appended to that list on line 141. This looks to me to be handling tasks called with piter that have multiple return values. If you add something like: if len(store.result) == 1: store.result = store.result[0] just outside of the "for task in tasks" loop, it produces the behavior that we discussed. However, now that I understand why this was done, I'm not so sure it should be changed. Having tasks dispatched with piter send back their return values in a list is a desired feature I think, and so I think that generality should be preserved. Perhaps there is a solution that could only affect getting parameters through params, but I think we should let Matt chime in on this, since he is the most knowledgeable about this area of the code.
However, this also seems to affect things not in params in an undesirable way. For example: from yt.mods import * all_files = glob.glob("*/*.hierarchy") all_files.sort() ts = TimeSeries.from_filenames(all_files) sphere = ts.sphere("max", (1.0, "pc")) L_vecs = sphere.quantities["AngularMomentumVector"]() L_vecs gets returned as something like: [[array([l1,l2,l3])], [array([l4,l5,l6])], ... [array([ln,lm,lo])]] Where once again you have lists of one object, namely in this case the NumPy arrays which are the angular momentum vector. So, generically speaking you are always getting an extra list in there you don't need, it seems. John
Hi all, Sorry for being a bit out of touch the last couple days. So this is in place for exactly the reasons Britton outlines -- it's because in theory, the time series data is meant to have a flexible set of return values. For instance, while John notes that the *default* behavior is to cycle through the list multiple times, you can also call "eval" on TimeSeries to make it operate much more in sequence on a series of operators. In practice, I think that this method is not ... as necessarily useful as the more generic .piter() method. Initially I thought, well, let's make it so that you can swap out a pf for a ts and still expect to get similar or identical return values. In retrospect, this is overly clever. Reading the source, I can't imagine anyone dealing with an actual AnalysisTask, or anything like that, when the much more convenient .piter() is available, with the storage keyword. The reason that the list is of additional dimensionality is to ensure that during the par_combine_object, the lists are concatenated correctly and mapped back in the right order to the original items. So, I guess what I've talked myself into proposing is that we strip off a lot of the overly clever stuff, and reduce TimeSeries back to being just a convenient way of addressing multiple objects, ditching the AnalysisTask stuff and retaining .piter(). Once we have that, we can also start adding on more interesting things, like inter-timestep correlations and whatnot. Thoughts? -Matt On Wed, Jul 11, 2012 at 8:48 AM, John ZuHone <jzuhone@gmail.com> wrote:
Hi Britton,
The place to look is the eval function for TimeSeriesData. On line 125 of yt/data_objects/time_series.py, store.result is initialized to a list and all return values are appended to that list on line 141. This looks to me to be handling tasks called with piter that have multiple return values. If you add something like: if len(store.result) == 1: store.result = store.result[0] just outside of the "for task in tasks" loop, it produces the behavior that we discussed. However, now that I understand why this was done, I'm not so sure it should be changed. Having tasks dispatched with piter send back their return values in a list is a desired feature I think, and so I think that generality should be preserved. Perhaps there is a solution that could only affect getting parameters through params, but I think we should let Matt chime in on this, since he is the most knowledgeable about this area of the code.
However, this also seems to affect things not in params in an undesirable way. For example:
from yt.mods import * all_files = glob.glob("*/*.hierarchy") all_files.sort() ts = TimeSeries.from_filenames(all_files) sphere = ts.sphere("max", (1.0, "pc")) L_vecs = sphere.quantities["AngularMomentumVector"]()
L_vecs gets returned as something like:
[[array([l1,l2,l3])], [array([l4,l5,l6])], ... [array([ln,lm,lo])]]
Where once again you have lists of one object, namely in this case the NumPy arrays which are the angular momentum vector. So, generically speaking you are always getting an extra list in there you don't need, it seems.
John
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hi everyone, I think Matt is right on this. The piter function is powerful and simple. It gives you full control over parallel functionality and also just hands users each pf, which makes writing new analysis much easier. It may be counterproductive to continue to maintain the AnalysisTask functionality. Britton On Wed, Jul 11, 2012 at 8:14 PM, Matthew Turk <matthewturk@gmail.com> wrote:
Hi all,
Sorry for being a bit out of touch the last couple days.
So this is in place for exactly the reasons Britton outlines -- it's because in theory, the time series data is meant to have a flexible set of return values. For instance, while John notes that the *default* behavior is to cycle through the list multiple times, you can also call "eval" on TimeSeries to make it operate much more in sequence on a series of operators.
In practice, I think that this method is not ... as necessarily useful as the more generic .piter() method. Initially I thought, well, let's make it so that you can swap out a pf for a ts and still expect to get similar or identical return values. In retrospect, this is overly clever. Reading the source, I can't imagine anyone dealing with an actual AnalysisTask, or anything like that, when the much more convenient .piter() is available, with the storage keyword.
The reason that the list is of additional dimensionality is to ensure that during the par_combine_object, the lists are concatenated correctly and mapped back in the right order to the original items.
So, I guess what I've talked myself into proposing is that we strip off a lot of the overly clever stuff, and reduce TimeSeries back to being just a convenient way of addressing multiple objects, ditching the AnalysisTask stuff and retaining .piter(). Once we have that, we can also start adding on more interesting things, like inter-timestep correlations and whatnot.
Thoughts?
-Matt
Hi Britton,
The place to look is the eval function for TimeSeriesData. On line 125 of yt/data_objects/time_series.py, store.result is initialized to a list and all return values are appended to that list on line 141. This looks to me to be handling tasks called with piter that have multiple return values. If you add something like: if len(store.result) == 1: store.result = store.result[0] just outside of the "for task in tasks" loop, it produces the behavior
we discussed. However, now that I understand why this was done, I'm not so sure it should be changed. Having tasks dispatched with piter send back their return values in a list is a desired feature I think, and so I
On Wed, Jul 11, 2012 at 8:48 AM, John ZuHone <jzuhone@gmail.com> wrote: that think
that generality should be preserved. Perhaps there is a solution that could only affect getting parameters through params, but I think we should let Matt chime in on this, since he is the most knowledgeable about this area of the code.
However, this also seems to affect things not in params in an undesirable way. For example:
from yt.mods import * all_files = glob.glob("*/*.hierarchy") all_files.sort() ts = TimeSeries.from_filenames(all_files) sphere = ts.sphere("max", (1.0, "pc")) L_vecs = sphere.quantities["AngularMomentumVector"]()
L_vecs gets returned as something like:
[[array([l1,l2,l3])], [array([l4,l5,l6])], ... [array([ln,lm,lo])]]
Where once again you have lists of one object, namely in this case the NumPy arrays which are the angular momentum vector. So, generically speaking you are always getting an extra list in there you don't need, it seems.
John
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
participants (3)
-
Britton Smith
-
John ZuHone
-
Matthew Turk