Hi Matt,
I just pushed a change to my fork that changes the way the data_size property of our ARTIOChunk (same as old DomainSubset) is exposed. This catches cases when .data_size is read before a fill is called. You reordered calls somewhere in the main code that was causing this to fail for SlicePlot, but now I am running into it for ProjectionPlot.
from yt.mods import *
pf = load("tests/artdat/sizmbhloz-clref04SNth-rs9_a0.9011.art") ProjectionPlot(pf, "x", "Density", width = (100,'kpc') ).save()
leads to yt : [ERROR ] 2013-03-14 10:39:05,002 ARTIOChunk.data_size called before fill
I've tracked it back to this line in YTSelectionContainer._chunked_read old_size, self.size = self.size, chunk.data_size
Here is the traceback: File "./testprojection.py", line 4, in <module> ProjectionPlot(pf, "x", "Density", width = (100,'kpc') ).save() File "/Users/drudd/Programs/yt-x86_64/src/yt-hg/yt/visualization/plot_window.py", line 1336, in __init__ center=center, field_parameters = field_parameters) File "/Users/drudd/Programs/yt-x86_64/src/yt-hg/yt/data_objects/construction_data_containers.py", line 243, in __init__ self.get_data(field) File "/Users/drudd/Programs/yt-x86_64/src/yt-hg/yt/data_objects/construction_data_containers.py", line 283, in get_data for chunk in self.data_source.chunks(None, "io"): File "/Users/drudd/Programs/yt-x86_64/src/yt-hg/yt/data_objects/data_containers.py", line 448, in chunks with self._chunked_read(chunk): File "/Users/drudd/Programs/yt-x86_64/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/Users/drudd/Programs/yt-x86_64/src/yt-hg/yt/data_objects/data_containers.py", line 525, in _chunked_read import traceback; traceback.print_stack()
It looks like the problem is the attempt to save state in _chunked_read while _chunk iterates to call .get_data. Since size is just a wrapper around the chunk.data_size, would it be better to have something like this:
_size = None @property def size(self) : if self._size is None : if not self._current_chunk is None : self._size = self._current_chunk.data_size return self._size
and def _chunked_read(self, chunk): # There are several items that need to be swapped out # field_data, size, shape old_field_data, self.field_data = self.field_data, YTFieldData() old_chunk, self._current_chunk = self._current_chunk, chunk self._size = None old_locked, self._locked = self._locked, False yield self.field_data = old_field_data self._current_chunk = old_chunk self._size = None self._locked = old_locked
That may just push the problem further down the call stack, but at least means this code won't ask for .get_size until needed.
Douglas Rudd Scientific Computing Consultant Research Computing Center drudd@uchicago.edu
Hi Doug,
On Thu, Mar 14, 2013 at 11:55 AM, Douglas Harvey Rudd drudd@uchicago.edu wrote:
Hi Matt,
I just pushed a change to my fork that changes the way the data_size property of our ARTIOChunk (same as old DomainSubset) is exposed. This catches cases when .data_size is read before a fill is called. You reordered calls somewhere in the main code that was causing this to fail for SlicePlot, but now I am running into it for ProjectionPlot.
Yup, you're right. Sorry about that. I want to take the opportunity also to thank you for your work on this -- as I've said before, opaque data object support is going to be a huge win for us, and I appreciate you sorting through all the myopic, "Matt didn't think this would be a problem" sections. :)
from yt.mods import *
pf = load("tests/artdat/sizmbhloz-clref04SNth-rs9_a0.9011.art") ProjectionPlot(pf, "x", "Density", width = (100,'kpc') ).save()
leads to yt : [ERROR ] 2013-03-14 10:39:05,002 ARTIOChunk.data_size called before fill
I've tracked it back to this line in YTSelectionContainer._chunked_read old_size, self.size = self.size, chunk.data_size
Here is the traceback: File "./testprojection.py", line 4, in <module> ProjectionPlot(pf, "x", "Density", width = (100,'kpc') ).save() File "/Users/drudd/Programs/yt-x86_64/src/yt-hg/yt/visualization/plot_window.py", line 1336, in __init__ center=center, field_parameters = field_parameters) File "/Users/drudd/Programs/yt-x86_64/src/yt-hg/yt/data_objects/construction_data_containers.py", line 243, in __init__ self.get_data(field) File "/Users/drudd/Programs/yt-x86_64/src/yt-hg/yt/data_objects/construction_data_containers.py", line 283, in get_data for chunk in self.data_source.chunks(None, "io"): File "/Users/drudd/Programs/yt-x86_64/src/yt-hg/yt/data_objects/data_containers.py", line 448, in chunks with self._chunked_read(chunk): File "/Users/drudd/Programs/yt-x86_64/lib/python2.7/contextlib.py", line 17, in __enter__ return self.gen.next() File "/Users/drudd/Programs/yt-x86_64/src/yt-hg/yt/data_objects/data_containers.py", line 525, in _chunked_read import traceback; traceback.print_stack()
It looks like the problem is the attempt to save state in _chunked_read while _chunk iterates to call .get_data. Since size is just a wrapper around the chunk.data_size, would it be better to have something like this:
_size = None @property def size(self) : if self._size is None : if not self._current_chunk is None : self._size = self._current_chunk.data_size return self._size
Yes, something like this, but is there going to be a case where _current_chunk is None? Or will this always be called after _identify_base_chunk? Incidentally, I am now wondering if we should just get rid of reliance on .size in the first place. It's not clear to me that there is a case when we absolutely need it, except *inside* the IO handlers, and those should all be able to guess or not rely on the size anyway. Right?
and def _chunked_read(self, chunk): # There are several items that need to be swapped out # field_data, size, shape old_field_data, self.field_data = self.field_data, YTFieldData() old_chunk, self._current_chunk = self._current_chunk, chunk self._size = None old_locked, self._locked = self._locked, False yield self.field_data = old_field_data self._current_chunk = old_chunk self._size = None self._locked = old_locked
That may just push the problem further down the call stack, but at least means this code won't ask for .get_size until needed.
I actually don't see the diff here, what changed in this function?
Douglas Rudd Scientific Computing Consultant Research Computing Center drudd@uchicago.edu
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Yup, you're right. Sorry about that. I want to take the opportunity also to thank you for your work on this -- as I've said before, opaque data object support is going to be a huge win for us, and I appreciate you sorting through all the myopic, "Matt didn't think this would be a problem" sections. :)
Of course, changing code in this way will always uncover implicit assumptions, and these are very difficult to pick up in unit tests (since most of those tests will reflect the same assumptions).
Yes, something like this, but is there going to be a case where _current_chunk is None? Or will this always be called after _identify_base_chunk? Incidentally, I am now wondering if we should just get rid of reliance on .size in the first place. It's not clear to me that there is a case when we absolutely need it, except *inside* the IO handlers, and those should all be able to guess or not rely on the size anyway. Right?
I don't know, but I also didn't know what to do in that situation, so I let it fall back to returning an undefined quantity (_size is None). We can identify a base chunk, but then it will simply fail due to the size being queried before the read.
def _chunked_read(self, chunk): # There are several items that need to be swapped out # field_data, size, shape old_field_data, self.field_data = self.field_data, YTFieldData() old_chunk, self._current_chunk = self._current_chunk, chunk self._size = None old_locked, self._locked = self._locked, False yield self.field_data = old_field_data self._current_chunk = old_chunk self._size = None self._locked = old_locked
That may just push the problem further down the call stack, but at least means this code won't ask for .get_size until needed.
I actually don't see the diff here, what changed in this function?
I removed the code that copied the old size and restored it after the yield. Instead, the _size is always invalidated any time _current_chunk is changed, and then will be repopulated as needed. I'm assuming here that size is trivially computed from the chunks.
Doug
Hi Doug,
On Thu, Mar 14, 2013 at 2:16 PM, Douglas Harvey Rudd drudd@uchicago.edu wrote:
Yup, you're right. Sorry about that. I want to take the opportunity also to thank you for your work on this -- as I've said before, opaque data object support is going to be a huge win for us, and I appreciate you sorting through all the myopic, "Matt didn't think this would be a problem" sections. :)
Of course, changing code in this way will always uncover implicit assumptions, and these are very difficult to pick up in unit tests (since most of those tests will reflect the same assumptions).
Yes, something like this, but is there going to be a case where _current_chunk is None? Or will this always be called after _identify_base_chunk? Incidentally, I am now wondering if we should just get rid of reliance on .size in the first place. It's not clear to me that there is a case when we absolutely need it, except *inside* the IO handlers, and those should all be able to guess or not rely on the size anyway. Right?
I don't know, but I also didn't know what to do in that situation, so I let it fall back to returning an undefined quantity (_size is None). We can identify a base chunk, but then it will simply fail due to the size being queried before the read.
That sounds good to me.
def _chunked_read(self, chunk): # There are several items that need to be swapped out # field_data, size, shape old_field_data, self.field_data = self.field_data, YTFieldData() old_chunk, self._current_chunk = self._current_chunk, chunk self._size = None old_locked, self._locked = self._locked, False yield self.field_data = old_field_data self._current_chunk = old_chunk self._size = None self._locked = old_locked
That may just push the problem further down the call stack, but at least means this code won't ask for .get_size until needed.
I actually don't see the diff here, what changed in this function?
I removed the code that copied the old size and restored it after the yield. Instead, the _size is always invalidated any time _current_chunk is changed, and then will be repopulated as needed. I'm assuming here that size is trivially computed from the chunks.
Ah, great. I agree with this.
Doug _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org