Hi Adam,

I don't work with datasets that large, but I know others have in the past.

It's hard to say exactly what's going wrong here without seeing how you construct the cut_region.  Would you mind sharing your script?

One explanation for the behavior you're seeing comes from the fact that cut_regions are defined in terms of other fields, and those fields will need to be loaded off disk before the data you are interested in can be selected. If you're not already doing so, defining the cut_region in terms of a geometric object that covers a small fraction of the domain will probably be more memory efficient.

Failing that, if you have a cluster available, you should be able to split the memory load over multiple compute nodes by running yt in parallel.

Hope that helps, please let us know if you have further questions.


On Wed, Nov 26, 2014 at 4:49 PM, Adam Jacobs <adam.jacobs@stonybrook.edu> wrote:
Is it unreasonable for me to expect yt on a workstation with 4x3GHz cores and ~20 GB of RAM to be able to handle a ~100 GB dataset?  I'm trying to select out a subset of the dataset using cut_region(), but still seem to run into hanging or running out of RAM.  For example, when I try to do a write_out() on the cut region yt sucks up all 20 GB of RAM and I have to kill it.  Is there a preferred method for loading in a subset of data?  I don't need near the full 100 GB of data.

I'm using yt dev and looking at BoxLib/Maestro data.

Adam Jacobs
Department of Physics and Astronomy, PhD Candidate
Stony Brook University


yt-users mailing list