Handling large files
Is it unreasonable for me to expect yt on a workstation with 4x3GHz cores and ~20 GB of RAM to be able to handle a ~100 GB dataset? I'm trying to select out a subset of the dataset using cut_region(), but still seem to run into hanging or running out of RAM. For example, when I try to do a write_out() on the cut region yt sucks up all 20 GB of RAM and I have to kill it. Is there a preferred method for loading in a subset of data? I don't need near the full 100 GB of data. I'm using yt dev and looking at BoxLib/Maestro data. -- Adam Jacobs Department of Physics and Astronomy, PhD Candidate Stony Brook University http://astro.sunysb.edu/amjacobs/
Hi Adam,
I don't work with datasets that large, but I know others have in the past.
It's hard to say exactly what's going wrong here without seeing how you
construct the cut_region. Would you mind sharing your script?
One explanation for the behavior you're seeing comes from the fact that
cut_regions are defined in terms of other fields, and those fields will
need to be loaded off disk before the data you are interested in can be
selected. If you're not already doing so, defining the cut_region in terms
of a geometric object that covers a small fraction of the domain will
probably be more memory efficient.
Failing that, if you have a cluster available, you should be able to split
the memory load over multiple compute nodes by running yt in parallel.
Hope that helps, please let us know if you have further questions.
-Nathan
On Wed, Nov 26, 2014 at 4:49 PM, Adam Jacobs
Is it unreasonable for me to expect yt on a workstation with 4x3GHz cores and ~20 GB of RAM to be able to handle a ~100 GB dataset? I'm trying to select out a subset of the dataset using cut_region(), but still seem to run into hanging or running out of RAM. For example, when I try to do a write_out() on the cut region yt sucks up all 20 GB of RAM and I have to kill it. Is there a preferred method for loading in a subset of data? I don't need near the full 100 GB of data.
I'm using yt dev and looking at BoxLib/Maestro data.
-- Adam Jacobs Department of Physics and Astronomy, PhD Candidate Stony Brook University
http://astro.sunysb.edu/amjacobs/
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
Thanks Nathan. That helps! I didn't realize, though in hindsight it seems
obvious, that yt would still need to load all of the file to select out
based on the field. I'll play around with geometry cuts and if that
doesn't work I'll send an example of the relevant sections of the script.
On Wed, Nov 26, 2014 at 8:33 PM, Nathan Goldbaum
Hi Adam,
I don't work with datasets that large, but I know others have in the past.
It's hard to say exactly what's going wrong here without seeing how you construct the cut_region. Would you mind sharing your script?
One explanation for the behavior you're seeing comes from the fact that cut_regions are defined in terms of other fields, and those fields will need to be loaded off disk before the data you are interested in can be selected. If you're not already doing so, defining the cut_region in terms of a geometric object that covers a small fraction of the domain will probably be more memory efficient.
Failing that, if you have a cluster available, you should be able to split the memory load over multiple compute nodes by running yt in parallel.
Hope that helps, please let us know if you have further questions.
-Nathan
On Wed, Nov 26, 2014 at 4:49 PM, Adam Jacobs
wrote: Is it unreasonable for me to expect yt on a workstation with 4x3GHz cores and ~20 GB of RAM to be able to handle a ~100 GB dataset? I'm trying to select out a subset of the dataset using cut_region(), but still seem to run into hanging or running out of RAM. For example, when I try to do a write_out() on the cut region yt sucks up all 20 GB of RAM and I have to kill it. Is there a preferred method for loading in a subset of data? I don't need near the full 100 GB of data.
I'm using yt dev and looking at BoxLib/Maestro data.
-- Adam Jacobs Department of Physics and Astronomy, PhD Candidate Stony Brook University
http://astro.sunysb.edu/amjacobs/
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
-- Adam Jacobs Department of Physics and Astronomy, PhD Candidate Stony Brook University http://astro.sunysb.edu/amjacobs/
participants (2)
-
Adam Jacobs
-
Nathan Goldbaum