Mailman 3 Handling large files - yt-users

Handling large files

older
Is it possible to get grid size...

Adam Jacobs

27 Nov 2014 27 Nov '14

12:49 a.m.

Is it unreasonable for me to expect yt on a workstation with 4x3GHz cores and ~20 GB of RAM to be able to handle a ~100 GB dataset? I'm trying to select out a subset of the dataset using cut_region(), but still seem to run into hanging or running out of RAM. For example, when I try to do a write_out() on the cut region yt sucks up all 20 GB of RAM and I have to kill it. Is there a preferred method for loading in a subset of data? I don't need near the full 100 GB of data. I'm using yt dev and looking at BoxLib/Maestro data. -- Adam Jacobs Department of Physics and Astronomy, PhD Candidate Stony Brook University http://astro.sunysb.edu/amjacobs/

Attachments:

attachment.html (text/html — 894 bytes)

Show replies by date

Nathan Goldbaum

27 Nov 27 Nov

1:33 a.m.

Hi Adam, I don't work with datasets that large, but I know others have in the past. It's hard to say exactly what's going wrong here without seeing how you construct the cut_region. Would you mind sharing your script? One explanation for the behavior you're seeing comes from the fact that cut_regions are defined in terms of other fields, and those fields will need to be loaded off disk before the data you are interested in can be selected. If you're not already doing so, defining the cut_region in terms of a geometric object that covers a small fraction of the domain will probably be more memory efficient. Failing that, if you have a cluster available, you should be able to split the memory load over multiple compute nodes by running yt in parallel. Hope that helps, please let us know if you have further questions. -Nathan On Wed, Nov 26, 2014 at 4:49 PM, Adam Jacobs wrote:

...

Is it unreasonable for me to expect yt on a workstation with 4x3GHz cores and ~20 GB of RAM to be able to handle a ~100 GB dataset? I'm trying to select out a subset of the dataset using cut_region(), but still seem to run into hanging or running out of RAM. For example, when I try to do a write_out() on the cut region yt sucks up all 20 GB of RAM and I have to kill it. Is there a preferred method for loading in a subset of data? I don't need near the full 100 GB of data.

I'm using yt dev and looking at BoxLib/Maestro data.

-- Adam Jacobs Department of Physics and Astronomy, PhD Candidate Stony Brook University

http://astro.sunysb.edu/amjacobs/

_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org

Adam Jacobs

30 Nov 30 Nov

3:45 p.m.

Thanks Nathan. That helps! I didn't realize, though in hindsight it seems obvious, that yt would still need to load all of the file to select out based on the field. I'll play around with geometry cuts and if that doesn't work I'll send an example of the relevant sections of the script. On Wed, Nov 26, 2014 at 8:33 PM, Nathan Goldbaum wrote:

...

Hi Adam,

I don't work with datasets that large, but I know others have in the past.

It's hard to say exactly what's going wrong here without seeing how you construct the cut_region. Would you mind sharing your script?

One explanation for the behavior you're seeing comes from the fact that cut_regions are defined in terms of other fields, and those fields will need to be loaded off disk before the data you are interested in can be selected. If you're not already doing so, defining the cut_region in terms of a geometric object that covers a small fraction of the domain will probably be more memory efficient.

Failing that, if you have a cluster available, you should be able to split the memory load over multiple compute nodes by running yt in parallel.

Hope that helps, please let us know if you have further questions.

-Nathan

On Wed, Nov 26, 2014 at 4:49 PM, Adam Jacobs wrote:

...
Is it unreasonable for me to expect yt on a workstation with 4x3GHz cores and ~20 GB of RAM to be able to handle a ~100 GB dataset? I'm trying to select out a subset of the dataset using cut_region(), but still seem to run into hanging or running out of RAM. For example, when I try to do a write_out() on the cut region yt sucks up all 20 GB of RAM and I have to kill it. Is there a preferred method for loading in a subset of data? I don't need near the full 100 GB of data.

I'm using yt dev and looking at BoxLib/Maestro data.

-- Adam Jacobs Department of Physics and Astronomy, PhD Candidate Stony Brook University

http://astro.sunysb.edu/amjacobs/

_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org

_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org

-- Adam Jacobs Department of Physics and Astronomy, PhD Candidate Stony Brook University http://astro.sunysb.edu/amjacobs/

3434

Age (days ago)

3437

Last active (days ago)

List overview

Download

2 comments

2 participants

participants (2)

Adam Jacobs
Nathan Goldbaum

Handling large files

Adam Jacobs

Nathan Goldbaum

Adam Jacobs

tags

participants (2)