Memory and high level covering grids

Hi all, I've been running into this problem for a while now, but I've reached a point where I can't avoid it any longer. I have a 6 level AMR simulation (run with Orion, but that's probably irrelevant to my problem) that has a level 0 size of (256)^3. I would like to extract a covering grid for a small subset at level 4 (for the full volume, that equates to a (4096)^3 grid). The chunk that I want has an equivalent size at level 4 of 2048x2048x1536. Basic setup: I'm running on a node with 16 cores and 256 GB of memory. I have tried both yt version 3.4 and 3.5 (updated this morning). I've tried level = 4 ds = yt.load(filename) cg = ds.covering_grid(level, ds.domain_left_edge, ds.domain_dimensions * ds.refine_by**level) result = cg["density"][1024:3072,1434:3482,512:2048] This results in the error File "/home/u14/rsmullen/anaconda3/lib/python3.5/site-packages/yt/data_objects/data_containers.py", line 282, in __getitem__ self.get_data(f) File "/home/u14/rsmullen/anaconda3/lib/python3.5/site-packages/yt/data_objects/construction_data_containers.py", line 641, in get_data if len(fill) > 0: self._fill_fields(fill) File "/home/u14/rsmullen/anaconda3/lib/python3.5/site-packages/yt/data_objects/construction_data_containers.py", line 690, in _fill_fields for field in fields] File "/home/u14/rsmullen/anaconda3/lib/python3.5/site-packages/yt/data_objects/construction_data_containers.py", line 690, in <listcomp> for field in fields] MemoryError Looking at the memory buildup during execution, it appears that yt is loading densities for the entire volume to then return my chunk to me. I have more than enough memory on the machine (the job exits at ~70GB usage), so I think this is a Python limit on array allocations. I have also tried creating a box region in the area that I want, but covering grids cannot be run on regions. Thus, I tried saving my region as a dataset, but it reloads with incorrect limits and sizes: box = ds.box([xmin,ymin,zmin], [xmax,ymax,zmax]) box.save_as_dataset(filename = 'box', fields = (('gas','density'))) ds2 = yt.load('box.h5') print( ds2.domain_left_edge, ds2.domain_dimensions, ds2.refine_by ) cg = ds2.covering_grid(level, ds2.domain_left_edge, ds2.domain_dimensions * ds2.refine_by**level) This gives [ -7.72500000e+18 -7.72500000e+18 -7.72500000e+18] code_length [2 2 2] 2 instead of [-1.93125e+18 -1.15875e+18 -2.896875e+18] [128 128 96] 2 (my expectation) So, to summarize my questions: 1. Is there an easy way for me to directly get my chunk at level 4 resolution with yt as it stands now? 2. Otherwise, is there an easy Python fix to allow the array memory allocation to be larger? 3. If none of the above, how can I save my region with the correct properties so I can reload my chunk as a smaller dataset? Thanks so much for your help! Rachel Smullen Graduate Student Steward Observatory University of Arizona

Hi Rachel, Thanks for writing! On Thu, Mar 15, 2018 at 2:11 PM, Rachel Smullen <rsmullen@email.arizona.edu> wrote:
Hi all,
I've been running into this problem for a while now, but I've reached a point where I can't avoid it any longer.
I have a 6 level AMR simulation (run with Orion, but that's probably irrelevant to my problem) that has a level 0 size of (256)^3. I would like to extract a covering grid for a small subset at level 4 (for the full volume, that equates to a (4096)^3 grid). The chunk that I want has an equivalent size at level 4 of 2048x2048x1536.
It's not completely irrelevant that you're using Orion -- the different codes read data in differently, so peak memory usage will vary based on which frontend. So the minimum RAM here will be 2048*2048*1536 * 64 bits per field, which is about 52gb.
Basic setup: I'm running on a node with 16 cores and 256 GB of memory. I have tried both yt version 3.4 and 3.5 (updated this morning).
I've tried
level = 4 ds = yt.load(filename) cg = ds.covering_grid(level, ds.domain_left_edge, ds.domain_dimensions * ds.refine_by**level) result = cg["density"][1024:3072,1434:3482,512:2048]
This results in the error
File "/home/u14/rsmullen/anaconda3/lib/python3.5/site-packages/yt/data_objects/data_containers.py", line 282, in __getitem__ self.get_data(f) File "/home/u14/rsmullen/anaconda3/lib/python3.5/site-packages/yt/data_objects/construction_data_containers.py", line 641, in get_data if len(fill) > 0: self._fill_fields(fill) File "/home/u14/rsmullen/anaconda3/lib/python3.5/site-packages/yt/data_objects/construction_data_containers.py", line 690, in _fill_fields for field in fields] File "/home/u14/rsmullen/anaconda3/lib/python3.5/site-packages/yt/data_objects/construction_data_containers.py", line 690, in <listcomp> for field in fields] MemoryError
Looking at the memory buildup during execution, it appears that yt is loading densities for the entire volume to then return my chunk to me. I have more than enough memory on the machine (the job exits at ~70GB usage), so I think this is a Python limit on array allocations.
Yup, that's definitely what it's doing -- it's loading up the entire run at level 4. I think you might be able to reduce the peak memory consumption by changing the covering grid instantiation to be just for the area under consideration: dx = (ds.domain_width / (ds.domain_dimensions*ds.refine_by**level)) left_edge = ds.domain_left_edge + dx * [1024, 1434, 512] cg = ds.covering_grid(level, left_edge, [2048, 2048, 1536]) This will restrict it to *just* that area.
I have also tried creating a box region in the area that I want, but covering grids cannot be run on regions. Thus, I tried saving my region as a dataset, but it reloads with incorrect limits and sizes:
box = ds.box([xmin,ymin,zmin], [xmax,ymax,zmax]) box.save_as_dataset(filename = 'box', fields = (('gas','density'))) ds2 = yt.load('box.h5') print( ds2.domain_left_edge, ds2.domain_dimensions, ds2.refine_by ) cg = ds2.covering_grid(level, ds2.domain_left_edge, ds2.domain_dimensions * ds2.refine_by**level)
This gives [ -7.72500000e+18 -7.72500000e+18 -7.72500000e+18] code_length [2 2 2] 2 instead of [-1.93125e+18 -1.15875e+18 -2.896875e+18] [128 128 96] 2 (my expectation)
So, to summarize my questions: 1. Is there an easy way for me to directly get my chunk at level 4 resolution with yt as it stands now?
I *think* the method above will do it, but there's a caveat: it might still run out of memory. I think it may still generate a couple fields that deal with coordinates and (depending on the size of the grid blocks in your output) they might be big. If that's the case, an alternate method you could explore would be to generate it in sweeps of, say, 512. It might look something like this, although I may have messed up some specifics here: f = h5py.File("output.h5", "w") f.create_dataset("/output", shape = (2048, 2048, 1536), dtype="f8") for i in range(4): for j in range(4): for k in range(3): print(i,j,k) dx = (ds.domain_width / (ds.domain_dimensions*ds.refine_by**level)) left_edge = ds.domain_left_edge + dx * [1024 + i * 512, 1434 + j * 512, 512 + k * 512] cg = ds.covering_grid(level, left_edge, [512, 512, 512]) f["/output"][ i * 512 : (i+1)*512, j * 512 : (j+1) * 512, k * 512 : (k+1)*512] = cg["density"] f.close() I hope that helps, and let us know if it does/doesn't!
2. Otherwise, is there an easy Python fix to allow the array memory allocation to be larger? 3. If none of the above, how can I save my region with the correct properties so I can reload my chunk as a smaller dataset?
Thanks so much for your help!
Rachel Smullen
Graduate Student Steward Observatory University of Arizona
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
participants (2)
-
Matthew Turk
-
Rachel Smullen