I believe a similar issue exists in the AREPO frontend (maybe if it exists for both the frontend is not the problem...?). Sorry I don't have more details, someone in my research group was complaining of a memory leak when loading AREPO datasets one after another.

From: Clément Robert via yt-users <yt-users@python.org>
Sent: Tuesday, October 11, 2022 9:51 AM
To: Discussion of the yt analysis package <yt-users@python.org>
Cc: Anne Noer Kolborg <akolborg@ucsc.edu>; Nicholas 💍🐻❤️ <nicholas@swiatecki.com>; Clément Robert <clement.robert@protonmail.com>
Subject: [EXTERNAL] [yt-users] Re: Potential memory leak in Ramses frontend
As a quick pointer, I’d like to promote Memray, which is still relatively new, and an excellent tool to trace an visualise memory usage in Python applications in details


On 11 Oct 2022, at 05:48, Anne Noer Kolborg via yt-users <yt-users@python.org> wrote:

Dear yt community, 

I am have encountered an out-of-memory problem (potential memory leak) when using yt to load and process several Ramses outputs in a row. Each simulation output is approximately 3 GB, but the RAM usage easily exceeds 16 GB in ~30 mins of computation time (single threaded computation time). When allocated more RAM the computation continues for longer but ultimately always runs out the memory before the end of the data-processing.

I have created a minimal working example that triggers this problem. It is uploaded here: http://paste.yt-project.org/show/404/ 
Log of the memory usage over time while running that script is here: http://paste.yt-project.org/show/406/ 

As you can see the total memory usage simply grows for each round of the loop. I suspect that the memory allocated to the dataset does not get cleared. 

Simulation output to test the code is uploaded here: http://use.yt/upload/1ace6072 

The del and clear_data() statements in line 18 and 19 do not impact the memory usage when commented in. 

I tested the same bit of code using the “HiresIsolatedGalaxy” Enzo dataset from the sample datasets (https://yt-project.org/data/). It is similar in size to the outputs I am trying to process and runs without issues. Therefore, I suspect it might be the ramses frontend causing the memory leak. 

Any advice or pointers on how to debug this further, alternative ways to handle this or suggestions for how to use yt more efficiently are most welcome! 

Best regards, 

Version lists: 
Yt version 4.1.dev, changeset = 39949fb6bfba. (cloned from GitHub about a week ago) - the stable version 4.0.1 also produces the same issue. 
Python version 3.8.12 
Gcc 4.8.5 
CentOS Linux release 7.9.2009 // Linux 3.10.0 (x86-64 architecture)