Brian and Matt,
FYI, I have had similar issues with a 1024^3 dataset (unigrid) that I was able to get the parallel hop working eventually, but it is a memory hog and not super fast. It required about the same number of cores and method of running on ranger that Brian describes.
On Aug 19, 2010, at 3:39 PM, Brian O'Shea wrote:
As you know (since we discussed it off-list), I'm the reason for this being mentioned to you. I had some pretty horrible problems with the various incarnations of HOP in yt being excruciatingly slow and consuming huge amounts of memory for a 1024^3 unigrid dataset, to the point where my grad student and I ended up just using P-GroupFinder, the standalone halo finder that comes with week-of-code enzo. Note that when I say "excruciatingly slow" and "consuming huge amounts of memory", I mean that when we used 256 nodes on Ranger, with 2 cores/node (so 512 cores total) for the 1024^3 dataset, it still ran Ranger out of memory, or, alternately, didn't finish in 24 hours. Various permutations of cores per node, total nodes, and wall clock time all resulted in either seg faults or the code running out the wall clock time, to the tune of us wasting half a million CPU hours trying to do halo-finding via yt for this dataset. That's not cool. P-GroupFinder, in comparison, generated the halo catalog for the same dataset in about 10 minutes on 256 processors. The difference in performance is striking, to say the least.
We also had seriously problems with the projections taking significantly more time and memory than one might think they should based on my old standalone tools, but this is already being dealt with. Slices seemed to work just fine, and other things like PDFs seem to work fine as well.
One reason that I mentioned this to Mike Norman (presumably he is the person who mentioned the yt thing to you) is that when we were at the Teragrid conference a couple of weeks ago, the subject of inline data analysis came up as relating to our planned Blue Waters unigrid and AMR runs. I expressed reservations that the current version of yt would be an effective solution at the scales we need (4096^3 unigrid run, roughly 1024^3 refine-everywhere AMR runs), based on my recent experiences with the code. While I am on the yt-dev mailing list, you know that I'm not actively developing yt (and maybe would be considered a novice user, at best), so I could simply be 100% wrong in my concerns. Maybe we could run some performance tests? I have a 1024^3 unigrid dataset that seems to be yt's White Whale...
Yt-dev mailing list
Google Voice: (774) 469-0278