Hi Stephen,

>As you know (since we discussed it off-list), I'm the reason for this being
>mentioned to you.  I had some pretty horrible problems with the various
>incarnations of HOP in yt being excruciatingly slow and consuming huge amounts
>of memory for a 1024^3 unigrid dataset, to the point where my grad student and I
>ended up just using P-GroupFinder, the standalone halo finder that comes with
>week-of-code enzo.  Note that when I say "excruciatingly slow" and "consuming
>huge amounts of memory", I mean that when we used 256 nodes on Ranger, with 2
>cores/node (so 512 cores total) for the 1024^3 dataset, it still ran Ranger out
>of memory, or, alternately, didn't finish in 24 hours.

A few notes in response:

- Recently I ran a 2048^3 dataset on 264 cores that took about 2 hours which
averaged about 8.5GB per task with a peak task of 10 GB. Your job is 1/8 the
size and should have run, and I don't know why it didn't.

On Ranger, Kraken, or another machine?   Regardless, that's far, far less time than it took us to NOT find halos on our dataset.  I'd be happy to point you towards this dataset, if you'd like (I may have already done this in an off-list email), so you can try it yourself.  I'd be VERY curious to see if you encounter similar problems to us on Ranger and/or Kraken for our 1024^3 dataset.
- If I wasn't trying to graduate I would have had more time to assist when your
student (Brian) asked me for help. I'm sorry so much of your time was wasted.

It's more human time than computer time, at this point - we spent a big chunk of the summer simply trying to find the halos in a box, which was meant to be step 1 of the project.  Very frustrating for a new grad student.
- My tool as a public tool is not any good unless other people can use it too.
Clearly I need to do some work on that.

- It *does* use much more memory than it needs to, you are right. I know where
the problems are, and whoo-boy they are there, but they are not easy to fix.

- Speed could be better, but some of this has to do with how HOP itself works.
For example, it needs to run the kD tree twice, unlike FOF which needs to only
once. The final group building step is a "global" operation, so that's slow as
well. On 128^3 particles, (normal) HOP takes about 75 seconds, and FOF about 25.
The C HOP and FOF in yt both use the same kD tree, same data I/O methods, so
that's a fair ratio of the increased workload.

This is interesting, and puzzling. We have a 256^3 version of the simulation that I was talking about earlier, and saw numbers that would be comparable to those you mention above.  Scaled up to a much larger calculation, however, it took way longer than one might think based on a back-of-the-envelope estimate.  Again, I really do think that, once you finish your thesis, it'd potentially be very useful for you to take a look at our dataset. It may simply be that our very small box is pathological in some way compared to the simulations you've been testing on.