Running Rockstar Halo Finder to create a merger tree

Hi all,
I'm quite confused on a number of points related to running the rockstar halo finder, so I hope its alright that I put all these questions into this one email!
1. I can't seem to run the rockstar halo finder at all without getting this error followed by a segmentation fault and crash.
[Warning] Network IO Failure (PID XXXXXX): Connection reset by peer [Network] Packet receive retry count at: 1
It sort of seems like this issue ( http://lists.spacepope.org/htdig.cgi/yt-dev-spacepope.org/2012-November/0026...) but I couldn't really figure out what the resolution was from the thread. Im attempting to run this on kraken and it doesn't matter if I use a single compute node or multiple, I get the same error. (I hope this isn't the infiniband issue the docs warned about, I couldn't figure out if that is how kraken is connected and I got an error that the suggested flag doesn't exist so I didn't press the issue.)
2. Whenever I finally do get the halo finder to work, I need the results to be in a form that the merger tree can use. It seems as though the MergerTree needs the results in the same form as the other halo finders give, so would getting the halo list and then dumping it as usual be the appropriate strategy? Ie:
rh.run() halo_list = rh.halo_list() halo_list.dump('MergerHalos')
2.5. The docs sort of give mixed messages on whether or not I could just be calling MergerTree with the argument halo_finder_function = RockstarHaloFinder. At this point I've pretty thoroughly convinced myself that I can't, but it would be nice if that was clarified. (Just a thoroughly overwhelmed new user's perspective!)
3. I'm a little confused as to whether or not I have to use a TimeSeriesData object rather than the usual single time output when instantiating the halo finder. Under "Rockstar Halo Finding" it uses TimeSeriesData, unlike the rest of the examples, but under the subheading "Output Analysis" it just uses pf. The "Output Analysis" example also doesn't call the run() method, which leads me to believe something else entirely is going on, but its not quite clear.
Thanks! -Hilary

Hi Hilary,
On 10/29/2013 12:30 PM, Hilary Egan wrote:
I'm quite confused on a number of points related to running the rockstar halo finder, so I hope its alright that I put all these questions into this one email!
It's no problem at all to include all of your questions in a single email. It's probably better this way!
- I can't seem to run the rockstar halo finder at all without getting
this error followed by a segmentation fault and crash.
[Warning] Network IO Failure (PID XXXXXX): Connection reset by peer [Network] Packet receive retry count at: 1
It sort of seems like this issue (http://lists.spacepope.org/htdig.cgi/yt-dev-spacepope.org/2012-November/0026...) but I couldn't really figure out what the resolution was from the thread. Im attempting to run this on kraken and it doesn't matter if I use a single compute node or multiple, I get the same error. (I hope this isn't the infiniband issue the docs warned about, I couldn't figure out if that is how kraken is connected and I got an error that the suggested flag doesn't exist so I didn't press the issue.)
I haven't seen that error before, but I still have to specific to *not* run on infiniband when running Rockstar on a single node. With OpenMPI, you would use "mpirun -n 32 --mca btl ^openib ...", but I haven't done this on kraken with their aprun but hopefully it's easily accompolished!
- Whenever I finally do get the halo finder to work, I need the results
to be in a form that the merger tree can use. It seems as though the MergerTree needs the results in the same form as the other halo finders give, so would getting the halo list and then dumping it as usual be the appropriate strategy? Ie:
rh.run() halo_list = rh.halo_list()
halo_list.dump('MergerHalos')
2.5. The docs sort of give mixed messages on whether or not I could just be calling MergerTree with the argument halo_finder_function = RockstarHaloFinder. At this point I've pretty thoroughly convinced myself that I can't, but it would be nice if that was clarified. (Just a thoroughly overwhelmed new user's perspective!)
I'm not sure whether you can use yt's merger tree code with the Rockstar halos. I haven't tried.
However, I've used Consistent Trees
https://code.google.com/p/consistent-trees/
with Rockstar's halo lists, which is also written by Peter Behroozi. I've chosen this route because the algorithm seems to be more physically robust in constructing parent/child relationships and boundness. All of the instructions are in the README of the code, and it's pretty straightforward and fast (probably 5-10 minutes for a 512^3 simulation with 60 outputs) to run.
I also have a visualizing script for consistent tree's output.
https://bitbucket.org/jwise77/rockstar-dot
From consistent tree's output, you can use the provided script, halo_trees_to_catalog.pl, (instructions also in the README) to convert the tree output into halo lists.
- I'm a little confused as to whether or not I have to use a
TimeSeriesData object rather than the usual single time output when instantiating the halo finder. Under "Rockstar Halo Finding" it uses TimeSeriesData, unlike the rest of the examples, but under the subheading "Output Analysis" it just uses pf. The "Output Analysis" example also doesn't call the run() method, which leads me to believe something else entirely is going on, but its not quite clear.
This actually came up recently. It's best to supply a TimeSeriesData object. Here's the link to the email for more details.
http://lists.spacepope.org/pipermail/yt-users-spacepope.org/2013-August/0038...
Cheers, John

Hi all,
I'm still having trouble getting rockstar to run on my dataset. I've moved my data to a different super computer that uses the mpirun command rather than kraken's aprun so that I could make sure there aren't any infiniband issues, but Im still seeing similar issues. I've also determined I can run the halo finder just fine on a smaller test dataset, which is leading me to believe that its some sort of memory issue, but I can't quite figure out how I would go about fixing it. I've tried playing with the number of readers and the number of nodes I'm running on, to no avail. For reference, the dataset is a 1024^3 unigrid enzo run. If anyone has any suggestions, I'd love to hear them!
Thanks! Hilary
Script: http://paste.yt-project.org/show/4025/
Error message: http://paste.yt-project.org/show/4024/
On Wed, Oct 30, 2013 at 8:50 AM, John Wise jwise@physics.gatech.edu wrote:
Hi Hilary,
On 10/29/2013 12:30 PM, Hilary Egan wrote:
I'm quite confused on a number of points related to running the rockstar
halo finder, so I hope its alright that I put all these questions into this one email!
It's no problem at all to include all of your questions in a single email. It's probably better this way!
- I can't seem to run the rockstar halo finder at all without getting
this error followed by a segmentation fault and crash.
[Warning] Network IO Failure (PID XXXXXX): Connection reset by peer [Network] Packet receive retry count at: 1
It sort of seems like this issue (http://lists.spacepope.org/htdig.cgi/yt-dev-spacepope. org/2012-November/002681.html) but I couldn't really figure out what the resolution was from the thread. Im attempting to run this on kraken and it doesn't matter if I use a single compute node or multiple, I get the same error. (I hope this isn't the infiniband issue the docs warned about, I couldn't figure out if that is how kraken is connected and I got an error that the suggested flag doesn't exist so I didn't press the issue.)
I haven't seen that error before, but I still have to specific to *not* run on infiniband when running Rockstar on a single node. With OpenMPI, you would use "mpirun -n 32 --mca btl ^openib ...", but I haven't done this on kraken with their aprun but hopefully it's easily accompolished!
- Whenever I finally do get the halo finder to work, I need the results
to be in a form that the merger tree can use. It seems as though the MergerTree needs the results in the same form as the other halo finders give, so would getting the halo list and then dumping it as usual be the appropriate strategy? Ie:
rh.run() halo_list = rh.halo_list()
halo_list.dump('MergerHalos')
2.5. The docs sort of give mixed messages on whether or not I could just be calling MergerTree with the argument halo_finder_function = RockstarHaloFinder. At this point I've pretty thoroughly convinced myself that I can't, but it would be nice if that was clarified. (Just a thoroughly overwhelmed new user's perspective!)
I'm not sure whether you can use yt's merger tree code with the Rockstar halos. I haven't tried.
However, I've used Consistent Trees
https://code.google.com/p/consistent-trees/
with Rockstar's halo lists, which is also written by Peter Behroozi. I've chosen this route because the algorithm seems to be more physically robust in constructing parent/child relationships and boundness. All of the instructions are in the README of the code, and it's pretty straightforward and fast (probably 5-10 minutes for a 512^3 simulation with 60 outputs) to run.
I also have a visualizing script for consistent tree's output.
https://bitbucket.org/jwise77/rockstar-dot
From consistent tree's output, you can use the provided script, halo_trees_to_catalog.pl, (instructions also in the README) to convert the tree output into halo lists.
- I'm a little confused as to whether or not I have to use a
TimeSeriesData object rather than the usual single time output when instantiating the halo finder. Under "Rockstar Halo Finding" it uses TimeSeriesData, unlike the rest of the examples, but under the subheading "Output Analysis" it just uses pf. The "Output Analysis" example also doesn't call the run() method, which leads me to believe something else entirely is going on, but its not quite clear.
This actually came up recently. It's best to supply a TimeSeriesData object. Here's the link to the email for more details.
http://lists.spacepope.org/pipermail/yt-users-spacepope. org/2013-August/003845.html
Cheers, John
-- John Wise Assistant Professor of Physics Center for Relativistic Astrophysics, Georgia Tech http://cosmo.gatech.edu _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org

Are you tied to using rockstar? I know yt works well with HOP, PHOP, and FOF without many problems...
On Mon, Nov 4, 2013 at 3:50 PM, Hilary Egan hilaryye@gmail.com wrote:
Hi all,
I'm still having trouble getting rockstar to run on my dataset. I've moved my data to a different super computer that uses the mpirun command rather than kraken's aprun so that I could make sure there aren't any infiniband issues, but Im still seeing similar issues. I've also determined I can run the halo finder just fine on a smaller test dataset, which is leading me to believe that its some sort of memory issue, but I can't quite figure out how I would go about fixing it. I've tried playing with the number of readers and the number of nodes I'm running on, to no avail. For reference, the dataset is a 1024^3 unigrid enzo run. If anyone has any suggestions, I'd love to hear them!
Thanks! Hilary
Script: http://paste.yt-project.org/show/4025/
Error message: http://paste.yt-project.org/show/4024/
On Wed, Oct 30, 2013 at 8:50 AM, John Wise jwise@physics.gatech.eduwrote:
Hi Hilary,
On 10/29/2013 12:30 PM, Hilary Egan wrote:
I'm quite confused on a number of points related to running the rockstar
halo finder, so I hope its alright that I put all these questions into this one email!
It's no problem at all to include all of your questions in a single email. It's probably better this way!
- I can't seem to run the rockstar halo finder at all without getting
this error followed by a segmentation fault and crash.
[Warning] Network IO Failure (PID XXXXXX): Connection reset by peer [Network] Packet receive retry count at: 1
It sort of seems like this issue (http://lists.spacepope.org/htdig.cgi/yt-dev-spacepope. org/2012-November/002681.html) but I couldn't really figure out what the resolution was from the thread. Im attempting to run this on kraken and it doesn't matter if I use a single compute node or multiple, I get the same error. (I hope this isn't the infiniband issue the docs warned about, I couldn't figure out if that is how kraken is connected and I got an error that the suggested flag doesn't exist so I didn't press the issue.)
I haven't seen that error before, but I still have to specific to *not* run on infiniband when running Rockstar on a single node. With OpenMPI, you would use "mpirun -n 32 --mca btl ^openib ...", but I haven't done this on kraken with their aprun but hopefully it's easily accompolished!
- Whenever I finally do get the halo finder to work, I need the results
to be in a form that the merger tree can use. It seems as though the MergerTree needs the results in the same form as the other halo finders give, so would getting the halo list and then dumping it as usual be the appropriate strategy? Ie:
rh.run() halo_list = rh.halo_list()
halo_list.dump('MergerHalos')
2.5. The docs sort of give mixed messages on whether or not I could just be calling MergerTree with the argument halo_finder_function = RockstarHaloFinder. At this point I've pretty thoroughly convinced myself that I can't, but it would be nice if that was clarified. (Just a thoroughly overwhelmed new user's perspective!)
I'm not sure whether you can use yt's merger tree code with the Rockstar halos. I haven't tried.
However, I've used Consistent Trees
https://code.google.com/p/consistent-trees/
with Rockstar's halo lists, which is also written by Peter Behroozi. I've chosen this route because the algorithm seems to be more physically robust in constructing parent/child relationships and boundness. All of the instructions are in the README of the code, and it's pretty straightforward and fast (probably 5-10 minutes for a 512^3 simulation with 60 outputs) to run.
I also have a visualizing script for consistent tree's output.
https://bitbucket.org/jwise77/rockstar-dot
From consistent tree's output, you can use the provided script, halo_trees_to_catalog.pl, (instructions also in the README) to convert the tree output into halo lists.
- I'm a little confused as to whether or not I have to use a
TimeSeriesData object rather than the usual single time output when instantiating the halo finder. Under "Rockstar Halo Finding" it uses TimeSeriesData, unlike the rest of the examples, but under the subheading "Output Analysis" it just uses pf. The "Output Analysis" example also doesn't call the run() method, which leads me to believe something else entirely is going on, but its not quite clear.
This actually came up recently. It's best to supply a TimeSeriesData object. Here's the link to the email for more details.
http://lists.spacepope.org/pipermail/yt-users-spacepope. org/2013-August/003845.html
Cheers, John
-- John Wise Assistant Professor of Physics Center for Relativistic Astrophysics, Georgia Tech http://cosmo.gatech.edu _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org

Cameron -- My experience has been that HOP/FOF are slow on datasets this large, and PHOP uses a large amount of memory. Anyways, regardless of that we should figure out what is going on with rockstar given its use in the community.
Hilary -- Just to be clear, you are sending the infiniband-disabling flags to this last run, correct?
On Mon, Nov 4, 2013 at 4:36 PM, Cameron Hummels chummels@gmail.com wrote:
Are you tied to using rockstar? I know yt works well with HOP, PHOP, and FOF without many problems...
On Mon, Nov 4, 2013 at 3:50 PM, Hilary Egan hilaryye@gmail.com wrote:
Hi all,
I'm still having trouble getting rockstar to run on my dataset. I've moved my data to a different super computer that uses the mpirun command rather than kraken's aprun so that I could make sure there aren't any infiniband issues, but Im still seeing similar issues. I've also determined I can run the halo finder just fine on a smaller test dataset, which is leading me to believe that its some sort of memory issue, but I can't quite figure out how I would go about fixing it. I've tried playing with the number of readers and the number of nodes I'm running on, to no avail. For reference, the dataset is a 1024^3 unigrid enzo run. If anyone has any suggestions, I'd love to hear them!
Thanks! Hilary
Script: http://paste.yt-project.org/show/4025/
Error message: http://paste.yt-project.org/show/4024/
On Wed, Oct 30, 2013 at 8:50 AM, John Wise jwise@physics.gatech.eduwrote:
Hi Hilary,
On 10/29/2013 12:30 PM, Hilary Egan wrote:
I'm quite confused on a number of points related to running the rockstar
halo finder, so I hope its alright that I put all these questions into this one email!
It's no problem at all to include all of your questions in a single email. It's probably better this way!
- I can't seem to run the rockstar halo finder at all without getting
this error followed by a segmentation fault and crash.
[Warning] Network IO Failure (PID XXXXXX): Connection reset by peer [Network] Packet receive retry count at: 1
It sort of seems like this issue (http://lists.spacepope.org/htdig.cgi/yt-dev-spacepope. org/2012-November/002681.html) but I couldn't really figure out what the resolution was from the thread. Im attempting to run this on kraken and it doesn't matter if I use a single compute node or multiple, I get the same error. (I hope this isn't the infiniband issue the docs warned about, I couldn't figure out if that is how kraken is connected and I got an error that the suggested flag doesn't exist so I didn't press the issue.)
I haven't seen that error before, but I still have to specific to *not* run on infiniband when running Rockstar on a single node. With OpenMPI, you would use "mpirun -n 32 --mca btl ^openib ...", but I haven't done this on kraken with their aprun but hopefully it's easily accompolished!
- Whenever I finally do get the halo finder to work, I need the results
to be in a form that the merger tree can use. It seems as though the MergerTree needs the results in the same form as the other halo finders give, so would getting the halo list and then dumping it as usual be the appropriate strategy? Ie:
rh.run() halo_list = rh.halo_list()
halo_list.dump('MergerHalos')
2.5. The docs sort of give mixed messages on whether or not I could just be calling MergerTree with the argument halo_finder_function = RockstarHaloFinder. At this point I've pretty thoroughly convinced myself that I can't, but it would be nice if that was clarified. (Just a thoroughly overwhelmed new user's perspective!)
I'm not sure whether you can use yt's merger tree code with the Rockstar halos. I haven't tried.
However, I've used Consistent Trees
https://code.google.com/p/consistent-trees/
with Rockstar's halo lists, which is also written by Peter Behroozi. I've chosen this route because the algorithm seems to be more physically robust in constructing parent/child relationships and boundness. All of the instructions are in the README of the code, and it's pretty straightforward and fast (probably 5-10 minutes for a 512^3 simulation with 60 outputs) to run.
I also have a visualizing script for consistent tree's output.
https://bitbucket.org/jwise77/rockstar-dot
From consistent tree's output, you can use the provided script, halo_trees_to_catalog.pl, (instructions also in the README) to convert the tree output into halo lists.
- I'm a little confused as to whether or not I have to use a
TimeSeriesData object rather than the usual single time output when instantiating the halo finder. Under "Rockstar Halo Finding" it uses TimeSeriesData, unlike the rest of the examples, but under the subheading "Output Analysis" it just uses pf. The "Output Analysis" example also doesn't call the run() method, which leads me to believe something else entirely is going on, but its not quite clear.
This actually came up recently. It's best to supply a TimeSeriesData object. Here's the link to the email for more details.
http://lists.spacepope.org/pipermail/yt-users-spacepope. org/2013-August/003845.html
Cheers, John
-- John Wise Assistant Professor of Physics Center for Relativistic Astrophysics, Georgia Tech http://cosmo.gatech.edu _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
-- Cameron Hummels Postdoctoral Researcher Steward Observatory University of Arizona http://chummels.org
yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org

Sam: Yep, I am definitely sending the infiniband-disabling flags and running it as follows
mpirun -np 48 --mca btl ^openib python halos.py --parallel
Cameron: I had originally switched to rockstar from HOP because I couldn't get it to run on kraken due to (very clear) out of memory issues. I suppose that since I've switched supercomputers anyway, I might as well play around with HOP again.
On Mon, Nov 4, 2013 at 5:57 PM, Sam Skillman samskillman@gmail.com wrote:
Cameron -- My experience has been that HOP/FOF are slow on datasets this large, and PHOP uses a large amount of memory. Anyways, regardless of that we should figure out what is going on with rockstar given its use in the community.
Hilary -- Just to be clear, you are sending the infiniband-disabling flags to this last run, correct?
On Mon, Nov 4, 2013 at 4:36 PM, Cameron Hummels chummels@gmail.comwrote:
Are you tied to using rockstar? I know yt works well with HOP, PHOP, and FOF without many problems...
On Mon, Nov 4, 2013 at 3:50 PM, Hilary Egan hilaryye@gmail.com wrote:
Hi all,
I'm still having trouble getting rockstar to run on my dataset. I've moved my data to a different super computer that uses the mpirun command rather than kraken's aprun so that I could make sure there aren't any infiniband issues, but Im still seeing similar issues. I've also determined I can run the halo finder just fine on a smaller test dataset, which is leading me to believe that its some sort of memory issue, but I can't quite figure out how I would go about fixing it. I've tried playing with the number of readers and the number of nodes I'm running on, to no avail. For reference, the dataset is a 1024^3 unigrid enzo run. If anyone has any suggestions, I'd love to hear them!
Thanks! Hilary
Script: http://paste.yt-project.org/show/4025/
Error message: http://paste.yt-project.org/show/4024/
On Wed, Oct 30, 2013 at 8:50 AM, John Wise jwise@physics.gatech.eduwrote:
Hi Hilary,
On 10/29/2013 12:30 PM, Hilary Egan wrote:
I'm quite confused on a number of points related to running the
rockstar halo finder, so I hope its alright that I put all these questions into this one email!
It's no problem at all to include all of your questions in a single email. It's probably better this way!
- I can't seem to run the rockstar halo finder at all without getting
this error followed by a segmentation fault and crash.
[Warning] Network IO Failure (PID XXXXXX): Connection reset by peer [Network] Packet receive retry count at: 1
It sort of seems like this issue (http://lists.spacepope.org/htdig.cgi/yt-dev-spacepope. org/2012-November/002681.html) but I couldn't really figure out what the resolution was from the thread. Im attempting to run this on kraken and it doesn't matter if I use a single compute node or multiple, I get the same error. (I hope this isn't the infiniband issue the docs warned about, I couldn't figure out if that is how kraken is connected and I got an error that the suggested flag doesn't exist so I didn't press the issue.)
I haven't seen that error before, but I still have to specific to *not* run on infiniband when running Rockstar on a single node. With OpenMPI, you would use "mpirun -n 32 --mca btl ^openib ...", but I haven't done this on kraken with their aprun but hopefully it's easily accompolished!
- Whenever I finally do get the halo finder to work, I need the
results to be in a form that the merger tree can use. It seems as though the MergerTree needs the results in the same form as the other halo finders give, so would getting the halo list and then dumping it as usual be the appropriate strategy? Ie:
rh.run() halo_list = rh.halo_list()
halo_list.dump('MergerHalos')
2.5. The docs sort of give mixed messages on whether or not I could just be calling MergerTree with the argument halo_finder_function = RockstarHaloFinder. At this point I've pretty thoroughly convinced myself that I can't, but it would be nice if that was clarified. (Just a thoroughly overwhelmed new user's perspective!)
I'm not sure whether you can use yt's merger tree code with the Rockstar halos. I haven't tried.
However, I've used Consistent Trees
https://code.google.com/p/consistent-trees/
with Rockstar's halo lists, which is also written by Peter Behroozi. I've chosen this route because the algorithm seems to be more physically robust in constructing parent/child relationships and boundness. All of the instructions are in the README of the code, and it's pretty straightforward and fast (probably 5-10 minutes for a 512^3 simulation with 60 outputs) to run.
I also have a visualizing script for consistent tree's output.
https://bitbucket.org/jwise77/rockstar-dot
From consistent tree's output, you can use the provided script, halo_trees_to_catalog.pl, (instructions also in the README) to convert the tree output into halo lists.
- I'm a little confused as to whether or not I have to use a
TimeSeriesData object rather than the usual single time output when instantiating the halo finder. Under "Rockstar Halo Finding" it uses TimeSeriesData, unlike the rest of the examples, but under the subheading "Output Analysis" it just uses pf. The "Output Analysis" example also doesn't call the run() method, which leads me to believe something else entirely is going on, but its not quite clear.
This actually came up recently. It's best to supply a TimeSeriesData object. Here's the link to the email for more details.
http://lists.spacepope.org/pipermail/yt-users-spacepope. org/2013-August/003845.html
Cheers, John
-- John Wise Assistant Professor of Physics Center for Relativistic Astrophysics, Georgia Tech http://cosmo.gatech.edu _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
-- Cameron Hummels Postdoctoral Researcher Steward Observatory University of Arizona http://chummels.org
yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org

I suggest using parallel HOP since it is more scalable than HOP on large datasets. With HOP you can get into trouble where the padding takes up too much overhead.
From G.S.
On Mon, Nov 4, 2013 at 5:37 PM, Hilary Egan hilaryye@gmail.com wrote:
Sam: Yep, I am definitely sending the infiniband-disabling flags and running it as follows
mpirun -np 48 --mca btl ^openib python halos.py --parallel
Cameron: I had originally switched to rockstar from HOP because I couldn't get it to run on kraken due to (very clear) out of memory issues. I suppose that since I've switched supercomputers anyway, I might as well play around with HOP again.
On Mon, Nov 4, 2013 at 5:57 PM, Sam Skillman samskillman@gmail.comwrote:
Cameron -- My experience has been that HOP/FOF are slow on datasets this large, and PHOP uses a large amount of memory. Anyways, regardless of that we should figure out what is going on with rockstar given its use in the community.
Hilary -- Just to be clear, you are sending the infiniband-disabling flags to this last run, correct?
On Mon, Nov 4, 2013 at 4:36 PM, Cameron Hummels chummels@gmail.comwrote:
Are you tied to using rockstar? I know yt works well with HOP, PHOP, and FOF without many problems...
On Mon, Nov 4, 2013 at 3:50 PM, Hilary Egan hilaryye@gmail.com wrote:
Hi all,
I'm still having trouble getting rockstar to run on my dataset. I've moved my data to a different super computer that uses the mpirun command rather than kraken's aprun so that I could make sure there aren't any infiniband issues, but Im still seeing similar issues. I've also determined I can run the halo finder just fine on a smaller test dataset, which is leading me to believe that its some sort of memory issue, but I can't quite figure out how I would go about fixing it. I've tried playing with the number of readers and the number of nodes I'm running on, to no avail. For reference, the dataset is a 1024^3 unigrid enzo run. If anyone has any suggestions, I'd love to hear them!
Thanks! Hilary
Script: http://paste.yt-project.org/show/4025/
Error message: http://paste.yt-project.org/show/4024/
On Wed, Oct 30, 2013 at 8:50 AM, John Wise jwise@physics.gatech.eduwrote:
Hi Hilary,
On 10/29/2013 12:30 PM, Hilary Egan wrote:
I'm quite confused on a number of points related to running the
rockstar halo finder, so I hope its alright that I put all these questions into this one email!
It's no problem at all to include all of your questions in a single email. It's probably better this way!
- I can't seem to run the rockstar halo finder at all without getting
this error followed by a segmentation fault and crash.
[Warning] Network IO Failure (PID XXXXXX): Connection reset by peer [Network] Packet receive retry count at: 1
It sort of seems like this issue (http://lists.spacepope.org/htdig.cgi/yt-dev-spacepope. org/2012-November/002681.html) but I couldn't really figure out what the resolution was from the thread. Im attempting to run this on kraken and it doesn't matter if I use a single compute node or multiple, I get the same error. (I hope this isn't the infiniband issue the docs warned about, I couldn't figure out if that is how kraken is connected and I got an error that the suggested flag doesn't exist so I didn't press the issue.)
I haven't seen that error before, but I still have to specific to *not* run on infiniband when running Rockstar on a single node. With OpenMPI, you would use "mpirun -n 32 --mca btl ^openib ...", but I haven't done this on kraken with their aprun but hopefully it's easily accompolished!
- Whenever I finally do get the halo finder to work, I need the
results to be in a form that the merger tree can use. It seems as though the MergerTree needs the results in the same form as the other halo finders give, so would getting the halo list and then dumping it as usual be the appropriate strategy? Ie:
rh.run() halo_list = rh.halo_list()
halo_list.dump('MergerHalos')
2.5. The docs sort of give mixed messages on whether or not I could just be calling MergerTree with the argument halo_finder_function = RockstarHaloFinder. At this point I've pretty thoroughly convinced myself that I can't, but it would be nice if that was clarified. (Just a thoroughly overwhelmed new user's perspective!)
I'm not sure whether you can use yt's merger tree code with the Rockstar halos. I haven't tried.
However, I've used Consistent Trees
https://code.google.com/p/consistent-trees/
with Rockstar's halo lists, which is also written by Peter Behroozi. I've chosen this route because the algorithm seems to be more physically robust in constructing parent/child relationships and boundness. All of the instructions are in the README of the code, and it's pretty straightforward and fast (probably 5-10 minutes for a 512^3 simulation with 60 outputs) to run.
I also have a visualizing script for consistent tree's output.
https://bitbucket.org/jwise77/rockstar-dot
From consistent tree's output, you can use the provided script, halo_trees_to_catalog.pl, (instructions also in the README) to convert the tree output into halo lists.
- I'm a little confused as to whether or not I have to use a
TimeSeriesData object rather than the usual single time output when instantiating the halo finder. Under "Rockstar Halo Finding" it uses TimeSeriesData, unlike the rest of the examples, but under the subheading "Output Analysis" it just uses pf. The "Output Analysis" example also doesn't call the run() method, which leads me to believe something else entirely is going on, but its not quite clear.
This actually came up recently. It's best to supply a TimeSeriesData object. Here's the link to the email for more details.
http://lists.spacepope.org/pipermail/yt-users-spacepope. org/2013-August/003845.html
Cheers, John
-- John Wise Assistant Professor of Physics Center for Relativistic Astrophysics, Georgia Tech http://cosmo.gatech.edu _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
-- Cameron Hummels Postdoctoral Researcher Steward Observatory University of Arizona http://chummels.org
yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org

Hi Hilary,
I think Rockstar is your best bet. Unfortunately there have been some changes in the yt/Rockstar interface that may be causing issues.
We've improved these considerably in the yt-3.0 branch, and in fact I'd recommend you give that a shot. If you are able, try using yt_analysis/yt-3.0 and the Rockstar branch that can be found here: http://bitbucket.org/MatthewTurk/rockstar . This includes a number of forward-facing changes that we're hoping to roll out as we revamp halos in the 3.x series.
-Matt
On Mon, Nov 4, 2013 at 8:37 PM, Hilary Egan hilaryye@gmail.com wrote:
Sam: Yep, I am definitely sending the infiniband-disabling flags and running it as follows
mpirun -np 48 --mca btl ^openib python halos.py --parallel
Cameron: I had originally switched to rockstar from HOP because I couldn't get it to run on kraken due to (very clear) out of memory issues. I suppose that since I've switched supercomputers anyway, I might as well play around with HOP again.
On Mon, Nov 4, 2013 at 5:57 PM, Sam Skillman samskillman@gmail.com wrote:
Cameron -- My experience has been that HOP/FOF are slow on datasets this large, and PHOP uses a large amount of memory. Anyways, regardless of that we should figure out what is going on with rockstar given its use in the community.
Hilary -- Just to be clear, you are sending the infiniband-disabling flags to this last run, correct?
On Mon, Nov 4, 2013 at 4:36 PM, Cameron Hummels chummels@gmail.com wrote:
Are you tied to using rockstar? I know yt works well with HOP, PHOP, and FOF without many problems...
On Mon, Nov 4, 2013 at 3:50 PM, Hilary Egan hilaryye@gmail.com wrote:
Hi all,
I'm still having trouble getting rockstar to run on my dataset. I've moved my data to a different super computer that uses the mpirun command rather than kraken's aprun so that I could make sure there aren't any infiniband issues, but Im still seeing similar issues. I've also determined I can run the halo finder just fine on a smaller test dataset, which is leading me to believe that its some sort of memory issue, but I can't quite figure out how I would go about fixing it. I've tried playing with the number of readers and the number of nodes I'm running on, to no avail. For reference, the dataset is a 1024^3 unigrid enzo run. If anyone has any suggestions, I'd love to hear them!
Thanks! Hilary
Script: http://paste.yt-project.org/show/4025/
Error message: http://paste.yt-project.org/show/4024/
On Wed, Oct 30, 2013 at 8:50 AM, John Wise jwise@physics.gatech.edu wrote:
Hi Hilary,
On 10/29/2013 12:30 PM, Hilary Egan wrote:
I'm quite confused on a number of points related to running the rockstar halo finder, so I hope its alright that I put all these questions into this one email!
It's no problem at all to include all of your questions in a single email. It's probably better this way!
- I can't seem to run the rockstar halo finder at all without getting
this error followed by a segmentation fault and crash.
[Warning] Network IO Failure (PID XXXXXX): Connection reset by peer [Network] Packet receive retry count at: 1
It sort of seems like this issue
(http://lists.spacepope.org/htdig.cgi/yt-dev-spacepope.org/2012-November/0026...) but I couldn't really figure out what the resolution was from the thread. Im attempting to run this on kraken and it doesn't matter if I use a single compute node or multiple, I get the same error. (I hope this isn't the infiniband issue the docs warned about, I couldn't figure out if that is how kraken is connected and I got an error that the suggested flag doesn't exist so I didn't press the issue.)
I haven't seen that error before, but I still have to specific to *not* run on infiniband when running Rockstar on a single node. With OpenMPI, you would use "mpirun -n 32 --mca btl ^openib ...", but I haven't done this on kraken with their aprun but hopefully it's easily accompolished!
- Whenever I finally do get the halo finder to work, I need the
results to be in a form that the merger tree can use. It seems as though the MergerTree needs the results in the same form as the other halo finders give, so would getting the halo list and then dumping it as usual be the appropriate strategy? Ie:
rh.run() halo_list = rh.halo_list()
halo_list.dump('MergerHalos')
2.5. The docs sort of give mixed messages on whether or not I could just be calling MergerTree with the argument halo_finder_function = RockstarHaloFinder. At this point I've pretty thoroughly convinced myself that I can't, but it would be nice if that was clarified. (Just a thoroughly overwhelmed new user's perspective!)
I'm not sure whether you can use yt's merger tree code with the Rockstar halos. I haven't tried.
However, I've used Consistent Trees
https://code.google.com/p/consistent-trees/
with Rockstar's halo lists, which is also written by Peter Behroozi. I've chosen this route because the algorithm seems to be more physically robust in constructing parent/child relationships and boundness. All of the instructions are in the README of the code, and it's pretty straightforward and fast (probably 5-10 minutes for a 512^3 simulation with 60 outputs) to run.
I also have a visualizing script for consistent tree's output.
https://bitbucket.org/jwise77/rockstar-dot
From consistent tree's output, you can use the provided script, halo_trees_to_catalog.pl, (instructions also in the README) to convert the tree output into halo lists.
- I'm a little confused as to whether or not I have to use a
TimeSeriesData object rather than the usual single time output when instantiating the halo finder. Under "Rockstar Halo Finding" it uses TimeSeriesData, unlike the rest of the examples, but under the subheading "Output Analysis" it just uses pf. The "Output Analysis" example also doesn't call the run() method, which leads me to believe something else entirely is going on, but its not quite clear.
This actually came up recently. It's best to supply a TimeSeriesData object. Here's the link to the email for more details.
http://lists.spacepope.org/pipermail/yt-users-spacepope.org/2013-August/0038...
Cheers, John
-- John Wise Assistant Professor of Physics Center for Relativistic Astrophysics, Georgia Tech http://cosmo.gatech.edu _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
-- Cameron Hummels Postdoctoral Researcher Steward Observatory University of Arizona http://chummels.org
yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org

On Tue, Oct 29, 2013 at 12:30 PM, Hilary Egan hilaryye@gmail.com wrote:
Hi all,
I'm quite confused on a number of points related to running the rockstar halo finder, so I hope its alright that I put all these questions into this one email!
- I can't seem to run the rockstar halo finder at all without getting
this error followed by a segmentation fault and crash.
[Warning] Network IO Failure (PID XXXXXX): Connection reset by peer [Network] Packet receive retry count at: 1
I know this is an old thread so this all might be futile.
I've had this before and it can be a number if network related issues. Primarily it may be that the default ports that the clients are using to communicate with the server are blocked, taken, or invalid for some reason. If everything is running on the same machine, you may be able to try using
PARALLEL_IO_SERVER_INTERFACE = lo
This will force everything to use the local loopback address (127.0.0.1). Often just waiting a few minutes for other instances to die often solves the problem. You could add 'killall rockstar' to your submission script in case there are zombie rockstar processes still running causing server issues.
I believe kraken uses SLURM in which case the following submission script might be helpful (using srun). e.g. start 128 instances and change FORK_PROCESSORS_PER_MACHINE to 1 in your cfg file.
You'll have to check the hdf5 module and change a few other things but here is a template.
#!/bin/bash #SBATCH -n 128 #SBATCH -o job.o%j #SBATCH -e job.e%j #SBATCH -t 5000 #SBATCH -p queue_name #SBATCH --mem=32gb #SBATCH -J rockstarjob --exclusive
module load -S centos6/hdf5-1.8.11_gcc-4.8.0
rsdir=/path/to/rockstar/code/ exe=/path/to/rockstar/executable cd $rsdir outdir=/path/to/output/directory/
$exe -c $rsdir/cfgs/config.cfg & #uncomment below and comment above for restarts. #$exe -c $outdir/restart.cfg & cd $outdir perl -e 'sleep 1 while (!(-e "auto-rockstar.cfg"))'
srun -n 128 $exe -c auto-rockstar.cfg
You might have already moved on by now. Hope this helps if not.
Brendan
It sort of seems like this issue ( http://lists.spacepope.org/htdig.cgi/yt-dev-spacepope.org/2012-November/0026...) but I couldn't really figure out what the resolution was from the thread. Im attempting to run this on kraken and it doesn't matter if I use a single compute node or multiple, I get the same error. (I hope this isn't the infiniband issue the docs warned about, I couldn't figure out if that is how kraken is connected and I got an error that the suggested flag doesn't exist so I didn't press the issue.)
- Whenever I finally do get the halo finder to work, I need the results
to be in a form that the merger tree can use. It seems as though the MergerTree needs the results in the same form as the other halo finders give, so would getting the halo list and then dumping it as usual be the appropriate strategy? Ie:
rh.run() halo_list = rh.halo_list()
halo_list.dump('MergerHalos')
2.5. The docs sort of give mixed messages on whether or not I could just be calling MergerTree with the argument halo_finder_function = RockstarHaloFinder. At this point I've pretty thoroughly convinced myself that I can't, but it would be nice if that was clarified. (Just a thoroughly overwhelmed new user's perspective!)
- I'm a little confused as to whether or not I have to use a
TimeSeriesData object rather than the usual single time output when instantiating the halo finder. Under "Rockstar Halo Finding" it uses TimeSeriesData, unlike the rest of the examples, but under the subheading "Output Analysis" it just uses pf. The "Output Analysis" example also doesn't call the run() method, which leads me to believe something else entirely is going on, but its not quite clear.
Thanks! -Hilary
yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
participants (7)
-
Brendan Griffen
-
Cameron Hummels
-
Geoffrey So
-
Hilary Egan
-
John Wise
-
Matthew Turk
-
Sam Skillman