Loading a large gadget snapshot

Hello! I have a gadget snapshot file with 1024^3 particles. When I tried to load it with yt, yt fails by saying that it cannot identify the file type. I did some digging and found that the way yt validates a gadget snapshot is by reading the number of particles from the header and then comparing that number to the number derived from the size of the position block. This size is 1024^3 *3*4 (the number of particles times 3 positions times 4 bytes per position). However, as I just discovered, this number is larger than the max value of an int in c, and so gadget writes a value of 0 due to the overflow. As such, I need to use gadget's unsigned long long int for the padding, but this is 8 bytes, not 4. I was just wondering if yt had some ability to detect the need to use something larger than an int? If not, I can add it. Thanks! -Jared

Hi Jared,
I know you can make the particle IDs work by specifying the dtype, but my reading of the _validate_header code is the same as yours. I think it would be a pretty straightforward change to conditionally make the 'I' read into the struct into a 64-bit variable, but it's not there now.
-Matt
On Tue, Feb 20, 2018 at 7:16 PM, Jared Coughlin Jared.W.Coughlin.29@nd.edu wrote:
Hello! I have a gadget snapshot file with 1024^3 particles. When I tried to load it with yt, yt fails by saying that it cannot identify the file type. I did some digging and found that the way yt validates a gadget snapshot is by reading the number of particles from the header and then comparing that number to the number derived from the size of the position block. This size is 1024^3 *3*4 (the number of particles times 3 positions times 4 bytes per position). However, as I just discovered, this number is larger than the max value of an int in c, and so gadget writes a value of 0 due to the overflow. As such, I need to use gadget's unsigned long long int for the padding, but this is 8 bytes, not 4. I was just wondering if yt had some ability to detect the need to use something larger than an int? If not, I can add it. Thanks! -Jared
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

Hi Jared,
Just a comment that for these really big particle simulations the way yt currently supports particle data doesn't scale terribly well, so you might run into either issues with operations taking a very long time or using too much RAM.
You may find it useful to only load in data in a subset of the simulation domain to restrict the number of particles yt is dealing with at any one time. This can be controlled at a coarse level via the "bounding_box" keyword argument to the load() function, in particular by supplying a bounding box that only covers a portion of the domain. It might also help with RAM usage to supply n_ref=16 instead of the default (32). There's more detail about what these parameters do in the docs:
http://yt-project.org/docs/3.4.1/examining/loading_data.html#gadget-data
If you'd like you could also try the "sph-viz" branch on my fork of yt at https://github.com/ngoldbaum/yt. This branch contains an implementation of the next-generation support for particle data in yt that is still under development. This version will scale much better for a dataset as big as yours.
Best,
Nathan
On Tue, Feb 20, 2018 at 7:22 PM, Matthew Turk matthewturk@gmail.com wrote:
Hi Jared,
I know you can make the particle IDs work by specifying the dtype, but my reading of the _validate_header code is the same as yours. I think it would be a pretty straightforward change to conditionally make the 'I' read into the struct into a 64-bit variable, but it's not there now.
-Matt
On Tue, Feb 20, 2018 at 7:16 PM, Jared Coughlin Jared.W.Coughlin.29@nd.edu wrote:
Hello! I have a gadget snapshot file with 1024^3 particles. When I tried
to
load it with yt, yt fails by saying that it cannot identify the file
type. I
did some digging and found that the way yt validates a gadget snapshot
is by
reading the number of particles from the header and then comparing that number to the number derived from the size of the position block. This
size
is 1024^3 *3*4 (the number of particles times 3 positions times 4 bytes
per
position). However, as I just discovered, this number is larger than the
max
value of an int in c, and so gadget writes a value of 0 due to the
overflow.
As such, I need to use gadget's unsigned long long int for the padding,
but
this is 8 bytes, not 4. I was just wondering if yt had some ability to detect the need to use something larger than an int? If not, I can add
it.
Thanks! -Jared
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org _______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

quick note -- set n_ref=64 or 128 (or bigger than 32) to reduce RAM footprint (instead of 16).
-d
On Tue, Feb 20, 2018 at 8:37 PM, Nathan Goldbaum nathan12343@gmail.com wrote:
Hi Jared,
Just a comment that for these really big particle simulations the way yt currently supports particle data doesn't scale terribly well, so you might run into either issues with operations taking a very long time or using too much RAM.
You may find it useful to only load in data in a subset of the simulation domain to restrict the number of particles yt is dealing with at any one time. This can be controlled at a coarse level via the "bounding_box" keyword argument to the load() function, in particular by supplying a bounding box that only covers a portion of the domain. It might also help with RAM usage to supply n_ref=16 instead of the default (32). There's more detail about what these parameters do in the docs:
http://yt-project.org/docs/3.4.1/examining/loading_data.html#gadget-data
If you'd like you could also try the "sph-viz" branch on my fork of yt at https://github.com/ngoldbaum/yt. This branch contains an implementation of the next-generation support for particle data in yt that is still under development. This version will scale much better for a dataset as big as yours.
Best,
Nathan
On Tue, Feb 20, 2018 at 7:22 PM, Matthew Turk matthewturk@gmail.com wrote:
Hi Jared,
I know you can make the particle IDs work by specifying the dtype, but my reading of the _validate_header code is the same as yours. I think it would be a pretty straightforward change to conditionally make the 'I' read into the struct into a 64-bit variable, but it's not there now.
-Matt
On Tue, Feb 20, 2018 at 7:16 PM, Jared Coughlin Jared.W.Coughlin.29@nd.edu wrote:
Hello! I have a gadget snapshot file with 1024^3 particles. When I
tried to
load it with yt, yt fails by saying that it cannot identify the file
type. I
did some digging and found that the way yt validates a gadget snapshot
is by
reading the number of particles from the header and then comparing that number to the number derived from the size of the position block. This
size
is 1024^3 *3*4 (the number of particles times 3 positions times 4 bytes
per
position). However, as I just discovered, this number is larger than
the max
value of an int in c, and so gadget writes a value of 0 due to the
overflow.
As such, I need to use gadget's unsigned long long int for the padding,
but
this is 8 bytes, not 4. I was just wondering if yt had some ability to detect the need to use something larger than an int? If not, I can add
it.
Thanks! -Jared
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org _______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

Yes, apologies, I got that backwards. A larger n_ref means there needs be more particles per octree leaf zone to trigger refinement on that zone, so the you end up with fewer octree leaf nodes over all and you need less memory.
-Nathan
On Tue, Feb 20, 2018 at 7:53 PM, Desika Narayanan < desika.narayanan@gmail.com> wrote:
quick note -- set n_ref=64 or 128 (or bigger than 32) to reduce RAM footprint (instead of 16).
-d
On Tue, Feb 20, 2018 at 8:37 PM, Nathan Goldbaum nathan12343@gmail.com wrote:
Hi Jared,
Just a comment that for these really big particle simulations the way yt currently supports particle data doesn't scale terribly well, so you might run into either issues with operations taking a very long time or using too much RAM.
You may find it useful to only load in data in a subset of the simulation domain to restrict the number of particles yt is dealing with at any one time. This can be controlled at a coarse level via the "bounding_box" keyword argument to the load() function, in particular by supplying a bounding box that only covers a portion of the domain. It might also help with RAM usage to supply n_ref=16 instead of the default (32). There's more detail about what these parameters do in the docs:
http://yt-project.org/docs/3.4.1/examining/loading_data.html#gadget-data
If you'd like you could also try the "sph-viz" branch on my fork of yt at https://github.com/ngoldbaum/yt. This branch contains an implementation of the next-generation support for particle data in yt that is still under development. This version will scale much better for a dataset as big as yours.
Best,
Nathan
On Tue, Feb 20, 2018 at 7:22 PM, Matthew Turk matthewturk@gmail.com wrote:
Hi Jared,
I know you can make the particle IDs work by specifying the dtype, but my reading of the _validate_header code is the same as yours. I think it would be a pretty straightforward change to conditionally make the 'I' read into the struct into a 64-bit variable, but it's not there now.
-Matt
On Tue, Feb 20, 2018 at 7:16 PM, Jared Coughlin Jared.W.Coughlin.29@nd.edu wrote:
Hello! I have a gadget snapshot file with 1024^3 particles. When I
tried to
load it with yt, yt fails by saying that it cannot identify the file
type. I
did some digging and found that the way yt validates a gadget snapshot
is by
reading the number of particles from the header and then comparing that number to the number derived from the size of the position block. This
size
is 1024^3 *3*4 (the number of particles times 3 positions times 4
bytes per
position). However, as I just discovered, this number is larger than
the max
value of an int in c, and so gadget writes a value of 0 due to the
overflow.
As such, I need to use gadget's unsigned long long int for the
padding, but
this is 8 bytes, not 4. I was just wondering if yt had some ability to detect the need to use something larger than an int? If not, I can add
it.
Thanks! -Jared
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org _______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

This just occurred to me: if my block paddings are unsigned long long int (which are 8 bytes) in order to hold a large enough number instead of the standard int (which is 4 bytes), I'm assuming this will screw up how yt loads the data after getting past file validation?
On Tue, Feb 20, 2018 at 8:57 PM, Nathan Goldbaum nathan12343@gmail.com wrote:
Yes, apologies, I got that backwards. A larger n_ref means there needs be more particles per octree leaf zone to trigger refinement on that zone, so the you end up with fewer octree leaf nodes over all and you need less memory.
-Nathan
On Tue, Feb 20, 2018 at 7:53 PM, Desika Narayanan < desika.narayanan@gmail.com> wrote:
quick note -- set n_ref=64 or 128 (or bigger than 32) to reduce RAM footprint (instead of 16).
-d
On Tue, Feb 20, 2018 at 8:37 PM, Nathan Goldbaum nathan12343@gmail.com wrote:
Hi Jared,
Just a comment that for these really big particle simulations the way yt currently supports particle data doesn't scale terribly well, so you might run into either issues with operations taking a very long time or using too much RAM.
You may find it useful to only load in data in a subset of the simulation domain to restrict the number of particles yt is dealing with at any one time. This can be controlled at a coarse level via the "bounding_box" keyword argument to the load() function, in particular by supplying a bounding box that only covers a portion of the domain. It might also help with RAM usage to supply n_ref=16 instead of the default (32). There's more detail about what these parameters do in the docs:
http://yt-project.org/docs/3.4.1/examining/loading_data.html#gadget-data
If you'd like you could also try the "sph-viz" branch on my fork of yt at https://github.com/ngoldbaum/yt. This branch contains an implementation of the next-generation support for particle data in yt that is still under development. This version will scale much better for a dataset as big as yours.
Best,
Nathan
On Tue, Feb 20, 2018 at 7:22 PM, Matthew Turk matthewturk@gmail.com wrote:
Hi Jared,
I know you can make the particle IDs work by specifying the dtype, but my reading of the _validate_header code is the same as yours. I think it would be a pretty straightforward change to conditionally make the 'I' read into the struct into a 64-bit variable, but it's not there now.
-Matt
On Tue, Feb 20, 2018 at 7:16 PM, Jared Coughlin Jared.W.Coughlin.29@nd.edu wrote:
Hello! I have a gadget snapshot file with 1024^3 particles. When I
tried to
load it with yt, yt fails by saying that it cannot identify the file
type. I
did some digging and found that the way yt validates a gadget
snapshot is by
reading the number of particles from the header and then comparing
that
number to the number derived from the size of the position block.
This size
is 1024^3 *3*4 (the number of particles times 3 positions times 4
bytes per
position). However, as I just discovered, this number is larger than
the max
value of an int in c, and so gadget writes a value of 0 due to the
overflow.
As such, I need to use gadget's unsigned long long int for the
padding, but
this is 8 bytes, not 4. I was just wondering if yt had some ability to detect the need to use something larger than an int? If not, I can
add it.
Thanks! -Jared
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org _______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

You'll need to specify a custom header specification. There's some discussion about this in the docs:
http://yt-project.org/docs/dev/examining/loading_data.html#header-specificat...
By the way, if you want to bypass the header validation entirely, you could do:
from yt.frontends.gadget.data_structures import GadgetDataset
ds = GadgetDataset(path)
-Nathan
On Wed, Feb 21, 2018 at 2:09 PM, Jared Coughlin Jared.W.Coughlin.29@nd.edu wrote:
This just occurred to me: if my block paddings are unsigned long long int (which are 8 bytes) in order to hold a large enough number instead of the standard int (which is 4 bytes), I'm assuming this will screw up how yt loads the data after getting past file validation?
On Tue, Feb 20, 2018 at 8:57 PM, Nathan Goldbaum nathan12343@gmail.com wrote:
Yes, apologies, I got that backwards. A larger n_ref means there needs be more particles per octree leaf zone to trigger refinement on that zone, so the you end up with fewer octree leaf nodes over all and you need less memory.
-Nathan
On Tue, Feb 20, 2018 at 7:53 PM, Desika Narayanan < desika.narayanan@gmail.com> wrote:
quick note -- set n_ref=64 or 128 (or bigger than 32) to reduce RAM footprint (instead of 16).
-d
On Tue, Feb 20, 2018 at 8:37 PM, Nathan Goldbaum nathan12343@gmail.com wrote:
Hi Jared,
Just a comment that for these really big particle simulations the way yt currently supports particle data doesn't scale terribly well, so you might run into either issues with operations taking a very long time or using too much RAM.
You may find it useful to only load in data in a subset of the simulation domain to restrict the number of particles yt is dealing with at any one time. This can be controlled at a coarse level via the "bounding_box" keyword argument to the load() function, in particular by supplying a bounding box that only covers a portion of the domain. It might also help with RAM usage to supply n_ref=16 instead of the default (32). There's more detail about what these parameters do in the docs:
http://yt-project.org/docs/3.4.1/examining/loading_data.html #gadget-data
If you'd like you could also try the "sph-viz" branch on my fork of yt at https://github.com/ngoldbaum/yt. This branch contains an implementation of the next-generation support for particle data in yt that is still under development. This version will scale much better for a dataset as big as yours.
Best,
Nathan
On Tue, Feb 20, 2018 at 7:22 PM, Matthew Turk matthewturk@gmail.com wrote:
Hi Jared,
I know you can make the particle IDs work by specifying the dtype, but my reading of the _validate_header code is the same as yours. I think it would be a pretty straightforward change to conditionally make the 'I' read into the struct into a 64-bit variable, but it's not there now.
-Matt
On Tue, Feb 20, 2018 at 7:16 PM, Jared Coughlin Jared.W.Coughlin.29@nd.edu wrote:
Hello! I have a gadget snapshot file with 1024^3 particles. When I
tried to
load it with yt, yt fails by saying that it cannot identify the file
type. I
did some digging and found that the way yt validates a gadget
snapshot is by
reading the number of particles from the header and then comparing
that
number to the number derived from the size of the position block.
This size
is 1024^3 *3*4 (the number of particles times 3 positions times 4
bytes per
position). However, as I just discovered, this number is larger than
the max
value of an int in c, and so gadget writes a value of 0 due to the
overflow.
As such, I need to use gadget's unsigned long long int for the
padding, but
this is 8 bytes, not 4. I was just wondering if yt had some ability
to
detect the need to use something larger than an int? If not, I can
add it.
Thanks! -Jared
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org _______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
participants (4)
-
Desika Narayanan
-
Jared Coughlin
-
Matthew Turk
-
Nathan Goldbaum