Loading a large gadget snapshot
Hello! I have a gadget snapshot file with 1024^3 particles. When I tried to load it with yt, yt fails by saying that it cannot identify the file type. I did some digging and found that the way yt validates a gadget snapshot is by reading the number of particles from the header and then comparing that number to the number derived from the size of the position block. This size is 1024^3 *3*4 (the number of particles times 3 positions times 4 bytes per position). However, as I just discovered, this number is larger than the max value of an int in c, and so gadget writes a value of 0 due to the overflow. As such, I need to use gadget's unsigned long long int for the padding, but this is 8 bytes, not 4. I was just wondering if yt had some ability to detect the need to use something larger than an int? If not, I can add it. Thanks! -Jared
Hi Jared,
I know you can make the particle IDs work by specifying the dtype, but
my reading of the _validate_header code is the same as yours. I think
it would be a pretty straightforward change to conditionally make the
'I' read into the struct into a 64-bit variable, but it's not there
now.
-Matt
On Tue, Feb 20, 2018 at 7:16 PM, Jared Coughlin
Hello! I have a gadget snapshot file with 1024^3 particles. When I tried to load it with yt, yt fails by saying that it cannot identify the file type. I did some digging and found that the way yt validates a gadget snapshot is by reading the number of particles from the header and then comparing that number to the number derived from the size of the position block. This size is 1024^3 *3*4 (the number of particles times 3 positions times 4 bytes per position). However, as I just discovered, this number is larger than the max value of an int in c, and so gadget writes a value of 0 due to the overflow. As such, I need to use gadget's unsigned long long int for the padding, but this is 8 bytes, not 4. I was just wondering if yt had some ability to detect the need to use something larger than an int? If not, I can add it. Thanks! -Jared
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
Hi Jared,
Just a comment that for these really big particle simulations the way yt
currently supports particle data doesn't scale terribly well, so you might
run into either issues with operations taking a very long time or using too
much RAM.
You may find it useful to only load in data in a subset of the simulation
domain to restrict the number of particles yt is dealing with at any one
time. This can be controlled at a coarse level via the "bounding_box"
keyword argument to the load() function, in particular by supplying a
bounding box that only covers a portion of the domain. It might also help
with RAM usage to supply n_ref=16 instead of the default (32). There's more
detail about what these parameters do in the docs:
http://yt-project.org/docs/3.4.1/examining/loading_data.html#gadget-data
If you'd like you could also try the "sph-viz" branch on my fork of yt at
https://github.com/ngoldbaum/yt. This branch contains an implementation of
the next-generation support for particle data in yt that is still under
development. This version will scale much better for a dataset as big as
yours.
Best,
Nathan
On Tue, Feb 20, 2018 at 7:22 PM, Matthew Turk
Hi Jared,
I know you can make the particle IDs work by specifying the dtype, but my reading of the _validate_header code is the same as yours. I think it would be a pretty straightforward change to conditionally make the 'I' read into the struct into a 64-bit variable, but it's not there now.
-Matt
Hello! I have a gadget snapshot file with 1024^3 particles. When I tried to load it with yt, yt fails by saying that it cannot identify the file type. I did some digging and found that the way yt validates a gadget snapshot is by reading the number of particles from the header and then comparing that number to the number derived from the size of the position block. This size is 1024^3 *3*4 (the number of particles times 3 positions times 4 bytes
On Tue, Feb 20, 2018 at 7:16 PM, Jared Coughlin
wrote: per position). However, as I just discovered, this number is larger than the max value of an int in c, and so gadget writes a value of 0 due to the overflow. As such, I need to use gadget's unsigned long long int for the padding, but this is 8 bytes, not 4. I was just wondering if yt had some ability to detect the need to use something larger than an int? If not, I can add it. Thanks! -Jared
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org _______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
quick note -- set n_ref=64 or 128 (or bigger than 32) to reduce RAM
footprint (instead of 16).
-d
On Tue, Feb 20, 2018 at 8:37 PM, Nathan Goldbaum
Hi Jared,
Just a comment that for these really big particle simulations the way yt currently supports particle data doesn't scale terribly well, so you might run into either issues with operations taking a very long time or using too much RAM.
You may find it useful to only load in data in a subset of the simulation domain to restrict the number of particles yt is dealing with at any one time. This can be controlled at a coarse level via the "bounding_box" keyword argument to the load() function, in particular by supplying a bounding box that only covers a portion of the domain. It might also help with RAM usage to supply n_ref=16 instead of the default (32). There's more detail about what these parameters do in the docs:
http://yt-project.org/docs/3.4.1/examining/loading_data.html#gadget-data
If you'd like you could also try the "sph-viz" branch on my fork of yt at https://github.com/ngoldbaum/yt. This branch contains an implementation of the next-generation support for particle data in yt that is still under development. This version will scale much better for a dataset as big as yours.
Best,
Nathan
On Tue, Feb 20, 2018 at 7:22 PM, Matthew Turk
wrote: Hi Jared,
I know you can make the particle IDs work by specifying the dtype, but my reading of the _validate_header code is the same as yours. I think it would be a pretty straightforward change to conditionally make the 'I' read into the struct into a 64-bit variable, but it's not there now.
-Matt
Hello! I have a gadget snapshot file with 1024^3 particles. When I
load it with yt, yt fails by saying that it cannot identify the file type. I did some digging and found that the way yt validates a gadget snapshot is by reading the number of particles from the header and then comparing that number to the number derived from the size of the position block. This size is 1024^3 *3*4 (the number of particles times 3 positions times 4 bytes
position). However, as I just discovered, this number is larger than
On Tue, Feb 20, 2018 at 7:16 PM, Jared Coughlin
wrote: tried to per the max value of an int in c, and so gadget writes a value of 0 due to the overflow. As such, I need to use gadget's unsigned long long int for the padding, but this is 8 bytes, not 4. I was just wondering if yt had some ability to detect the need to use something larger than an int? If not, I can add it. Thanks! -Jared
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org _______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
Yes, apologies, I got that backwards. A larger n_ref means there needs be more particles per octree leaf zone to trigger refinement on that zone, so the you end up with fewer octree leaf nodes over all and you need less memory. -Nathan On Tue, Feb 20, 2018 at 7:53 PM, Desika Narayanan < desika.narayanan@gmail.com> wrote:
quick note -- set n_ref=64 or 128 (or bigger than 32) to reduce RAM footprint (instead of 16).
-d
On Tue, Feb 20, 2018 at 8:37 PM, Nathan Goldbaum
wrote: Hi Jared,
Just a comment that for these really big particle simulations the way yt currently supports particle data doesn't scale terribly well, so you might run into either issues with operations taking a very long time or using too much RAM.
You may find it useful to only load in data in a subset of the simulation domain to restrict the number of particles yt is dealing with at any one time. This can be controlled at a coarse level via the "bounding_box" keyword argument to the load() function, in particular by supplying a bounding box that only covers a portion of the domain. It might also help with RAM usage to supply n_ref=16 instead of the default (32). There's more detail about what these parameters do in the docs:
http://yt-project.org/docs/3.4.1/examining/loading_data.html#gadget-data
If you'd like you could also try the "sph-viz" branch on my fork of yt at https://github.com/ngoldbaum/yt. This branch contains an implementation of the next-generation support for particle data in yt that is still under development. This version will scale much better for a dataset as big as yours.
Best,
Nathan
On Tue, Feb 20, 2018 at 7:22 PM, Matthew Turk
wrote: Hi Jared,
I know you can make the particle IDs work by specifying the dtype, but my reading of the _validate_header code is the same as yours. I think it would be a pretty straightforward change to conditionally make the 'I' read into the struct into a 64-bit variable, but it's not there now.
-Matt
Hello! I have a gadget snapshot file with 1024^3 particles. When I
load it with yt, yt fails by saying that it cannot identify the file type. I did some digging and found that the way yt validates a gadget snapshot is by reading the number of particles from the header and then comparing that number to the number derived from the size of the position block. This size is 1024^3 *3*4 (the number of particles times 3 positions times 4 bytes per position). However, as I just discovered, this number is larger than
value of an int in c, and so gadget writes a value of 0 due to the overflow. As such, I need to use gadget's unsigned long long int for the
On Tue, Feb 20, 2018 at 7:16 PM, Jared Coughlin
wrote: tried to the max padding, but this is 8 bytes, not 4. I was just wondering if yt had some ability to detect the need to use something larger than an int? If not, I can add it. Thanks! -Jared
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org _______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
This just occurred to me: if my block paddings are unsigned long long int
(which are 8 bytes) in order to hold a large enough number instead of the
standard int (which is 4 bytes), I'm assuming this will screw up how yt
loads the data after getting past file validation?
On Tue, Feb 20, 2018 at 8:57 PM, Nathan Goldbaum
Yes, apologies, I got that backwards. A larger n_ref means there needs be more particles per octree leaf zone to trigger refinement on that zone, so the you end up with fewer octree leaf nodes over all and you need less memory.
-Nathan
On Tue, Feb 20, 2018 at 7:53 PM, Desika Narayanan < desika.narayanan@gmail.com> wrote:
quick note -- set n_ref=64 or 128 (or bigger than 32) to reduce RAM footprint (instead of 16).
-d
On Tue, Feb 20, 2018 at 8:37 PM, Nathan Goldbaum
wrote: Hi Jared,
Just a comment that for these really big particle simulations the way yt currently supports particle data doesn't scale terribly well, so you might run into either issues with operations taking a very long time or using too much RAM.
You may find it useful to only load in data in a subset of the simulation domain to restrict the number of particles yt is dealing with at any one time. This can be controlled at a coarse level via the "bounding_box" keyword argument to the load() function, in particular by supplying a bounding box that only covers a portion of the domain. It might also help with RAM usage to supply n_ref=16 instead of the default (32). There's more detail about what these parameters do in the docs:
http://yt-project.org/docs/3.4.1/examining/loading_data.html#gadget-data
If you'd like you could also try the "sph-viz" branch on my fork of yt at https://github.com/ngoldbaum/yt. This branch contains an implementation of the next-generation support for particle data in yt that is still under development. This version will scale much better for a dataset as big as yours.
Best,
Nathan
On Tue, Feb 20, 2018 at 7:22 PM, Matthew Turk
wrote: Hi Jared,
I know you can make the particle IDs work by specifying the dtype, but my reading of the _validate_header code is the same as yours. I think it would be a pretty straightforward change to conditionally make the 'I' read into the struct into a 64-bit variable, but it's not there now.
-Matt
Hello! I have a gadget snapshot file with 1024^3 particles. When I
load it with yt, yt fails by saying that it cannot identify the file type. I did some digging and found that the way yt validates a gadget snapshot is by reading the number of particles from the header and then comparing
number to the number derived from the size of the position block. This size is 1024^3 *3*4 (the number of particles times 3 positions times 4 bytes per position). However, as I just discovered, this number is larger than
value of an int in c, and so gadget writes a value of 0 due to the overflow. As such, I need to use gadget's unsigned long long int for the
On Tue, Feb 20, 2018 at 7:16 PM, Jared Coughlin
wrote: tried to that the max padding, but this is 8 bytes, not 4. I was just wondering if yt had some ability to detect the need to use something larger than an int? If not, I can add it. Thanks! -Jared
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org _______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
You'll need to specify a custom header specification. There's some
discussion about this in the docs:
http://yt-project.org/docs/dev/examining/loading_data.html#header-specificat...
By the way, if you want to bypass the header validation entirely, you could
do:
from yt.frontends.gadget.data_structures import GadgetDataset
ds = GadgetDataset(path)
-Nathan
On Wed, Feb 21, 2018 at 2:09 PM, Jared Coughlin
This just occurred to me: if my block paddings are unsigned long long int (which are 8 bytes) in order to hold a large enough number instead of the standard int (which is 4 bytes), I'm assuming this will screw up how yt loads the data after getting past file validation?
On Tue, Feb 20, 2018 at 8:57 PM, Nathan Goldbaum
wrote: Yes, apologies, I got that backwards. A larger n_ref means there needs be more particles per octree leaf zone to trigger refinement on that zone, so the you end up with fewer octree leaf nodes over all and you need less memory.
-Nathan
On Tue, Feb 20, 2018 at 7:53 PM, Desika Narayanan < desika.narayanan@gmail.com> wrote:
quick note -- set n_ref=64 or 128 (or bigger than 32) to reduce RAM footprint (instead of 16).
-d
On Tue, Feb 20, 2018 at 8:37 PM, Nathan Goldbaum
wrote: Hi Jared,
Just a comment that for these really big particle simulations the way yt currently supports particle data doesn't scale terribly well, so you might run into either issues with operations taking a very long time or using too much RAM.
You may find it useful to only load in data in a subset of the simulation domain to restrict the number of particles yt is dealing with at any one time. This can be controlled at a coarse level via the "bounding_box" keyword argument to the load() function, in particular by supplying a bounding box that only covers a portion of the domain. It might also help with RAM usage to supply n_ref=16 instead of the default (32). There's more detail about what these parameters do in the docs:
http://yt-project.org/docs/3.4.1/examining/loading_data.html #gadget-data
If you'd like you could also try the "sph-viz" branch on my fork of yt at https://github.com/ngoldbaum/yt. This branch contains an implementation of the next-generation support for particle data in yt that is still under development. This version will scale much better for a dataset as big as yours.
Best,
Nathan
On Tue, Feb 20, 2018 at 7:22 PM, Matthew Turk
wrote: Hi Jared,
I know you can make the particle IDs work by specifying the dtype, but my reading of the _validate_header code is the same as yours. I think it would be a pretty straightforward change to conditionally make the 'I' read into the struct into a 64-bit variable, but it's not there now.
-Matt
Hello! I have a gadget snapshot file with 1024^3 particles. When I
load it with yt, yt fails by saying that it cannot identify the file type. I did some digging and found that the way yt validates a gadget snapshot is by reading the number of particles from the header and then comparing
number to the number derived from the size of the position block. This size is 1024^3 *3*4 (the number of particles times 3 positions times 4 bytes per position). However, as I just discovered, this number is larger than
value of an int in c, and so gadget writes a value of 0 due to the overflow. As such, I need to use gadget's unsigned long long int for the
On Tue, Feb 20, 2018 at 7:16 PM, Jared Coughlin
wrote: tried to that the max padding, but this is 8 bytes, not 4. I was just wondering if yt had some ability to detect the need to use something larger than an int? If not, I can add it. Thanks! -Jared
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org _______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org
participants (4)
-
Desika Narayanan
-
Jared Coughlin
-
Matthew Turk
-
Nathan Goldbaum