Mailman 3 Loading a large gadget snapshot - yt-users

Loading a large gadget snapshot

older
Re: Error while running yt with MPI

Jared Coughlin

21 Feb 2018 21 Feb '18

6:46 a.m.

Hello! I have a gadget snapshot file with 1024^3 particles. When I tried to load it with yt, yt fails by saying that it cannot identify the file type. I did some digging and found that the way yt validates a gadget snapshot is by reading the number of particles from the header and then comparing that number to the number derived from the size of the position block. This size is 1024^3 *3*4 (the number of particles times 3 positions times 4 bytes per position). However, as I just discovered, this number is larger than the max value of an int in c, and so gadget writes a value of 0 due to the overflow. As such, I need to use gadget's unsigned long long int for the padding, but this is 8 bytes, not 4. I was just wondering if yt had some ability to detect the need to use something larger than an int? If not, I can add it. Thanks! -Jared

Attachments:

attachment.html (text/html — 892 bytes)

Show replies by date

Matthew Turk

21 Feb 21 Feb

6:52 a.m.

Hi Jared, I know you can make the particle IDs work by specifying the dtype, but my reading of the _validate_header code is the same as yours. I think it would be a pretty straightforward change to conditionally make the 'I' read into the struct into a 64-bit variable, but it's not there now. -Matt On Tue, Feb 20, 2018 at 7:16 PM, Jared Coughlin wrote:

...

Hello! I have a gadget snapshot file with 1024^3 particles. When I tried to load it with yt, yt fails by saying that it cannot identify the file type. I did some digging and found that the way yt validates a gadget snapshot is by reading the number of particles from the header and then comparing that number to the number derived from the size of the position block. This size is 1024^3 *3*4 (the number of particles times 3 positions times 4 bytes per position). However, as I just discovered, this number is larger than the max value of an int in c, and so gadget writes a value of 0 due to the overflow. As such, I need to use gadget's unsigned long long int for the padding, but this is 8 bytes, not 4. I was just wondering if yt had some ability to detect the need to use something larger than an int? If not, I can add it. Thanks! -Jared

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

Nathan Goldbaum

7:07 a.m.

Hi Jared, Just a comment that for these really big particle simulations the way yt currently supports particle data doesn't scale terribly well, so you might run into either issues with operations taking a very long time or using too much RAM. You may find it useful to only load in data in a subset of the simulation domain to restrict the number of particles yt is dealing with at any one time. This can be controlled at a coarse level via the "bounding_box" keyword argument to the load() function, in particular by supplying a bounding box that only covers a portion of the domain. It might also help with RAM usage to supply n_ref=16 instead of the default (32). There's more detail about what these parameters do in the docs: http://yt-project.org/docs/3.4.1/examining/loading_data.html#gadget-data If you'd like you could also try the "sph-viz" branch on my fork of yt at https://github.com/ngoldbaum/yt. This branch contains an implementation of the next-generation support for particle data in yt that is still under development. This version will scale much better for a dataset as big as yours. Best, Nathan On Tue, Feb 20, 2018 at 7:22 PM, Matthew Turk wrote:

...

Hi Jared,

I know you can make the particle IDs work by specifying the dtype, but my reading of the _validate_header code is the same as yours. I think it would be a pretty straightforward change to conditionally make the 'I' read into the struct into a 64-bit variable, but it's not there now.

-Matt

...
Hello! I have a gadget snapshot file with 1024^3 particles. When I tried to load it with yt, yt fails by saying that it cannot identify the file type. I did some digging and found that the way yt validates a gadget snapshot is by reading the number of particles from the header and then comparing that number to the number derived from the size of the position block. This size is 1024^3 *3*4 (the number of particles times 3 positions times 4 bytes

On Tue, Feb 20, 2018 at 7:16 PM, Jared Coughlin wrote: per

...
position). However, as I just discovered, this number is larger than the max value of an int in c, and so gadget writes a value of 0 due to the overflow. As such, I need to use gadget's unsigned long long int for the padding, but this is 8 bytes, not 4. I was just wondering if yt had some ability to detect the need to use something larger than an int? If not, I can add it. Thanks! -Jared

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org _______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

Desika Narayanan

7:23 a.m.

quick note -- set n_ref=64 or 128 (or bigger than 32) to reduce RAM footprint (instead of 16). -d On Tue, Feb 20, 2018 at 8:37 PM, Nathan Goldbaum wrote:

...

Hi Jared,

Just a comment that for these really big particle simulations the way yt currently supports particle data doesn't scale terribly well, so you might run into either issues with operations taking a very long time or using too much RAM.

You may find it useful to only load in data in a subset of the simulation domain to restrict the number of particles yt is dealing with at any one time. This can be controlled at a coarse level via the "bounding_box" keyword argument to the load() function, in particular by supplying a bounding box that only covers a portion of the domain. It might also help with RAM usage to supply n_ref=16 instead of the default (32). There's more detail about what these parameters do in the docs:

http://yt-project.org/docs/3.4.1/examining/loading_data.html#gadget-data

If you'd like you could also try the "sph-viz" branch on my fork of yt at https://github.com/ngoldbaum/yt. This branch contains an implementation of the next-generation support for particle data in yt that is still under development. This version will scale much better for a dataset as big as yours.

Best,

Nathan

On Tue, Feb 20, 2018 at 7:22 PM, Matthew Turk wrote:

...
Hi Jared,

I know you can make the particle IDs work by specifying the dtype, but my reading of the _validate_header code is the same as yours. I think it would be a pretty straightforward change to conditionally make the 'I' read into the struct into a 64-bit variable, but it's not there now.

-Matt

...
Hello! I have a gadget snapshot file with 1024^3 particles. When I

...
load it with yt, yt fails by saying that it cannot identify the file type. I did some digging and found that the way yt validates a gadget snapshot is by reading the number of particles from the header and then comparing that number to the number derived from the size of the position block. This size is 1024^3 *3*4 (the number of particles times 3 positions times 4 bytes

...
position). However, as I just discovered, this number is larger than

On Tue, Feb 20, 2018 at 7:16 PM, Jared Coughlin wrote: tried to per the max

...
value of an int in c, and so gadget writes a value of 0 due to the overflow. As such, I need to use gadget's unsigned long long int for the padding, but this is 8 bytes, not 4. I was just wondering if yt had some ability to detect the need to use something larger than an int? If not, I can add it. Thanks! -Jared

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org _______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

Nathan Goldbaum

7:27 a.m.

Yes, apologies, I got that backwards. A larger n_ref means there needs be more particles per octree leaf zone to trigger refinement on that zone, so the you end up with fewer octree leaf nodes over all and you need less memory. -Nathan On Tue, Feb 20, 2018 at 7:53 PM, Desika Narayanan < desika.narayanan@gmail.com> wrote:

...

quick note -- set n_ref=64 or 128 (or bigger than 32) to reduce RAM footprint (instead of 16).

-d

On Tue, Feb 20, 2018 at 8:37 PM, Nathan Goldbaum wrote:

...
Hi Jared,

Just a comment that for these really big particle simulations the way yt currently supports particle data doesn't scale terribly well, so you might run into either issues with operations taking a very long time or using too much RAM.

You may find it useful to only load in data in a subset of the simulation domain to restrict the number of particles yt is dealing with at any one time. This can be controlled at a coarse level via the "bounding_box" keyword argument to the load() function, in particular by supplying a bounding box that only covers a portion of the domain. It might also help with RAM usage to supply n_ref=16 instead of the default (32). There's more detail about what these parameters do in the docs:

http://yt-project.org/docs/3.4.1/examining/loading_data.html#gadget-data

If you'd like you could also try the "sph-viz" branch on my fork of yt at https://github.com/ngoldbaum/yt. This branch contains an implementation of the next-generation support for particle data in yt that is still under development. This version will scale much better for a dataset as big as yours.

Best,

Nathan

On Tue, Feb 20, 2018 at 7:22 PM, Matthew Turk wrote:

...
Hi Jared,

I know you can make the particle IDs work by specifying the dtype, but my reading of the _validate_header code is the same as yours. I think it would be a pretty straightforward change to conditionally make the 'I' read into the struct into a 64-bit variable, but it's not there now.

-Matt

...
Hello! I have a gadget snapshot file with 1024^3 particles. When I

...
load it with yt, yt fails by saying that it cannot identify the file type. I did some digging and found that the way yt validates a gadget snapshot is by reading the number of particles from the header and then comparing that number to the number derived from the size of the position block. This size is 1024^3 *3*4 (the number of particles times 3 positions times 4 bytes per position). However, as I just discovered, this number is larger than

...
value of an int in c, and so gadget writes a value of 0 due to the overflow. As such, I need to use gadget's unsigned long long int for the

On Tue, Feb 20, 2018 at 7:16 PM, Jared Coughlin wrote: tried to the max padding, but

...
this is 8 bytes, not 4. I was just wondering if yt had some ability to detect the need to use something larger than an int? If not, I can add it. Thanks! -Jared

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org _______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

Jared Coughlin

22 Feb 22 Feb

1:39 a.m.

This just occurred to me: if my block paddings are unsigned long long int (which are 8 bytes) in order to hold a large enough number instead of the standard int (which is 4 bytes), I'm assuming this will screw up how yt loads the data after getting past file validation? On Tue, Feb 20, 2018 at 8:57 PM, Nathan Goldbaum wrote:

...

Yes, apologies, I got that backwards. A larger n_ref means there needs be more particles per octree leaf zone to trigger refinement on that zone, so the you end up with fewer octree leaf nodes over all and you need less memory.

-Nathan

On Tue, Feb 20, 2018 at 7:53 PM, Desika Narayanan < desika.narayanan@gmail.com> wrote:

...
quick note -- set n_ref=64 or 128 (or bigger than 32) to reduce RAM footprint (instead of 16).

-d

On Tue, Feb 20, 2018 at 8:37 PM, Nathan Goldbaum wrote:

...
Hi Jared,

Just a comment that for these really big particle simulations the way yt currently supports particle data doesn't scale terribly well, so you might run into either issues with operations taking a very long time or using too much RAM.

You may find it useful to only load in data in a subset of the simulation domain to restrict the number of particles yt is dealing with at any one time. This can be controlled at a coarse level via the "bounding_box" keyword argument to the load() function, in particular by supplying a bounding box that only covers a portion of the domain. It might also help with RAM usage to supply n_ref=16 instead of the default (32). There's more detail about what these parameters do in the docs:

http://yt-project.org/docs/3.4.1/examining/loading_data.html#gadget-data

If you'd like you could also try the "sph-viz" branch on my fork of yt at https://github.com/ngoldbaum/yt. This branch contains an implementation of the next-generation support for particle data in yt that is still under development. This version will scale much better for a dataset as big as yours.

Best,

Nathan

On Tue, Feb 20, 2018 at 7:22 PM, Matthew Turk wrote:

...
Hi Jared,

I know you can make the particle IDs work by specifying the dtype, but my reading of the _validate_header code is the same as yours. I think it would be a pretty straightforward change to conditionally make the 'I' read into the struct into a 64-bit variable, but it's not there now.

-Matt

...
Hello! I have a gadget snapshot file with 1024^3 particles. When I

...
load it with yt, yt fails by saying that it cannot identify the file type. I did some digging and found that the way yt validates a gadget snapshot is by reading the number of particles from the header and then comparing

...
number to the number derived from the size of the position block. This size is 1024^3 *3*4 (the number of particles times 3 positions times 4 bytes per position). However, as I just discovered, this number is larger than

...
value of an int in c, and so gadget writes a value of 0 due to the overflow. As such, I need to use gadget's unsigned long long int for the

On Tue, Feb 20, 2018 at 7:16 PM, Jared Coughlin wrote: tried to that the max padding, but

...
this is 8 bytes, not 4. I was just wondering if yt had some ability to detect the need to use something larger than an int? If not, I can add it. Thanks! -Jared

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org _______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

Nathan Goldbaum

1:45 a.m.

You'll need to specify a custom header specification. There's some discussion about this in the docs: http://yt-project.org/docs/dev/examining/loading_data.html#header-specificat... By the way, if you want to bypass the header validation entirely, you could do: from yt.frontends.gadget.data_structures import GadgetDataset ds = GadgetDataset(path) -Nathan On Wed, Feb 21, 2018 at 2:09 PM, Jared Coughlin wrote:

...

This just occurred to me: if my block paddings are unsigned long long int (which are 8 bytes) in order to hold a large enough number instead of the standard int (which is 4 bytes), I'm assuming this will screw up how yt loads the data after getting past file validation?

On Tue, Feb 20, 2018 at 8:57 PM, Nathan Goldbaum wrote:

...
Yes, apologies, I got that backwards. A larger n_ref means there needs be more particles per octree leaf zone to trigger refinement on that zone, so the you end up with fewer octree leaf nodes over all and you need less memory.

-Nathan

On Tue, Feb 20, 2018 at 7:53 PM, Desika Narayanan < desika.narayanan@gmail.com> wrote:

...
quick note -- set n_ref=64 or 128 (or bigger than 32) to reduce RAM footprint (instead of 16).

-d

On Tue, Feb 20, 2018 at 8:37 PM, Nathan Goldbaum wrote:

...
Hi Jared,

Just a comment that for these really big particle simulations the way yt currently supports particle data doesn't scale terribly well, so you might run into either issues with operations taking a very long time or using too much RAM.

You may find it useful to only load in data in a subset of the simulation domain to restrict the number of particles yt is dealing with at any one time. This can be controlled at a coarse level via the "bounding_box" keyword argument to the load() function, in particular by supplying a bounding box that only covers a portion of the domain. It might also help with RAM usage to supply n_ref=16 instead of the default (32). There's more detail about what these parameters do in the docs:

http://yt-project.org/docs/3.4.1/examining/loading_data.html #gadget-data

If you'd like you could also try the "sph-viz" branch on my fork of yt at https://github.com/ngoldbaum/yt. This branch contains an implementation of the next-generation support for particle data in yt that is still under development. This version will scale much better for a dataset as big as yours.

Best,

Nathan

On Tue, Feb 20, 2018 at 7:22 PM, Matthew Turk wrote:

...
Hi Jared,

I know you can make the particle IDs work by specifying the dtype, but my reading of the _validate_header code is the same as yours. I think it would be a pretty straightforward change to conditionally make the 'I' read into the struct into a 64-bit variable, but it's not there now.

-Matt

...
Hello! I have a gadget snapshot file with 1024^3 particles. When I

...
load it with yt, yt fails by saying that it cannot identify the file type. I did some digging and found that the way yt validates a gadget snapshot is by reading the number of particles from the header and then comparing

...
number to the number derived from the size of the position block. This size is 1024^3 *3*4 (the number of particles times 3 positions times 4 bytes per position). However, as I just discovered, this number is larger than

...
value of an int in c, and so gadget writes a value of 0 due to the overflow. As such, I need to use gadget's unsigned long long int for the

On Tue, Feb 20, 2018 at 7:16 PM, Jared Coughlin wrote: tried to that the max padding, but

...
this is 8 bytes, not 4. I was just wondering if yt had some ability to detect the need to use something larger than an int? If not, I can add it. Thanks! -Jared

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org _______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

_______________________________________________ yt-users mailing list -- yt-users@python.org To unsubscribe send an email to yt-users-leave@python.org

2250

Age (days ago)

2250

Last active (days ago)

List overview

Download

6 comments

4 participants

participants (4)

Desika Narayanan
Jared Coughlin
Matthew Turk
Nathan Goldbaum

Loading a large gadget snapshot

Jared Coughlin

Matthew Turk

Nathan Goldbaum

Desika Narayanan

Nathan Goldbaum

Jared Coughlin

Nathan Goldbaum

tags

participants (4)