Mailman 3 Parallelization & the ART frontend - yt-dev

Parallelization & the ART frontend

Christopher Moody

8 Feb 2012 8 Feb '12

2:30 a.m.

Hi guys, I've been working hard on the ART frontend. Lately, I'm to the point where I'm playing around with more complex datasets that are taking much longer to project - so I'd really like to to start using the parallelization engines. I've tried Sam's workshop parallelization demos, and they all work. But launching with the ART frontend ( http://paste.yt-project.org/show/2152/) spawns many independent processes which evidently are not actually splitting the projection job, but still taking up lots of processors. My mpi installation works: yt : [INFO ] 2012-02-07 18:12:28,207 Global parallel computation enabled: 0 / 8yt : [INFO ] 2012-02-07 18:12:28,207 Global parallel computation enabled: 2 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 1 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 6 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 3 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 4 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 5 / 8yt : [INFO ] 2012-02-07 18:12:28,209 Global parallel computation enabled: 7 / 8 But the script is just run 8 times, not any faster. What am I missing here? Many thanks! chris

Attachments:

attachment.html (text/html — 1.6 KB)

Show replies by date

Matthew Turk

8 Feb 8 Feb

2:46 p.m.

Hi Chris, On Tue, Feb 7, 2012 at 9:30 PM, Christopher Moody wrote:

...

Hi guys,

I've been working hard on the ART frontend. Lately, I'm to the point where I'm playing around with more complex datasets that are taking much longer to project - so I'd really like to to start using the parallelization engines. I've tried Sam's workshop parallelization demos, and they all work. But launching with the ART frontend (http://paste.yt-project.org/show/2152/) spawns many independent processes which evidently are not actually splitting the projection job, but still taking up lots of processors.

My guess is that parallelism is not enabled for the ART frontend simply as a matter of how the IO is conducted. To make it really work in parallel, the IO needs to be split up so that when process 1 reads a given grid patch, the rest of the processors don't also need to read all the data for that grid patch. Can you lower your loglevel (by setting loglevel = 1 in ~/.yt/config or by --config yt.loglevel=1 on the command line) and report back with what it says during a projection job there? -MAtt

...

My mpi installation works: yt : [INFO ] 2012-02-07 18:12:28,207 Global parallel computation enabled: 0 / 8yt : [INFO ] 2012-02-07 18:12:28,207 Global parallel computation enabled: 2 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 1 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 6 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 3 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 4 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 5 / 8yt : [INFO ] 2012-02-07 18:12:28,209 Global parallel computation enabled: 7 / 8

But the script is just run 8 times, not any faster.

What am I missing here?

Many thanks! chris

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

Christopher Moody

8 p.m.

Hi Matt, I've got the log output here:http://paste.yt-project.org/show/2153/ with the serial version here http://paste.yt-project.org/show/2154/ . The most interesting tidbit is below, where it looks like core 0 projects Levels 0-5 and core 1 projects Level 6 (which takes up like 99% of the projection time.) chris P001 yt : [DEBUG ] 2012-02-08 11:39:53,403 Going to obtain [] P001 yt : [DEBUG ] 2012-02-08 11:39:53,406 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,406 End of projecting level level 0, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,406 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,406 End of projecting level level 1, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,407 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,407 End of projecting level level 2, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,408 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,408 End of projecting level level 3, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,408 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,408 End of projecting level level 4, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,409 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,409 End of projecting level level 5, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,409 Preloading ['density'] from 6 grids P001 yt : [INFO ] 2012-02-08 11:39:53,410 Starting 'Projecting level 6 / 6 ' P000 yt : [INFO ] 2012-02-08 11:39:54,057 Finishing 'Projecting level 0 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:54,057 End of projecting level level 0, memory usage 4.482e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:54,057 Preloading ['density'] from 1 grids P000 yt : [INFO ] 2012-02-08 11:39:54,058 Starting 'Projecting level 1 / 6 'P000 yt : [INFO ] 2012-02-08 11:39:54,070 Finishing 'Projecting level 1 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:54,070 End of projecting level level 1, memory usage 4.482e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:54,070 Preloading ['density'] from 1 gridsP000 yt : [INFO ] 2012-02-08 11:39:54,071 Starting 'Projecting level 2 / 6 ' P000 yt : [INFO ] 2012-02-08 11:39:54,130 Finishing 'Projecting level 2 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:54,130 End of projecting level level 2, memory usage 4.482e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:54,130 Preloading ['density'] from 1 grids P000 yt : [INFO ] 2012-02-08 11:39:54,131 Starting 'Projecting level 3 / 6 ' P000 yt : [INFO ] 2012-02-08 11:39:54,783 Finishing 'Projecting level 3 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:54,784 End of projecting level level 3, memory usage 4.482e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:54,784 Preloading ['density'] from 1 grids P000 yt : [INFO ] 2012-02-08 11:39:54,784 Starting 'Projecting level 4 / 6 ' P000 yt : [INFO ] 2012-02-08 11:39:59,389 Finishing 'Projecting level 4 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:59,389 End of projecting level level 4, memory usage 5.918e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:59,389 Preloading ['density'] from 1 grids P000 yt : [INFO ] 2012-02-08 11:39:59,389 Starting 'Projecting level 5 / 6 ' P000 yt : [INFO ] 2012-02-08 11:40:17,735 Finishing 'Projecting level 5 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:40:17,735 End of projecting level level 5, memory usage 1.569e+00 P000 yt : [DEBUG ] 2012-02-08 11:40:17,735 Preloading ['density'] from 0 grids P000 yt : [DEBUG ] 2012-02-08 11:40:17,736 End of projecting level level 6, memory usage 1.569e+00 P001 yt : [INFO ] 2012-02-08 11:41:31,681 Finishing 'Projecting level 6 / 6 ' P001 yt : [DEBUG ] 2012-02-08 11:41:31,681 End of projecting level level 6, memory usage 2.113e+00 P000 yt : [DEBUG ] 2012-02-08 11:41:33,807 Opening MPI Barrier on 0 P000 yt : [INFO ] 2012-02-08 11:41:34,502 Projection completed P001 yt : [DEBUG ] 2012-02-08 11:41:34,502 Opening MPI Barrier on 1 P001 yt : [INFO ] 2012-02-08 11:41:34,502 Projection completed P001 yt : [DEBUG ] 2012-02-08 11:41:34,579 Opening MPI Barrier on 1 On Wed, Feb 8, 2012 at 6:46 AM, Matthew Turk wrote:

...

Hi Chris,

On Tue, Feb 7, 2012 at 9:30 PM, Christopher Moody wrote:

...
Hi guys,

I've been working hard on the ART frontend. Lately, I'm to the point where I'm playing around with more complex datasets that are taking much longer to project - so I'd really like to to start using the parallelization engines. I've tried Sam's workshop parallelization demos, and they all work. But launching with the ART frontend (http://paste.yt-project.org/show/2152/) spawns many independent processes which evidently are not actually splitting the projection job, but still taking up lots of processors.

My guess is that parallelism is not enabled for the ART frontend simply as a matter of how the IO is conducted. To make it really work in parallel, the IO needs to be split up so that when process 1 reads a given grid patch, the rest of the processors don't also need to read all the data for that grid patch.

Can you lower your loglevel (by setting loglevel = 1 in ~/.yt/config or by --config yt.loglevel=1 on the command line) and report back with what it says during a projection job there?

-MAtt

...
My mpi installation works: yt : [INFO ] 2012-02-07 18:12:28,207 Global parallel computation enabled: 0 / 8yt : [INFO ] 2012-02-07 18:12:28,207 Global parallel computation enabled: 2 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 1 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 6 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 3 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 4 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 5 / 8yt : [INFO ] 2012-02-07 18:12:28,209 Global parallel computation enabled: 7 / 8

But the script is just run 8 times, not any faster.

What am I missing here?

Many thanks! chris

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

Matthew Turk

8:44 p.m.

Hi Chris, Yeah, that's weird. My guess is that load balancing is going haywire for some reason, likely due to overlap versus quadtree proj. Can you tell me what type of object pf.h.proj is? i.e., what's the output of "print pf.h.proj"? And then, what's pf.refine_by? -Matt On Wed, Feb 8, 2012 at 3:00 PM, Christopher Moody wrote:

...

Hi Matt,

I've got the log output here:http://paste.yt-project.org/show/2153/ with the serial version here http://paste.yt-project.org/show/2154/ .

The most interesting tidbit is below, where it looks like core 0 projects Levels 0-5 and core 1 projects Level 6 (which takes up like 99% of the projection time.)

chris

P001 yt : [DEBUG ] 2012-02-08 11:39:53,403 Going to obtain [] P001 yt : [DEBUG ] 2012-02-08 11:39:53,406 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,406 End of projecting level level 0, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,406 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,406 End of projecting level level 1, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,407 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,407 End of projecting level level 2, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,408 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,408 End of projecting level level 3, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,408 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,408 End of projecting level level 4, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,409 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,409 End of projecting level level 5, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,409 Preloading ['density'] from 6 grids P001 yt : [INFO ] 2012-02-08 11:39:53,410 Starting 'Projecting level 6 / 6 ' P000 yt : [INFO ] 2012-02-08 11:39:54,057 Finishing 'Projecting level 0 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:54,057 End of projecting level level 0, memory usage 4.482e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:54,057 Preloading ['density'] from 1 grids P000 yt : [INFO ] 2012-02-08 11:39:54,058 Starting 'Projecting level 1 / 6 'P000 yt : [INFO ] 2012-02-08 11:39:54,070 Finishing 'Projecting level 1 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:54,070 End of projecting level level 1, memory usage 4.482e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:54,070 Preloading ['density'] from 1 gridsP000 yt : [INFO ] 2012-02-08 11:39:54,071 Starting 'Projecting level 2 / 6 ' P000 yt : [INFO ] 2012-02-08 11:39:54,130 Finishing 'Projecting level 2 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:54,130 End of projecting level level 2, memory usage 4.482e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:54,130 Preloading ['density'] from 1 grids P000 yt : [INFO ] 2012-02-08 11:39:54,131 Starting 'Projecting level 3 / 6 ' P000 yt : [INFO ] 2012-02-08 11:39:54,783 Finishing 'Projecting level 3 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:54,784 End of projecting level level 3, memory usage 4.482e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:54,784 Preloading ['density'] from 1 grids P000 yt : [INFO ] 2012-02-08 11:39:54,784 Starting 'Projecting level 4 / 6 ' P000 yt : [INFO ] 2012-02-08 11:39:59,389 Finishing 'Projecting level 4 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:59,389 End of projecting level level 4, memory usage 5.918e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:59,389 Preloading ['density'] from 1 grids P000 yt : [INFO ] 2012-02-08 11:39:59,389 Starting 'Projecting level 5 / 6 ' P000 yt : [INFO ] 2012-02-08 11:40:17,735 Finishing 'Projecting level 5 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:40:17,735 End of projecting level level 5, memory usage 1.569e+00 P000 yt : [DEBUG ] 2012-02-08 11:40:17,735 Preloading ['density'] from 0 grids P000 yt : [DEBUG ] 2012-02-08 11:40:17,736 End of projecting level level 6, memory usage 1.569e+00 P001 yt : [INFO ] 2012-02-08 11:41:31,681 Finishing 'Projecting level 6 / 6 ' P001 yt : [DEBUG ] 2012-02-08 11:41:31,681 End of projecting level level 6, memory usage 2.113e+00 P000 yt : [DEBUG ] 2012-02-08 11:41:33,807 Opening MPI Barrier on 0 P000 yt : [INFO ] 2012-02-08 11:41:34,502 Projection completed P001 yt : [DEBUG ] 2012-02-08 11:41:34,502 Opening MPI Barrier on 1 P001 yt : [INFO ] 2012-02-08 11:41:34,502 Projection completed P001 yt : [DEBUG ] 2012-02-08 11:41:34,579 Opening MPI Barrier on 1

On Wed, Feb 8, 2012 at 6:46 AM, Matthew Turk wrote:

...
Hi Chris,

On Tue, Feb 7, 2012 at 9:30 PM, Christopher Moody wrote:

...
Hi guys,

I've been working hard on the ART frontend. Lately, I'm to the point where I'm playing around with more complex datasets that are taking much longer to project - so I'd really like to to start using the parallelization engines. I've tried Sam's workshop parallelization demos, and they all work. But launching with the ART frontend (http://paste.yt-project.org/show/2152/) spawns many independent processes which evidently are not actually splitting the projection job, but still taking up lots of processors.

My guess is that parallelism is not enabled for the ART frontend simply as a matter of how the IO is conducted. To make it really work in parallel, the IO needs to be split up so that when process 1 reads a given grid patch, the rest of the processors don't also need to read all the data for that grid patch.

Can you lower your loglevel (by setting loglevel = 1 in ~/.yt/config or by --config yt.loglevel=1 on the command line) and report back with what it says during a projection job there?

-MAtt

...
My mpi installation works: yt : [INFO ] 2012-02-07 18:12:28,207 Global parallel computation enabled: 0 / 8yt : [INFO ] 2012-02-07 18:12:28,207 Global parallel computation enabled: 2 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 1 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 6 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 3 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 4 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 5 / 8yt : [INFO ] 2012-02-07 18:12:28,209 Global parallel computation enabled: 7 / 8

But the script is just run 8 times, not any faster.

What am I missing here?

Many thanks! chris

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

Christopher Moody

9:09 p.m.

Hi Matt, pf.h.proj is of type and refine_by is 2. Does this help? I'm not sure what you mean by overlaps - doesn't the the RAMSES grid patching mechanism produce non-overlapping grids from the octs? Is quadtree proj checking for overlapping grids? chris On Wed, Feb 8, 2012 at 12:44 PM, Matthew Turk wrote:

...

Hi Chris,

Yeah, that's weird. My guess is that load balancing is going haywire for some reason, likely due to overlap versus quadtree proj. Can you tell me what type of object pf.h.proj is? i.e., what's the output of "print pf.h.proj"? And then, what's pf.refine_by?

-Matt

On Wed, Feb 8, 2012 at 3:00 PM, Christopher Moody wrote:

...
Hi Matt,

I've got the log output here:http://paste.yt-project.org/show/2153/with the serial version here http://paste.yt-project.org/show/2154/ .

The most interesting tidbit is below, where it looks like core 0 projects Levels 0-5 and core 1 projects Level 6 (which takes up like 99% of the projection time.)

chris

P001 yt : [DEBUG ] 2012-02-08 11:39:53,403 Going to obtain [] P001 yt : [DEBUG ] 2012-02-08 11:39:53,406 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,406 End of projecting level level 0, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,406 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,406 End of projecting level level 1, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,407 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,407 End of projecting level level 2, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,408 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,408 End of projecting level level 3, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,408 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,408 End of projecting level level 4, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,409 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,409 End of projecting level level 5, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,409 Preloading ['density'] from 6 grids P001 yt : [INFO ] 2012-02-08 11:39:53,410 Starting 'Projecting level 6 / 6 ' P000 yt : [INFO ] 2012-02-08 11:39:54,057 Finishing 'Projecting level 0 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:54,057 End of projecting level level 0, memory usage 4.482e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:54,057 Preloading ['density'] from 1 grids P000 yt : [INFO ] 2012-02-08 11:39:54,058 Starting 'Projecting level 1 / 6 'P000 yt : [INFO ] 2012-02-08 11:39:54,070 Finishing 'Projecting level 1 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:54,070 End of projecting level level 1, memory usage 4.482e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:54,070 Preloading ['density'] from 1 gridsP000 yt : [INFO ] 2012-02-08 11:39:54,071 Starting 'Projecting level 2 / 6 ' P000 yt : [INFO ] 2012-02-08 11:39:54,130 Finishing 'Projecting level 2 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:54,130 End of projecting level level 2, memory usage 4.482e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:54,130 Preloading ['density'] from 1 grids P000 yt : [INFO ] 2012-02-08 11:39:54,131 Starting 'Projecting level 3 / 6 ' P000 yt : [INFO ] 2012-02-08 11:39:54,783 Finishing 'Projecting level 3 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:54,784 End of projecting level level 3, memory usage 4.482e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:54,784 Preloading ['density'] from 1 grids P000 yt : [INFO ] 2012-02-08 11:39:54,784 Starting 'Projecting level 4 / 6 ' P000 yt : [INFO ] 2012-02-08 11:39:59,389 Finishing 'Projecting level 4 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:59,389 End of projecting level level 4, memory usage 5.918e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:59,389 Preloading ['density'] from 1 grids P000 yt : [INFO ] 2012-02-08 11:39:59,389 Starting 'Projecting level 5 / 6 ' P000 yt : [INFO ] 2012-02-08 11:40:17,735 Finishing 'Projecting level 5 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:40:17,735 End of projecting level level 5, memory usage 1.569e+00 P000 yt : [DEBUG ] 2012-02-08 11:40:17,735 Preloading ['density'] from 0 grids P000 yt : [DEBUG ] 2012-02-08 11:40:17,736 End of projecting level level 6, memory usage 1.569e+00 P001 yt : [INFO ] 2012-02-08 11:41:31,681 Finishing 'Projecting level 6 / 6 ' P001 yt : [DEBUG ] 2012-02-08 11:41:31,681 End of projecting level level 6, memory usage 2.113e+00 P000 yt : [DEBUG ] 2012-02-08 11:41:33,807 Opening MPI Barrier on 0 P000 yt : [INFO ] 2012-02-08 11:41:34,502 Projection completed P001 yt : [DEBUG ] 2012-02-08 11:41:34,502 Opening MPI Barrier on 1 P001 yt : [INFO ] 2012-02-08 11:41:34,502 Projection completed P001 yt : [DEBUG ] 2012-02-08 11:41:34,579 Opening MPI Barrier on 1

On Wed, Feb 8, 2012 at 6:46 AM, Matthew Turk wrote:

...
Hi Chris,

On Tue, Feb 7, 2012 at 9:30 PM, Christopher Moody wrote:

...
Hi guys,

I've been working hard on the ART frontend. Lately, I'm to the point where I'm playing around with more complex datasets that are taking much longer to project - so I'd really like to to start using the parallelization engines. I've tried Sam's workshop parallelization demos, and they all work.

But

...
...
launching with the ART frontend ( http://paste.yt-project.org/show/2152/) spawns many independent processes which evidently are not actually splitting the projection job, but still taking up lots of processors.

My guess is that parallelism is not enabled for the ART frontend simply as a matter of how the IO is conducted. To make it really work in parallel, the IO needs to be split up so that when process 1 reads a given grid patch, the rest of the processors don't also need to read all the data for that grid patch.

Can you lower your loglevel (by setting loglevel = 1 in ~/.yt/config or by --config yt.loglevel=1 on the command line) and report back with what it says during a projection job there?

-MAtt

...
My mpi installation works: yt : [INFO ] 2012-02-07 18:12:28,207 Global parallel computation enabled: 0 / 8yt : [INFO ] 2012-02-07 18:12:28,207 Global parallel computation enabled: 2 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 1 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 6 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 3 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 4 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 5 / 8yt : [INFO ] 2012-02-07 18:12:28,209 Global parallel computation enabled: 7 / 8

But the script is just run 8 times, not any faster.

What am I missing here?

Many thanks! chris

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

Matthew Turk

9:47 p.m.

Hi Chris, This does help. My suspicion is that the load balancing is giving all the lower-level grids to one processor, and all the upper level grids to another. If you have time tomorrow, let's work through this together on IRC. I'm free 1-2PM EST, 3-5PM EST. I think it should be a pretty straightforward fix. It would also be cool to see your new method of accumulating octs into patches. -Matt On Wed, Feb 8, 2012 at 4:09 PM, Christopher Moody wrote:

...

Hi Matt,

pf.h.proj is of type and refine_by is 2.

Does this help? I'm not sure what you mean by overlaps - doesn't the the RAMSES grid patching mechanism produce non-overlapping grids from the octs? Is quadtree proj checking for overlapping grids?

chris

On Wed, Feb 8, 2012 at 12:44 PM, Matthew Turk wrote:

...
Hi Chris,

Yeah, that's weird. My guess is that load balancing is going haywire for some reason, likely due to overlap versus quadtree proj. Can you tell me what type of object pf.h.proj is? i.e., what's the output of "print pf.h.proj"? And then, what's pf.refine_by?

-Matt

On Wed, Feb 8, 2012 at 3:00 PM, Christopher Moody wrote:

...
Hi Matt,

I've got the log output here:http://paste.yt-project.org/show/2153/ with the serial version here http://paste.yt-project.org/show/2154/ .

The most interesting tidbit is below, where it looks like core 0 projects Levels 0-5 and core 1 projects Level 6 (which takes up like 99% of the projection time.)

chris

P001 yt : [DEBUG ] 2012-02-08 11:39:53,403 Going to obtain [] P001 yt : [DEBUG ] 2012-02-08 11:39:53,406 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,406 End of projecting level level 0, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,406 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,406 End of projecting level level 1, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,407 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,407 End of projecting level level 2, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,408 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,408 End of projecting level level 3, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,408 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,408 End of projecting level level 4, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,409 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,409 End of projecting level level 5, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,409 Preloading ['density'] from 6 grids P001 yt : [INFO ] 2012-02-08 11:39:53,410 Starting 'Projecting level 6 / 6 ' P000 yt : [INFO ] 2012-02-08 11:39:54,057 Finishing 'Projecting level 0 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:54,057 End of projecting level level 0, memory usage 4.482e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:54,057 Preloading ['density'] from 1 grids P000 yt : [INFO ] 2012-02-08 11:39:54,058 Starting 'Projecting level 1 / 6 'P000 yt : [INFO ] 2012-02-08 11:39:54,070 Finishing 'Projecting level 1 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:54,070 End of projecting level level 1, memory usage 4.482e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:54,070 Preloading ['density'] from 1 gridsP000 yt : [INFO ] 2012-02-08 11:39:54,071 Starting 'Projecting level 2 / 6 ' P000 yt : [INFO ] 2012-02-08 11:39:54,130 Finishing 'Projecting level 2 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:54,130 End of projecting level level 2, memory usage 4.482e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:54,130 Preloading ['density'] from 1 grids P000 yt : [INFO ] 2012-02-08 11:39:54,131 Starting 'Projecting level 3 / 6 ' P000 yt : [INFO ] 2012-02-08 11:39:54,783 Finishing 'Projecting level 3 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:54,784 End of projecting level level 3, memory usage 4.482e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:54,784 Preloading ['density'] from 1 grids P000 yt : [INFO ] 2012-02-08 11:39:54,784 Starting 'Projecting level 4 / 6 ' P000 yt : [INFO ] 2012-02-08 11:39:59,389 Finishing 'Projecting level 4 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:59,389 End of projecting level level 4, memory usage 5.918e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:59,389 Preloading ['density'] from 1 grids P000 yt : [INFO ] 2012-02-08 11:39:59,389 Starting 'Projecting level 5 / 6 ' P000 yt : [INFO ] 2012-02-08 11:40:17,735 Finishing 'Projecting level 5 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:40:17,735 End of projecting level level 5, memory usage 1.569e+00 P000 yt : [DEBUG ] 2012-02-08 11:40:17,735 Preloading ['density'] from 0 grids P000 yt : [DEBUG ] 2012-02-08 11:40:17,736 End of projecting level level 6, memory usage 1.569e+00 P001 yt : [INFO ] 2012-02-08 11:41:31,681 Finishing 'Projecting level 6 / 6 ' P001 yt : [DEBUG ] 2012-02-08 11:41:31,681 End of projecting level level 6, memory usage 2.113e+00 P000 yt : [DEBUG ] 2012-02-08 11:41:33,807 Opening MPI Barrier on 0 P000 yt : [INFO ] 2012-02-08 11:41:34,502 Projection completed P001 yt : [DEBUG ] 2012-02-08 11:41:34,502 Opening MPI Barrier on 1 P001 yt : [INFO ] 2012-02-08 11:41:34,502 Projection completed P001 yt : [DEBUG ] 2012-02-08 11:41:34,579 Opening MPI Barrier on 1

On Wed, Feb 8, 2012 at 6:46 AM, Matthew Turk wrote:

...
Hi Chris,

On Tue, Feb 7, 2012 at 9:30 PM, Christopher Moody wrote:

...
Hi guys,

I've been working hard on the ART frontend. Lately, I'm to the point where I'm playing around with more complex datasets that are taking much longer to project - so I'd really like to to start using the parallelization engines. I've tried Sam's workshop parallelization demos, and they all work. But launching with the ART frontend (http://paste.yt-project.org/show/2152/) spawns many independent processes which evidently are not actually splitting the projection job, but still taking up lots of processors.

My guess is that parallelism is not enabled for the ART frontend simply as a matter of how the IO is conducted. To make it really work in parallel, the IO needs to be split up so that when process 1 reads a given grid patch, the rest of the processors don't also need to read all the data for that grid patch.

Can you lower your loglevel (by setting loglevel = 1 in ~/.yt/config or by --config yt.loglevel=1 on the command line) and report back with what it says during a projection job there?

-MAtt

...
My mpi installation works: yt : [INFO ] 2012-02-07 18:12:28,207 Global parallel computation enabled: 0 / 8yt : [INFO ] 2012-02-07 18:12:28,207 Global parallel computation enabled: 2 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 1 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 6 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 3 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 4 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 5 / 8yt : [INFO ] 2012-02-07 18:12:28,209 Global parallel computation enabled: 7 / 8

But the script is just run 8 times, not any faster.

What am I missing here?

Many thanks! chris

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

Christopher Moody

10:20 p.m.

Hi Matt, Sounds good! See ya at 1pm EST. I've played around with lots of gridding mechanism now, with lots of fiddling of parameters. I now don't think growing octs along a sparse Hilbert curve (what I've been playing around with) is particularly more efficient than splitting clumps on a coarse curve (default). It's also tough to diagnose how to speed stuff up; an efficient hierarchy with many small grids (+800 grids on a level) is easy on memory but takes x100 longer to project. It's hard to guess how stuff scales with ncells, # of grids on a level, and memory efficiency all being independent variables. In the end, all I've done is made a few very, very, small changes to some of the grid patch recursive splitting code, which has had a dramatic effect (~5x speedup) for my simulations, but I don't think will be too helpful outside of that. chris On Wed, Feb 8, 2012 at 1:47 PM, Matthew Turk wrote:

...

Hi Chris,

This does help. My suspicion is that the load balancing is giving all the lower-level grids to one processor, and all the upper level grids to another. If you have time tomorrow, let's work through this together on IRC. I'm free 1-2PM EST, 3-5PM EST. I think it should be a pretty straightforward fix.

It would also be cool to see your new method of accumulating octs into patches.

-Matt

...
Hi Matt,

pf.h.proj is of type and refine_by is 2.

Does this help? I'm not sure what you mean by overlaps - doesn't the the RAMSES grid patching mechanism produce non-overlapping grids from the octs? Is quadtree proj checking for overlapping grids?

chris

On Wed, Feb 8, 2012 at 12:44 PM, Matthew Turk wrote:

...
Hi Chris,

Yeah, that's weird. My guess is that load balancing is going haywire for some reason, likely due to overlap versus quadtree proj. Can you tell me what type of object pf.h.proj is? i.e., what's the output of "print pf.h.proj"? And then, what's pf.refine_by?

-Matt

On Wed, Feb 8, 2012 at 3:00 PM, Christopher Moody wrote:

...
Hi Matt,

I've got the log output here:http://paste.yt-project.org/show/2153/with the serial version here http://paste.yt-project.org/show/2154/ .

The most interesting tidbit is below, where it looks like core 0 projects Levels 0-5 and core 1 projects Level 6 (which takes up like 99% of the projection time.)

chris

P001 yt : [DEBUG ] 2012-02-08 11:39:53,403 Going to obtain [] P001 yt : [DEBUG ] 2012-02-08 11:39:53,406 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,406 End of projecting level level 0, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,406 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,406 End of projecting level level 1, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,407 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,407 End of projecting level level 2, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,408 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,408 End of projecting level level 3, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,408 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,408 End of projecting level level 4, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,409 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,409 End of projecting level level 5, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,409 Preloading ['density'] from 6 grids P001 yt : [INFO ] 2012-02-08 11:39:53,410 Starting 'Projecting level 6 / 6 ' P000 yt : [INFO ] 2012-02-08 11:39:54,057 Finishing 'Projecting level 0 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:54,057 End of projecting level level 0, memory usage 4.482e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:54,057 Preloading ['density'] from 1 grids P000 yt : [INFO ] 2012-02-08 11:39:54,058 Starting 'Projecting level 1 / 6 'P000 yt : [INFO ] 2012-02-08 11:39:54,070 Finishing 'Projecting level 1 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:54,070 End of projecting level level 1, memory usage 4.482e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:54,070 Preloading ['density'] from 1 gridsP000 yt : [INFO ] 2012-02-08 11:39:54,071 Starting

'Projecting

...
...
level 2 / 6 ' P000 yt : [INFO ] 2012-02-08 11:39:54,130 Finishing 'Projecting level 2 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:54,130 End of projecting level level 2, memory usage 4.482e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:54,130 Preloading ['density'] from 1 grids P000 yt : [INFO ] 2012-02-08 11:39:54,131 Starting 'Projecting level 3 / 6 ' P000 yt : [INFO ] 2012-02-08 11:39:54,783 Finishing 'Projecting level 3 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:54,784 End of projecting level level 3, memory usage 4.482e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:54,784 Preloading ['density'] from 1 grids P000 yt : [INFO ] 2012-02-08 11:39:54,784 Starting 'Projecting level 4 / 6 ' P000 yt : [INFO ] 2012-02-08 11:39:59,389 Finishing 'Projecting level 4 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:59,389 End of projecting level level 4, memory usage 5.918e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:59,389 Preloading ['density'] from 1 grids P000 yt : [INFO ] 2012-02-08 11:39:59,389 Starting 'Projecting level 5 / 6 ' P000 yt : [INFO ] 2012-02-08 11:40:17,735 Finishing 'Projecting level 5 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:40:17,735 End of projecting level level 5, memory usage 1.569e+00 P000 yt : [DEBUG ] 2012-02-08 11:40:17,735 Preloading ['density'] from 0 grids P000 yt : [DEBUG ] 2012-02-08 11:40:17,736 End of projecting level level 6, memory usage 1.569e+00 P001 yt : [INFO ] 2012-02-08 11:41:31,681 Finishing 'Projecting level 6 / 6 ' P001 yt : [DEBUG ] 2012-02-08 11:41:31,681 End of projecting level level 6, memory usage 2.113e+00 P000 yt : [DEBUG ] 2012-02-08 11:41:33,807 Opening MPI Barrier on 0 P000 yt : [INFO ] 2012-02-08 11:41:34,502 Projection completed P001 yt : [DEBUG ] 2012-02-08 11:41:34,502 Opening MPI Barrier on 1 P001 yt : [INFO ] 2012-02-08 11:41:34,502 Projection completed P001 yt : [DEBUG ] 2012-02-08 11:41:34,579 Opening MPI Barrier on 1

On Wed, Feb 8, 2012 at 6:46 AM, Matthew Turk wrote:

...
Hi Chris,

On Tue, Feb 7, 2012 at 9:30 PM, Christopher Moody wrote:

...
Hi guys,

I've been working hard on the ART frontend. Lately, I'm to the

On Wed, Feb 8, 2012 at 4:09 PM, Christopher Moody wrote: point

...
...
...
...
...
where I'm playing around with more complex datasets that are taking much longer to project - so I'd really like to to start using the parallelization engines. I've tried Sam's workshop parallelization demos, and they all work. But launching with the ART frontend (http://paste.yt-project.org/show/2152/) spawns many independent processes which evidently are not actually splitting the projection job, but still taking up lots of processors.

My guess is that parallelism is not enabled for the ART frontend simply as a matter of how the IO is conducted. To make it really work in parallel, the IO needs to be split up so that when process 1 reads a given grid patch, the rest of the processors don't also need to read all the data for that grid patch.

Can you lower your loglevel (by setting loglevel = 1 in ~/.yt/config or by --config yt.loglevel=1 on the command line) and report back with what it says during a projection job there?

-MAtt

...
My mpi installation works: yt : [INFO ] 2012-02-07 18:12:28,207 Global parallel

computation

...
enabled: 0 / 8yt : [INFO ] 2012-02-07 18:12:28,207 Global parallel computation enabled: 2 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 1 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 6 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 3 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 4 / 8 yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel computation enabled: 5 / 8yt : [INFO ] 2012-02-07 18:12:28,209 Global parallel computation enabled: 7 / 8

But the script is just run 8 times, not any faster.

What am I missing here?

Many thanks! chris

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

Matthew Turk

9 Feb 9 Feb

3:50 p.m.

Hi Chris, On Wed, Feb 8, 2012 at 5:20 PM, Christopher Moody wrote:

...

Hi Matt, Sounds good! See ya at 1pm EST.

I've played around with lots of gridding mechanism now, with lots of fiddling of parameters. I now don't think growing octs along a sparse Hilbert curve (what I've been playing around with) is particularly more efficient than splitting clumps on a coarse curve (default). It's also tough to diagnose how to speed stuff up; an efficient hierarchy with many small grids (+800 grids on a level) is easy on memory but takes x100 longer to project. It's hard to guess how stuff scales with ncells, # of grids on a level, and memory efficiency all being independent variables. In the end, all I've done is made a few very, very, small changes to some of the grid patch recursive splitting code, which has had a dramatic effect (~5x speedup) for my simulations, but I don't think will be too helpful outside of that.

The 100x performance hit you're seeing is, I think, 100% dependent on the IO. I bet if you ran with cProfile you'd see almost all of the time is spent in reading data from disk, which is largely redundant in the ART frontend if I remember correctly, and then throwing that away. There are several steps to improve this. The most long-reaching is to rethink the entire way we do geometric selection. We could then construct extremely coarse bounding boxes and then do a progressive step over data to identify which is available where. This is something I have been working on recently; it's quite a ways from usability (few months), but preliminary tests even with patch-based AMR suggest a speedup in many geometry-selection routines of between 2-5x. I suspect this number will come down as it becomes more fully-featured, but it should be a gigantic improvement for oct-based codes. The trouble with those codes, like RAMSES and ART, is keeping a minimum set of information in memory, which will allow data loading from disk. This is why in the past we've used patches, but with the new geometry stuff we should be able to handle the full set of octs (until data gets extremely large, but this should hold us off until we can write a distributed geometry.) The closer step is simply to divvy up the IO in advanced by assigning some kind of loadbalancing ID to subsets of the hierarchy of octs/patches. This is what we do with RAMSES, although I should note that RAMSES (unlike the cevART data format, IIRC) is actually split up into multiple files. This splitting is used to partition regridded octs. If we do this, then we can, for instance, load balance level-by-level rather than grid-by-grid, which will help with the speed problems you're seeing. Anyway, see you at 1. -Matt

...

chris

On Wed, Feb 8, 2012 at 1:47 PM, Matthew Turk wrote:

...
Hi Chris,

This does help. My suspicion is that the load balancing is giving all the lower-level grids to one processor, and all the upper level grids to another. If you have time tomorrow, let's work through this together on IRC. I'm free 1-2PM EST, 3-5PM EST. I think it should be a pretty straightforward fix.

It would also be cool to see your new method of accumulating octs into patches.

-Matt

On Wed, Feb 8, 2012 at 4:09 PM, Christopher Moody wrote:

...
Hi Matt,

pf.h.proj is of type and refine_by is 2.

Does this help? I'm not sure what you mean by overlaps - doesn't the the RAMSES grid patching mechanism produce non-overlapping grids from the octs? Is quadtree proj checking for overlapping grids?

chris

On Wed, Feb 8, 2012 at 12:44 PM, Matthew Turk wrote:

...
Hi Chris,

Yeah, that's weird. My guess is that load balancing is going haywire for some reason, likely due to overlap versus quadtree proj. Can you tell me what type of object pf.h.proj is? i.e., what's the output of "print pf.h.proj"? And then, what's pf.refine_by?

-Matt

On Wed, Feb 8, 2012 at 3:00 PM, Christopher Moody wrote:

...
Hi Matt,

I've got the log output here:http://paste.yt-project.org/show/2153/ with the serial version here http://paste.yt-project.org/show/2154/ .

The most interesting tidbit is below, where it looks like core 0 projects Levels 0-5 and core 1 projects Level 6 (which takes up like 99% of the projection time.)

chris

P001 yt : [DEBUG ] 2012-02-08 11:39:53,403 Going to obtain [] P001 yt : [DEBUG ] 2012-02-08 11:39:53,406 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,406 End of projecting level level 0, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,406 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,406 End of projecting level level 1, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,407 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,407 End of projecting level level 2, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,408 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,408 End of projecting level level 3, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,408 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,408 End of projecting level level 4, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,409 Preloading ['density'] from 0 grids P001 yt : [DEBUG ] 2012-02-08 11:39:53,409 End of projecting level level 5, memory usage 3.545e-01 P001 yt : [DEBUG ] 2012-02-08 11:39:53,409 Preloading ['density'] from 6 grids P001 yt : [INFO ] 2012-02-08 11:39:53,410 Starting 'Projecting level 6 / 6 ' P000 yt : [INFO ] 2012-02-08 11:39:54,057 Finishing 'Projecting level 0 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:54,057 End of projecting level level 0, memory usage 4.482e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:54,057 Preloading ['density'] from 1 grids P000 yt : [INFO ] 2012-02-08 11:39:54,058 Starting 'Projecting level 1 / 6 'P000 yt : [INFO ] 2012-02-08 11:39:54,070 Finishing 'Projecting level 1 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:54,070 End of projecting level level 1, memory usage 4.482e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:54,070 Preloading ['density'] from 1 gridsP000 yt : [INFO ] 2012-02-08 11:39:54,071 Starting 'Projecting level 2 / 6 ' P000 yt : [INFO ] 2012-02-08 11:39:54,130 Finishing 'Projecting level 2 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:54,130 End of projecting level level 2, memory usage 4.482e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:54,130 Preloading ['density'] from 1 grids P000 yt : [INFO ] 2012-02-08 11:39:54,131 Starting 'Projecting level 3 / 6 ' P000 yt : [INFO ] 2012-02-08 11:39:54,783 Finishing 'Projecting level 3 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:54,784 End of projecting level level 3, memory usage 4.482e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:54,784 Preloading ['density'] from 1 grids P000 yt : [INFO ] 2012-02-08 11:39:54,784 Starting 'Projecting level 4 / 6 ' P000 yt : [INFO ] 2012-02-08 11:39:59,389 Finishing 'Projecting level 4 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:39:59,389 End of projecting level level 4, memory usage 5.918e-01 P000 yt : [DEBUG ] 2012-02-08 11:39:59,389 Preloading ['density'] from 1 grids P000 yt : [INFO ] 2012-02-08 11:39:59,389 Starting 'Projecting level 5 / 6 ' P000 yt : [INFO ] 2012-02-08 11:40:17,735 Finishing 'Projecting level 5 / 6 ' P000 yt : [DEBUG ] 2012-02-08 11:40:17,735 End of projecting level level 5, memory usage 1.569e+00 P000 yt : [DEBUG ] 2012-02-08 11:40:17,735 Preloading ['density'] from 0 grids P000 yt : [DEBUG ] 2012-02-08 11:40:17,736 End of projecting level level 6, memory usage 1.569e+00 P001 yt : [INFO ] 2012-02-08 11:41:31,681 Finishing 'Projecting level 6 / 6 ' P001 yt : [DEBUG ] 2012-02-08 11:41:31,681 End of projecting level level 6, memory usage 2.113e+00 P000 yt : [DEBUG ] 2012-02-08 11:41:33,807 Opening MPI Barrier on 0 P000 yt : [INFO ] 2012-02-08 11:41:34,502 Projection completed P001 yt : [DEBUG ] 2012-02-08 11:41:34,502 Opening MPI Barrier on 1 P001 yt : [INFO ] 2012-02-08 11:41:34,502 Projection completed P001 yt : [DEBUG ] 2012-02-08 11:41:34,579 Opening MPI Barrier on 1

On Wed, Feb 8, 2012 at 6:46 AM, Matthew Turk wrote:

...
Hi Chris,

On Tue, Feb 7, 2012 at 9:30 PM, Christopher Moody wrote: > Hi guys, > > I've been working hard on the ART frontend. Lately, I'm to the > point > where > I'm playing around with more complex datasets that are taking much > longer to > project - so I'd really like to to start using the parallelization > engines. > I've tried Sam's workshop parallelization demos, and they all > work. > But > launching with the ART frontend > (http://paste.yt-project.org/show/2152/) > spawns many independent processes which evidently are not actually > splitting > the projection job, but still taking up lots of processors.

My guess is that parallelism is not enabled for the ART frontend simply as a matter of how the IO is conducted. To make it really work in parallel, the IO needs to be split up so that when process 1 reads a given grid patch, the rest of the processors don't also need to read all the data for that grid patch.

Can you lower your loglevel (by setting loglevel = 1 in ~/.yt/config or by --config yt.loglevel=1 on the command line) and report back with what it says during a projection job there?

-MAtt

> > My mpi installation works: > yt : [INFO ] 2012-02-07 18:12:28,207 Global parallel > computation > enabled: 0 / 8yt : [INFO ] 2012-02-07 18:12:28,207 Global > parallel > computation enabled: 2 / 8 > yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel > computation > enabled: 1 / 8 > yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel > computation > enabled: 6 / 8 > yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel > computation > enabled: 3 / 8 > yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel > computation > enabled: 4 / 8 > yt : [INFO ] 2012-02-07 18:12:28,208 Global parallel > computation > enabled: 5 / 8yt : [INFO ] 2012-02-07 18:12:28,209 Global > parallel > computation enabled: 7 / 8 > > But the script is just run 8 times, not any faster. > > What am I missing here? > > Many thanks! > chris > > _______________________________________________ > yt-dev mailing list > yt-dev@lists.spacepope.org > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org > _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

4459

Age (days ago)

4460

Last active (days ago)

List overview

Download

7 comments

2 participants

participants (2)

Christopher Moody
Matthew Turk

Parallelization & the ART frontend

Christopher Moody

Matthew Turk

Christopher Moody

Matthew Turk

Christopher Moody

Matthew Turk

Christopher Moody

Matthew Turk

tags

participants (2)