Hi Matt,
I think the oct size reduction is great. This means a lot to NMSU-ART where carrying around the oct_tree is 900MB, which will fall to ~200MB afterwards. And getting rid of the mask will mean rewriting some of the ART frontend code, but it doesn't sound like that should be much of a chore for me to do. And of course, the mask was memory-intensive anyway, so replacing this with a more dynamic selection is better for the long term.

NMSU-ART isn't a strict octree, but instead starts with a root grid of octs. I have some cython code that explicitly uses the code index to local oct index matching -- it's not clear to me whether this will need updating with this. Regardless, we're still early in yt-3.0 and I don't mind breaking so many things right now. 



On Mon, May 6, 2013 at 11:00 AM, Matthew Turk <matthewturk@gmail.com> wrote:
Hi all,

I've shared some of this with ChrisM and SamL, but I wanted to bring
it up here.  This largely pertains to the codes RAMSES and NMSU-ART
that are currently supported, as well as the support for particle
Octrees.  ARTIO is somewhat orthogonal at this time, as it doesn't
store the octree in memory.

Currently, Octs have:

int (64-bits)
local_ind (64-bits)
domain (64-bits)
pos[3] (64-bits each)
level (64-bits)
pointer-to-sd (64-bits)
children, 8 pointers (64-bits each)
parent (64-bits)

This comes out to 136 bytes.  Much of this is because we have two
different methods of traversing the Octs, both in allocation-order and
in depth-first traversal order, and also because when I wrote this
(seems like yesterday) I was trying to get something done without
worrying too much about memory optimization.  However, both of these
things are becoming much more important; firstly, we need to ensure
we're thinking about the memory implications of a large number of leaf
nodes, and we need to verify that order of traversal is not important
-- or at least, is guaranteed identical.

Last week I spent some time rewriting the traversal code to attempt to
get rid of much of the self-awareness of the octs, such that *outside*
of a traversal, they are somewhat meaningless.  This lets me cut it
down to:

domain index (which we may be able to eliminate in the future)
code-specific index
allocateable pointer to the 8 children (linearly)
pointer-to-sd object

This should be 32 bytes for leaf nodes, which expands for non-leaf
nodes, although we may be able to cut that down as well (at the
expense of slight over-allocation for instances when not all cells
must be simultaneously refined.)  I've implemented this with a
traversal object, such that it will track the level state, the oct
state, and so on and so forth.  This works for all of the tests I have
been able to create for it, but it results in a slight performance
decrease.  (I believe in the future we can optimize this.)

Anyway, here's where I'm going with all of this.  I'd like to extend
the Octree base class, but in a way that leaves several items up to
individual implementations.  Furthermore, I'd like to start a brief
discussion of how we might plan for this becoming distributed memory,
which was impossible before but which I think will be possible by
removing state from the Octs.

= New Octree Base Class Methods =

I'd like to propose these methods become part of the base octree class:

 * next_root(int domain_id, int ind[3])
 * next_child(int domain_id)

RAMSESOctreeContainer implements the first of these, and a version of
the second of these items.  (Which contains routines that are no
longer necessary.)  The distinction here is that the first of these
routines will be able -- in the future, with no changes to API -- to
add the root level oct to a sparse list of octs that belong to a given
domain; this will, I hope, enable.  Currently it will add them to a
root_mesh attribute that lives on all the octrees.

By using this, we'll be able to move more of the addition of new octs
into the base Octree class, which should hide all of the logic for
adding new octs from the individual implementations.

[+-][01] on moving these explicitly to the base class and mandating
their presence?

Because we're thus going to keep a single method of traversing the
octree, this will also mean that icoords, fcoords and ires can all be
moved to the base class as well, which is the ultimate reason for the
changes to the Octree anyway.

= Sparse Octrees =

it is my hope that we can stop allocating the "mask" variable that
gets passed around and is (noct,8) in shape, and instead (if a mask is
needed at all) only look at domain-by-domain information.  For RAMSES
and Gadget this will be a big improvement, but much less so for
NMSU-ART and Tipsy, which are currently not set up for efficient
multi-domain iteration.  This will also mean that spatial chunking
will be more efficient, and I think we'll be able to push mask
selection deeper into the octree object itself.

However, before we can start looking at distributed octrees with true
sparse oct filling across nodes, we'll need to "pin" the parallelism.
This would mean disabling the ability to add on new communicators, or
possibly splitting workgroups up with an IO handler that communicates
across MPI to the nodes that contain the sparse octree -- if we
instantiate a sparse octree on N processors, we can't then move to
subgroups for instance.  So we'll need a way to pin it.  But, I think
with the changes here to removing global state and using a consistent
mechanism for traversing the octree, we'll be able to do so eventually
in the future.

yt-dev mailing list