Hi Matt,

I'm for exploring this. I'd be happy to see the reduction in complexity in the i/o and frontends alone, but the performance benefits are definitely needed. I'm happy to volunteer largish particle datasets to this effort.

Britton

On Wed, Jan 8, 2020 at 8:31 PM Matthew Turk <matthewturk@gmail.com> wrote:
Hi folks,

Over the last little while, I've been looking at this issue and having
some thoughts:

https://github.com/yt-project/yt/issues/2383

Back when yt-3.0 was first set up, it was built-in that the arrays
were all pre-allocated before being read.  (An important note about
this is that this *pre-dates* the demeshening.)  What this means is
that typically the process of reading data from any type of frontend
goes something like this:

 * _identify_base_chunk -> figure out how big the whole thing is
 * either read, or subdivide into chunks

When a chunk is read -- which includes the chunk style "all" that
reads all the data in a given data object in a single go -- the
destination buffer is preallocated.  For grid objects, this can be
done without reading any data off of disk.  The process is still
expensive, but we cache the most-recently-used grid mask every time we
count a given grid.

For particles, however, the case is different.  Determining selections
of particles requires IO (our bitmap indices can only provide an upper
bound) and so calling identify_base_chunk, if it needs to size the
thing, will read data from disk.  But, by the time we had implemented
a nice indexing scheme for particles, we had wedded ourselves to this,
and so it got implemented this way.  It was one unrelated design
decision that was applied out of context.

Now, you might ask yourself, why do we do that anyway?  Why know how
big something is?  Well, it's because our old method (pre-3.0) was
along these lines:

 * read each grid, one at a time (remember we only had grid support)
and then mask them
 * at the end, concatenate the arrays all together

For reasonably sized data, this isn't so bad.  The biggest problem is
that of fragmentation and copying -- we're making lots of little-ish
arrays in step 1, and in step 2, we make a single big array and copy
each one in, then de-allocate.  This was the most painful when we read
a gigantic data object in all at once.  We want to avoid the issue of
reading some 1024^3 dataset and then having a moment when the memory
doubles.

But, in contrast to before, now almost all of our operations are
chunked (what we used to call "lazy") and so don't need to be able to
read a gigantic array for most things.  Mind you, if you do
ds.r[:]["something"] it'll still read it in, but that is much, much
less common than it was before.

So what I have explored a bit is what happens if, for particle
datasets, we get rid of this notion that we need to know how big
something is (or rather, *exactly* how big) before we do any IO?  And,
it turns out, it does make a difference -- a non-negligible one, in
fact, for times when the cost of reallocating and copying is not that
big.

(There are some numbers on the issue I linked above.)

So that's a pretty long, rambly dive into things, but here's the thing
I wanted to bring up: I'd like to explore not pre-allocating memory or
pre-counting items in the chunking and IO systems.  But it's
exploratory, and I don't know if it will pan out.  So, I'd like to
invite if anyone is interested, to try it out with me, and if you have
objections, now would be a perfect time to raise them.

I'll post this email here as well:
https://github.com/yt-project/yt/issues/2412 so if you're interested,
subscribe to that issue.

If this turns out to be a worthwhile change, I will propose these
steps, all of which would be in the yt-4.0 branch:

 1) Turn it off one frontend at a time and examine performance and
memory use in a few different cases
 2) Once all frontends have been disabled, disable support for it in
the base chunking system itself, or at least make it optional and
consolidate references to it in the codebase
 3) Update YTEP-0001 to reflect this new state of affairs

yt's performance needs to be improved, and this could be a good first
step at finding ways to do so that doesn't require a ton of surgery
and would overall *reduce* the complexity of yt.

-Matt
_______________________________________________
yt-dev mailing list -- yt-dev@python.org
To unsubscribe send an email to yt-dev-leave@python.org