Hi Matt,
I'm for exploring this. I'd be happy to see the reduction in complexity in
the i/o and frontends alone, but the performance benefits are definitely
needed. I'm happy to volunteer largish particle datasets to this effort.
Britton
On Wed, Jan 8, 2020 at 8:31 PM Matthew Turk
Hi folks,
Over the last little while, I've been looking at this issue and having some thoughts:
https://github.com/yt-project/yt/issues/2383
Back when yt-3.0 was first set up, it was built-in that the arrays were all pre-allocated before being read. (An important note about this is that this *pre-dates* the demeshening.) What this means is that typically the process of reading data from any type of frontend goes something like this:
* _identify_base_chunk -> figure out how big the whole thing is * either read, or subdivide into chunks
When a chunk is read -- which includes the chunk style "all" that reads all the data in a given data object in a single go -- the destination buffer is preallocated. For grid objects, this can be done without reading any data off of disk. The process is still expensive, but we cache the most-recently-used grid mask every time we count a given grid.
For particles, however, the case is different. Determining selections of particles requires IO (our bitmap indices can only provide an upper bound) and so calling identify_base_chunk, if it needs to size the thing, will read data from disk. But, by the time we had implemented a nice indexing scheme for particles, we had wedded ourselves to this, and so it got implemented this way. It was one unrelated design decision that was applied out of context.
Now, you might ask yourself, why do we do that anyway? Why know how big something is? Well, it's because our old method (pre-3.0) was along these lines:
* read each grid, one at a time (remember we only had grid support) and then mask them * at the end, concatenate the arrays all together
For reasonably sized data, this isn't so bad. The biggest problem is that of fragmentation and copying -- we're making lots of little-ish arrays in step 1, and in step 2, we make a single big array and copy each one in, then de-allocate. This was the most painful when we read a gigantic data object in all at once. We want to avoid the issue of reading some 1024^3 dataset and then having a moment when the memory doubles.
But, in contrast to before, now almost all of our operations are chunked (what we used to call "lazy") and so don't need to be able to read a gigantic array for most things. Mind you, if you do ds.r[:]["something"] it'll still read it in, but that is much, much less common than it was before.
So what I have explored a bit is what happens if, for particle datasets, we get rid of this notion that we need to know how big something is (or rather, *exactly* how big) before we do any IO? And, it turns out, it does make a difference -- a non-negligible one, in fact, for times when the cost of reallocating and copying is not that big.
(There are some numbers on the issue I linked above.)
So that's a pretty long, rambly dive into things, but here's the thing I wanted to bring up: I'd like to explore not pre-allocating memory or pre-counting items in the chunking and IO systems. But it's exploratory, and I don't know if it will pan out. So, I'd like to invite if anyone is interested, to try it out with me, and if you have objections, now would be a perfect time to raise them.
I'll post this email here as well: https://github.com/yt-project/yt/issues/2412 so if you're interested, subscribe to that issue.
If this turns out to be a worthwhile change, I will propose these steps, all of which would be in the yt-4.0 branch:
1) Turn it off one frontend at a time and examine performance and memory use in a few different cases 2) Once all frontends have been disabled, disable support for it in the base chunking system itself, or at least make it optional and consolidate references to it in the codebase 3) Update YTEP-0001 to reflect this new state of affairs
yt's performance needs to be improved, and this could be a good first step at finding ways to do so that doesn't require a ton of surgery and would overall *reduce* the complexity of yt.
-Matt _______________________________________________ yt-dev mailing list -- yt-dev@python.org To unsubscribe send an email to yt-dev-leave@python.org