Hi Chris and yt-dev,

Thank you for writing all of this up! I think the dask-yt work is really exciting and it's awesome you've thought so much about it. I love the thedaskening branch name, though I'm clearly not a neutral observer in that. :-) 

You're asking a lot of good questions here, and I think it might be worth scheduling a dask *specific* yt team meeting to plan things out before you do the YTEP. I think this will help make sure the YTEP has less iteration because we'll get a big community jump on planning the work, and I think it will help all of us plan this work intentionally as a community (and make sure that you're not developing this alone). What do you all think? 

Madicken

On Sat, Feb 27, 2021 at 7:38 AM Chris Havlin <chris.havlin@gmail.com> wrote:
Hello!

I'm writing to follow up on the discussion of a Dask feature branch at last week's development meeting. I wanted to summarize the plan to move forward with a Dask feature branch, add some additional notes and make sure everyone has a chance for initial feedback before we move forward with creating a feature branch.

Generally, the plan is to:
1. create a feature branch named `dask`
2a. submit dask-specific PRs to `dask` feature branch
2b. submit general PRs that come up in dask development to `main`
3. weekly (at a minimum, more frequent cadence is welcome) merges of `main` into `dask` branch

The use of a feature branch will allow PR review throughout development, avoiding a massive review when finally merging the `dask` branch back into `main`. It also allows multiple developers to work on Dask development more easily.

Some extra notes and clarification:

Dask-specific vs general PRs:
Some changes to add dask support may be better as PRs to `main`. These changes should be non-breaking and not rely on Dask itself. For example, PR 2934 (https://github.com/yt-project/yt/pull/2934 ) added pickle support to some selection objects to help with Dask development but it has general applicability so was submitted to `main`. If it's not clear whether a PR is general enough, it should be submitted to the `dask` branch and reviewers can suggest re-targeting to `main` if general enough.

Feature branch name:
I'm using `dask` everywhere here… but it could be more exciting. `thedaskening` perhaps (credit to Madicken for this name!)? Feedback on feature branch name is encouraged :)

Dependencies:
Current development makes Dask a hard dependency, but a full Dask install is not required. At present the minimal install requires the `array`, `distributed` and  `delayed` dependency sets. e.g.,

python -m pip install "dask[array,delayed,distributed]"

This is not far from a full Dask install, so it may be simpler to just require `dask[complete]`. The complete install adds the dask dataframe, dask bag and dask diagnostic features. The extra dependencies that these subsets include are pandas, ffspec, toolz and bokeh (see https://github.com/dask/dask/blob/master/setup.py). Of those, perhaps bokeh is extraneous enough that we should only require the minimal install (bokeh is only used for Dask's interactive browser-dashboard for monitoring Client/cluster activity).

YTEP:
planning to start drafting a YTEP after a feature branch is started and some development has proceeded and we have some broader input.

Short Term Development Targets:
A couple of short term work directions I've been planning include:
1. a daskified particle reader, currently in my fork of yt here: https://github.com/chrishavlin/yt/tree/dask_init_particle
2. Daskification of derived quantities and "simpler" chunked operations

Please reply with any comments you may have! I'm excited about getting feedback and moving this work forward!

Cheers,

Chris
_______________________________________________
yt-dev mailing list -- yt-dev@python.org
To unsubscribe send an email to yt-dev-leave@python.org
https://mail.python.org/mailman3/lists/yt-dev.python.org/
Member address: madicken.munk@gmail.com