Hi Chris and yt-dev, Thank you for writing all of this up! I think the dask-yt work is really exciting and it's awesome you've thought so much about it. I love the thedaskening branch name, though I'm clearly not a neutral observer in that. :-) You're asking a lot of good questions here, and I think it might be worth scheduling a dask *specific* yt team meeting to plan things out before you do the YTEP. I think this will help make sure the YTEP has less iteration because we'll get a big community jump on planning the work, and I think it will help all of us plan this work intentionally as a community (and make sure that you're not developing this alone). What do you all think? Madicken On Sat, Feb 27, 2021 at 7:38 AM Chris Havlin <chris.havlin@gmail.com> wrote:
Hello!
I'm writing to follow up on the discussion of a Dask feature branch at last week's development meeting. I wanted to summarize the plan to move forward with a Dask feature branch, add some additional notes and make sure everyone has a chance for initial feedback before we move forward with creating a feature branch.
Generally, the plan is to: 1. create a feature branch named `dask` 2a. submit dask-specific PRs to `dask` feature branch 2b. submit general PRs that come up in dask development to `main` 3. weekly (at a minimum, more frequent cadence is welcome) merges of `main` into `dask` branch
The use of a feature branch will allow PR review throughout development, avoiding a massive review when finally merging the `dask` branch back into `main`. It also allows multiple developers to work on Dask development more easily.
Some extra notes and clarification:
Dask-specific vs general PRs: Some changes to add dask support may be better as PRs to `main`. These changes should be non-breaking and not rely on Dask itself. For example, PR 2934 (https://github.com/yt-project/yt/pull/2934 ) added pickle support to some selection objects to help with Dask development but it has general applicability so was submitted to `main`. If it's not clear whether a PR is general enough, it should be submitted to the `dask` branch and reviewers can suggest re-targeting to `main` if general enough.
Feature branch name: I'm using `dask` everywhere here… but it could be more exciting. `thedaskening` perhaps (credit to Madicken for this name!)? Feedback on feature branch name is encouraged :)
Dependencies: Current development makes Dask a hard dependency, but a full Dask install is not required. At present the minimal install requires the `array`, `distributed` and `delayed` dependency sets. e.g.,
python -m pip install "dask[array,delayed,distributed]"
This is not far from a full Dask install, so it may be simpler to just require `dask[complete]`. The complete install adds the dask dataframe, dask bag and dask diagnostic features. The extra dependencies that these subsets include are pandas, ffspec, toolz and bokeh (see https://github.com/dask/dask/blob/master/setup.py). Of those, perhaps bokeh is extraneous enough that we should only require the minimal install (bokeh is only used for Dask's interactive browser-dashboard for monitoring Client/cluster activity).
YTEP: planning to start drafting a YTEP after a feature branch is started and some development has proceeded and we have some broader input.
Short Term Development Targets: A couple of short term work directions I've been planning include: 1. a daskified particle reader, currently in my fork of yt here: https://github.com/chrishavlin/yt/tree/dask_init_particle 2. Daskification of derived quantities and "simpler" chunked operations
Please reply with any comments you may have! I'm excited about getting feedback and moving this work forward!
Cheers,
Chris _______________________________________________ yt-dev mailing list -- yt-dev@python.org To unsubscribe send an email to yt-dev-leave@python.org https://mail.python.org/mailman3/lists/yt-dev.python.org/ Member address: madicken.munk@gmail.com