Hello!
I'm writing to follow up on the discussion of a Dask feature branch at last week's development meeting. I wanted to summarize the plan to move forward with a Dask feature branch, add some additional notes and make sure everyone has a chance for initial feedback before we move forward with creating a feature branch.
Generally, the plan is to:
1. create a feature branch named `dask`
2a. submit dask-specific PRs to `dask` feature branch
2b. submit general PRs that come up in dask development to `main`
3. weekly (at a minimum, more frequent cadence is welcome) merges of `main` into `dask` branch
The use of a feature branch will allow PR review throughout development, avoiding a massive review when finally merging the `dask` branch back into `main`. It also allows multiple developers to work on Dask development more easily.
Some extra notes and clarification:
Dask-specific vs general PRs:
Some changes to add dask support may be better as PRs to `main`. These changes should be non-breaking and not rely on Dask itself. For example, PR 2934 (https://github.com/yt-project/yt/pull/2934 ) added pickle support to some selection objects to help with Dask development but it has general applicability so was submitted to `main`. If it's not clear whether a PR is general enough, it should be submitted to the `dask` branch and reviewers can suggest re-targeting to `main` if general enough.
Feature branch name:
I'm using `dask` everywhere here… but it could be more exciting. `thedaskening` perhaps (credit to Madicken for this name!)? Feedback on feature branch name is encouraged :)
Dependencies:
Current development makes Dask a hard dependency, but a full Dask install is not required. At present the minimal install requires the `array`, `distributed` and `delayed` dependency sets. e.g.,
python -m pip install "dask[array,delayed,distributed]"
This is not far from a full Dask install, so it may be simpler to just require `dask[complete]`. The complete install adds the dask dataframe, dask bag and dask diagnostic features. The extra dependencies that these subsets include are pandas, ffspec, toolz and bokeh (see https://github.com/dask/dask/blob/master/setup.py). Of those, perhaps bokeh is extraneous enough that we should only require the minimal install (bokeh is only used for Dask's interactive browser-dashboard for monitoring Client/cluster activity).
YTEP:
planning to start drafting a YTEP after a feature branch is started and some development has proceeded and we have some broader input.
Short Term Development Targets:
A couple of short term work directions I've been planning include:
1. a daskified particle reader, currently in my fork of yt here: https://github.com/chrishavlin/yt/tree/dask_init_particle
2. Daskification of derived quantities and "simpler" chunked operations
Please reply with any comments you may have! I'm excited about getting feedback and moving this work forward!
Cheers,
Chris
_______________________________________________
yt-dev mailing list -- yt-dev@python.org
To unsubscribe send an email to yt-dev-leave@python.org
https://mail.python.org/mailman3/lists/yt-dev.python.org/
Member address: madicken.munk@gmail.com