Hello! I'm writing to follow up on the discussion of a Dask feature branch at last week's development meeting. I wanted to summarize the plan to move forward with a Dask feature branch, add some additional notes and make sure everyone has a chance for initial feedback before we move forward with creating a feature branch. Generally, the plan is to: 1. create a feature branch named `dask` 2a. submit dask-specific PRs to `dask` feature branch 2b. submit general PRs that come up in dask development to `main` 3. weekly (at a minimum, more frequent cadence is welcome) merges of `main` into `dask` branch The use of a feature branch will allow PR review throughout development, avoiding a massive review when finally merging the `dask` branch back into `main`. It also allows multiple developers to work on Dask development more easily. Some extra notes and clarification: Dask-specific vs general PRs: Some changes to add dask support may be better as PRs to `main`. These changes should be non-breaking and not rely on Dask itself. For example, PR 2934 (https://github.com/yt-project/yt/pull/2934 ) added pickle support to some selection objects to help with Dask development but it has general applicability so was submitted to `main`. If it's not clear whether a PR is general enough, it should be submitted to the `dask` branch and reviewers can suggest re-targeting to `main` if general enough. Feature branch name: I'm using `dask` everywhere here… but it could be more exciting. `thedaskening` perhaps (credit to Madicken for this name!)? Feedback on feature branch name is encouraged :) Dependencies: Current development makes Dask a hard dependency, but a full Dask install is not required. At present the minimal install requires the `array`, `distributed` and `delayed` dependency sets. e.g., python -m pip install "dask[array,delayed,distributed]" This is not far from a full Dask install, so it may be simpler to just require `dask[complete]`. The complete install adds the dask dataframe, dask bag and dask diagnostic features. The extra dependencies that these subsets include are pandas, ffspec, toolz and bokeh (see https://github.com/dask/dask/blob/master/setup.py). Of those, perhaps bokeh is extraneous enough that we should only require the minimal install (bokeh is only used for Dask's interactive browser-dashboard for monitoring Client/cluster activity). YTEP: planning to start drafting a YTEP after a feature branch is started and some development has proceeded and we have some broader input. Short Term Development Targets: A couple of short term work directions I've been planning include: 1. a daskified particle reader, currently in my fork of yt here: https://github.com/chrishavlin/yt/tree/dask_init_particle 2. Daskification of derived quantities and "simpler" chunked operations Please reply with any comments you may have! I'm excited about getting feedback and moving this work forward! Cheers, Chris
Hi Chris and yt-dev, Thank you for writing all of this up! I think the dask-yt work is really exciting and it's awesome you've thought so much about it. I love the thedaskening branch name, though I'm clearly not a neutral observer in that. :-) You're asking a lot of good questions here, and I think it might be worth scheduling a dask *specific* yt team meeting to plan things out before you do the YTEP. I think this will help make sure the YTEP has less iteration because we'll get a big community jump on planning the work, and I think it will help all of us plan this work intentionally as a community (and make sure that you're not developing this alone). What do you all think? Madicken On Sat, Feb 27, 2021 at 7:38 AM Chris Havlin <chris.havlin@gmail.com> wrote:
Hello!
I'm writing to follow up on the discussion of a Dask feature branch at last week's development meeting. I wanted to summarize the plan to move forward with a Dask feature branch, add some additional notes and make sure everyone has a chance for initial feedback before we move forward with creating a feature branch.
Generally, the plan is to: 1. create a feature branch named `dask` 2a. submit dask-specific PRs to `dask` feature branch 2b. submit general PRs that come up in dask development to `main` 3. weekly (at a minimum, more frequent cadence is welcome) merges of `main` into `dask` branch
The use of a feature branch will allow PR review throughout development, avoiding a massive review when finally merging the `dask` branch back into `main`. It also allows multiple developers to work on Dask development more easily.
Some extra notes and clarification:
Dask-specific vs general PRs: Some changes to add dask support may be better as PRs to `main`. These changes should be non-breaking and not rely on Dask itself. For example, PR 2934 (https://github.com/yt-project/yt/pull/2934 ) added pickle support to some selection objects to help with Dask development but it has general applicability so was submitted to `main`. If it's not clear whether a PR is general enough, it should be submitted to the `dask` branch and reviewers can suggest re-targeting to `main` if general enough.
Feature branch name: I'm using `dask` everywhere here… but it could be more exciting. `thedaskening` perhaps (credit to Madicken for this name!)? Feedback on feature branch name is encouraged :)
Dependencies: Current development makes Dask a hard dependency, but a full Dask install is not required. At present the minimal install requires the `array`, `distributed` and `delayed` dependency sets. e.g.,
python -m pip install "dask[array,delayed,distributed]"
This is not far from a full Dask install, so it may be simpler to just require `dask[complete]`. The complete install adds the dask dataframe, dask bag and dask diagnostic features. The extra dependencies that these subsets include are pandas, ffspec, toolz and bokeh (see https://github.com/dask/dask/blob/master/setup.py). Of those, perhaps bokeh is extraneous enough that we should only require the minimal install (bokeh is only used for Dask's interactive browser-dashboard for monitoring Client/cluster activity).
YTEP: planning to start drafting a YTEP after a feature branch is started and some development has proceeded and we have some broader input.
Short Term Development Targets: A couple of short term work directions I've been planning include: 1. a daskified particle reader, currently in my fork of yt here: https://github.com/chrishavlin/yt/tree/dask_init_particle 2. Daskification of derived quantities and "simpler" chunked operations
Please reply with any comments you may have! I'm excited about getting feedback and moving this work forward!
Cheers,
Chris _______________________________________________ yt-dev mailing list -- yt-dev@python.org To unsubscribe send an email to yt-dev-leave@python.org https://mail.python.org/mailman3/lists/yt-dev.python.org/ Member address: madicken.munk@gmail.com
About the branch name, I thought of “for_the_night_is_DASK_and_full_of_YTErrors”,but I yield to Madicken’s practical creativity. In all seriousness “daskening” sounds really cool and is in line with previous large/sweeping changes that took the form of YTEPs too (demeshening, blackening), so it’s perfect IMO. I also like the idea of a targeted dev meeting to prep the YTEP, if you think it can help reduce the number of iterations. Thanks Chris for doing this so thoughtfully ! Cheers Clément On Mon, Mar 1, 2021 at 07:00, Madicken Munk <madicken.munk@gmail.com> wrote:
Hi Chris and yt-dev,
Thank you for writing all of this up! I think the dask-yt work is really exciting and it's awesome you've thought so much about it. I love the thedaskening branch name, though I'm clearly not a neutral observer in that. :-)
You're asking a lot of good questions here, and I think it might be worth scheduling a dask *specific* yt team meeting to plan things out before you do the YTEP. I think this will help make sure the YTEP has less iteration because we'll get a big community jump on planning the work, and I think it will help all of us plan this work intentionally as a community (and make sure that you're not developing this alone). What do you all think?
Madicken
On Sat, Feb 27, 2021 at 7:38 AM Chris Havlin <chris.havlin@gmail.com> wrote:
Hello!
I'm writing to follow up on the discussion of a Dask feature branch at last week's development meeting. I wanted to summarize the plan to move forward with a Dask feature branch, add some additional notes and make sure everyone has a chance for initial feedback before we move forward with creating a feature branch.
Generally, the plan is to: 1. create a feature branch named `dask` 2a. submit dask-specific PRs to `dask` feature branch 2b. submit general PRs that come up in dask development to `main` 3. weekly (at a minimum, more frequent cadence is welcome) merges of `main` into `dask` branch
The use of a feature branch will allow PR review throughout development, avoiding a massive review when finally merging the `dask` branch back into `main`. It also allows multiple developers to work on Dask development more easily.
Some extra notes and clarification:
Dask-specific vs general PRs: Some changes to add dask support may be better as PRs to `main`. These changes should be non-breaking and not rely on Dask itself. For example, PR 2934 (https://github.com/yt-project/yt/pull/2934 ) added pickle support to some selection objects to help with Dask development but it has general applicability so was submitted to `main`. If it's not clear whether a PR is general enough, it should be submitted to the `dask` branch and reviewers can suggest re-targeting to `main` if general enough.
Feature branch name: I'm using `dask` everywhere here… but it could be more exciting. `thedaskening` perhaps (credit to Madicken for this name!)? Feedback on feature branch name is encouraged :)
Dependencies: Current development makes Dask a hard dependency, but a full Dask install is not required. At present the minimal install requires the `array`, `distributed` and `delayed` dependency sets. e.g.,
python -m pip install "dask[array,delayed,distributed]"
This is not far from a full Dask install, so it may be simpler to just require `dask[complete]`. The complete install adds the dask dataframe, dask bag and dask diagnostic features. The extra dependencies that these subsets include are pandas, ffspec, toolz and bokeh (see https://github.com/dask/dask/blob/master/setup.py). Of those, perhaps bokeh is extraneous enough that we should only require the minimal install (bokeh is only used for Dask's interactive browser-dashboard for monitoring Client/cluster activity).
YTEP: planning to start drafting a YTEP after a feature branch is started and some development has proceeded and we have some broader input.
Short Term Development Targets: A couple of short term work directions I've been planning include: 1. a daskified particle reader, currently in my fork of yt here: https://github.com/chrishavlin/yt/tree/dask_init_particle 2. Daskification of derived quantities and "simpler" chunked operations
Please reply with any comments you may have! I'm excited about getting feedback and moving this work forward!
Cheers,
Chris _______________________________________________ yt-dev mailing list -- yt-dev@python.org To unsubscribe send an email to yt-dev-leave@python.org https://mail.python.org/mailman3/lists/yt-dev.python.org/ Member address: madicken.munk@gmail.com
Hi folks, Since this seems mostly uncontroversial, I'm going to go ahead and create a branch in the main repo for this; if we end up with objections, we'll reconsider that. :) -Matt On Tue, Mar 2, 2021 at 7:01 AM Chris Havlin <chris.havlin@gmail.com> wrote:
Yes! A dask specific yt meeting would be great!
what_is_DASKED_may_never_die,
Chris _______________________________________________ yt-dev mailing list -- yt-dev@python.org To unsubscribe send an email to yt-dev-leave@python.org https://mail.python.org/mailman3/lists/yt-dev.python.org/ Member address: matthewturk@gmail.com
participants (4)
-
Chris Havlin
-
Clément Robert
-
Madicken Munk
-
Matthew Turk