We're in the process of implementing PDEPs, equivalent to Python's PEPs and NumPy's NEPs, but for pandas. This should help build the roadmap, make discussions more efficient, obtain more structured feedback from the community, and add visibility to agreed future plans for pandas. The initial implementation (workflow) is a bit simpler than PEP or NEP, but we'll iterate in the future as convenient. You can see the PR for PDEP-1 with the purpose, scope and guidelines here: https://github.com/pandas-dev/pandas/pull/47444 Feedback is very welcome.
Thanks for starting this proposal, Marc! I have already been doing this in some ad-hoc way with eg the Copy/View proposal (writing an actual proposal document), so I am very much in favor of formalizing this a bit more. Personally, I would prefer that we use a more dedicated home for this instead of using the existing pandas repo (e.g. a separate repo in the pandas-dev org). The main pandas repo has nowadays such a high volume in issue and PR comments, that it becomes difficult to follow this or notice specific issues. While there are certainly ways to deal with this (e.g. consistently using a specific label and title, ensuring we always notify the mailing list as well, ...), IMO it would make it more accessible to follow and have an overview of those discussions in e.g. a separate repo. (there are examples of both in other projects, for example scikit-learn has a separate repo, while bumpy uses the main repo I think) Joris Op di 21 jun. 2022 09:46 schreef Marc Garcia <garcia.marc@gmail.com>:
We're in the process of implementing PDEPs, equivalent to Python's PEPs and NumPy's NEPs, but for pandas. This should help build the roadmap, make discussions more efficient, obtain more structured feedback from the community, and add visibility to agreed future plans for pandas.
The initial implementation (workflow) is a bit simpler than PEP or NEP, but we'll iterate in the future as convenient.
You can see the PR for PDEP-1 with the purpose, scope and guidelines here: https://github.com/pandas-dev/pandas/pull/47444
Feedback is very welcome. _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
+1 in using a separate repo (under pandas-dev) for this
On Jun 24, 2022, at 5:05 PM, Joris Van den Bossche <jorisvandenbossche@gmail.com> wrote:
Thanks for starting this proposal, Marc!
I have already been doing this in some ad-hoc way with eg the Copy/View proposal (writing an actual proposal document), so I am very much in favor of formalizing this a bit more.
Personally, I would prefer that we use a more dedicated home for this instead of using the existing pandas repo (e.g. a separate repo in the pandas-dev org). The main pandas repo has nowadays such a high volume in issue and PR comments, that it becomes difficult to follow this or notice specific issues. While there are certainly ways to deal with this (e.g. consistently using a specific label and title, ensuring we always notify the mailing list as well, ...), IMO it would make it more accessible to follow and have an overview of those discussions in e.g. a separate repo.
(there are examples of both in other projects, for example scikit-learn has a separate repo, while bumpy uses the main repo I think)
Joris
Op di 21 jun. 2022 09:46 schreef Marc Garcia <garcia.marc@gmail.com>:
We're in the process of implementing PDEPs, equivalent to Python's PEPs and NumPy's NEPs, but for pandas. This should help build the roadmap, make discussions more efficient, obtain more structured feedback from the community, and add visibility to agreed future plans for pandas.
The initial implementation (workflow) is a bit simpler than PEP or NEP, but we'll iterate in the future as convenient.
You can see the PR for PDEP-1 with the purpose, scope and guidelines here: https://github.com/pandas-dev/pandas/pull/47444
Feedback is very welcome. _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
Thanks for the feedback. I understand your point about using a different repo, but I see several advantages on the current approach, so maybe worth discussing a bit further what are the exact pain points, to see if a separate repo is really the best solution. Let me know if I miss something, but I see three different ways in which we'll be interacting with PDEPs: a) Via their rendered version. Not sure if you checked it, but the current rendered page from the PDEP PR (attached) is equivalent to the home of the scikit-learn SLEP proposals [1]. The main difference is that with the current approach we have it integrated with the website, which I personally think it's an advantage. b) Via the list of PDEP PRs to review. In this case, to see only PDEP PRs, if we use the main pandas repo, this is just a label filter [2]. To me personally quicker than having to go to another repo, but no big difference about one or the other. c) Notifications. I guess this is the main thing. I think one concern is that notifications from PDEPs get lost in the rest of the repo notifications. I assume you're using your email client filters, and if the notifications come from another repo, you can change the rules easily. I guess the solution here would be to use something like PDEP in the title and use that as a rule. Or we can try to find something more reliable, if that's the main concern. Personally, I don't see the advantages of having the proposals in a separate repo very significant. And by keeping things the way they're implemented in the PR, I do see some advantages: - No need to maintain a separate repo, CI workflow, jobs to publish the build, sphinx (or equivalent) project... Nothing too complex, by why having to implement and maintain all that if our website is already prepared to handle it. And in particular, with Sphinx is not as easy as with out website to fetch the open PRs and render them. - Integrated UX of the PDEPs into our website. I think this gives it more visibility, and a better using experience than having to jump from one website to another. - One of my concerns is that being in a separate repo we forget about them. We're used to check PRs in the pandas repo, and we'll keep coming back to PRs about PDEPs until they're merged if they are in the main repo, but feels like being in a separate repo is easier to forget them when there is no recent activity and notifications. It would be good to know if I miss any of your concerns. If I didn't, I'd say we can start with what's already implemented, which is almost ready to get merged, and if in the future you still think we can do better by using a separate repo, you can implement it, we have a discussion about it, and we move PDEPs to a separate repo if that makes sense. What do you think? Cheers, Marc 1. https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/ 2. https://github.com/pandas-dev/pandas/pulls?q=is%3Aopen+is%3Apr+label%3APDEP On Sat, Jun 25, 2022 at 7:05 AM Jeff Reback <jeffreback@gmail.com> wrote:
+1 in using a separate repo (under pandas-dev) for this
On Jun 24, 2022, at 5:05 PM, Joris Van den Bossche < jorisvandenbossche@gmail.com> wrote:
Thanks for starting this proposal, Marc!
I have already been doing this in some ad-hoc way with eg the Copy/View proposal (writing an actual proposal document), so I am very much in favor of formalizing this a bit more.
Personally, I would prefer that we use a more dedicated home for this instead of using the existing pandas repo (e.g. a separate repo in the pandas-dev org). The main pandas repo has nowadays such a high volume in issue and PR comments, that it becomes difficult to follow this or notice specific issues. While there are certainly ways to deal with this (e.g. consistently using a specific label and title, ensuring we always notify the mailing list as well, ...), IMO it would make it more accessible to follow and have an overview of those discussions in e.g. a separate repo.
(there are examples of both in other projects, for example scikit-learn has a separate repo, while bumpy uses the main repo I think)
Joris
Op di 21 jun. 2022 09:46 schreef Marc Garcia <garcia.marc@gmail.com>:
We're in the process of implementing PDEPs, equivalent to Python's PEPs and NumPy's NEPs, but for pandas. This should help build the roadmap, make discussions more efficient, obtain more structured feedback from the community, and add visibility to agreed future plans for pandas.
The initial implementation (workflow) is a bit simpler than PEP or NEP, but we'll iterate in the future as convenient.
You can see the PR for PDEP-1 with the purpose, scope and guidelines here: https://github.com/pandas-dev/pandas/pull/47444
Feedback is very welcome. _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
For me, notifications are the big thing. Having the emails come from a separate repo would make following things much easier for those who can’t keep up with the main repo. Tom
On Jun 25, 2022, at 12:04 PM, Marc Garcia <garcia.marc@gmail.com> wrote:
Thanks for the feedback. I understand your point about using a different repo, but I see several advantages on the current approach, so maybe worth discussing a bit further what are the exact pain points, to see if a separate repo is really the best solution.
Let me know if I miss something, but I see three different ways in which we'll be interacting with PDEPs:
a) Via their rendered version. Not sure if you checked it, but the current rendered page from the PDEP PR (attached) is equivalent to the home of the scikit-learn SLEP proposals [1]. The main difference is that with the current approach we have it integrated with the website, which I personally think it's an advantage.
b) Via the list of PDEP PRs to review. In this case, to see only PDEP PRs, if we use the main pandas repo, this is just a label filter [2]. To me personally quicker than having to go to another repo, but no big difference about one or the other.
c) Notifications. I guess this is the main thing. I think one concern is that notifications from PDEPs get lost in the rest of the repo notifications. I assume you're using your email client filters, and if the notifications come from another repo, you can change the rules easily. I guess the solution here would be to use something like PDEP in the title and use that as a rule. Or we can try to find something more reliable, if that's the main concern.
Personally, I don't see the advantages of having the proposals in a separate repo very significant. And by keeping things the way they're implemented in the PR, I do see some advantages: - No need to maintain a separate repo, CI workflow, jobs to publish the build, sphinx (or equivalent) project... Nothing too complex, by why having to implement and maintain all that if our website is already prepared to handle it. And in particular, with Sphinx is not as easy as with out website to fetch the open PRs and render them. - Integrated UX of the PDEPs into our website. I think this gives it more visibility, and a better using experience than having to jump from one website to another. - One of my concerns is that being in a separate repo we forget about them. We're used to check PRs in the pandas repo, and we'll keep coming back to PRs about PDEPs until they're merged if they are in the main repo, but feels like being in a separate repo is easier to forget them when there is no recent activity and notifications.
It would be good to know if I miss any of your concerns. If I didn't, I'd say we can start with what's already implemented, which is almost ready to get merged, and if in the future you still think we can do better by using a separate repo, you can implement it, we have a discussion about it, and we move PDEPs to a separate repo if that makes sense. What do you think?
Cheers, Marc
1. https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/ 2. https://github.com/pandas-dev/pandas/pulls?q=is%3Aopen+is%3Apr+label%3APDEP
On Sat, Jun 25, 2022 at 7:05 AM Jeff Reback <jeffreback@gmail.com> wrote: +1 in using a separate repo (under pandas-dev) for this
On Jun 24, 2022, at 5:05 PM, Joris Van den Bossche <jorisvandenbossche@gmail.com> wrote:
Thanks for starting this proposal, Marc!
I have already been doing this in some ad-hoc way with eg the Copy/View proposal (writing an actual proposal document), so I am very much in favor of formalizing this a bit more.
Personally, I would prefer that we use a more dedicated home for this instead of using the existing pandas repo (e.g. a separate repo in the pandas-dev org). The main pandas repo has nowadays such a high volume in issue and PR comments, that it becomes difficult to follow this or notice specific issues. While there are certainly ways to deal with this (e.g. consistently using a specific label and title, ensuring we always notify the mailing list as well, ...), IMO it would make it more accessible to follow and have an overview of those discussions in e.g. a separate repo.
(there are examples of both in other projects, for example scikit-learn has a separate repo, while bumpy uses the main repo I think)
Joris
Op di 21 jun. 2022 09:46 schreef Marc Garcia <garcia.marc@gmail.com>:
We're in the process of implementing PDEPs, equivalent to Python's PEPs and NumPy's NEPs, but for pandas. This should help build the roadmap, make discussions more efficient, obtain more structured feedback from the community, and add visibility to agreed future plans for pandas.
The initial implementation (workflow) is a bit simpler than PEP or NEP, but we'll iterate in the future as convenient.
You can see the PR for PDEP-1 with the purpose, scope and guidelines here: https://github.com/pandas-dev/pandas/pull/47444
Feedback is very welcome. _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
I find Marc's arguments regarding general simplicity of PDEP flow (publishing to website & integration to the main repo) a strong argument to keep these in the main repo. Since there is a dependency between PDEP development and the pandas-dev repo development, having them separated may lead to similar workflow challenges with the MacPython/pandas-wheels repo for example (where ciwheelbuild being integrated into the main repo <https://github.com/pandas-dev/pandas/issues/44027> is considered a benefit due to tighter integration). I agree PDEP visibility from notifications is important, but notification priority and channels can differ person-to-person. For example, I just manage my GIthub notifications in Github, not email. On Sat, Jun 25, 2022 at 10:50 AM Tom Augspurger <tom.w.augspurger@gmail.com> wrote:
For me, notifications are the big thing. Having the emails come from a separate repo would make following things much easier for those who can’t keep up with the main repo.
Tom
On Jun 25, 2022, at 12:04 PM, Marc Garcia <garcia.marc@gmail.com> wrote:
Thanks for the feedback. I understand your point about using a different repo, but I see several advantages on the current approach, so maybe worth discussing a bit further what are the exact pain points, to see if a separate repo is really the best solution.
Let me know if I miss something, but I see three different ways in which we'll be interacting with PDEPs:
a) Via their rendered version. Not sure if you checked it, but the current rendered page from the PDEP PR (attached) is equivalent to the home of the scikit-learn SLEP proposals [1]. The main difference is that with the current approach we have it integrated with the website, which I personally think it's an advantage.
b) Via the list of PDEP PRs to review. In this case, to see only PDEP PRs, if we use the main pandas repo, this is just a label filter [2]. To me personally quicker than having to go to another repo, but no big difference about one or the other.
c) Notifications. I guess this is the main thing. I think one concern is that notifications from PDEPs get lost in the rest of the repo notifications. I assume you're using your email client filters, and if the notifications come from another repo, you can change the rules easily. I guess the solution here would be to use something like PDEP in the title and use that as a rule. Or we can try to find something more reliable, if that's the main concern.
Personally, I don't see the advantages of having the proposals in a separate repo very significant. And by keeping things the way they're implemented in the PR, I do see some advantages: - No need to maintain a separate repo, CI workflow, jobs to publish the build, sphinx (or equivalent) project... Nothing too complex, by why having to implement and maintain all that if our website is already prepared to handle it. And in particular, with Sphinx is not as easy as with out website to fetch the open PRs and render them. - Integrated UX of the PDEPs into our website. I think this gives it more visibility, and a better using experience than having to jump from one website to another. - One of my concerns is that being in a separate repo we forget about them. We're used to check PRs in the pandas repo, and we'll keep coming back to PRs about PDEPs until they're merged if they are in the main repo, but feels like being in a separate repo is easier to forget them when there is no recent activity and notifications.
It would be good to know if I miss any of your concerns. If I didn't, I'd say we can start with what's already implemented, which is almost ready to get merged, and if in the future you still think we can do better by using a separate repo, you can implement it, we have a discussion about it, and we move PDEPs to a separate repo if that makes sense. What do you think?
Cheers, Marc
1. https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/ 2. https://github.com/pandas-dev/pandas/pulls?q=is%3Aopen+is%3Apr+label%3APDEP
On Sat, Jun 25, 2022 at 7:05 AM Jeff Reback <jeffreback@gmail.com> wrote:
+1 in using a separate repo (under pandas-dev) for this
On Jun 24, 2022, at 5:05 PM, Joris Van den Bossche < jorisvandenbossche@gmail.com> wrote:
Thanks for starting this proposal, Marc!
I have already been doing this in some ad-hoc way with eg the Copy/View proposal (writing an actual proposal document), so I am very much in favor of formalizing this a bit more.
Personally, I would prefer that we use a more dedicated home for this instead of using the existing pandas repo (e.g. a separate repo in the pandas-dev org). The main pandas repo has nowadays such a high volume in issue and PR comments, that it becomes difficult to follow this or notice specific issues. While there are certainly ways to deal with this (e.g. consistently using a specific label and title, ensuring we always notify the mailing list as well, ...), IMO it would make it more accessible to follow and have an overview of those discussions in e.g. a separate repo.
(there are examples of both in other projects, for example scikit-learn has a separate repo, while bumpy uses the main repo I think)
Joris
Op di 21 jun. 2022 09:46 schreef Marc Garcia <garcia.marc@gmail.com>:
We're in the process of implementing PDEPs, equivalent to Python's PEPs and NumPy's NEPs, but for pandas. This should help build the roadmap, make discussions more efficient, obtain more structured feedback from the community, and add visibility to agreed future plans for pandas.
The initial implementation (workflow) is a bit simpler than PEP or NEP, but we'll iterate in the future as convenient.
You can see the PR for PDEP-1 with the purpose, scope and guidelines here: https://github.com/pandas-dev/pandas/pull/47444
Feedback is very welcome. _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
[image: Screenshot at 2022-06-25 22-20-50.png]
Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
-- Matthew Roeschke
Thanks Marc for the detailed answer. In general, I personally think that the added complexity is not that big, and we can still have a nice publishing workflow to the website with a separate repo (some more detailed responses inline below). For someone who wants to follow the PDEPs (and I hope with this new PDEP process we can engage more people in the pandas community), but doesn't have the time follow all of pandas (eg a maintainer of a dependent package, ..), my hunch is that a separate repo is a more accessible way to do this. You can indeed list all related PRs based on a label filter, but you still need to know this (we can of course document that on the roadmap page) and it's not an automatic notification. And for email notifications you can indeed set up an email filter (although I don't think you have a good option if using github notifications?). For someone as myself, if we end up using the main repo, I can for sure set up those filters, that is not a problem. But in general I think that is not a very accessible way to have people follow those discussions. Having it as a separate repo provides a clear home and gives you all the tools that github has to manage and customize the notifications however you want (eg watch one repo and not the other). Sidenote: I do (or did) this for other projects, such as numpy or python. I don't follow either of their issue trackers, but I do (somewhat) follow NEP or PEP discussions, and both give me a way to do that without having to follow their main issue trackers. The last point that you raise about "forgetting about a separate repo" is certainly a valid concern. It's true that the other separate repos that we have (had) were no success, so we don't have a good track record on this front. But I do think it is a matter of habit (and documentation/communication! we never really publicized any of the other repos, neither actively used them at any point), and if ensure we have a steady activity in such a separate repo for a while, I think that will grow naturally. On Sat, 25 Jun 2022 at 21:19, Matthew Roeschke <emailformattr@gmail.com> wrote:
I find Marc's arguments regarding general simplicity of PDEP flow (publishing to website & integration to the main repo) a strong argument to keep these in the main repo.
Since there is a dependency between PDEP development and the pandas-dev repo development, having them separated may lead to similar workflow challenges with the MacPython/pandas-wheels repo for example (where ciwheelbuild being integrated into the main repo <https://github.com/pandas-dev/pandas/issues/44027> is considered a benefit due to tighter integration).
I think an important difference here is that building wheels is defined in the pandas repo (packaging setup) and often needs fixes in pandas, and so here it indeed makes that workflow much easier to have that in the same repo. For PDEPs that is much less of an issue.
I agree PDEP visibility from notifications is important, but notification priority and channels can differ person-to-person. For example, I just manage my GIthub notifications in Github, not email.
I don't think there is fundamentally a difference between both. Also if I
was using github notifications, seeing a specific subset of issues in those is challenging (while when using email I could at least set up some automatic filters). (but I don't know github notifications well, so I might be wrong)
On Sat, Jun 25, 2022 at 10:50 AM Tom Augspurger < tom.w.augspurger@gmail.com> wrote:
For me, notifications are the big thing. Having the emails come from a separate repo would make following things much easier for those who can’t keep up with the main repo.
Tom
On Jun 25, 2022, at 12:04 PM, Marc Garcia <garcia.marc@gmail.com> wrote:
Thanks for the feedback. I understand your point about using a different repo, but I see several advantages on the current approach, so maybe worth discussing a bit further what are the exact pain points, to see if a separate repo is really the best solution.
Let me know if I miss something, but I see three different ways in which we'll be interacting with PDEPs:
a) Via their rendered version. Not sure if you checked it, but the current rendered page from the PDEP PR (attached) is equivalent to the home of the scikit-learn SLEP proposals [1]. The main difference is that with the current approach we have it integrated with the website, which I personally think it's an advantage.
I am assuming that also with a separate repo we will have an identical web page (which wil be very useful!).
b) Via the list of PDEP PRs to review. In this case, to see only PDEP PRs, if we use the main pandas repo, this is just a label filter [2]. To me personally quicker than having to go to another repo, but no big difference about one or the other.
c) Notifications. I guess this is the main thing. I think one concern is that notifications from PDEPs get lost in the rest of the repo notifications. I assume you're using your email client filters, and if the notifications come from another repo, you can change the rules easily. I guess the solution here would be to use something like PDEP in the title and use that as a rule. Or we can try to find something more reliable, if that's the main concern.
Personally, I don't see the advantages of having the proposals in a separate repo very significant. And by keeping things the way they're implemented in the PR, I do see some advantages: - No need to maintain a separate repo, CI workflow, jobs to publish the build, sphinx (or equivalent) project... Nothing too complex, by why having to implement and maintain all that if our website is already prepared to handle it. And in particular, with Sphinx is not as easy as with out website to fetch the open PRs and render them. - Integrated UX of the PDEPs into our website. I think this gives it more visibility, and a better using experience than having to jump from one website to another.
I think it should certainly be possible to keep the website UX as you
implemented with a separate repo as well. There are for sure multiple options, but one (maybe simplest) option would be to keep the publishing in the main repo as you have now (since the website publishing lives there): for example the separate repo could additionally be cloned in the website workflow, and then that content is available as well (requiring to change the path in the script a bit). The PDEP repo itself could further have only very limited CI?
- One of my concerns is that being in a separate repo we forget about
them. We're used to check PRs in the pandas repo, and we'll keep coming back to PRs about PDEPs until they're merged if they are in the main repo, but feels like being in a separate repo is easier to forget them when there is no recent activity and notifications.
It would be good to know if I miss any of your concerns. If I didn't, I'd say we can start with what's already implemented, which is almost ready to get merged, and if in the future you still think we can do better by using a separate repo, you can implement it, we have a discussion about it, and we move PDEPs to a separate repo if that makes sense. What do you think?
Cheers, Marc
1. https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/ 2. https://github.com/pandas-dev/pandas/pulls?q=is%3Aopen+is%3Apr+label%3APDEP
On Sat, Jun 25, 2022 at 7:05 AM Jeff Reback <jeffreback@gmail.com> wrote:
+1 in using a separate repo (under pandas-dev) for this
On Jun 24, 2022, at 5:05 PM, Joris Van den Bossche < jorisvandenbossche@gmail.com> wrote:
Thanks for starting this proposal, Marc!
I have already been doing this in some ad-hoc way with eg the Copy/View proposal (writing an actual proposal document), so I am very much in favor of formalizing this a bit more.
Personally, I would prefer that we use a more dedicated home for this instead of using the existing pandas repo (e.g. a separate repo in the pandas-dev org). The main pandas repo has nowadays such a high volume in issue and PR comments, that it becomes difficult to follow this or notice specific issues. While there are certainly ways to deal with this (e.g. consistently using a specific label and title, ensuring we always notify the mailing list as well, ...), IMO it would make it more accessible to follow and have an overview of those discussions in e.g. a separate repo.
(there are examples of both in other projects, for example scikit-learn has a separate repo, while bumpy uses the main repo I think)
Joris
Op di 21 jun. 2022 09:46 schreef Marc Garcia <garcia.marc@gmail.com>:
We're in the process of implementing PDEPs, equivalent to Python's PEPs and NumPy's NEPs, but for pandas. This should help build the roadmap, make discussions more efficient, obtain more structured feedback from the community, and add visibility to agreed future plans for pandas.
The initial implementation (workflow) is a bit simpler than PEP or NEP, but we'll iterate in the future as convenient.
You can see the PR for PDEP-1 with the purpose, scope and guidelines here: https://github.com/pandas-dev/pandas/pull/47444
Feedback is very welcome. _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
[image: Screenshot at 2022-06-25 22-20-50.png]
Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
-- Matthew Roeschke _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
Ok, correct me if I'm wrong, but for what you say the options to consider are: 1) Keep everything as is (in the main pandas repo), and maybe improve notifications (send emails to a list where people can subscribe, rss feed, telegram messages...) 2) Use a new repo for PRs, and use the main pandas website to display them 2.a) On the website build, fetch the PDEP docs from the other repo 2.b) On the PDEP repo CI, push the PDEP docs to the main pandas repo 3) Use a new repo for the PDEP PRs, and have a separate website for PDEPs Does these options sound reasonable as the ones to discuss? Or am I missing something? My preference is 1, as I think it's the simplest, and adding notifications that allow following PDEPs separate from all pandas activity doesn't seem complex. I'm also fine with 2.a if more people have a strong opinion about keeping PDEP discussions/PRs in a separate repo. I personally don't see advantages in 2.b and 3. On Mon, Jul 18, 2022, 18:34 Joris Van den Bossche < jorisvandenbossche@gmail.com> wrote:
Thanks Marc for the detailed answer. In general, I personally think that the added complexity is not that big, and we can still have a nice publishing workflow to the website with a separate repo (some more detailed responses inline below).
For someone who wants to follow the PDEPs (and I hope with this new PDEP process we can engage more people in the pandas community), but doesn't have the time follow all of pandas (eg a maintainer of a dependent package, ..), my hunch is that a separate repo is a more accessible way to do this. You can indeed list all related PRs based on a label filter, but you still need to know this (we can of course document that on the roadmap page) and it's not an automatic notification. And for email notifications you can indeed set up an email filter (although I don't think you have a good option if using github notifications?).
For someone as myself, if we end up using the main repo, I can for sure set up those filters, that is not a problem. But in general I think that is not a very accessible way to have people follow those discussions. Having it as a separate repo provides a clear home and gives you all the tools that github has to manage and customize the notifications however you want (eg watch one repo and not the other).
Sidenote: I do (or did) this for other projects, such as numpy or python. I don't follow either of their issue trackers, but I do (somewhat) follow NEP or PEP discussions, and both give me a way to do that without having to follow their main issue trackers.
The last point that you raise about "forgetting about a separate repo" is certainly a valid concern. It's true that the other separate repos that we have (had) were no success, so we don't have a good track record on this front. But I do think it is a matter of habit (and documentation/communication! we never really publicized any of the other repos, neither actively used them at any point), and if ensure we have a steady activity in such a separate repo for a while, I think that will grow naturally.
On Sat, 25 Jun 2022 at 21:19, Matthew Roeschke <emailformattr@gmail.com> wrote:
I find Marc's arguments regarding general simplicity of PDEP flow (publishing to website & integration to the main repo) a strong argument to keep these in the main repo.
Since there is a dependency between PDEP development and the pandas-dev repo development, having them separated may lead to similar workflow challenges with the MacPython/pandas-wheels repo for example (where ciwheelbuild being integrated into the main repo <https://github.com/pandas-dev/pandas/issues/44027> is considered a benefit due to tighter integration).
I think an important difference here is that building wheels is defined in the pandas repo (packaging setup) and often needs fixes in pandas, and so here it indeed makes that workflow much easier to have that in the same repo. For PDEPs that is much less of an issue.
I agree PDEP visibility from notifications is important, but notification priority and channels can differ person-to-person. For example, I just manage my GIthub notifications in Github, not email.
I don't think there is fundamentally a difference between both. Also if I
was using github notifications, seeing a specific subset of issues in those is challenging (while when using email I could at least set up some automatic filters). (but I don't know github notifications well, so I might be wrong)
On Sat, Jun 25, 2022 at 10:50 AM Tom Augspurger < tom.w.augspurger@gmail.com> wrote:
For me, notifications are the big thing. Having the emails come from a separate repo would make following things much easier for those who can’t keep up with the main repo.
Tom
On Jun 25, 2022, at 12:04 PM, Marc Garcia <garcia.marc@gmail.com> wrote:
Thanks for the feedback. I understand your point about using a different repo, but I see several advantages on the current approach, so maybe worth discussing a bit further what are the exact pain points, to see if a separate repo is really the best solution.
Let me know if I miss something, but I see three different ways in which we'll be interacting with PDEPs:
a) Via their rendered version. Not sure if you checked it, but the current rendered page from the PDEP PR (attached) is equivalent to the home of the scikit-learn SLEP proposals [1]. The main difference is that with the current approach we have it integrated with the website, which I personally think it's an advantage.
I am assuming that also with a separate repo we will have an identical web page (which wil be very useful!).
b) Via the list of PDEP PRs to review. In this case, to see only PDEP PRs, if we use the main pandas repo, this is just a label filter [2]. To me personally quicker than having to go to another repo, but no big difference about one or the other.
c) Notifications. I guess this is the main thing. I think one concern is that notifications from PDEPs get lost in the rest of the repo notifications. I assume you're using your email client filters, and if the notifications come from another repo, you can change the rules easily. I guess the solution here would be to use something like PDEP in the title and use that as a rule. Or we can try to find something more reliable, if that's the main concern.
Personally, I don't see the advantages of having the proposals in a separate repo very significant. And by keeping things the way they're implemented in the PR, I do see some advantages: - No need to maintain a separate repo, CI workflow, jobs to publish the build, sphinx (or equivalent) project... Nothing too complex, by why having to implement and maintain all that if our website is already prepared to handle it. And in particular, with Sphinx is not as easy as with out website to fetch the open PRs and render them. - Integrated UX of the PDEPs into our website. I think this gives it more visibility, and a better using experience than having to jump from one website to another.
I think it should certainly be possible to keep the website UX as you
implemented with a separate repo as well. There are for sure multiple options, but one (maybe simplest) option would be to keep the publishing in the main repo as you have now (since the website publishing lives there): for example the separate repo could additionally be cloned in the website workflow, and then that content is available as well (requiring to change the path in the script a bit).
The PDEP repo itself could further have only very limited CI?
- One of my concerns is that being in a separate repo we forget about
them. We're used to check PRs in the pandas repo, and we'll keep coming back to PRs about PDEPs until they're merged if they are in the main repo, but feels like being in a separate repo is easier to forget them when there is no recent activity and notifications.
It would be good to know if I miss any of your concerns. If I didn't, I'd say we can start with what's already implemented, which is almost ready to get merged, and if in the future you still think we can do better by using a separate repo, you can implement it, we have a discussion about it, and we move PDEPs to a separate repo if that makes sense. What do you think?
Cheers, Marc
1. https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/ 2. https://github.com/pandas-dev/pandas/pulls?q=is%3Aopen+is%3Apr+label%3APDEP
On Sat, Jun 25, 2022 at 7:05 AM Jeff Reback <jeffreback@gmail.com> wrote:
+1 in using a separate repo (under pandas-dev) for this
On Jun 24, 2022, at 5:05 PM, Joris Van den Bossche < jorisvandenbossche@gmail.com> wrote:
Thanks for starting this proposal, Marc!
I have already been doing this in some ad-hoc way with eg the Copy/View proposal (writing an actual proposal document), so I am very much in favor of formalizing this a bit more.
Personally, I would prefer that we use a more dedicated home for this instead of using the existing pandas repo (e.g. a separate repo in the pandas-dev org). The main pandas repo has nowadays such a high volume in issue and PR comments, that it becomes difficult to follow this or notice specific issues. While there are certainly ways to deal with this (e.g. consistently using a specific label and title, ensuring we always notify the mailing list as well, ...), IMO it would make it more accessible to follow and have an overview of those discussions in e.g. a separate repo.
(there are examples of both in other projects, for example scikit-learn has a separate repo, while bumpy uses the main repo I think)
Joris
Op di 21 jun. 2022 09:46 schreef Marc Garcia <garcia.marc@gmail.com>:
We're in the process of implementing PDEPs, equivalent to Python's PEPs and NumPy's NEPs, but for pandas. This should help build the roadmap, make discussions more efficient, obtain more structured feedback from the community, and add visibility to agreed future plans for pandas.
The initial implementation (workflow) is a bit simpler than PEP or NEP, but we'll iterate in the future as convenient.
You can see the PR for PDEP-1 with the purpose, scope and guidelines here: https://github.com/pandas-dev/pandas/pull/47444
Feedback is very welcome. _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
[image: Screenshot at 2022-06-25 22-20-50.png]
Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
-- Matthew Roeschke _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
On Wed, 20 Jul 2022 at 16:26, Marc Garcia <garcia.marc@gmail.com> wrote:
Ok, correct me if I'm wrong, but for what you say the options to consider are:
1) Keep everything as is (in the main pandas repo), and maybe improve notifications (send emails to a list where people can subscribe, rss feed, telegram messages...)
2) Use a new repo for PRs, and use the main pandas website to display them 2.a) On the website build, fetch the PDEP docs from the other repo 2.b) On the PDEP repo CI, push the PDEP docs to the main pandas repo
3) Use a new repo for the PDEP PRs, and have a separate website for PDEPs
Does these options sound reasonable as the ones to discuss? Or am I missing something?
My preference is 1, as I think it's the simplest, and adding notifications that allow following PDEPs separate from all pandas activity doesn't seem complex.
I'm also fine with 2.a if more people have a strong opinion about keeping PDEP discussions/PRs in a separate repo. I personally don't see advantages in 2.b and 3.
Thanks, that's indeed a good summary of the options. I also think 2.a is the easiest of the alternatives, so I would indeed only consider 1 and 2.a. My preference is to go with 2 (separate repo), for the reasons mentioned before. I think Tom also mentioned this as his preference, and Jeff being OK with it, while Marc/Matthew prefer the main repo. But it would be good to hear from others as well whether they have a (strong) preference. If we decide to do that, I am happy to do a PR to update the publishing workflow to handle a separate repo. Joris
On Mon, Jul 18, 2022, 18:34 Joris Van den Bossche < jorisvandenbossche@gmail.com> wrote:
Thanks Marc for the detailed answer. In general, I personally think that the added complexity is not that big, and we can still have a nice publishing workflow to the website with a separate repo (some more detailed responses inline below).
For someone who wants to follow the PDEPs (and I hope with this new PDEP process we can engage more people in the pandas community), but doesn't have the time follow all of pandas (eg a maintainer of a dependent package, ..), my hunch is that a separate repo is a more accessible way to do this. You can indeed list all related PRs based on a label filter, but you still need to know this (we can of course document that on the roadmap page) and it's not an automatic notification. And for email notifications you can indeed set up an email filter (although I don't think you have a good option if using github notifications?).
For someone as myself, if we end up using the main repo, I can for sure set up those filters, that is not a problem. But in general I think that is not a very accessible way to have people follow those discussions. Having it as a separate repo provides a clear home and gives you all the tools that github has to manage and customize the notifications however you want (eg watch one repo and not the other).
Sidenote: I do (or did) this for other projects, such as numpy or python. I don't follow either of their issue trackers, but I do (somewhat) follow NEP or PEP discussions, and both give me a way to do that without having to follow their main issue trackers.
The last point that you raise about "forgetting about a separate repo" is certainly a valid concern. It's true that the other separate repos that we have (had) were no success, so we don't have a good track record on this front. But I do think it is a matter of habit (and documentation/communication! we never really publicized any of the other repos, neither actively used them at any point), and if ensure we have a steady activity in such a separate repo for a while, I think that will grow naturally.
On Sat, 25 Jun 2022 at 21:19, Matthew Roeschke <emailformattr@gmail.com> wrote:
I find Marc's arguments regarding general simplicity of PDEP flow (publishing to website & integration to the main repo) a strong argument to keep these in the main repo.
Since there is a dependency between PDEP development and the pandas-dev repo development, having them separated may lead to similar workflow challenges with the MacPython/pandas-wheels repo for example (where ciwheelbuild being integrated into the main repo <https://github.com/pandas-dev/pandas/issues/44027> is considered a benefit due to tighter integration).
I think an important difference here is that building wheels is defined in the pandas repo (packaging setup) and often needs fixes in pandas, and so here it indeed makes that workflow much easier to have that in the same repo. For PDEPs that is much less of an issue.
I agree PDEP visibility from notifications is important, but notification priority and channels can differ person-to-person. For example, I just manage my GIthub notifications in Github, not email.
I don't think there is fundamentally a difference between both. Also if
I was using github notifications, seeing a specific subset of issues in those is challenging (while when using email I could at least set up some automatic filters). (but I don't know github notifications well, so I might be wrong)
On Sat, Jun 25, 2022 at 10:50 AM Tom Augspurger < tom.w.augspurger@gmail.com> wrote:
For me, notifications are the big thing. Having the emails come from a separate repo would make following things much easier for those who can’t keep up with the main repo.
Tom
On Jun 25, 2022, at 12:04 PM, Marc Garcia <garcia.marc@gmail.com> wrote:
Thanks for the feedback. I understand your point about using a different repo, but I see several advantages on the current approach, so maybe worth discussing a bit further what are the exact pain points, to see if a separate repo is really the best solution.
Let me know if I miss something, but I see three different ways in which we'll be interacting with PDEPs:
a) Via their rendered version. Not sure if you checked it, but the current rendered page from the PDEP PR (attached) is equivalent to the home of the scikit-learn SLEP proposals [1]. The main difference is that with the current approach we have it integrated with the website, which I personally think it's an advantage.
I am assuming that also with a separate repo we will have an identical web page (which wil be very useful!).
b) Via the list of PDEP PRs to review. In this case, to see only PDEP PRs, if we use the main pandas repo, this is just a label filter [2]. To me personally quicker than having to go to another repo, but no big difference about one or the other.
c) Notifications. I guess this is the main thing. I think one concern is that notifications from PDEPs get lost in the rest of the repo notifications. I assume you're using your email client filters, and if the notifications come from another repo, you can change the rules easily. I guess the solution here would be to use something like PDEP in the title and use that as a rule. Or we can try to find something more reliable, if that's the main concern.
Personally, I don't see the advantages of having the proposals in a separate repo very significant. And by keeping things the way they're implemented in the PR, I do see some advantages: - No need to maintain a separate repo, CI workflow, jobs to publish the build, sphinx (or equivalent) project... Nothing too complex, by why having to implement and maintain all that if our website is already prepared to handle it. And in particular, with Sphinx is not as easy as with out website to fetch the open PRs and render them. - Integrated UX of the PDEPs into our website. I think this gives it more visibility, and a better using experience than having to jump from one website to another.
I think it should certainly be possible to keep the website UX as you
implemented with a separate repo as well. There are for sure multiple options, but one (maybe simplest) option would be to keep the publishing in the main repo as you have now (since the website publishing lives there): for example the separate repo could additionally be cloned in the website workflow, and then that content is available as well (requiring to change the path in the script a bit).
The PDEP repo itself could further have only very limited CI?
- One of my concerns is that being in a separate repo we forget about
them. We're used to check PRs in the pandas repo, and we'll keep coming back to PRs about PDEPs until they're merged if they are in the main repo, but feels like being in a separate repo is easier to forget them when there is no recent activity and notifications.
It would be good to know if I miss any of your concerns. If I didn't, I'd say we can start with what's already implemented, which is almost ready to get merged, and if in the future you still think we can do better by using a separate repo, you can implement it, we have a discussion about it, and we move PDEPs to a separate repo if that makes sense. What do you think?
Cheers, Marc
1. https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/ 2. https://github.com/pandas-dev/pandas/pulls?q=is%3Aopen+is%3Apr+label%3APDEP
On Sat, Jun 25, 2022 at 7:05 AM Jeff Reback <jeffreback@gmail.com> wrote:
+1 in using a separate repo (under pandas-dev) for this
On Jun 24, 2022, at 5:05 PM, Joris Van den Bossche < jorisvandenbossche@gmail.com> wrote:
Thanks for starting this proposal, Marc!
I have already been doing this in some ad-hoc way with eg the Copy/View proposal (writing an actual proposal document), so I am very much in favor of formalizing this a bit more.
Personally, I would prefer that we use a more dedicated home for this instead of using the existing pandas repo (e.g. a separate repo in the pandas-dev org). The main pandas repo has nowadays such a high volume in issue and PR comments, that it becomes difficult to follow this or notice specific issues. While there are certainly ways to deal with this (e.g. consistently using a specific label and title, ensuring we always notify the mailing list as well, ...), IMO it would make it more accessible to follow and have an overview of those discussions in e.g. a separate repo.
(there are examples of both in other projects, for example scikit-learn has a separate repo, while bumpy uses the main repo I think)
Joris
Op di 21 jun. 2022 09:46 schreef Marc Garcia <garcia.marc@gmail.com>:
We're in the process of implementing PDEPs, equivalent to Python's PEPs and NumPy's NEPs, but for pandas. This should help build the roadmap, make discussions more efficient, obtain more structured feedback from the community, and add visibility to agreed future plans for pandas.
The initial implementation (workflow) is a bit simpler than PEP or NEP, but we'll iterate in the future as convenient.
You can see the PR for PDEP-1 with the purpose, scope and guidelines here: https://github.com/pandas-dev/pandas/pull/47444
Feedback is very welcome. _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
[image: Screenshot at 2022-06-25 22-20-50.png]
Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
-- Matthew Roeschke _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
On Fri, 5 Aug 2022 at 16:29, Joris Van den Bossche < jorisvandenbossche@gmail.com> wrote:
On Wed, 20 Jul 2022 at 16:26, Marc Garcia <garcia.marc@gmail.com> wrote:
Ok, correct me if I'm wrong, but for what you say the options to consider are:
1) Keep everything as is (in the main pandas repo), and maybe improve notifications (send emails to a list where people can subscribe, rss feed, telegram messages...)
2) Use a new repo for PRs, and use the main pandas website to display them 2.a) On the website build, fetch the PDEP docs from the other repo 2.b) On the PDEP repo CI, push the PDEP docs to the main pandas repo
3) Use a new repo for the PDEP PRs, and have a separate website for PDEPs
Does these options sound reasonable as the ones to discuss? Or am I missing something?
My preference is 1, as I think it's the simplest, and adding notifications that allow following PDEPs separate from all pandas activity doesn't seem complex.
I'm also fine with 2.a if more people have a strong opinion about keeping PDEP discussions/PRs in a separate repo. I personally don't see advantages in 2.b and 3.
Thanks, that's indeed a good summary of the options. I also think 2.a is the easiest of the alternatives, so I would indeed only consider 1 and 2.a.
My preference is to go with 2 (separate repo), for the reasons mentioned before. I think Tom also mentioned this as his preference, and Jeff being OK with it, while Marc/Matthew prefer the main repo. But it would be good to hear from others as well whether they have a (strong) preference.
maybe another quick poll with just those 2 options, it seemed to get a swift resolution on the PDEP name. if not a clear majority, then we would need further discussion. if a majority, then maybe only a few pain points to resolve.
If we decide to do that, I am happy to do a PR to update the publishing workflow to handle a separate repo.
Joris
On Mon, Jul 18, 2022, 18:34 Joris Van den Bossche < jorisvandenbossche@gmail.com> wrote:
Thanks Marc for the detailed answer. In general, I personally think that the added complexity is not that big, and we can still have a nice publishing workflow to the website with a separate repo (some more detailed responses inline below).
For someone who wants to follow the PDEPs (and I hope with this new PDEP process we can engage more people in the pandas community), but doesn't have the time follow all of pandas (eg a maintainer of a dependent package, ..), my hunch is that a separate repo is a more accessible way to do this. You can indeed list all related PRs based on a label filter, but you still need to know this (we can of course document that on the roadmap page) and it's not an automatic notification. And for email notifications you can indeed set up an email filter (although I don't think you have a good option if using github notifications?).
For someone as myself, if we end up using the main repo, I can for sure set up those filters, that is not a problem. But in general I think that is not a very accessible way to have people follow those discussions. Having it as a separate repo provides a clear home and gives you all the tools that github has to manage and customize the notifications however you want (eg watch one repo and not the other).
Sidenote: I do (or did) this for other projects, such as numpy or python. I don't follow either of their issue trackers, but I do (somewhat) follow NEP or PEP discussions, and both give me a way to do that without having to follow their main issue trackers.
The last point that you raise about "forgetting about a separate repo" is certainly a valid concern. It's true that the other separate repos that we have (had) were no success, so we don't have a good track record on this front. But I do think it is a matter of habit (and documentation/communication! we never really publicized any of the other repos, neither actively used them at any point), and if ensure we have a steady activity in such a separate repo for a while, I think that will grow naturally.
On Sat, 25 Jun 2022 at 21:19, Matthew Roeschke <emailformattr@gmail.com> wrote:
I find Marc's arguments regarding general simplicity of PDEP flow (publishing to website & integration to the main repo) a strong argument to keep these in the main repo.
Since there is a dependency between PDEP development and the pandas-dev repo development, having them separated may lead to similar workflow challenges with the MacPython/pandas-wheels repo for example (where ciwheelbuild being integrated into the main repo <https://github.com/pandas-dev/pandas/issues/44027> is considered a benefit due to tighter integration).
I think an important difference here is that building wheels is defined in the pandas repo (packaging setup) and often needs fixes in pandas, and so here it indeed makes that workflow much easier to have that in the same repo. For PDEPs that is much less of an issue.
I agree PDEP visibility from notifications is important, but notification priority and channels can differ person-to-person. For example, I just manage my GIthub notifications in Github, not email.
I don't think there is fundamentally a difference between both. Also if
I was using github notifications, seeing a specific subset of issues in those is challenging (while when using email I could at least set up some automatic filters). (but I don't know github notifications well, so I might be wrong)
On Sat, Jun 25, 2022 at 10:50 AM Tom Augspurger < tom.w.augspurger@gmail.com> wrote:
For me, notifications are the big thing. Having the emails come from a separate repo would make following things much easier for those who can’t keep up with the main repo.
Tom
On Jun 25, 2022, at 12:04 PM, Marc Garcia <garcia.marc@gmail.com> wrote:
Thanks for the feedback. I understand your point about using a different repo, but I see several advantages on the current approach, so maybe worth discussing a bit further what are the exact pain points, to see if a separate repo is really the best solution.
Let me know if I miss something, but I see three different ways in which we'll be interacting with PDEPs:
a) Via their rendered version. Not sure if you checked it, but the current rendered page from the PDEP PR (attached) is equivalent to the home of the scikit-learn SLEP proposals [1]. The main difference is that with the current approach we have it integrated with the website, which I personally think it's an advantage.
I am assuming that also with a separate repo we will have an identical web page (which wil be very useful!).
b) Via the list of PDEP PRs to review. In this case, to see only PDEP PRs, if we use the main pandas repo, this is just a label filter [2]. To me personally quicker than having to go to another repo, but no big difference about one or the other.
c) Notifications. I guess this is the main thing. I think one concern is that notifications from PDEPs get lost in the rest of the repo notifications. I assume you're using your email client filters, and if the notifications come from another repo, you can change the rules easily. I guess the solution here would be to use something like PDEP in the title and use that as a rule. Or we can try to find something more reliable, if that's the main concern.
Personally, I don't see the advantages of having the proposals in a separate repo very significant. And by keeping things the way they're implemented in the PR, I do see some advantages: - No need to maintain a separate repo, CI workflow, jobs to publish the build, sphinx (or equivalent) project... Nothing too complex, by why having to implement and maintain all that if our website is already prepared to handle it. And in particular, with Sphinx is not as easy as with out website to fetch the open PRs and render them. - Integrated UX of the PDEPs into our website. I think this gives it more visibility, and a better using experience than having to jump from one website to another.
I think it should certainly be possible to keep the website UX as you
implemented with a separate repo as well. There are for sure multiple options, but one (maybe simplest) option would be to keep the publishing in the main repo as you have now (since the website publishing lives there): for example the separate repo could additionally be cloned in the website workflow, and then that content is available as well (requiring to change the path in the script a bit).
The PDEP repo itself could further have only very limited CI?
- One of my concerns is that being in a separate repo we forget about
them. We're used to check PRs in the pandas repo, and we'll keep coming back to PRs about PDEPs until they're merged if they are in the main repo, but feels like being in a separate repo is easier to forget them when there is no recent activity and notifications.
It would be good to know if I miss any of your concerns. If I didn't, I'd say we can start with what's already implemented, which is almost ready to get merged, and if in the future you still think we can do better by using a separate repo, you can implement it, we have a discussion about it, and we move PDEPs to a separate repo if that makes sense. What do you think?
Cheers, Marc
1. https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/ 2. https://github.com/pandas-dev/pandas/pulls?q=is%3Aopen+is%3Apr+label%3APDEP
On Sat, Jun 25, 2022 at 7:05 AM Jeff Reback <jeffreback@gmail.com> wrote:
+1 in using a separate repo (under pandas-dev) for this
On Jun 24, 2022, at 5:05 PM, Joris Van den Bossche < jorisvandenbossche@gmail.com> wrote:
Thanks for starting this proposal, Marc!
I have already been doing this in some ad-hoc way with eg the Copy/View proposal (writing an actual proposal document), so I am very much in favor of formalizing this a bit more.
Personally, I would prefer that we use a more dedicated home for this instead of using the existing pandas repo (e.g. a separate repo in the pandas-dev org). The main pandas repo has nowadays such a high volume in issue and PR comments, that it becomes difficult to follow this or notice specific issues. While there are certainly ways to deal with this (e.g. consistently using a specific label and title, ensuring we always notify the mailing list as well, ...), IMO it would make it more accessible to follow and have an overview of those discussions in e.g. a separate repo.
(there are examples of both in other projects, for example scikit-learn has a separate repo, while bumpy uses the main repo I think)
Joris
Op di 21 jun. 2022 09:46 schreef Marc Garcia <garcia.marc@gmail.com>:
> We're in the process of implementing PDEPs, equivalent to Python's > PEPs and NumPy's NEPs, but for pandas. This should help build the roadmap, > make discussions more efficient, obtain more structured feedback from the > community, and add visibility to agreed future plans for pandas. > > The initial implementation (workflow) is a bit simpler than PEP or > NEP, but we'll iterate in the future as convenient. > > You can see the PR for PDEP-1 with the purpose, scope and guidelines > here: https://github.com/pandas-dev/pandas/pull/47444 > > Feedback is very welcome. > _______________________________________________ > Pandas-dev mailing list > Pandas-dev@python.org > https://mail.python.org/mailman/listinfo/pandas-dev > _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
[image: Screenshot at 2022-06-25 22-20-50.png]
Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
-- Matthew Roeschke _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________
Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
Hi all, In the last meeting on governance, we were discussing the current workflow around PDEPs, including "When and how to notify about new PDEPs" (or, how to ensure that people are aware of new PDEPs and ongoing discussions). In that context, it came up again that it could help to have those discussions in a separate repo (for people that cannot easily handle the large stream of notifications in the main repo). So I would like to bring forward this proposal once more. Thoughts about this? (now we have a bit of experience with it) If we decide to do this, I am happy to look at the necessary changes to still include the PDEP's text in the website build in the main repo. Best, Joris On Fri, 5 Aug 2022 at 17:41, Simon Hawkins <simonjayhawkins@gmail.com> wrote:
On Fri, 5 Aug 2022 at 16:29, Joris Van den Bossche < jorisvandenbossche@gmail.com> wrote:
On Wed, 20 Jul 2022 at 16:26, Marc Garcia <garcia.marc@gmail.com> wrote:
Ok, correct me if I'm wrong, but for what you say the options to consider are:
1) Keep everything as is (in the main pandas repo), and maybe improve notifications (send emails to a list where people can subscribe, rss feed, telegram messages...)
2) Use a new repo for PRs, and use the main pandas website to display them 2.a) On the website build, fetch the PDEP docs from the other repo 2.b) On the PDEP repo CI, push the PDEP docs to the main pandas repo
3) Use a new repo for the PDEP PRs, and have a separate website for PDEPs
Does these options sound reasonable as the ones to discuss? Or am I missing something?
My preference is 1, as I think it's the simplest, and adding notifications that allow following PDEPs separate from all pandas activity doesn't seem complex.
I'm also fine with 2.a if more people have a strong opinion about keeping PDEP discussions/PRs in a separate repo. I personally don't see advantages in 2.b and 3.
Thanks, that's indeed a good summary of the options. I also think 2.a is the easiest of the alternatives, so I would indeed only consider 1 and 2.a.
My preference is to go with 2 (separate repo), for the reasons mentioned before. I think Tom also mentioned this as his preference, and Jeff being OK with it, while Marc/Matthew prefer the main repo. But it would be good to hear from others as well whether they have a (strong) preference.
maybe another quick poll with just those 2 options, it seemed to get a swift resolution on the PDEP name.
if not a clear majority, then we would need further discussion.
if a majority, then maybe only a few pain points to resolve.
If we decide to do that, I am happy to do a PR to update the publishing workflow to handle a separate repo.
Joris
On Mon, Jul 18, 2022, 18:34 Joris Van den Bossche < jorisvandenbossche@gmail.com> wrote:
Thanks Marc for the detailed answer. In general, I personally think that the added complexity is not that big, and we can still have a nice publishing workflow to the website with a separate repo (some more detailed responses inline below).
For someone who wants to follow the PDEPs (and I hope with this new PDEP process we can engage more people in the pandas community), but doesn't have the time follow all of pandas (eg a maintainer of a dependent package, ..), my hunch is that a separate repo is a more accessible way to do this. You can indeed list all related PRs based on a label filter, but you still need to know this (we can of course document that on the roadmap page) and it's not an automatic notification. And for email notifications you can indeed set up an email filter (although I don't think you have a good option if using github notifications?).
For someone as myself, if we end up using the main repo, I can for sure set up those filters, that is not a problem. But in general I think that is not a very accessible way to have people follow those discussions. Having it as a separate repo provides a clear home and gives you all the tools that github has to manage and customize the notifications however you want (eg watch one repo and not the other).
Sidenote: I do (or did) this for other projects, such as numpy or python. I don't follow either of their issue trackers, but I do (somewhat) follow NEP or PEP discussions, and both give me a way to do that without having to follow their main issue trackers.
The last point that you raise about "forgetting about a separate repo" is certainly a valid concern. It's true that the other separate repos that we have (had) were no success, so we don't have a good track record on this front. But I do think it is a matter of habit (and documentation/communication! we never really publicized any of the other repos, neither actively used them at any point), and if ensure we have a steady activity in such a separate repo for a while, I think that will grow naturally.
On Sat, 25 Jun 2022 at 21:19, Matthew Roeschke <emailformattr@gmail.com> wrote:
I find Marc's arguments regarding general simplicity of PDEP flow (publishing to website & integration to the main repo) a strong argument to keep these in the main repo.
Since there is a dependency between PDEP development and the pandas-dev repo development, having them separated may lead to similar workflow challenges with the MacPython/pandas-wheels repo for example (where ciwheelbuild being integrated into the main repo <https://github.com/pandas-dev/pandas/issues/44027> is considered a benefit due to tighter integration).
I think an important difference here is that building wheels is defined in the pandas repo (packaging setup) and often needs fixes in pandas, and so here it indeed makes that workflow much easier to have that in the same repo. For PDEPs that is much less of an issue.
I agree PDEP visibility from notifications is important, but notification priority and channels can differ person-to-person. For example, I just manage my GIthub notifications in Github, not email.
I don't think there is fundamentally a difference between both. Also
if I was using github notifications, seeing a specific subset of issues in those is challenging (while when using email I could at least set up some automatic filters). (but I don't know github notifications well, so I might be wrong)
On Sat, Jun 25, 2022 at 10:50 AM Tom Augspurger < tom.w.augspurger@gmail.com> wrote:
For me, notifications are the big thing. Having the emails come from a separate repo would make following things much easier for those who can’t keep up with the main repo.
Tom
On Jun 25, 2022, at 12:04 PM, Marc Garcia <garcia.marc@gmail.com> wrote:
Thanks for the feedback. I understand your point about using a different repo, but I see several advantages on the current approach, so maybe worth discussing a bit further what are the exact pain points, to see if a separate repo is really the best solution.
Let me know if I miss something, but I see three different ways in which we'll be interacting with PDEPs:
a) Via their rendered version. Not sure if you checked it, but the current rendered page from the PDEP PR (attached) is equivalent to the home of the scikit-learn SLEP proposals [1]. The main difference is that with the current approach we have it integrated with the website, which I personally think it's an advantage.
I am assuming that also with a separate repo we will have an identical web page (which wil be very useful!).
b) Via the list of PDEP PRs to review. In this case, to see only PDEP PRs, if we use the main pandas repo, this is just a label filter [2]. To me personally quicker than having to go to another repo, but no big difference about one or the other.
c) Notifications. I guess this is the main thing. I think one concern is that notifications from PDEPs get lost in the rest of the repo notifications. I assume you're using your email client filters, and if the notifications come from another repo, you can change the rules easily. I guess the solution here would be to use something like PDEP in the title and use that as a rule. Or we can try to find something more reliable, if that's the main concern.
Personally, I don't see the advantages of having the proposals in a separate repo very significant. And by keeping things the way they're implemented in the PR, I do see some advantages: - No need to maintain a separate repo, CI workflow, jobs to publish the build, sphinx (or equivalent) project... Nothing too complex, by why having to implement and maintain all that if our website is already prepared to handle it. And in particular, with Sphinx is not as easy as with out website to fetch the open PRs and render them. - Integrated UX of the PDEPs into our website. I think this gives it more visibility, and a better using experience than having to jump from one website to another.
I think it should certainly be possible to keep the website UX as you
implemented with a separate repo as well. There are for sure multiple options, but one (maybe simplest) option would be to keep the publishing in the main repo as you have now (since the website publishing lives there): for example the separate repo could additionally be cloned in the website workflow, and then that content is available as well (requiring to change the path in the script a bit).
The PDEP repo itself could further have only very limited CI?
- One of my concerns is that being in a separate repo we forget about
them. We're used to check PRs in the pandas repo, and we'll keep coming back to PRs about PDEPs until they're merged if they are in the main repo, but feels like being in a separate repo is easier to forget them when there is no recent activity and notifications.
It would be good to know if I miss any of your concerns. If I didn't, I'd say we can start with what's already implemented, which is almost ready to get merged, and if in the future you still think we can do better by using a separate repo, you can implement it, we have a discussion about it, and we move PDEPs to a separate repo if that makes sense. What do you think?
Cheers, Marc
1. https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/ 2. https://github.com/pandas-dev/pandas/pulls?q=is%3Aopen+is%3Apr+label%3APDEP
On Sat, Jun 25, 2022 at 7:05 AM Jeff Reback <jeffreback@gmail.com> wrote:
> +1 in using a separate repo (under pandas-dev) for this > > > On Jun 24, 2022, at 5:05 PM, Joris Van den Bossche < > jorisvandenbossche@gmail.com> wrote: > > > Thanks for starting this proposal, Marc! > > I have already been doing this in some ad-hoc way with eg the > Copy/View proposal (writing an actual proposal document), so I am very much > in favor of formalizing this a bit more. > > Personally, I would prefer that we use a more dedicated home for > this instead of using the existing pandas repo (e.g. a separate repo in the > pandas-dev org). The main pandas repo has nowadays such a high volume in > issue and PR comments, that it becomes difficult to follow this or notice > specific issues. While there are certainly ways to deal with this (e.g. > consistently using a specific label and title, ensuring we always notify > the mailing list as well, ...), IMO it would make it more accessible to > follow and have an overview of those discussions in e.g. a separate repo. > > (there are examples of both in other projects, for example > scikit-learn has a separate repo, while bumpy uses the main repo I think) > > Joris > > Op di 21 jun. 2022 09:46 schreef Marc Garcia <garcia.marc@gmail.com > >: > >> We're in the process of implementing PDEPs, equivalent to Python's >> PEPs and NumPy's NEPs, but for pandas. This should help build the roadmap, >> make discussions more efficient, obtain more structured feedback from the >> community, and add visibility to agreed future plans for pandas. >> >> The initial implementation (workflow) is a bit simpler than PEP or >> NEP, but we'll iterate in the future as convenient. >> >> You can see the PR for PDEP-1 with the purpose, scope and >> guidelines here: https://github.com/pandas-dev/pandas/pull/47444 >> >> Feedback is very welcome. >> _______________________________________________ >> Pandas-dev mailing list >> Pandas-dev@python.org >> https://mail.python.org/mailman/listinfo/pandas-dev >> > _______________________________________________ > Pandas-dev mailing list > Pandas-dev@python.org > https://mail.python.org/mailman/listinfo/pandas-dev > > _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
-- Matthew Roeschke _______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
_______________________________________________
Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
participants (6)
-
Jeff Reback -
Joris Van den Bossche -
Marc Garcia -
Matthew Roeschke -
Simon Hawkins -
Tom Augspurger