Split docscrape out from numpydoc
Hi all, A while ago I proposed splitting docscrape out from the numpydoc repo: https://github.com/numpy/numpydoc/issues/619. Why? SciPy has a copy of the docscrape source which is used to generate some docstrings for the public API. Copying source code like this is never great given that the two copies can fall out of sync. Vendoring the entire numpydoc repo in SciPy, however, or adding numpydoc as a runtime dependency, seem off the table, making the situation worse rather than better. A better solution seems to be to have docscrape be a standalone project, which SciPy can vendor more easily, and which numpydoc can either depend on or vendor. From a modularity perspective at least this seems ideal — there are use-cases where you want docscrape, but not numpydoc, available at runtime. Joren suggested that I float this idea on the mailing list, given that he just submitted a patch to the copy in SciPy without realising that it was vendored code 😄. What I don't know is whether this would negatively impact maintenance burden etc.? I assume there will be a bit of upfront cost in restructuring the repos, and a little more if we decide to distribute docscrape as a standalone project, but my hope would be that this wouldn't cause any problems long-term? An alternative solution would be to extract the docscrape source from the numpydoc repo in a vendoring script in SciPy. While okay, that still leaves us having to track commits by hand instead of using e.g. a git submodule, and isn't robust to upstream changes of directory structure. Feedback appreciated! Eric Larson responded on the PR asking whether SciPy can introduce a dependency on numpydoc, but that is all so far. Cheers, Lucas
Hi Lucas, Indeed I think this is a good idea - SciPy is not the only project that depends (or would like to have depended) on the NumpyDocString and docscrape functionality without pulling in Sphinx (or any other dependency). One solution is to refactor numpydoc to fix the dependency footprint - i.e. make sphinx a soft dependency. That work is mostly complete in [numpy/numpydoc#651](https://github.com/numpy/numpydoc/pull/651). There are a few final integration tests (see checkboxes in the top post in the PR) that I'd like to run to build confidence that the solution works for everyone. That should include replacing the vendored code in scipy. Does this solution work for scipy? On Mon, Apr 20, 2026 at 5:06 AM Lucas Colley via NumPy-Discussion < numpy-discussion@python.org> wrote:
Hi all,
A while ago I proposed splitting docscrape out from the numpydoc repo: https://github.com/numpy/numpydoc/issues/619.
Why? SciPy has a copy of the docscrape source which is used to generate some docstrings for the public API. Copying source code like this is never great given that the two copies can fall out of sync. Vendoring the entire numpydoc repo in SciPy, however, or adding numpydoc as a runtime dependency, seem off the table, making the situation worse rather than better.
A better solution seems to be to have docscrape be a standalone project, which SciPy can vendor more easily, and which numpydoc can either depend on or vendor. From a modularity perspective at least this seems ideal — there are use-cases where you want docscrape, but not numpydoc, available at runtime. Joren suggested that I float this idea on the mailing list, given that he just submitted a patch to the copy in SciPy without realising that it was vendored code 😄.
What I don't know is whether this would negatively impact maintenance burden etc.? I assume there will be a bit of upfront cost in restructuring the repos, and a little more if we decide to distribute docscrape as a standalone project, but my hope would be that this wouldn't cause any problems long-term?
An alternative solution would be to extract the docscrape source from the numpydoc repo in a vendoring script in SciPy. While okay, that still leaves us having to track commits by hand instead of using e.g. a git submodule, and isn't robust to upstream changes of directory structure.
Feedback appreciated! Eric Larson responded on the PR asking whether SciPy can introduce a dependency on numpydoc, but that is all so far.
Cheers, Lucas _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: rossbar@berkeley.edu
That sounds great! I’m still not sure if SciPy will want to add a runtime dependency on numpydoc, since we only have `numpy` at the minut
IIUC you'll be adding a dependency either way, whether it's `numpydoc` or some vendored library that encapsulates just the getdocobject/NumpyDocString (I'd vote for `numpydoc-parser` as the name for this potential sub-library).
Might it still make sense to split docscrape out into a separate repo after your work is done, or do you see those as two orthogonal efforts?
No I don't think they're necessarily orthogonal. Nearly all of the complexity associated with numpydoc stems from the sphinx/validation interfaces. The bits actually dedicated to the docstring parsing are much more straightforward. Splitting this into a separate library slightly complicates the numpydoc maintenance/release process, but given how infrequently the parsing code actually changes I wouldn't expect the pinning of `numpydoc-parser` within the `numpydoc` sphinx extension to be a prohibitive burden. I'm not against it, but nor am I involved in the numpydoc release process - other numpydoc mainters' opinions are much more valuable here! From a downstream library standpoint, the practical differences between a dependency-less numpydoc and a separate numpydoc library should be minimal; however, if that's a blocker for scipy then it's a stronger motivation for splitting things up IMV! ~Ross On Mon, Apr 20, 2026 at 12:52 PM Lucas Colley <lucas.colley8@gmail.com> wrote:
Hi Ross,
That sounds great! I’m still not sure if SciPy will want to add a runtime dependency on numpydoc, since we only have `numpy` at the minute… but this is at least a step in the right direction!
Might it still make sense to split docscrape out into a separate repo after your work is done, or do you see those as two orthogonal efforts?
Cheers, Lucas
On 20 Apr 2026, at 17:16, Ross Barnowski <rossbar15@gmail.com> wrote:
Hi Lucas,
Indeed I think this is a good idea - SciPy is not the only project that depends (or would like to have depended) on the NumpyDocString and docscrape functionality without pulling in Sphinx (or any other dependency).
One solution is to refactor numpydoc to fix the dependency footprint - i.e. make sphinx a soft dependency. That work is mostly complete in [numpy/numpydoc#651](https://github.com/numpy/numpydoc/pull/651). There are a few final integration tests (see checkboxes in the top post in the PR) that I'd like to run to build confidence that the solution works for everyone. That should include replacing the vendored code in scipy.
Does this solution work for scipy?
On Mon, Apr 20, 2026 at 5:06 AM Lucas Colley via NumPy-Discussion < numpy-discussion@python.org> wrote:
Hi all,
A while ago I proposed splitting docscrape out from the numpydoc repo: https://github.com/numpy/numpydoc/issues/619.
Why? SciPy has a copy of the docscrape source which is used to generate some docstrings for the public API. Copying source code like this is never great given that the two copies can fall out of sync. Vendoring the entire numpydoc repo in SciPy, however, or adding numpydoc as a runtime dependency, seem off the table, making the situation worse rather than better.
A better solution seems to be to have docscrape be a standalone project, which SciPy can vendor more easily, and which numpydoc can either depend on or vendor. From a modularity perspective at least this seems ideal — there are use-cases where you want docscrape, but not numpydoc, available at runtime. Joren suggested that I float this idea on the mailing list, given that he just submitted a patch to the copy in SciPy without realising that it was vendored code 😄.
What I don't know is whether this would negatively impact maintenance burden etc.? I assume there will be a bit of upfront cost in restructuring the repos, and a little more if we decide to distribute docscrape as a standalone project, but my hope would be that this wouldn't cause any problems long-term?
An alternative solution would be to extract the docscrape source from the numpydoc repo in a vendoring script in SciPy. While okay, that still leaves us having to track commits by hand instead of using e.g. a git submodule, and isn't robust to upstream changes of directory structure.
Feedback appreciated! Eric Larson responded on the PR asking whether SciPy can introduce a dependency on numpydoc, but that is all so far.
Cheers, Lucas _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: rossbar@berkeley.edu
Hi Ross,
On 21 Apr 2026, at 16:44, Ross Barnowski <rossbar15@gmail.com> wrote:
That sounds great! I’m still not sure if SciPy will want to add a runtime dependency on numpydoc, since we only have `numpy` at the minut
IIUC you'll be adding a dependency either way, whether it's `numpydoc` or some vendored library that encapsulates just the getdocobject/NumpyDocString (I'd vote for `numpydoc-parser` as the name for this potential sub-library).
There is a pretty big difference between a dependency in the sense of https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#depend... and a vendored dependency, at least for SciPy. With a vendored dependency, we have complete control over the (single) version, and importantly users can install a different version in their environment alongside SciPy without issue. With a proper dependency, we would be compelled to support a large range of versions of numpydoc to avoid making it impossible to install SciPy alongside other packages which overzealously pin the version of numpydoc. That has the potential to create extra maintenance churn for little benefit. It would also be one extra point of variability to complicate the release process, which in turn is probably significant from a security perspective. Anyway, `numpydoc-parser` sounds good to me! Let’s summarise that idea into the linked issue, possibly after any input from the numpydoc release team? Cheers, Lucas
Might it still make sense to split docscrape out into a separate repo after your work is done, or do you see those as two orthogonal efforts?
No I don't think they're necessarily orthogonal. Nearly all of the complexity associated with numpydoc stems from the sphinx/validation interfaces. The bits actually dedicated to the docstring parsing are much more straightforward. Splitting this into a separate library slightly complicates the numpydoc maintenance/release process, but given how infrequently the parsing code actually changes I wouldn't expect the pinning of `numpydoc-parser` within the `numpydoc` sphinx extension to be a prohibitive burden. I'm not against it, but nor am I involved in the numpydoc release process - other numpydoc mainters' opinions are much more valuable here!
From a downstream library standpoint, the practical differences between a dependency-less numpydoc and a separate numpydoc library should be minimal; however, if that's a blocker for scipy then it's a stronger motivation for splitting things up IMV!
~Ross
On Mon, Apr 20, 2026 at 12:52 PM Lucas Colley <lucas.colley8@gmail.com <mailto:lucas.colley8@gmail.com>> wrote:
Hi Ross,
That sounds great! I’m still not sure if SciPy will want to add a runtime dependency on numpydoc, since we only have `numpy` at the minute… but this is at least a step in the right direction!
Might it still make sense to split docscrape out into a separate repo after your work is done, or do you see those as two orthogonal efforts?
Cheers, Lucas
On 20 Apr 2026, at 17:16, Ross Barnowski <rossbar15@gmail.com <mailto:rossbar15@gmail.com>> wrote:
Hi Lucas,
Indeed I think this is a good idea - SciPy is not the only project that depends (or would like to have depended) on the NumpyDocString and docscrape functionality without pulling in Sphinx (or any other dependency).
One solution is to refactor numpydoc to fix the dependency footprint - i.e. make sphinx a soft dependency. That work is mostly complete in [numpy/numpydoc#651](https://github.com/numpy/numpydoc/pull/651). There are a few final integration tests (see checkboxes in the top post in the PR) that I'd like to run to build confidence that the solution works for everyone. That should include replacing the vendored code in scipy.
Does this solution work for scipy?
On Mon, Apr 20, 2026 at 5:06 AM Lucas Colley via NumPy-Discussion <numpy-discussion@python.org <mailto:numpy-discussion@python.org>> wrote:
Hi all,
A while ago I proposed splitting docscrape out from the numpydoc repo: https://github.com/numpy/numpydoc/issues/619.
Why? SciPy has a copy of the docscrape source which is used to generate some docstrings for the public API. Copying source code like this is never great given that the two copies can fall out of sync. Vendoring the entire numpydoc repo in SciPy, however, or adding numpydoc as a runtime dependency, seem off the table, making the situation worse rather than better.
A better solution seems to be to have docscrape be a standalone project, which SciPy can vendor more easily, and which numpydoc can either depend on or vendor. From a modularity perspective at least this seems ideal — there are use-cases where you want docscrape, but not numpydoc, available at runtime. Joren suggested that I float this idea on the mailing list, given that he just submitted a patch to the copy in SciPy without realising that it was vendored code 😄.
What I don't know is whether this would negatively impact maintenance burden etc.? I assume there will be a bit of upfront cost in restructuring the repos, and a little more if we decide to distribute docscrape as a standalone project, but my hope would be that this wouldn't cause any problems long-term?
An alternative solution would be to extract the docscrape source from the numpydoc repo in a vendoring script in SciPy. While okay, that still leaves us having to track commits by hand instead of using e.g. a git submodule, and isn't robust to upstream changes of directory structure.
Feedback appreciated! Eric Larson responded on the PR asking whether SciPy can introduce a dependency on numpydoc, but that is all so far.
Cheers, Lucas _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org <mailto:numpy-discussion@python.org> To unsubscribe send an email to numpy-discussion-leave@python.org <mailto:numpy-discussion-leave@python.org> https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: rossbar@berkeley.edu <mailto:rossbar@berkeley.edu>
participants (2)
-
Lucas Colley -
Ross Barnowski