Proposal: split numpy.distutils into it's own package

Hey all, [copied from https://github.com/numpy/numpy/issues/17620, as requested by the feature request guidelines] Cross-compiling scipy and other projects that depend on numpy's distutils is a huge pain right now, because to do it [in addition to lots of other details that you have to get right] you have to have both a native and cross-compiled version of numpy installed. It seems pretty unreasonable that I need a native version of numpy installed to compile scipy. One might ask, why is this needed? Well, scipy's setup.py uses numpy's distutils fork for the fortran support (I think). Because numpy.distutils is a subpackage of numpy, if you're working with a cross-compiled version of numpy, it eventually tries to import some .so that wasn't compiled for your host and everything dies -- thus you have to have a native numpy installed to allow numpy.distutils to import correctly. As far as I can tell, numpy's distutils fork is pure python and doesn't actually use anything else in numpy itself, and so is completely separable from numpy. If it were its own top-level package that scipy et al could use, then cross-compiling would be significantly less tricky. The mechanics of this would work like so: * contents of numpy.distutils would be moved to 'numpy_distutils' package * importing numpy.distutils would raise a deprecation warning and redirect to numpy_distutils, probably using import hook magic * scipy and other packages would now utilize numpy_distutils instead of numpy.distutils at build time Initially, I considered proposing creating a separate pypi package for numpy_distutils, but I expect that would be more trouble than it's worth. One could also propose creating a new PEP 517 build backend, or move to cmake, or other huge changes, but those also seem more trouble than they're worth. Thanks for your consideration. Dustin

Separating it is definitely a good idea. The only thing that I think would be better would be if they key features that are not in setuptools could be added there so that NumPy distutils could eventually be retired. Kevin On Sat, Oct 24, 2020, 08:12 Dustin Spicuzza <dustin@virtualroadside.com> wrote:

la, 2020-10-24 kello 03:11 -0400, Dustin Spicuzza kirjoitti:
Factoring out numpy.distutils from numpy alone will not enable compiling scipy without numpy being installed. It probably can help, though, and might make sense also in view of the incoming deprecation of Python distutils (https://www.python.org/dev/peps/pep-0632/). Extension modules, including f2py, need numpy headers and probably also their platform-specific configuration. There are also some assumptions about data type sizes and Numpy versions at build-time being compatible with the ones at runtime, which factoring out distutils won't address. IIUC, cross-compilation is not actually supported, so that it can be made to work is surprising. Pauli

On 10/24/20 2:59 PM, Dustin Spicuzza wrote:
I took a first stab at it, and... surprise, surprise, there were a few more warts than I had originally expected in my initial survey. The biggest unexpected result is that numpy.f2py would need to also be a toplevel package. I did get the refactor cross-compiled and started on scipy, but there's a few more issues that will have to be resolved on the scipy side. I posted a detailed set of notes on the issue (#17620) and made a draft PR with my initial results (#17632) if you want to get a sense for how invasive this is (or isn't depending on your point of view). Dustin

On 10/25/20 10:46 AM, Dustin Spicuzza wrote:
Is there a way to do this without modifying SciPy? That would reassure me that this change will not break other peoples' workflow. It is hard to believe that only SciPy uses numpy.distutils. If the changes break backward compatibility, they need to be done like any other deprecation: warn for 4 releases (two years) before actually breaking workflows. Matti

NumPy could take an explicit runtime dependency on numpy-distutils so that the code would be technically in a different repo bit would always be available through NumPy. Eventually this could be removed as a runtime dependency. Kevin On Sun, Oct 25, 2020, 09:23 Matti Picus <matti.picus@gmail.com> wrote:

On Sun, Oct 25, 2020 at 1:20 PM Kevin Sheppard <kevin.k.sheppard@gmail.com> wrote:
I put some more thoughts in https://github.com/numpy/numpy/issues/17620. We cannot remove numpy.distutils, so that separate package may be needed for cross-compilation but we don't need to use it in NumPy itself.
The goal is to modify SciPy here, so it can be cross-compiled. That would reassure
me that this change will not break other peoples' workflow. It is hard to believe that only SciPy uses numpy.distutils.
That is indeed not the case, numpy.distutils is widely used. The `numpy.distutils` namespace must remain accessible imho. Cheers, Ralf If the changes break

On 10/25/20 5:23 AM, Matti Picus wrote:
Sorry for not being clear, when I was discussing modifications to scipy I was referring to the specific use case of cross-compilation. The goal is that existing native builds would not break backwards compatibility. To that end, there's a package redirection stub in my PR for both numpy.distutils and numpy.f2py. Just tried a native build using my current PR branch and at the moment scipy doesn't work. However, it's a size mismatch during compilation as opposed to an ImportError, so I probably just missed a subtlety when I moved things. But I would definitely expect the finalized version of this set of changes should not break existing users. Dustin

Separating it is definitely a good idea. The only thing that I think would be better would be if they key features that are not in setuptools could be added there so that NumPy distutils could eventually be retired. Kevin On Sat, Oct 24, 2020, 08:12 Dustin Spicuzza <dustin@virtualroadside.com> wrote:

la, 2020-10-24 kello 03:11 -0400, Dustin Spicuzza kirjoitti:
Factoring out numpy.distutils from numpy alone will not enable compiling scipy without numpy being installed. It probably can help, though, and might make sense also in view of the incoming deprecation of Python distutils (https://www.python.org/dev/peps/pep-0632/). Extension modules, including f2py, need numpy headers and probably also their platform-specific configuration. There are also some assumptions about data type sizes and Numpy versions at build-time being compatible with the ones at runtime, which factoring out distutils won't address. IIUC, cross-compilation is not actually supported, so that it can be made to work is surprising. Pauli

On 10/24/20 2:59 PM, Dustin Spicuzza wrote:
I took a first stab at it, and... surprise, surprise, there were a few more warts than I had originally expected in my initial survey. The biggest unexpected result is that numpy.f2py would need to also be a toplevel package. I did get the refactor cross-compiled and started on scipy, but there's a few more issues that will have to be resolved on the scipy side. I posted a detailed set of notes on the issue (#17620) and made a draft PR with my initial results (#17632) if you want to get a sense for how invasive this is (or isn't depending on your point of view). Dustin

On 10/25/20 10:46 AM, Dustin Spicuzza wrote:
Is there a way to do this without modifying SciPy? That would reassure me that this change will not break other peoples' workflow. It is hard to believe that only SciPy uses numpy.distutils. If the changes break backward compatibility, they need to be done like any other deprecation: warn for 4 releases (two years) before actually breaking workflows. Matti

NumPy could take an explicit runtime dependency on numpy-distutils so that the code would be technically in a different repo bit would always be available through NumPy. Eventually this could be removed as a runtime dependency. Kevin On Sun, Oct 25, 2020, 09:23 Matti Picus <matti.picus@gmail.com> wrote:

On Sun, Oct 25, 2020 at 1:20 PM Kevin Sheppard <kevin.k.sheppard@gmail.com> wrote:
I put some more thoughts in https://github.com/numpy/numpy/issues/17620. We cannot remove numpy.distutils, so that separate package may be needed for cross-compilation but we don't need to use it in NumPy itself.
The goal is to modify SciPy here, so it can be cross-compiled. That would reassure
me that this change will not break other peoples' workflow. It is hard to believe that only SciPy uses numpy.distutils.
That is indeed not the case, numpy.distutils is widely used. The `numpy.distutils` namespace must remain accessible imho. Cheers, Ralf If the changes break

On 10/25/20 5:23 AM, Matti Picus wrote:
Sorry for not being clear, when I was discussing modifications to scipy I was referring to the specific use case of cross-compilation. The goal is that existing native builds would not break backwards compatibility. To that end, there's a package redirection stub in my PR for both numpy.distutils and numpy.f2py. Just tried a native build using my current PR branch and at the moment scipy doesn't work. However, it's a size mismatch during compilation as opposed to an ImportError, so I probably just missed a subtlety when I moved things. But I would definitely expect the finalized version of this set of changes should not break existing users. Dustin
participants (5)
-
Dustin Spicuzza
-
Kevin Sheppard
-
Matti Picus
-
Pauli Virtanen
-
Ralf Gommers