[Numpy-discussion] New NEP: merging multiarray and umath

Nathaniel Smith njs at pobox.com
Thu Mar 8 03:25:00 EST 2018


Hi all,

Well, this is something that we've discussed for a while and I think
generally has consensus already, but I figured I'd write it down
anyway to make sure.

There's a rendered version here:
https://github.com/njsmith/numpy/blob/nep-0015-merge-multiarray-umath/doc/neps/nep-0015-merge-multiarray-umath.rst

-----

============================
Merging multiarray and umath
============================

:Author: Nathaniel J. Smith <njs at pobox.com>
:Status: Draft
:Type: Standards Track
:Created: 2018-02-22


Abstract
--------

Let's merge ``numpy.core.multiarray`` and ``numpy.core.umath`` into a
single extension module, and deprecate ``np.set_numeric_ops``.


Background
----------

Currently, numpy's core C code is split between two separate extension
modules.

``numpy.core.multiarray`` is built from
``numpy/core/src/multiarray/*.c``, and contains the core array
functionality (in particular, the ``ndarray`` object).

``numpy.core.umath`` is built from ``numpy/core/src/umath/*.c``, and
contains the ufunc machinery.

These two modules each expose their own separate C API, accessed via
``import_multiarray()`` and ``import_umath()`` respectively. The idea
is that they're supposed to be independent modules, with
``multiarray`` as a lower-level layer with ``umath`` built on top. In
practice this has turned out to be problematic.

First, the layering isn't perfect: when you write ``ndarray +
ndarray``, this invokes ``ndarray.__add__``, which then calls the
ufunc ``np.add``. This means that ``ndarray`` needs to know about
ufuncs – so instead of a clean layering, we have a circular
dependency. To solve this, ``multiarray`` exports a somewhat
terrifying function called ``set_numeric_ops``. The bootstrap
procedure each time you ``import numpy`` is:

1. ``multiarray`` and its ``ndarray`` object are loaded, but
   arithmetic operations on ndarrays are broken.

2. ``umath`` is loaded.

3. ``set_numeric_ops`` is used to monkeypatch all the methods like
   ``ndarray.__add__`` with objects from ``umath``.

In addition, ``set_numeric_ops`` is exposed as a public API,
``np.set_numeric_ops``.

Furthermore, even when this layering does work, it ends up distorting
the shape of our public ABI. In recent years, the most common reason
for adding new functions to ``multiarray``\'s "public" ABI is not that
they really need to be public or that we expect other projects to use
them, but rather just that we need to call them from ``umath``. This
is extremely unfortunate, because it makes our public ABI
unnecessarily large, and since we can never remove things from it then
this creates an ongoing maintenance burden. The way C works, you can
have internal API that's visible to everything inside the same
extension module, or you can have a public API that everyone can use;
you can't have an API that's visible to multiple extension modules
inside numpy, but not to external users.

We've also increasingly been putting utility code into
``numpy/core/src/private/``, which now contains a bunch of files which
are ``#include``\d twice, once into ``multiarray`` and once into
``umath``. This is pretty gross, and is purely a workaround for these
being separate C extensions.


Proposed changes
----------------

This NEP proposes three changes:

1. We should start building ``numpy/core/src/multiarray/*.c`` and
   ``numpy/core/src/umath/*.c`` together into a single extension
   module.

2. Instead of ``set_numeric_ops``, we should use some new, private API
   to set up ``ndarray.__add__`` and friends.

3. We should deprecate, and eventually remove, ``np.set_numeric_ops``.


Non-proposed changes
--------------------

We don't necessarily propose to throw away the distinction between
multiarray/ and umath/ in terms of our source code organization:
internal organization is useful! We just want to build them together
into a single extension module. Of course, this does open the door for
potential future refactorings, which we can then evaluate based on
their merits as they come up.

It also doesn't propose that we break the public C ABI. We should
continue to provide ``import_multiarray()`` and ``import_umath()``
functions – it's just that now both ABIs will ultimately be loaded
from the same C library. Due to how ``import_multiarray()`` and
``import_umath()`` are written, we'll also still need to have modules
called ``numpy.core.multiarray`` and ``numpy.core.umath``, and they'll
need to continue to export ``_ARRAY_API`` and ``_UFUNC_API`` objects –
but we can make one or both of these modules be tiny shims that simply
re-export the magic API object from where-ever it's actually defined.
(See ``numpy/core/code_generators/generate_{numpy,ufunc}_api.py`` for
details of how these imports work.)


Backward compatibility
----------------------

The only compatibility break is the deprecation of ``np.set_numeric_ops``.


Alternatives
------------

n/a


Discussion
----------

TBD


Copyright
---------

This document has been placed in the public domain.


-- 
Nathaniel J. Smith -- https://vorpus.org


More information about the NumPy-Discussion mailing list