[Numpy-discussion] reorganizing numpy internal extensions (was: Re: Should we drop support for "one file" compilation mode?)

Tue Oct 6 15:04:59 EDT 2015

On Tue, Oct 6, 2015 at 11:52 AM, David Cournapeau <cournape at gmail.com> wrote:
>
>
> On Tue, Oct 6, 2015 at 7:30 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>
>> [splitting this off into a new thread]
>>
>> On Tue, Oct 6, 2015 at 3:00 AM, David Cournapeau <cournape at gmail.com>
>> wrote:
>> [...]
>> > I also agree the current situation is not sustainable -- as we discussed
>> > privately before, cythonizing numpy.core is made quite more complicated
>> > by
>> > this. I have myself quite a few issues w/ cythonizing the other parts of
>> > umath. I would also like to support the static link better than we do
>> > now
>> > (do we know some static link users we can contact to validate our
>> > approach
>> > ?)
>> >
>> > Currently, what we have in numpy core is the following:
>> >
>> > numpy.core.multiarray -> compilation units in numpy/core/src/multiarray/
>> > +
>> > statically link npymath
>> > numpy.core.umath -> compilation units in numpy/core/src/umath +
>> > statically
>> > link npymath/npysort + some shenanigans to use things in
>> > numpy.core.multiarray
>>
>> There are also shenanigans in the other direction - supposedly umath
>> is layered "above" multiarray, but in practice there are circular
>> dependencies (see e.g. np.set_numeric_ops).
>
> Indeed, I am not arguing about merging umath and multiarray.

Oh, okay :-).

>> > I would suggest to have a more layered approach, to enable both 'normal'
>> > build and static build, without polluting the public namespace too much.
>> > This is an approach followed by most large libraries (e.g. MKL), and is
>> > fairly flexible.
>> >
>> > Concretely, we could start by putting more common functionalities (aka
>> > the
>> > 'core' library) into its own static library. The API would be considered
>> > private to numpy (no stability guaranteed outside numpy), and every
>> > exported
>> > symbol from that library would be decorated appropriately to avoid
>> > potential
>> > clashes (e.g. '_npy_internal_').
>>
>> I don't see why we need this multi-layered complexity, though.
>
>
> For several reasons:
>
>  - when you want to cythonize either extension, it is much easier to
> separate it as cython for CPython API, C for the rest.

I don't think this will help much, because I think we'll want to have
multiple cython files, and that we'll probably move individual
functions between being implemented in C and Cython (including utility
functions). So that means we need to solve the problem of mixing C and
Cython files inside a single library.

If you look at Stefan's PR:
  https://github.com/numpy/numpy/pull/6408
it does solve most of these problems. It would help if Cython added a
few tweaks to officially support compiling multiple modules into one
.so, and I'm not sure whether the current code quite handles
initialization of the submodule correctly, but it's actually
surprisingly easy to make work.

(Obviously we won't want to go overboard here -- but the point of
removing the technical constraints is that then it frees us to pick
whatever arrangement makes the most sense, instead of deciding based
on what makes the build system and linker easiest.)

>  - if numpy.core.multiarray.so is built as cython-based .o + a 'large' C
> static library, it should become much simpler to support static link.

I don't see this at all, so I must be missing something? Either way
you build a bunch of .o files, and then you have to either combine
them into a shared library or combine them into a static library. Why
does pre-combining some of them into a static library make this
easier?

>  - maybe that's just personal, but I find the whole multiarray + umath quite
> beyond manageable in terms of intertwined complexity. You may argue it is
> not that big, and we all have different preferences in terms of
> organization, but if I look at the binary size of multiarray + umath, it is
> quite larger than the median size of the .so I have in my /usr/lib.

The binary size isn't a good measure here -- most of that is the
bazillions of copies of slightly tweaked loops that we auto-generate,
which take up a lot of space but don't add much intertwined
complexity. (Though now that I think about it, my LOC estimate was
probably a bit low because cloc is probably ignoring those
autogeneration template files.)

We definitely could do a better job with our internal APIs -- I just
think that'll be easiest if everything is in the same directory so
there are minimal obstacles to rearranging and refactoring things.

Anyway, it sounds like we agree that the next step is to merge
multiarray and umath, so possibly we should worry about doing that and
then see what makes sense from there :-).

-n

-- 
Nathaniel J. Smith -- http://vorpus.org