[Numpy-discussion] reorganizing numpy internal extensions (was: Re: Should we drop support for "one file" compilation mode?)
Charles R Harris
charlesr.harris at gmail.com
Tue Oct 6 15:19:41 EDT 2015
On Tue, Oct 6, 2015 at 1:04 PM, Nathaniel Smith <njs at pobox.com> wrote:
> On Tue, Oct 6, 2015 at 11:52 AM, David Cournapeau <cournape at gmail.com>
> wrote:
> >
> >
> > On Tue, Oct 6, 2015 at 7:30 PM, Nathaniel Smith <njs at pobox.com> wrote:
> >>
> >> [splitting this off into a new thread]
> >>
> >> On Tue, Oct 6, 2015 at 3:00 AM, David Cournapeau <cournape at gmail.com>
> >> wrote:
> >> [...]
> >> > I also agree the current situation is not sustainable -- as we
> discussed
> >> > privately before, cythonizing numpy.core is made quite more
> complicated
> >> > by
> >> > this. I have myself quite a few issues w/ cythonizing the other parts
> of
> >> > umath. I would also like to support the static link better than we do
> >> > now
> >> > (do we know some static link users we can contact to validate our
> >> > approach
> >> > ?)
> >> >
> >> > Currently, what we have in numpy core is the following:
> >> >
> >> > numpy.core.multiarray -> compilation units in
> numpy/core/src/multiarray/
> >> > +
> >> > statically link npymath
> >> > numpy.core.umath -> compilation units in numpy/core/src/umath +
> >> > statically
> >> > link npymath/npysort + some shenanigans to use things in
> >> > numpy.core.multiarray
> >>
> >> There are also shenanigans in the other direction - supposedly umath
> >> is layered "above" multiarray, but in practice there are circular
> >> dependencies (see e.g. np.set_numeric_ops).
> >
> > Indeed, I am not arguing about merging umath and multiarray.
>
> Oh, okay :-).
>
> >> > I would suggest to have a more layered approach, to enable both
> 'normal'
> >> > build and static build, without polluting the public namespace too
> much.
> >> > This is an approach followed by most large libraries (e.g. MKL), and
> is
> >> > fairly flexible.
> >> >
> >> > Concretely, we could start by putting more common functionalities (aka
> >> > the
> >> > 'core' library) into its own static library. The API would be
> considered
> >> > private to numpy (no stability guaranteed outside numpy), and every
> >> > exported
> >> > symbol from that library would be decorated appropriately to avoid
> >> > potential
> >> > clashes (e.g. '_npy_internal_').
> >>
> >> I don't see why we need this multi-layered complexity, though.
> >
> >
> > For several reasons:
> >
> > - when you want to cythonize either extension, it is much easier to
> > separate it as cython for CPython API, C for the rest.
>
> I don't think this will help much, because I think we'll want to have
> multiple cython files, and that we'll probably move individual
> functions between being implemented in C and Cython (including utility
> functions). So that means we need to solve the problem of mixing C and
> Cython files inside a single library.
>
> If you look at Stefan's PR:
> https://github.com/numpy/numpy/pull/6408
> it does solve most of these problems. It would help if Cython added a
> few tweaks to officially support compiling multiple modules into one
> .so, and I'm not sure whether the current code quite handles
> initialization of the submodule correctly, but it's actually
> surprisingly easy to make work.
>
> (Obviously we won't want to go overboard here -- but the point of
> removing the technical constraints is that then it frees us to pick
> whatever arrangement makes the most sense, instead of deciding based
> on what makes the build system and linker easiest.)
>
> > - if numpy.core.multiarray.so is built as cython-based .o + a 'large' C
> > static library, it should become much simpler to support static link.
>
> I don't see this at all, so I must be missing something? Either way
> you build a bunch of .o files, and then you have to either combine
> them into a shared library or combine them into a static library. Why
> does pre-combining some of them into a static library make this
> easier?
>
> > - maybe that's just personal, but I find the whole multiarray + umath
> quite
> > beyond manageable in terms of intertwined complexity. You may argue it is
> > not that big, and we all have different preferences in terms of
> > organization, but if I look at the binary size of multiarray + umath, it
> is
> > quite larger than the median size of the .so I have in my /usr/lib.
>
> The binary size isn't a good measure here -- most of that is the
> bazillions of copies of slightly tweaked loops that we auto-generate,
> which take up a lot of space but don't add much intertwined
> complexity. (Though now that I think about it, my LOC estimate was
> probably a bit low because cloc is probably ignoring those
> autogeneration template files.)
>
> We definitely could do a better job with our internal APIs -- I just
> think that'll be easiest if everything is in the same directory so
> there are minimal obstacles to rearranging and refactoring things.
>
> Anyway, it sounds like we agree that the next step is to merge
> multiarray and umath, so possibly we should worry about doing that and
> then see what makes sense from there :-).
>
>
What about removing the single file build? That seems somewhat orthogonal
to this discussion. Would someone explain to me the advantages of the
single file build for static linking, apart from possible doing a better
job of hiding symbols? If symbols are the problem, it there not a solution
we could implement?
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20151006/6abe4b0c/attachment.html>
More information about the NumPy-Discussion
mailing list