On Tue, Sep 1, 2015 at 8:16 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Sun, Aug 30, 2015 at 2:44 PM, David Cournapeau <cournape@gmail.com> wrote:
> Hi there,
>
> Reading Nathaniel summary from the numpy dev meeting, it looks like there is
> a consensus on using cython in numpy for the Python-C interfaces.
>
> This has been on my radar for a long time: that was one of my rationale for
> splitting multiarray into multiple "independent" .c files half a decade ago.
> I took the opportunity of EuroScipy sprints to look back into this, but
> before looking more into it, I'd like to make sure I am not going astray:
>
> 1. The transition has to be gradual

Yes, definitely.

> 2. The obvious way I can think of allowing cython in multiarray is modifying
> multiarray such as cython "owns" the PyMODINIT_FUNC and the module
> PyModuleDef table.

The seems like a plausible place to start.

In the longer run, I think we'll need to figure out a strategy to have
source code divided over multiple .pyx files (for the same reason we
want multiple .c files -- it'll just be impossible to work with
otherwise). And this will be difficult for annoying technical reasons,
since we definitely do *not* want to increase the API surface exposed
by multiarray.so, so we will need to compile these multiple .pyx and
.c files into a single module, and have them talk to each other via
internal interfaces. But Cython is currently very insistent that every
.pyx file should be its own extension module, and the interface
between different files should be via public APIs.

I spent some time poking at this, and I think it's possible but will
take a few kluges at least initially. IIRC the tricky points I noticed
are:

- For everything except the top-level .pyx file, we'd need to call the
generated module initialization functions "by hand", and have a bit of
utility code to let us access the symbol tables for the resulting
modules

- We'd need some preprocessor hack (or something?) to prevent the
non-main module initialization functions from being exposed at the .so
level (like 'cdef extern from "foo.h"', 'foo.h' re#defines
PyMODINIT_FUNC to remove the visibility declaration)

- By default 'cdef' functions are name-mangled, which is annoying if
you want to be able to do direct C calls between different .pyx and .c
files. You can fix this by adding a 'public' declaration to your cdef
function. But 'public' also adds dllexport stuff which would need to
be hacked out as per above.

I think the best strategy for this is to do whatever horrible things
are necessary to get an initial version working (on a branch, of
course), and then once that's done assess what changes we want to ask
the cython folks for to let us eliminate the gross parts.

Agreed.

Regarding multiple cython .pyx and symbol pollution, I think it would be fine to have an internal API with the required prefix (say `_npy_cpy_`) in a core library, and control the exported symbols at the .so level. This is how many large libraries work in practice (e.g. MKL), and is a model well understood by library users.

I will start the cythonize process without caring about any of that though: one large .pyx file, and everything build together by putting everything in one .so. That will avoid having to fight both cython and distutils at the same time :)

David

(Insisting on compiling everything into the same .so will probably
also help at some point in avoiding Cython-Related Binary Size Blowup
Syndrome (CRBSBS), because the masses of boilerplate could in
principle be shared between the different files. I think some modern
linkers are even clever enough to eliminate this kind of duplicate
code automatically, since C++ suffers from a similar problem.)

> 3. We start using cython for the parts that are mostly menial refcount work.
> Things like functions in calculation.c are obvious candidates.
>
> Step 2 should not be disruptive, and does not look like a lot of work: there
> are < 60 methods in the table, and most of them should be fairly
> straightforward to cythonize. At worse, we could just keep them as is
> outside cython and just "export" them in cython.
>
> Does that sound like an acceptable plan ?
>
> If so, I will start working on a PR to work on 2.

Makes sense to me!

-n

--
Nathaniel J. Smith -- http://vorpus.org
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion