Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015

Aug. 26, 2015

      On Mi, 2015-08-26 at 00:05 -0700, Nathaniel Smith wrote:
...
On Tue, Aug 25, 2015 at 5:53 PM, David Cournapeau <cournape@gmail.com> wrote:
...
Thanks for the good summary Nathaniel.
Regarding dtype machinery, I agree casting is the hardest part. Unless the
code has changed dramatically, this was the main reason why you could not
make most of the dtypes separate from numpy codebase (I tried to move the
datetime dtype out of multiarray into a separate C extension some years
ago). Being able to separate the dtypes from the multiarray module would be
an obvious way to drive the internal API change.
For practical reasons I don't imagine we'll ever want to actually move
the core dtypes out of multiarray -- if nothing else they will always
remain a little bit special, like np.array([1.0, 2.0]) will just
"know" that this should use the float64 dtype. But yeah, in general a
good heuristic would be that -- aside from a few limited cases like
that -- we want to make built-in dtypes and user-defined dtypes use
the same APIs.
Well, casting is the conceptional hardest part. Marrying it to the rest
of numpy is probably just as hard ;).

With the chance of not having thought this through enough, maybe some
points about the general discussion. I think I would like some more
clarity of what we want and especially *need* [1].

From SciPy, there were two things I particularly remember:
1. the dtype/scalar issue
2. making an interface to make array-likes interaction more sane (this I
think can go quite far, and we are already going part of it)

The dtypes/scalars seem a particularly dark corner of numpy and if it is
feasible for us to replace it with something new, then I would be
willing to do some breaks for it (admittingly, given protest, I would
back down from that and another solution would be needed).

The point for me is, I currently think a dtype/scalar could get numpy a
big way, especially from the point of view of downstream packages. Of
course it would be harder to do in numpy then in something new, but it
should also be of much more immediate use.
Maybe I am going a bit too far with this right now, but I could imagine
that if we cannot clean up the dtype/scalars, numpy may indeed be doomed
or at least a brick slowing down a lot of other people.

And if it is not possible to do this without a numpy 2, then likely that
is the way to go. But I am not convinced we should aim to fix all the
other stuff at the same time. I am afraid it would just accumulate to
grow over everyones heads.
In other words, I think if we can muster the resources I would like to
see this problem attacked within numpy. If this proves impossible a new
dtype abstraction may well be reason for numpy 2, or used by a DyND or
similar? But I do believe we should not give up on Numpy here from the
start, at least I do not see a compelling reason to do. Instead giving
up on numpy seems like the last way out of a misery.
And much of the different opinions to me seem to be whether we think
this will clearly happen or not or has already happened (or maybe
whether it is too costly to do in numpy).

Cleaning it up, would open doors to many things. Note that I think it
would make the numpy source much less scary, because I think it is the
one big piece of code that is maybe not clearly a separate chunk [2].
After making it sane, I would argue that numpy does become much more
maintainable and extensible. From my current view, probably enough so
for a long time.
Also, I think it would give us abstraction to make different/new
projects work together better and if done well enough, some grand new
project set to replace numpy could reuse it.

Of course it is entirely possible that more things need to be changed in
numpy and that some others would be just as hard or even harder to do.
But if we can identify this as the "one big thing that gets us 90%" then
I refuse to give up hope of doing it in numpy just yet.

- Sebastian

[1] Travis has said quite a lot about it, but it is not yet clear to me
what is a priority/real pain point. Take "datashape" for example. By now
I think that the datashape is likely a good idea to make structured
arrays nicer, since it moves the "structured" part into the array object
and not the dtype, which makes sense to me. However, I am not convinced
that the datashape is something that would make numpy a compelling
amount better. In fact I could imagine that for many things it would
make it unnecessarily more complicated for users.

[2] Take indexing, I like to think I did not break that much when
redoing it (except on purpose, which I hope did not create much
trouble). In some sense indexing was simple to redo, because it does not
overlap at all with anything else directly. If we get dtypes/scalars
more separated, I think we are at a point where this is possible with
pretty much any part of numpy.
...
...
Regarding the use of cython in numpy, was there any discussion about the
compilation/size cost of using cython, and talking to the cython team to
improve this ? Or was that considered acceptable with current cython for
numpy. I am convinced cleanly separating the low level parts from the python
C API plumbing would be the single most important thing one could do to make
the codebase more amenable.
It's still a more blue-sky idea than that... the discussion was more
at the level of "is this something that is even worth trying to make
work and seeing where the problems are?"
The big immediate problem, before we got into code size issues, would
be that we would need to be able to compile a mix of .pyx files and .c
files into a single .so, while cython generated code currently makes
some strong assumptions about how each .pyx file will live in its own
.so. From playing around with it I suspect the first version of making
this work will be klugey indeed. But yeah, the thing to do would be
for someone to dig in and make the kluges and then decide how to clean
them up once you know where they are.
-n

Re: [Numpy-discussion] Notes from the numpy dev meeting at scipy 2015

Sebastian Berg