[pypy-dev] Questions on the pypy+numpy project

Mon Oct 17 21:40:14 CEST 2011

On Mon, Oct 17, 2011 at 7:20 PM, David Cournapeau <cournape at gmail.com> wrote:
> On Mon, Oct 17, 2011 at 2:22 PM, Michael Foord <fuzzyman at gmail.com> wrote:
>
>>
>> Travis' post seems to suggest that it is the responsibility of the *pypy*
>> dev team to do the work necessary to integrate the numpy refactor (initially
>> sponsored by Microsoft). That refactoring (smaller numpy core) seems like a
>> great way forward for numpy - particularly if *it* wants to play well with
>> multiple implementations, but it is unreasonable to expect the pypy team to
>> pick that up!
>
> I am pretty sure Travis did not intend to suggest that (I did not
> understand that from his wordings, but maybe that's because we had
> discussion in person on that topic several times already).
>
> There are a lot of reasons to do that refactor that has nothing to do
> with pypy, so the idea is more: let's talk about what pypy would need
> to make this refactor beneficial for pypy *as well*. I (and other)
> have advocated using more cython inside numpy and scipy. We could
> share resources to do that.

I think alex's question was whether the refactoring is going to be
merged upstream or not (and what's the plan).

I don't think you understand our point. Reusing the current numpy
implementation is not giving us much *even* if it was all Cython and
no C API. It's just that we can do cool stuff with the JIT. *Right
now* operation chain like this:

a, b, c = [numpy.arange(100) for i in range(3)]
a + b - c

becomes

...
i = 0
while i < 100:
  res[i] = a[i] + b[i] - c[i]

without allocating intermediates. In the near future we plan to
implement this using SSE so it becomes even faster. It also applies to
all kinds of operations that we implemented in RPython - ufuncs,
castings etc. All of them get unrolled into a single loop right now,
they can get nicely vectorized in the near future. Having numpy still
implementing stuff in C doesn't buy us much - we wouldn't be able to
do all the cool stuff we're doing now and we won't get all the
speedups. That's why we don't reuse the current numpy and not because
it uses C API.

Now the scenario is slightly different with FFT and other more complex
algorithms. We want to call existing C code with array pointers so we
don't have to reimplement it.

Now tell me - how us moving pieces of scipy or numpy to cython give us anything?

>
>> It seems odd to argue that extending numpy to pypy will be a net negative
>> for the community! Sure there are some difficulties involved, just as there
>> are difficulties with having multiple implementations in the first place,
>> but the benefits are much greater.
>
> The net negative would be the community split, with numpy losing some
> resources taken by numpy on pypy. This seems like a plausible
> situation.

So, you're saying that giving people the ability to run numpy code
faster if the refrain from using scipy and matplotlib (for now) is
producing the community split? How does it? My interpretation is that
we want to give people powerful tools that can be used to achieve
things not possible before - like not using cython but instead
implementing it in python. I imagine how someone might not get value
from that, but how does that decrease the value?

>
> Without a C numpy API, you can't have scipy or matplotlib, no
> scikit-learns, etc... But you could hide most of it behind cython,
> which has momentum in the scientific community. Then a realistic
> approach becomes:
>  - makes the cython+pypy backend a reality
>  - ideally make cython to wrap fortran a reality
>  - convert as much as possible from python C API to cython
>
> People of all level can participate. The first point in particular
> could help pypy besides the scipy community. And that's a plan where
> both parties would benefit from each other.

I think our priority right now is to provide a working numpy. Next
point is to make it use SSE. Does that fit somehow with your plan?

Cheers,
fijal