[pypy-dev] numpypy array expressions

Tue Aug 28 12:44:04 CEST 2012

Hey Armin,

On 28 August 2012 11:19, Armin Rigo <arigo at tunes.org> wrote:
> Hi Mark,
>
> On Mon, Aug 27, 2012 at 3:09 PM, mark florisson
> <markflorisson88 at gmail.com> wrote:
>> For this year's summer of code, and for my master dissertation, we
>> created a project to compile array expressions efficiently (Dag was my
>> mentor, CCed), which can be found here:
>> https://github.com/markflorisson88/minivect , the thesis is under
>> subdirectory 'thesis'.
>
> Thanks for mentioning it.  It is a rather different approach than the
> one we are playing with in PyPy.  It looks like a project that is
> independent of CPython-vs-PyPy: doing some computation on AST trees
> and producing some compiled form, possibly with llvm.  This works
> independently on the underlying interpreter, but makes some unchecked
> assumptions e.g. on the exact Python types involved in the expression.

Indeed, it is a low-level compiler, things like type checking (other
than numeric promotion) should be performed by the front-end (in this
case a lazy evaluation of numpy).

> By contrast, the approach in PyPy is at a different level, based on
> JITting by automatic transformation of the interpreter (like we do to
> run Python).  The idea is to automatically transform the part of
> numpypy that interprets numpy expressions, to get a JIT out of it; and
> then the result is that at run-time the evaluation of numpy
> expressions is JITted.  E.g. there is no AST anywhere in our approach.
>  It integrates naturally the JITted code produced out of Python and
> the JITted code out of the numpypy parts.

Thanks for the clarification, I thought that was the case.

> Someone can feel free to try to plug your approach into PyPy --- I am
> not saying no, but I am saying that it would be unlikely that you
> could reuse any of the infrastructure that we already have.

Understandable. How it would work for a lazy evaluating numpy, or for
projects like Theano, is that they convert their expression graph at
runtime to minivect after performing all sanity checks/optimizations,
to generate native code. If the code generation part is already
figured out, then there may not be too much sense in reusing the
project.

> Moreover, if it does not contains a lot of tests, none of your code is
> any use for us --- this point is the way we work, but also the only
> sane way to write anything in RPython, because otherwise the
> 45-minutes compilation cycle would kill us.

Indeed, most of the tests are actually system tests part of Cython:
https://github.com/markflorisson88/cython/tree/_array_expressions/tests/array_expressions
. Writing a more comprehensive test suite as part of minivect is on
the todo list.

> As for reusing some ideas, maybe, but it's still a long way off for
> PyPy.  Our goal is to first be as compatible as possible, and
> completely transparent to the user, whatever his numpy source code
> does.
>

That should obviously have priority :) In any case, if it at some
point does reach such a stage and you're shooting for more
performance, you know where to find us.

Rewriting all optimizations may be some work, but then, compared to
implementing numpy it'll be a piece of cake :)

> A bientôt,
>
> Armin.