[Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

Sun Mar 11 10:52:40 EDT 2012

11.03.2012 15:12, xavier.gnata at gmail.com kirjoitti:
[clip]
> If this description is correct, Numba is an additional pass once the 
> cpython  bytecode has be produced by cpython.
> Is it correct??
> Is python bytecote a good intermediate representation to perform numpy 
> related optimization?
> 
> One current big issue with numpy is that C=A+B+D produces temporaries.
> numexpr addresses this issue and it would be great to get the same 
> result by default in numpy.
> numexpr also optimizes polynomials using Horner's method. It is hard to 
> do that at bytecode level, isn't it?

My impression is that dealing with Python's bytecode is not necessarily
significantly harder than dealing with the AST.

Your example reads

  1           0 LOAD_NAME                0 (A)
              3 LOAD_NAME                1 (B)
              6 BINARY_ADD
              7 LOAD_NAME                2 (D)
             10 BINARY_ADD
             11 STORE_NAME               3 (C)

For instance, interpreting the bytecode (e.g. loop body) once with dummy
objects lets you know what the final compound expression is.

> Unladen swallow wanted to replace the full cpython implementation by a 
> jit compiler built using LLVM... but unladen swallow is dead.

To get speed gains, you need to optimize not only the bytecode
interpreter side, but also the object space --- Python classes, strings
and all that. Keeping in mind Python's dynamism, there are potential
side effects everywhere. I guess this is what sunk the swallow.

Just speeding up effectively statically typed code dealing with arrays
and basic types, on the other hand, sounds much easier.

The PyPy guys have a much more ambitious approach, and are getting nice
results. Also with arrays --- as I understand, the fact that they want
to be able to do this sort of optimization is the main reason why they
want to reimplement the core parts of Numpy in RPython.

The second issue is that unfortunately their emulation of CPython's
C-API is at the moment seems to have quite large overheads. Porting
Numpy on top of that is possible --- I managed to get basic things
(apart from string/unicode arrays) to work, but things took very large
speed hits (of the order ~ 100x for things like `arange(10000).sum()`).
This pushes the speed advantage of Numpy to a bit too large array sizes.
The reason is probably that Numpy uses PyObjects internally heavily,
which accumulates the cost of passing objects through the emulation layer.

-- 
Pauli Virtanen