[Python-ideas] New __reference__ hook

Sturla Molden sturla at molden.no
Wed Dec 5 21:09:49 CET 2012

Den 5. des. 2012 kl. 19:51 skrev Masklinn <masklinn at masklinn.net>:

> Why? z could just be a "lazy value" at this point, basically a manual
> building of thunks, only reifying them when necessary (whenever that
> is). It's not like numpy *has* to create three temporary arrays, just
> that it *does*.

It has to, because it does not know when to flush an expression. This strangely enough, accounts for most of the speed difference between Python/NumPy and e.g. Fortran 95. A Fortran 95 compiler can compile an array expression as a single loop. NumPy cannot, because the binary operators does not tell when an expression is "finalized". That is why the numexpr JIT compiler evaluates Python expressions as strings, and needs to include a parser and whatnot. Today, most numerical code is memory bound, not compute bound, as CPUs are immensely faster than RAM. So what keeps numerical/scientific code written in Python slower than C or Fortran today is mostly creation of temporary array objects – i.e. memory access –, not the computations per se. If we could get rid of temprary arrays, Python codes could possibly achieve 80 % of Fortran 95 speed. For scientistis that would mean we don't need to write any more Fortran or C.

But perhaps it is possible to do this with AST magic? I don't know. Nor do I know if __bind__ is the best way to do this. Perhaps not. But I do know that automatically detecting when to "flush a compund expression with (NumPy?) arrays" would be the holy grail for scientific computing with Python. A binary operator x+y would just return a symbolic representation of the expression, but when the full expression needs to be flushed we can e.g. ask OpenCL or LLVM to generate the code on the fly. It would turn numerical computing into something similar to dynamic HTML. And we know how good Python is at generating structured text on the fly.


More information about the Python-ideas mailing list