[Cython] gsoc: array expressions
markflorisson88 at gmail.com
Mon May 28 14:49:36 CEST 2012
On 25 May 2012 21:53, Frédéric Bastien <nouiz at nouiz.org> wrote:
> Sorry for the delay, I had some schedule change.
> thanks for adding me. Should I subscribe to cython-dev? How much email
> daily there is? I didn't found this on the archives. Fell free to add
> me in CC again when you think it is appropriate.
There is usually not so much traffic on cython-dev, unless something
comes up that is debated to the death :)
> I'll reply here to all email at the same time. Do you prefer that I
> reply to each email individually if this happen again? I'll try to
> reply faster next time.
No worries, either way works fine, don't worry too much about protocol
(the only thing to note is that we do bottom posting).
> - About pickling theano, we currently can't pick Theano function. It
> could be made to work in some cases, but not for all cases as there is
> hardware dependent optimization in the Theano function. Currently it
> is mostly CPU vs GPU operation. So if we stay on the CPU, we could do
> some pickling, but we should make sure that the compiled c code into
> python module are still there when we unpickle or recompile them.
> - I think it make sense to make a theano graph from cython ast,
> optimize and redo a cython ast from the optimized graph. This would
> allow using Theano optimizations.
Ok, the important thing is that the graph can be pickled, it should be
pretty straightforward to generate code to build the function again
from the loaded graph.
> - It also make sense to do the code generation in Theano and reuse it
> in Cython. But this would make the Theano dependency much stronger.
> I'm not sure you want this.
> - Another point not raised, theano need to know at compile time is the
> dtype, number of dimensions and witch dimensions are broadcastable for
> each variable. I think that the last one could cause problem, but if
> you use specialization for the dtype, the same can be done for the
> broadcsatability of a dimensions.
Hm, that would lead to kind of an explosion of combinations. I think
we could specialize only on no broadcasting at all (except for
operands with lesser dimensionality).
> - The compyte(gpu nd array) project do collapsing of dimensions. This
> is an important optimization on the GPU as doing the index computation
> in parallel is costlier. I think on the CPU we could probably do
> collapsing just of the inner dimensions to make it faster.
> - Theano don't generate intrinsect or assembly, but we suppose that
> g++ will generate vectorized operation for simple loop. Recent version
> of gcc/g++ do this.
Right, the aim is definitely to specialize for contiguous arrays,
where you collapse everything. Specializing statically for anything
more would be unfeasible, and better handled by a runtime compiler I
think. For the C backend, I'll start by generating simple C loops and
see if the compilers vectorize that already.
> - Our generated code for element-wise operation take care a little
> about the memory access pattern. We swap dimensions to iterate on the
> dimensions with the smallest strides. But we don't go further.
> - What do you mean by CSE? Constant optimization?
Yes, common subexpression elimination and also hoisting of unchanging
expressions outside the loop.
I started a new project, https://github.com/markflorisson88/minivect ,
which currently features a simple C code generator. The specializer
and astbuilder do most of the work of creating the right AST, so the
code generator only has to implement code generation functions for
simple expressions. Depending on how it progresses I will look at
incorporating Theano's optimizations into it and having Theano use it
as a C backend for compatible expressions.
More information about the cython-devel