[Cython] gsoc: array expressions

Frédéric Bastien nouiz at nouiz.org
Wed May 30 17:27:27 CEST 2012

On Mon, May 28, 2012 at 8:49 AM, mark florisson
<markflorisson88 at gmail.com> wrote:
> On 25 May 2012 21:53, Frédéric Bastien <nouiz at nouiz.org> wrote:
>> - About pickling theano, we currently can't pick Theano function. It
>> could be made to work in some cases, but not for all cases as there is
>> hardware dependent optimization in the Theano function. Currently it
>> is mostly CPU vs GPU operation. So if we stay on the CPU, we could do
>> some pickling, but we should make sure that the compiled c code into
>> python module are still there when we unpickle or recompile them.
>> - I think it make sense to make a theano graph from cython ast,
>> optimize and redo a cython ast from the optimized graph. This would
>> allow using Theano optimizations.
> Ok, the important thing is that the graph can be pickled, it should be
> pretty straightforward to generate code to build the function again
> from the loaded graph.

We can pickle not compiled graph. So no problem here.

>> - It also make sense to do the code generation in Theano and reuse it
>> in Cython. But this would make the Theano dependency much stronger.
>> I'm not sure you want this.
>> - Another point not raised, theano need to know at compile time is the
>> dtype, number of dimensions and witch dimensions are broadcastable for
>> each variable. I think that the last one could cause problem, but if
>> you use specialization for the dtype, the same can be done for the
>> broadcsatability of a dimensions.
> Hm, that would lead to kind of an explosion of combinations. I think
> we could specialize only on no broadcasting at all (except for
> operands with lesser dimensionality).

I expect that in normal user script they won't user all the
combination :) So I won't worry about it at first. If there is a need,
we could parametrise Theano op (especially the Elemwise op) so that
when a dimensions is marked as not broadcasted, it also work when it
is broadcasted. In the case of Elemwise, it is probably just the error
checking code that will need to change.

>> - The compyte(gpu nd array) project do collapsing of dimensions. This
>> is an important optimization on the GPU as doing the index computation
>> in parallel is costlier. I think on the CPU we could probably do
>> collapsing just of the inner dimensions to make it faster.
>> - Theano don't generate intrinsect or assembly, but we suppose that
>> g++ will generate vectorized operation for simple loop. Recent version
>> of gcc/g++ do this.
> Right, the aim is definitely to specialize for contiguous arrays,
> where you collapse everything. Specializing statically for anything
> more would be unfeasible, and better handled by a runtime compiler I
> think. For the C backend, I'll start by generating simple C loops and
> see if the compilers vectorize that already.

I was under the impression you where doing run time code generation. I
mixed the ongoing project. But collapsing the inner dimensions could
still be useful as if you don't write explicitly all the loop, you
will call a function or make a loop over the number of dimensions. It
will reduice this number of looping. If the inner dimensions is
small(ex matrix of shape (10000, 3)) this can be useful. But that is
less important that the default contiguous case.

>> - Our generated code for element-wise operation take care a little
>> about the memory access pattern. We swap dimensions to iterate on the
>> dimensions with the smallest strides. But we don't go further.
>> - What do you mean by CSE? Constant  optimization?
> Yes, common subexpression elimination and also hoisting of unchanging
> expressions outside the loop.

Theano do CSE in the merge optimization. As for lifting expression
outside of loop, we do it for Theano Scan(our loop), but they are not
normal loop. It is much better to use tensor expression then scan if

> I started a new project, https://github.com/markflorisson88/minivect ,
> which currently features a simple C code generator. The specializer
> and astbuilder do most of the work of creating the right AST, so the
> code generator only has to implement code generation functions for
> simple expressions. Depending on how it progresses I will look at
> incorporating Theano's optimizations into it and having Theano use it
> as a C backend for compatible expressions.

Great, when you think it is a good time for me to look at it, tell me.
Do it mimic cython internal? If so, is there doc about it so that I
look at it?


More information about the cython-devel mailing list