[Cython] gsoc: array expressions
markflorisson88 at gmail.com
Mon May 28 14:54:33 CEST 2012
On 28 May 2012 13:52, mark florisson <markflorisson88 at gmail.com> wrote:
> On 28 May 2012 13:49, mark florisson <markflorisson88 at gmail.com> wrote:
>> On 25 May 2012 21:53, Frédéric Bastien <nouiz at nouiz.org> wrote:
>>> Sorry for the delay, I had some schedule change.
>>> thanks for adding me. Should I subscribe to cython-dev? How much email
>>> daily there is? I didn't found this on the archives. Fell free to add
>>> me in CC again when you think it is appropriate.
>> There is usually not so much traffic on cython-dev, unless something
>> comes up that is debated to the death :)
>>> I'll reply here to all email at the same time. Do you prefer that I
>>> reply to each email individually if this happen again? I'll try to
>>> reply faster next time.
>> No worries, either way works fine, don't worry too much about protocol
>> (the only thing to note is that we do bottom posting).
>>> - About pickling theano, we currently can't pick Theano function. It
>>> could be made to work in some cases, but not for all cases as there is
>>> hardware dependent optimization in the Theano function. Currently it
>>> is mostly CPU vs GPU operation. So if we stay on the CPU, we could do
>>> some pickling, but we should make sure that the compiled c code into
>>> python module are still there when we unpickle or recompile them.
>>> - I think it make sense to make a theano graph from cython ast,
>>> optimize and redo a cython ast from the optimized graph. This would
>>> allow using Theano optimizations.
>> Ok, the important thing is that the graph can be pickled, it should be
>> pretty straightforward to generate code to build the function again
>> from the loaded graph.
>>> - It also make sense to do the code generation in Theano and reuse it
>>> in Cython. But this would make the Theano dependency much stronger.
>>> I'm not sure you want this.
>>> - Another point not raised, theano need to know at compile time is the
>>> dtype, number of dimensions and witch dimensions are broadcastable for
>>> each variable. I think that the last one could cause problem, but if
>>> you use specialization for the dtype, the same can be done for the
>>> broadcsatability of a dimensions.
>> Hm, that would lead to kind of an explosion of combinations. I think
>> we could specialize only on no broadcasting at all (except for
>> operands with lesser dimensionality).
>>> - The compyte(gpu nd array) project do collapsing of dimensions. This
>>> is an important optimization on the GPU as doing the index computation
>>> in parallel is costlier. I think on the CPU we could probably do
>>> collapsing just of the inner dimensions to make it faster.
>>> - Theano don't generate intrinsect or assembly, but we suppose that
>>> g++ will generate vectorized operation for simple loop. Recent version
>>> of gcc/g++ do this.
>> Right, the aim is definitely to specialize for contiguous arrays,
>> where you collapse everything. Specializing statically for anything
>> more would be unfeasible, and better handled by a runtime compiler I
>> think. For the C backend, I'll start by generating simple C loops and
>> see if the compilers vectorize that already.
>>> - Our generated code for element-wise operation take care a little
>>> about the memory access pattern. We swap dimensions to iterate on the
>>> dimensions with the smallest strides. But we don't go further.
>>> - What do you mean by CSE? Constant optimization?
>> Yes, common subexpression elimination and also hoisting of unchanging
>> expressions outside the loop.
>> I started a new project, https://github.com/markflorisson88/minivect ,
>> which currently features a simple C code generator. The specializer
>> and astbuilder do most of the work of creating the right AST, so the
>> code generator only has to implement code generation functions for
>> simple expressions. Depending on how it progresses I will look at
>> incorporating Theano's optimizations into it and having Theano use it
>> as a C backend for compatible expressions.
> I forgot to mention, it's still pretty basic, but it works for simple
> arithmetic expressions with non-overlapping (shifted) memory from
> Cython: https://github.com/markflorisson88/cython/commit/2c316abdbc1228597bbdf480f737a59213ee9532#L4R1
So basically, this project is to be used as a git submodule in Cython,
and to be shipped directly in the source distribution. Is there any
objection to that?
More information about the cython-devel