[Cython] gsoc: array expressions

Robert Bradshaw robertwb at gmail.com
Tue May 29 23:26:35 CEST 2012


On Mon, May 28, 2012 at 5:54 AM, mark florisson
<markflorisson88 at gmail.com> wrote:
> On 28 May 2012 13:52, mark florisson <markflorisson88 at gmail.com> wrote:
>> On 28 May 2012 13:49, mark florisson <markflorisson88 at gmail.com> wrote:
>>> On 25 May 2012 21:53, Frédéric Bastien <nouiz at nouiz.org> wrote:
>>>> Hi,
>>>>
>>>>
>>>> Sorry for the delay, I had some schedule change.
>>>>
>>>> thanks for adding me. Should I subscribe to cython-dev? How much email
>>>> daily there is? I didn't found this on the archives. Fell free to add
>>>> me in CC again when you think it is appropriate.
>>>
>>> There is usually not so much traffic on cython-dev, unless something
>>> comes up that is debated to the death :)
>>>
>>>> I'll reply here to all email at the same time. Do you prefer that I
>>>> reply to each email individually if this happen again? I'll try to
>>>> reply faster next time.
>>>
>>> No worries, either way works fine, don't worry too much about protocol
>>> (the only thing to note is that we do bottom posting).
>>>
>>>> - About pickling theano, we currently can't pick Theano function. It
>>>> could be made to work in some cases, but not for all cases as there is
>>>> hardware dependent optimization in the Theano function. Currently it
>>>> is mostly CPU vs GPU operation. So if we stay on the CPU, we could do
>>>> some pickling, but we should make sure that the compiled c code into
>>>> python module are still there when we unpickle or recompile them.
>>>>
>>>> - I think it make sense to make a theano graph from cython ast,
>>>> optimize and redo a cython ast from the optimized graph. This would
>>>> allow using Theano optimizations.
>>>
>>> Ok, the important thing is that the graph can be pickled, it should be
>>> pretty straightforward to generate code to build the function again
>>> from the loaded graph.
>>>
>>>> - It also make sense to do the code generation in Theano and reuse it
>>>> in Cython. But this would make the Theano dependency much stronger.
>>>> I'm not sure you want this.
>>>>
>>>>
>>>> - Another point not raised, theano need to know at compile time is the
>>>> dtype, number of dimensions and witch dimensions are broadcastable for
>>>> each variable. I think that the last one could cause problem, but if
>>>> you use specialization for the dtype, the same can be done for the
>>>> broadcsatability of a dimensions.
>>>
>>> Hm, that would lead to kind of an explosion of combinations. I think
>>> we could specialize only on no broadcasting at all (except for
>>> operands with lesser dimensionality).
>>>
>>>> - The compyte(gpu nd array) project do collapsing of dimensions. This
>>>> is an important optimization on the GPU as doing the index computation
>>>> in parallel is costlier. I think on the CPU we could probably do
>>>> collapsing just of the inner dimensions to make it faster.
>>>>
>>>> - Theano don't generate intrinsect or assembly, but we suppose that
>>>> g++ will generate vectorized operation for simple loop. Recent version
>>>> of gcc/g++ do this.
>>>
>>> Right, the aim is definitely to specialize for contiguous arrays,
>>> where you collapse everything. Specializing statically for anything
>>> more would be unfeasible, and better handled by a runtime compiler I
>>> think. For the C backend, I'll start by generating simple C loops and
>>> see if the compilers vectorize that already.
>>>
>>>> - Our generated code for element-wise operation take care a little
>>>> about the memory access pattern. We swap dimensions to iterate on the
>>>> dimensions with the smallest strides. But we don't go further.
>>>>
>>>> - What do you mean by CSE? Constant  optimization?
>>>
>>> Yes, common subexpression elimination and also hoisting of unchanging
>>> expressions outside the loop.
>>>
>>>> Fred
>>>
>>> I started a new project, https://github.com/markflorisson88/minivect ,
>>> which currently features a simple C code generator. The specializer
>>> and astbuilder do most of the work of creating the right AST, so the
>>> code generator only has to implement code generation functions for
>>> simple expressions. Depending on how it progresses I will look at
>>> incorporating Theano's optimizations into it and having Theano use it
>>> as a C backend for compatible expressions.
>>
>> I forgot to mention, it's still pretty basic, but it works for simple
>> arithmetic expressions with non-overlapping (shifted) memory from
>> Cython: https://github.com/markflorisson88/cython/commit/2c316abdbc1228597bbdf480f737a59213ee9532#L4R1
>
> So basically, this project is to be used as a git submodule in Cython,
> and to be shipped directly in the source distribution. Is there any
> objection to that?

I'm not sure this is the best long-term solution (the alternative
would be making it part of Cython or adding a dependency) but I think
that's fine for now. I'm assuming there that the end user doesn't
explicitly reference it, right? It's just an optimization if present.

- Robert


More information about the cython-devel mailing list