[Cython] gsoc: array expressions

mark florisson markflorisson88 at gmail.com
Wed May 30 00:22:01 CEST 2012


On 29 May 2012 22:26, Robert Bradshaw <robertwb at gmail.com> wrote:
> On Mon, May 28, 2012 at 5:54 AM, mark florisson
> <markflorisson88 at gmail.com> wrote:
>> On 28 May 2012 13:52, mark florisson <markflorisson88 at gmail.com> wrote:
>>> On 28 May 2012 13:49, mark florisson <markflorisson88 at gmail.com> wrote:
>>>> On 25 May 2012 21:53, Frédéric Bastien <nouiz at nouiz.org> wrote:
>>>>> Hi,
>>>>>
>>>>>
>>>>> Sorry for the delay, I had some schedule change.
>>>>>
>>>>> thanks for adding me. Should I subscribe to cython-dev? How much email
>>>>> daily there is? I didn't found this on the archives. Fell free to add
>>>>> me in CC again when you think it is appropriate.
>>>>
>>>> There is usually not so much traffic on cython-dev, unless something
>>>> comes up that is debated to the death :)
>>>>
>>>>> I'll reply here to all email at the same time. Do you prefer that I
>>>>> reply to each email individually if this happen again? I'll try to
>>>>> reply faster next time.
>>>>
>>>> No worries, either way works fine, don't worry too much about protocol
>>>> (the only thing to note is that we do bottom posting).
>>>>
>>>>> - About pickling theano, we currently can't pick Theano function. It
>>>>> could be made to work in some cases, but not for all cases as there is
>>>>> hardware dependent optimization in the Theano function. Currently it
>>>>> is mostly CPU vs GPU operation. So if we stay on the CPU, we could do
>>>>> some pickling, but we should make sure that the compiled c code into
>>>>> python module are still there when we unpickle or recompile them.
>>>>>
>>>>> - I think it make sense to make a theano graph from cython ast,
>>>>> optimize and redo a cython ast from the optimized graph. This would
>>>>> allow using Theano optimizations.
>>>>
>>>> Ok, the important thing is that the graph can be pickled, it should be
>>>> pretty straightforward to generate code to build the function again
>>>> from the loaded graph.
>>>>
>>>>> - It also make sense to do the code generation in Theano and reuse it
>>>>> in Cython. But this would make the Theano dependency much stronger.
>>>>> I'm not sure you want this.
>>>>>
>>>>>
>>>>> - Another point not raised, theano need to know at compile time is the
>>>>> dtype, number of dimensions and witch dimensions are broadcastable for
>>>>> each variable. I think that the last one could cause problem, but if
>>>>> you use specialization for the dtype, the same can be done for the
>>>>> broadcsatability of a dimensions.
>>>>
>>>> Hm, that would lead to kind of an explosion of combinations. I think
>>>> we could specialize only on no broadcasting at all (except for
>>>> operands with lesser dimensionality).
>>>>
>>>>> - The compyte(gpu nd array) project do collapsing of dimensions. This
>>>>> is an important optimization on the GPU as doing the index computation
>>>>> in parallel is costlier. I think on the CPU we could probably do
>>>>> collapsing just of the inner dimensions to make it faster.
>>>>>
>>>>> - Theano don't generate intrinsect or assembly, but we suppose that
>>>>> g++ will generate vectorized operation for simple loop. Recent version
>>>>> of gcc/g++ do this.
>>>>
>>>> Right, the aim is definitely to specialize for contiguous arrays,
>>>> where you collapse everything. Specializing statically for anything
>>>> more would be unfeasible, and better handled by a runtime compiler I
>>>> think. For the C backend, I'll start by generating simple C loops and
>>>> see if the compilers vectorize that already.
>>>>
>>>>> - Our generated code for element-wise operation take care a little
>>>>> about the memory access pattern. We swap dimensions to iterate on the
>>>>> dimensions with the smallest strides. But we don't go further.
>>>>>
>>>>> - What do you mean by CSE? Constant  optimization?
>>>>
>>>> Yes, common subexpression elimination and also hoisting of unchanging
>>>> expressions outside the loop.
>>>>
>>>>> Fred
>>>>
>>>> I started a new project, https://github.com/markflorisson88/minivect ,
>>>> which currently features a simple C code generator. The specializer
>>>> and astbuilder do most of the work of creating the right AST, so the
>>>> code generator only has to implement code generation functions for
>>>> simple expressions. Depending on how it progresses I will look at
>>>> incorporating Theano's optimizations into it and having Theano use it
>>>> as a C backend for compatible expressions.
>>>
>>> I forgot to mention, it's still pretty basic, but it works for simple
>>> arithmetic expressions with non-overlapping (shifted) memory from
>>> Cython: https://github.com/markflorisson88/cython/commit/2c316abdbc1228597bbdf480f737a59213ee9532#L4R1
>>
>> So basically, this project is to be used as a git submodule in Cython,
>> and to be shipped directly in the source distribution. Is there any
>> objection to that?
>
> I'm not sure this is the best long-term solution (the alternative
> would be making it part of Cython or adding a dependency) but I think
> that's fine for now. I'm assuming there that the end user doesn't
> explicitly reference it, right? It's just an optimization if present.
>
> - Robert
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

The only gotcha is that when you checkout Cython from github you will
need to update the submodule. It's not really an optimization, it's an
implementation, that is the Cython AST needs to be mapped onto the new
AST, and then it generates the C code from that. I'm currently working
on interweaving this AST with Cython's AST, to support operations on
objects and complex numbers, as well as provide Cython semantics for
division and such (complex numbers are working now).

If it's all working, it might be possible to create an LLVM backend
with reasonably ease as well for our vector expressions, to provide
optimal just-in-time specializations.


More information about the cython-devel mailing list