[Numpy-discussion] NumPy to CPU+GPU compiler, looking for tests

Frédéric Bastien nouiz at nouiz.org
Mon Oct 29 11:49:31 EDT 2012


That is possible.

The gpu nd array project I talked above work on the CPU and the GPU in
OpenCL and with CUDA. But there is much stuff that is in numpy that we
didn't ported.

We started integrating it into Theano. So this mean the GPU code from
Theano will be ported to this project, so there will be more code
available later.

If some people are willing to help/collaborate don't hesitate. I think
we should collaborate much more on GPU/OpenCL code then we do now.
That was part of the goal of gpu nd array. There is the PyCUDA and
PyOpenCL authors that also collaborate with us.

Fred

On Mon, Oct 29, 2012 at 11:26 AM, Henry Gomersall <heng at cantab.net> wrote:
> On Mon, 2012-10-29 at 11:11 -0400, Frédéric Bastien wrote:
>> > Assuming of course all the relevant backends are up to scratch.
>> >
>> > Is there a fundamental reason why targetting a CPU through OpenCL is
>> > worse than doing it exclusively through C or C++?
>>
>> First, opencl do not allow us to do pointor arythmetique. So when
>> taking a slice of an ndarray, we can't just more the pointor. So we
>> need to change the object structure.
>>
>> I didn't do any speed anylysis of this, but I think that by using
>> OpenCL, it would have a bigger overhead. So it is only useful for
>> "big" ndarray. I don't have any size in mind too. I don't know, but if
>> we could access the opencl data directly from C/C++, we could bypass
>> this for small array if we want. But maybe this is not possible!
>
> My understanding is that when running OpenCL on CPU, one can simply map
> memory from a host pointer using CL_MEM_USE_HOST_PTR during buffer
> creation. On a CPU, this will result in no copies being made.
>
> The overhead is clearly an issue, and was the subject of my question. I
> wouldn't be surprised to find that the speedup associated with the free
> multithreading that comes with OpenCL on CPU, along with the vector data
> types mapping nicely to SSE etc, would make OpenCL on CPU faster on any
> reasonably sized array.
>
> It strikes me that if there is a neat way in which numpy objects can be
> represented by coherent versions in both main memory and device memory,
> then OpenCL could be used when it makes sense (either on CPU or GPU),
> and the CPU natively when _it_ makes sense.
>
> Henry
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion



More information about the NumPy-Discussion mailing list