Non-local means denoising

Sun Jan 26 02:22:11 EST 2014

On Sun, 26 Jan 2014 01:01:49 +0100
Stéfan van der Walt <stefan at sun.ac.za> wrote:

> On Sun, 26 Jan 2014 10:52:46 +1100, Juan Nunez-Iglesias wrote:
> > I do remember these, but it's a worthwhile conversation to bring up every
> > so often as tools mature and more people contribute. To be honest I know
> > next to nothing about CUDA and GPU programming, I just see the success
> > stories fly by. =)
> 
> Those are the only stories people like to tell, but they forget to mention
> that GPU implementations often need to be tweaked for each platform--and
> tweaks can be fairly extensive, including buffer and pool sizes, algorithm
> configurations etc.  It is not the kind of complication I would like to deal
> with--but, like you say, it is good to evaluate this from time to time.

Hi Stefan,

While I can confirm that best performances can only got with optimized
kernels, our OpenCL implementation of SIFT is able to chose the right
implementation on runtime depending on the architecture and on the
compute-capabilities of the selected device ... 

Moreover, devices are becoming more and more clever (since Fermi, there
is a memory cache !)

Cheers,
-- 
Jerome Kieffer <google at terre-adelie.org>