[SciPy-Dev] GSoC 2018: Cythonizing

Sat Mar 3 02:38:11 EST 2018

On 01.03.2018 16:40, Todd wrote:
> The first issue listed in the roadmap, convolution, is a much more
> complicated issue than that description makes out.  There are a few
> issues, some with some overlap behind-the-scenes:
> 
>    1. As discussed, there are a bunch of different implementations that
> that use different algorithm that work better in different scenarios. 
> Ideally there would be one "master" function that would pick the best
> algorithm for a given set of parameters.  This will depend on the number
> of dimensions to be convolved over, the size of the the first signal to
> be convolved, and the size of the second signal to be convolved. 
> Changing any one of these can change which implementation is optimal, or
> even useful.  So for with vectors, it is better to use a different
> algorithm if the one vector is short, if both vectors are long but one
> is much longer, and if both vectors are long and of similar length.
>    2. We don't have the best algorithms implemented for all  of these
> scenarios.  For example the "both vectors are long but one is much
> longer" scenario is best with the overlap-add algorithm, which scipy
> doesn't have.  Similarly, there is an fft-based version of correlation
> equivalent to fftconvolve that isn't implemented, 2D and n-d versions of
> fft convolution and correlation that aren't implemented, etc.
>    3. The implementations only work over the number of dimensions they
> apply to.  So the 1D implementations can only take vectors, the 2D
> implementations can only take 2D arrays, etc.  There is no way to, say,
> apply a filter along the second dimension of a 3D signal.  In order to
> implement the "master" function, at least one implementation (and
> ideally all implementations) should be able to be applied across
> additional dimensions.
> 
> And there is overlap between these.  For example I mention the
> overlap-add method in point 2, but that would most likely be implemented
> in part by applying across dimensions as mentioned in point 3.
> 
> A lot of these issues apply elsewhere in scipy.signal.  For example the
> stft/spectrogram uses a slow, naive implementation.  A lot of the
> functions don't support applying across multidimensional arrays (for
> example to create a filter bank). 

So you're saying this could be a possible GSoC project? Because this
does sound the most interesting to me so far.

To make sure I understand this correctly:

- I would work with the two modules `signal` and `ndimage` as well as
  NumPy (`numpy.convolve`)?
- I would unify, redesign and extend the parts / API that deal with
  convolution with the goal to cover the most common use cases and
  minimize overlap.
- Is somebody willing to mentor this?
- Required knowledge would involve understanding different algorithms to
  implement convolution as well as optimization, Python, Cython, C, ...?
- How would you judge the size and difficulty of this task?

Thank you all for the feedback so far. :)

Best regards,
Lars