Hi Christoph,

Thank you for your detailed reply! 
 
Why are you interpolating such a huge system?  What are you trying to
achieve?  What is the width of the interpolation “bumps” that you are
using and what is the size of your output array?  (Or in other words:
how are you calling the interpolator?)

We want to examine quantities on scales much larger than the lattice scale, and average over lattice scale features. See attached plot of the density in our simulation of transverse magnetic focussing. The left is the raw density data and the right is the interpolated data with the relwidth parameter set to 0.05. Evidently, the focussing peaks really stand out in the interpolated plot. I'm using the interpolator as follows:

density  = sum(density_operator(mode) for mode in modes)
density_interpolated, boundaries  = kwant.plotter.interpolate_density(sys, density, relwidth=0.05)
 

> Is there any way to speed up the interpolation routines? I see that
> the present implementation in plotter.py involves a for loop (in
> python) over each site/hopping.

Are you using streamplot or just interpolating currents/densities?
Kwant’s streamplot routine piggy backs on matplotlib’s streamplot and in
a typical use case (e.g. the attached script) matplotlib’s streamplot
will take half of the time that Kwant’s interpolation routine takes.  So
even if the interpolator produced its results instantly, current
plotting would become only three times faster in such applications.

Just interpolating currents and densities and then plotting at a later stage.
 

The interpolator could be optimized further in several ways - as always
it’s a compromise between development effort and speed.  But the
application that it was created for (i.e. figures for publications)
tends to involve systems of size 100^2 or so and then its speed is quite
satisfactory (see attached example).

Creating figures from much larger systems only makes sense when the
interpolation is smoothing things out a lot (i.e. the “bumps” are much
larger than a hopping), otherwise there would be too much detail in the
figure. 

I think you mean something like the plot I attached?

In this regime the current implementation will be inefficient
because its vectorization (see further below) will be over small
subarrays. 

> Perhaps a vectorized implementation may help?

Actually, it is already vectorized - otherwise it would be much slower!
The main loop inside _interpolate_field() goes indeed sequentially over
all hoppings/densities, but for each such hopping/density the output
field in its vicinity is updated using vectorized numpy operations.

A straightforward brute-force way to speed up the interpolation would be
to split up the work of that loop over multiple cores - each would have
its own output array, and in the end they would be all summed together.
That should be quite easy to do [1].

A complementary approach would be to rewrite the inner loop in a fast
compiled language.  This would help especially in the case where the
interpolation bump does not cover many points of the output array and as
such the existing vectorization is not very good (the involved subarrays
are small).

Do you think this can be achieved using Cython?
 

Yet another approach would be to create a special implementation for the
common case of a rectangular lattice that is commensurate with the
output array of the interpolator.

We're indeed using a rectangular lattice!
 

All three approaches could be combined.  To go further, someone who is
really motivated could even implement the interpolation algorithm on
GPUs...

That would be a bit much. I think a more optimized CPU implementation will already solve the problem.
 

If you or someone else would like to work on this I would be happy to
guide you.


Thanks! I can try Cythonizing or vectorizing the loop. 

More generally, what do you think about simply using a filtering method (say gaussian filter) in scipy.ndimage (https://docs.scipy.org/doc/scipy/reference/reference/generated/scipy.ndimage.gaussian_filter.html#scipy.ndimage.gaussian_filter) or one of the many in scipy.signal (https://docs.scipy.org/doc/scipy/reference/reference/signal.html)? Are you using a special kind of filter?

Cheers,
Mani