[Numpy-discussion] Correlation filter

josef.pktd at gmail.com josef.pktd at gmail.com
Fri Nov 20 14:17:09 EST 2009


On Fri, Nov 20, 2009 at 1:51 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
> On Fri, Nov 20, 2009 at 8:53 AM,  <josef.pktd at gmail.com> wrote:
>> scipy.signal.correlate  would be fast, but it will not be easy to
>> subtract the correct moving mean. Subtracting a standard moving mean
>> would subtract different values for each observation in the window.
>>
>> One possibility would be to look at a moving regression and just take
>> the estimated slope parameter. John D'Errico
>> http://www.mathworks.com/matlabcentral/fileexchange/16997-movingslope
>> (BSD licensed)
>> uses a very nice trick with pinv to get the filter to calculate a
>> moving slope coefficient. I read the source but didn't try it out and
>> didn't try to figure out how the pinv trick exactly works.
>> If this can be adapted to your case, then this would be the fastest I
>> can think of (pinv and scipy.signal.correlate would do everything in
>> C, or maybe one (500) loop might still be necessary)
>>
>> For just getting a ranking on a linear relationship, there might be
>> other tricks possible, local linear regression, ... (?), but I never
>> tried. Also, I think with one time loop, you can do all cross section
>> regressions at the same time, without tricks.
>
> Those sound like good ideas.
>
> I was able to get rid of the for loop (and cut the time in half) by
> using lfilter from scipy:
>
> def corr3(x, y):
>    x = x - x.mean()
>    x /= x.std()
>    nx = x.size
>    one = np.ones(nx)
>    xy = lfilter(x, 1, y)
>    sy = lfilter(one, 1, y)
>    sy2 = lfilter(one, 1, y*y)
>    d = xy / np.sqrt(nx * sy2 - sy * sy)
>    return d

Is this correct? xy uses the original y and not a demeaned y.

I think your use of lfilter only uses features that are also available
with correlate, and signal correlate can handle ndim arrays.
I use correlate in my moving moment calculations
http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head%3A/scikits/statsmodels/sandbox/tsa/movstat.py#L207

but I think I only tested 1d.

Josef


>
> (Somehow I managed to flip the sign of the correlation in corr3.)
>
> I can trim a little more time by doing some of the operations in place
> but then the code becomes hard to read. This solution will work well
> for me since I can move much of the calculation outside of the
> function and reuse it across many calls (y does not change often
> compared to x).
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



More information about the NumPy-Discussion mailing list