reimplementation of lfilter

Looking at this atrocious piece of C: http://svn.scipy.org/svn/scipy/trunk/scipy/signal/lfilter.c.src I got some inspiration for reusing some of my other C code to reimplement scipy.signal.lfilter. I basically just had to write this small C function + a Cython wrapper: static void lfilter_1d(double *restrict b, double *restrict a, double *x, double *y, double *zi, npy_intp K, npy_intp N) { double Z[K]; double *restrict pz = zi, *restrict z = Z; register npy_intp n, k; for (n=0; n<N; n++) { register double yn, xn = x[n]; yn = y[n] = b[0] * xn + pz[0]; for (k=0; k<K-1; k++) z[k] = b[k+1] * xn + pz[k+1] - a[k+1]*yn; z[K-1] = b[K]*xn - a[K]*yn; double *tmp = pz; pz = z; z = tmp; } if (pz != zi) memcpy(zi, pz, K*sizeof(double)); } Relative speed compared to scipy.signal.lfilter: One processor: 1D array, 1000000 floats: 150% 2D array, 1000 x 1000 floats, axis=0: 80% 2D array, 1000 x 1000 floats, axis=1: 150% It is faster than lfilter except for 2D with axis=0. For 2D with axis=1 it is probably faster than lfilter due to copy-in copy-out optimization. For arrays with more than one dimension, multithreading helps as well: Four processors: 1D array, 1000000 floats: 150% 2D array, 1000 x 1000 floats, axis=0: 160% 2D array, 1000 x 1000 floats, axis=1: 250% Multithreading obviously does not help for filtering single vector, as there is no work to share. Some other improvements over signal.lfilter, apart from readable Cython code: * GIL is released. The C code is not littered with Python C API calls that prevents this. * Filtering can be done in reverse, i.e. no need for fliplr or flipud for filtering backwards. * Arrays can be filtered in-place. * Cleaned up docstring. Still it only work as a "proof of concept" with double precision floats. Adding support for other dtypes would be easy, just replace double with any of: float float _Complex double _Complex long double long double _Complex Regards, Sturla Molden # Copyright (C) 2009 Sturla Molden import numpy as np cimport numpy as np cdef extern int _linear_filter( object a, object b, object x, object y, object z, int axis, int direction) def lfilter(object B, object A, object X, object zi=None, object axis = None, int direction=1, object inplace=False): """ Filter data along one-dimension with an IIR or FIR filter. Description =========== Filter a data sequence, x, using a digital filter. This works for many fundamental data types (including Object type). The filter is a direct form II transposed implementation of the standard difference equation (see "Algorithm"). Inputs: ======= b : vector-like The numerator coefficient vector in a 1-D sequence. a : vector-like The denominator coefficient vector in a 1-D sequence. If a[0] is not 1, then both a and b are normalized by a[0]. x : array-like An N-dimensional input array. axis : int or None The axis of the input data array along which to apply the linear filter. The filter is applied to each subarray along this axis. If axis is None, the filter is applied to x flattened. Default: axis=None. zi : array-like Initial conditions for the filter delays. It is a vector (or array of vectors for an N-dimensional input) of length max(len(a),len(b)). If zi=None or is not given then initial rest is assumed. If axis is None, zi should be a vector of length K = max(M,N), where M=len(b)-1 and N=len(a)-1. If an axis is not None, zi should either be a vector of length K, or an array with same shape as x, except for zi.shape[axis] which should be K. If zi is a vector, the same initial conditions are applied to each vector along the specified axis. Default: zi=None (initial rest). See signal.lfiltic for more information. direction : int If negative, the filter is applied in reverse direction along the axis. Default: direction=1 (filter forward). inplace : bool If True, allow x or zi to be modified inplace if possible. Default: False. Outputs: (y, {zf}) ======= y : ndarray The output of the digital filter. zf : ndarray If zi is None, this is not returned, otherwise, zf holds the final filter delay values. Algorithm: ========== The filter function is implemented as a direct II transposed structure. This means that the filter implements a[0]*y[n] = b[0]*x[n] + b[1]*x[n-1] + ... + b[nb]*x[n-nb] - a[1]*y[n-1] - ... - a[na]*y[n-na] using the following difference equations: y[m] = b[0]*x[m] + z[0,m-1] z[0,m] = b[1]*x[m] + z[1,m-1] - a[1]*y[m] ... z[n-3,m] = b[n-2]*x[m] + z[n-2,m-1] - a[n-2]*y[m] z[n-2,m] = b[n-1]*x[m] - a[n-1]*y[m] where m is the output sample number and n=max(len(a),len(b)) is the model order. The rational transfer function describing this filter in the z-transform domain is -1 -nb b[0] + b[1]z + ... + b[nb] z Y(z) = ---------------------------------- X(z) -1 -na a[0] + a[1]z + ... + a[na] z """ cdef np.ndarray b, a, x, y, z cdef np.npy_intp i, ierr cdef double tmp cdef object K, zshape, iaxis # get filter coeffs b, a A = np.asarray(A) B = np.asarray(B) if (A.ndim != 1) or (B.ndim != 1): raise ValueError, 'a and b must be vectors' K = max(len(A)-1,len(B)-1) b = np.zeros(K+1) a = np.zeros(K+1) a[:A.shape[0]] = A[:] b[:B.shape[0]] = B[:] # normalize by a[0] if a[0] != 1.0: tmp = a[0] for i in range(a.shape[0]): a[i] /= tmp b[i] /= tmp # set up input signal X = np.asarray(X, dtype=np.float64) if axis is None: X = X.ravel() elif type(axis) != int: raise ValueError, 'axis must be None or an integer >= 0' elif axis < 0: raise ValueError, 'axis must be None or an integer >= 0' x = X # set up output signal if inplace: y = X else: y = np.zeros(X.shape, dtype=np.float64) # set up filter delay buffer iaxis = 0 if (axis is None) else axis if zi is None: # zi not specified, assume initial rest zshape = list(X.shape) zshape[iaxis] = K z = np.zeros(zshape, dtype=np.float64) else: zi = np.asarray(zi, dtype=np.float64) # x and zi are both vectors if (x.ndim == 1) and (zi.ndim == 1): if (zi.shape[0] != K): raise ValueError, 'length of zi must be K' if inplace: z = zi else: z = zi.copy() # x is nd array, zi is vector # broadcast zi (cannot modify inplace) elif (x.ndim > 1) and (zi.ndim == 1): if (zi.shape[0] != K): raise ValueError, 'length of zi must be K' zshape = list(X.shape) zshape[iaxis] = K z = np.zeros(zshape, dtype=np.float64) zshape = [1] * X.ndim zshape[iaxis] = K z = z + zi.reshape(zshape) # x and zi are both nd arrays else: zshape = list(X.shape) zshape[iaxis] = K zshape = tuple(zshape) if (zi.shape != zshape): raise ValueError, ('bad shape of zi: expected %r, got %r' % (zshape,zi.shape)) if inplace: z = zi else: z = zi.copy() # apply the linear filter ierr = _linear_filter(a, b, x, y, z, <int> iaxis, direction) # check for error conditions on return if ierr == -1: raise MemoryError, 'malloc returned NULL' if ierr == -2: raise ValueError, 'shape of x, y, and z did not match... should never happen, debug please.' if ierr == -3: raise ValueError, 'shape of a and b did not match... should never happen, debug please.' # return y or (y, z) depending on zi # return (y if (zi is None) else (y, z)) ## Cython fails on this if (zi is None): return y else: return (y,z) /* Copyright (C) 2009 Sturla Molden */ #include <Python.h> #include <numpy/arrayobject.h> #include <string.h> #include <stdlib.h> typedef PyArrayObject ndarray; /* PyArrayObject is an ugly name */ typedef Py_ssize_t integer; /* Py_ssize_t of int for 64 bit support, npy_intp is typedef'ed incorrectly. */ /* copy data to and from temporary work arrays */ static void copy_in(double *restrict y, double *restrict x, integer n, integer s) { integer i; char *tmp; if (s == sizeof(double)) memcpy(y, x, n*sizeof(double)); else { tmp = (char *)x; for (i=0; i<n; i++) { *y++ = *x; tmp += s; x = (double *)tmp; } } } static void copy_out(double *restrict y, double *restrict x, integer n, integer s) { integer i; char *tmp; if (s == sizeof(double)) memcpy(y, x, n*sizeof(double)); else { tmp = (char *)y; for (i=0; i<n; i++) { *y = *x++; tmp += s; y = (double *)tmp; } } } /* this computes the start adress for every vector along a dimension (axis) of an ndarray */ inline char *datapointer(ndarray *a, integer *indices) { char *data = a->data; int i; integer j; for (i=0; i < a->nd; i++) { j = indices[i]; data += j * a->strides[i]; } return data; } static double ** _get_axis_pointer(integer *indices, int axis, ndarray *a, double **axis_ptr_array, int dim) { /* recursive axis pointer search for 4 dimensions or more */ integer i; double *axis_ptr; if (dim == a->nd) { /* done recursing dimensions, compute address of this axis */ axis_ptr = (double *) datapointer(a, indices); *axis_ptr_array = axis_ptr; return (axis_ptr_array + 1); } else if (dim == axis) { /* use first element only */ indices[dim] = 0; axis_ptr_array = _get_axis_pointer(indices, axis, a, axis_ptr_array, dim+1); return axis_ptr_array; } else { /* iterate range of indices */ integer length = a->dimensions[dim]; for (i=0; i < length; i++) { indices[dim] = i; axis_ptr_array = _get_axis_pointer(indices, axis, a, axis_ptr_array, dim+1); } return axis_ptr_array; } } static double **get_axis_pointers(ndarray *a, int axis, integer *count) { integer indices[a->nd]; double **out, **tmp; integer i, j; j = 1; for (i=0; i < a->nd; i++) { if (i != axis) j *= a->dimensions[i]; } *count = j; out = (double **) malloc(*count * sizeof(double*)); if (out == NULL) { *count = 0; return NULL; } tmp = out; switch (a->nd) { /* for one dimension we just return the data pointer */ case 1: *tmp = (double *)a->data; break; /* specialized for two dimensions */ case 2: switch (axis) { case 0: for (i=0; i<a->dimensions[1]; i++) *tmp++ = (double *)(a->data + i*a->strides[1]); break; case 1: for (i=0; i<a->dimensions[0]; i++) *tmp++ = (double *)(a->data + i*a->strides[0]); break; } break; /* specialized for three dimensions */ case 3: switch (axis) { case 0: for (i=0; i<a->dimensions[1]; i++) for (j=0; j<a->dimensions[2]; j++) *tmp++ = (double *)(a->data + i*a->strides[1] + j*a->strides[2]); break; case 1: for (i=0; i<a->dimensions[0]; i++) for (j=0; j<a->dimensions[2]; j++) *tmp++ = (double *)(a->data + i*a->strides[0] + j*a->strides[2]); break; case 2: for (i=0; i<a->dimensions[0]; i++) for (j=0; j<a->dimensions[1]; j++) *tmp++ = (double *)(a->data + i*a->strides[0] + j*a->strides[1]); } break; /* four or more dimensions: use recursion */ default: for (i=0; i<a->nd; i++) indices[i] = 0; _get_axis_pointer(indices, axis, a, out, 0); } done: return out; } /* applies a linear filter along one dimension */ static void lfilter_1d( double *restrict b, double *restrict a, double *x, double *y, double *zi, integer K, integer N) { double Z[K]; double *restrict pz = zi, *restrict z = Z; register integer n, k; for (n=0; n<N; n++) { register double yn, xn = x[n]; yn = y[n] = b[0] * xn + pz[0]; for (k=0; k<K-1; k++) z[k] = b[k+1] * xn + pz[k+1] - a[k+1]*yn; z[K-1] = b[K]*xn - a[K]*yn; double *tmp = pz; pz = z; z = tmp; } if (pz != zi) memcpy(zi, pz, K*sizeof(double)); } static void rev_lfilter_1d( double *restrict b, double *restrict a, double *restrict x, double *restrict y, double *zi, integer K, integer N) { double Z[K]; double *restrict pz = zi, *restrict z = Z; register integer n, k; for (n=N-1; n >= 0; n--) { register double yn, xn = x[n]; yn = y[n] = b[0] * xn + pz[0]; for (k=0; k<K-1; k++) z[k] = b[k+1]*xn + pz[k+1] - a[k+1]*yn; z[K-1] = b[K]*xn - a[K]*yn; double *tmp = pz; pz = z; z = tmp; } if (pz != zi) memcpy(zi, pz, K*sizeof(double)); } typedef void (*lfilterfun_t)(double *restrict b, double *restrict a, double *restrict x, double *restrict y, double *zi, integer K, integer N); int _linear_filter(ndarray *a, ndarray *b, ndarray *x, ndarray *y, ndarray *z, int axis, int direction) { int retval = 0; double *wx, *wy, *wz, *data_a, *data_b; integer xcount, ycount, zcount; integer xstride, ystride, zstride; double *xdata, *ydata, *zdata; double **x_axis_ptrs = NULL, **y_axis_ptrs = NULL, **z_axis_ptrs = NULL; lfilterfun_t lfilter; integer i, K, N; /* release the GIL */ Py_BEGIN_ALLOW_THREADS /* sanity check on a and b (these should never fail) */ if (a->nd != 1) goto error_ab; if (b->nd != 1) goto error_ab; if (b->dimensions[0] != a->dimensions[0]) goto error_ab; data_a = (double *)a->data; data_b = (double *)b->data; K = a->dimensions[0] - 1; /* sanity ckeck on x, y and z */ if ((x->nd != y->nd) || (y->nd != z->nd)) goto error_xyz; for (int d=0; d < a->nd; d++) { if (d == axis) continue; if ((x->dimensions[d] != y->dimensions[d]) || (y->dimensions[d] != z->dimensions[d])) goto error_xyz; } N = x->dimensions[axis]; /* extract axis pointers */ x_axis_ptrs = get_axis_pointers(x, axis, &xcount); y_axis_ptrs = get_axis_pointers(y, axis, &ycount); z_axis_ptrs = get_axis_pointers(z, axis, &zcount); /* sanity check */ if (!x_axis_ptrs) goto error_malloc; if (!y_axis_ptrs) goto error_malloc; if (!z_axis_ptrs) goto error_malloc; if ((xcount != ycount)||(ycount != zcount)) goto error_xyz; /* all ok, we can start ... */ lfilter = direction < 0 ? rev_lfilter_1d : lfilter_1d; xstride = x->strides[axis]; ystride = y->strides[axis]; zstride = z->strides[axis]; #pragma omp parallel \ firstprivate(lfilter, N, K, xcount, x_axis_ptrs, y_axis_ptrs, z_axis_ptrs, \ xstride, ystride, zstride, data_b, data_a) \ private(i, xdata, ydata, zdata, wx, wy, wz) \ shared(retval) \ default(none) { if ((xstride==sizeof(double)) && (ystride==sizeof(double)) && (xstride==sizeof(double))) { #pragma omp for schedule(static) for (i=0; i < xcount; i++) { xdata = x_axis_ptrs[i]; ydata = y_axis_ptrs[i]; zdata = z_axis_ptrs[i]; /* strictly speaking we could be aliasing xdata and ydata here, but it should not matter. */ lfilter(data_b, data_a, xdata, ydata, zdata, K, N); } } else { /* axis is strided, use copy-in copy-out */ wx = (double *)malloc((2*N+K)*sizeof(double)); if (!wx) { #pragma omp critical retval = -1; /* error_malloc */ } #pragma omp barrier #pragma omp flush(retval) if (retval < 0) /* malloc failed in some thread */ goto filtering_complete; wy = wx + N; wz = wy + N; #pragma omp for schedule(static) for (i=0; i < xcount; i++) { /* get pointe to the first element of the axis */ xdata = x_axis_ptrs[i]; ydata = y_axis_ptrs[i]; zdata = z_axis_ptrs[i]; /* copy to local contiguous buffers */ copy_in(wx, xdata, N, xstride); copy_in(wz, zdata, K, zstride); /* filter */ lfilter(data_b, data_a, wx, wy, wz, K, N); /* copy data back */ copy_out(ydata, wy, N, ystride); copy_out(zdata, wz, K, zstride); } filtering_complete: if (wx) free(wx); } } /* we are done... */ goto done; /* error conditions */ error_ab: retval = -3; goto done; error_xyz: retval = -2; goto done; error_malloc: retval = -1; /* clean up and exit */ done: if (x_axis_ptrs) free(x_axis_ptrs); if (y_axis_ptrs) free(y_axis_ptrs); if (z_axis_ptrs) free(z_axis_ptrs); /* get the GIL back and return */ Py_END_ALLOW_THREADS return retval; }

Looks like a very useful improvement. Your docstring won't render well, and there is already a cleaned up version of the old one here: http://docs.scipy.org/scipy/docs/scipy.signal.signaltools.lfilter/ Could you please integrate those changes with yours? Cheers, Ralf On Tue, Sep 22, 2009 at 2:21 AM, Sturla Molden <sturla@molden.no> wrote:
Looking at this atrocious piece of C:
http://svn.scipy.org/svn/scipy/trunk/scipy/signal/lfilter.c.src
I got some inspiration for reusing some of my other C code to reimplement scipy.signal.lfilter. I basically just had to write this small C function + a Cython wrapper:
static void lfilter_1d(double *restrict b, double *restrict a, double *x, double *y, double *zi, npy_intp K, npy_intp N) { double Z[K]; double *restrict pz = zi, *restrict z = Z; register npy_intp n, k; for (n=0; n<N; n++) { register double yn, xn = x[n]; yn = y[n] = b[0] * xn + pz[0]; for (k=0; k<K-1; k++) z[k] = b[k+1] * xn + pz[k+1] - a[k+1]*yn; z[K-1] = b[K]*xn - a[K]*yn; double *tmp = pz; pz = z; z = tmp; } if (pz != zi) memcpy(zi, pz, K*sizeof(double)); }
Relative speed compared to scipy.signal.lfilter:
One processor: 1D array, 1000000 floats: 150% 2D array, 1000 x 1000 floats, axis=0: 80% 2D array, 1000 x 1000 floats, axis=1: 150%
It is faster than lfilter except for 2D with axis=0. For 2D with axis=1 it is probably faster than lfilter due to copy-in copy-out optimization. For arrays with more than one dimension, multithreading helps as well:
Four processors: 1D array, 1000000 floats: 150% 2D array, 1000 x 1000 floats, axis=0: 160% 2D array, 1000 x 1000 floats, axis=1: 250%
Multithreading obviously does not help for filtering single vector, as there is no work to share.
Some other improvements over signal.lfilter, apart from readable Cython code:
* GIL is released. The C code is not littered with Python C API calls that prevents this.
* Filtering can be done in reverse, i.e. no need for fliplr or flipud for filtering backwards.
* Arrays can be filtered in-place.
* Cleaned up docstring.
Still it only work as a "proof of concept" with double precision floats. Adding support for other dtypes would be easy, just replace double with any of:
float float _Complex double _Complex long double long double _Complex
Regards, Sturla Molden
# Copyright (C) 2009 Sturla Molden
import numpy as np cimport numpy as np
cdef extern int _linear_filter( object a, object b, object x, object y, object z, int axis, int direction)
def lfilter(object B, object A, object X, object zi=None, object axis = None, int direction=1, object inplace=False):
""" Filter data along one-dimension with an IIR or FIR filter.
Description ===========
Filter a data sequence, x, using a digital filter. This works for many fundamental data types (including Object type). The filter is a direct form II transposed implementation of the standard difference equation (see "Algorithm").
Inputs: =======
b : vector-like The numerator coefficient vector in a 1-D sequence.
a : vector-like The denominator coefficient vector in a 1-D sequence. If a[0] is not 1, then both a and b are normalized by a[0].
x : array-like An N-dimensional input array.
axis : int or None The axis of the input data array along which to apply the linear filter. The filter is applied to each subarray along this axis. If axis is None, the filter is applied to x flattened. Default: axis=None.
zi : array-like Initial conditions for the filter delays. It is a vector (or array of vectors for an N-dimensional input) of length max(len(a),len(b)). If zi=None or is not given then initial rest is assumed. If axis is None, zi should be a vector of length K = max(M,N), where M=len(b)-1 and N=len(a)-1. If an axis is not None, zi should either be a vector of length K, or an array with same shape as x, except for zi.shape[axis] which should be K. If zi is a vector, the same initial conditions are applied to each vector along the specified axis. Default: zi=None (initial rest).
See signal.lfiltic for more information.
direction : int If negative, the filter is applied in reverse direction along the axis. Default: direction=1 (filter forward).
inplace : bool If True, allow x or zi to be modified inplace if possible. Default: False.
Outputs: (y, {zf}) =======
y : ndarray The output of the digital filter.
zf : ndarray If zi is None, this is not returned, otherwise, zf holds the final filter delay values.
Algorithm: ========== The filter function is implemented as a direct II transposed structure. This means that the filter implements
a[0]*y[n] = b[0]*x[n] + b[1]*x[n-1] + ... + b[nb]*x[n-nb] - a[1]*y[n-1] - ... - a[na]*y[n-na]
using the following difference equations:
y[m] = b[0]*x[m] + z[0,m-1] z[0,m] = b[1]*x[m] + z[1,m-1] - a[1]*y[m] ... z[n-3,m] = b[n-2]*x[m] + z[n-2,m-1] - a[n-2]*y[m] z[n-2,m] = b[n-1]*x[m] - a[n-1]*y[m]
where m is the output sample number and n=max(len(a),len(b)) is the model order.
The rational transfer function describing this filter in the z-transform domain is -1 -nb b[0] + b[1]z + ... + b[nb] z Y(z) = ---------------------------------- X(z) -1 -na a[0] + a[1]z + ... + a[na] z
"""
cdef np.ndarray b, a, x, y, z cdef np.npy_intp i, ierr cdef double tmp cdef object K, zshape, iaxis
# get filter coeffs b, a
A = np.asarray(A) B = np.asarray(B)
if (A.ndim != 1) or (B.ndim != 1): raise ValueError, 'a and b must be vectors'
K = max(len(A)-1,len(B)-1)
b = np.zeros(K+1) a = np.zeros(K+1) a[:A.shape[0]] = A[:] b[:B.shape[0]] = B[:]
# normalize by a[0] if a[0] != 1.0: tmp = a[0] for i in range(a.shape[0]): a[i] /= tmp b[i] /= tmp
# set up input signal X = np.asarray(X, dtype=np.float64) if axis is None: X = X.ravel() elif type(axis) != int: raise ValueError, 'axis must be None or an integer >= 0' elif axis < 0: raise ValueError, 'axis must be None or an integer >= 0' x = X
# set up output signal if inplace: y = X else: y = np.zeros(X.shape, dtype=np.float64)
# set up filter delay buffer
iaxis = 0 if (axis is None) else axis
if zi is None:
# zi not specified, assume initial rest
zshape = list(X.shape) zshape[iaxis] = K z = np.zeros(zshape, dtype=np.float64)
else:
zi = np.asarray(zi, dtype=np.float64)
# x and zi are both vectors
if (x.ndim == 1) and (zi.ndim == 1):
if (zi.shape[0] != K): raise ValueError, 'length of zi must be K'
if inplace: z = zi else: z = zi.copy()
# x is nd array, zi is vector # broadcast zi (cannot modify inplace)
elif (x.ndim > 1) and (zi.ndim == 1):
if (zi.shape[0] != K): raise ValueError, 'length of zi must be K'
zshape = list(X.shape) zshape[iaxis] = K z = np.zeros(zshape, dtype=np.float64)
zshape = [1] * X.ndim zshape[iaxis] = K z = z + zi.reshape(zshape)
# x and zi are both nd arrays
else:
zshape = list(X.shape) zshape[iaxis] = K zshape = tuple(zshape)
if (zi.shape != zshape): raise ValueError, ('bad shape of zi: expected %r, got %r' % (zshape,zi.shape))
if inplace: z = zi else: z = zi.copy()
# apply the linear filter
ierr = _linear_filter(a, b, x, y, z, <int> iaxis, direction)
# check for error conditions on return
if ierr == -1: raise MemoryError, 'malloc returned NULL' if ierr == -2: raise ValueError, 'shape of x, y, and z did not match... should never happen, debug please.' if ierr == -3: raise ValueError, 'shape of a and b did not match... should never happen, debug please.'
# return y or (y, z) depending on zi
# return (y if (zi is None) else (y, z)) ## Cython fails on this
if (zi is None): return y else: return (y,z)
/* Copyright (C) 2009 Sturla Molden */
#include <Python.h> #include <numpy/arrayobject.h> #include <string.h> #include <stdlib.h>
typedef PyArrayObject ndarray; /* PyArrayObject is an ugly name */ typedef Py_ssize_t integer; /* Py_ssize_t of int for 64 bit support, npy_intp is typedef'ed incorrectly. */
/* copy data to and from temporary work arrays */
static void copy_in(double *restrict y, double *restrict x, integer n, integer s) { integer i; char *tmp;
if (s == sizeof(double)) memcpy(y, x, n*sizeof(double)); else { tmp = (char *)x; for (i=0; i<n; i++) { *y++ = *x; tmp += s; x = (double *)tmp; } } }
static void copy_out(double *restrict y, double *restrict x, integer n, integer s) { integer i; char *tmp; if (s == sizeof(double)) memcpy(y, x, n*sizeof(double)); else { tmp = (char *)y; for (i=0; i<n; i++) { *y = *x++; tmp += s; y = (double *)tmp; } } }
/* this computes the start adress for every vector along a dimension (axis) of an ndarray */
inline char *datapointer(ndarray *a, integer *indices) { char *data = a->data; int i; integer j; for (i=0; i < a->nd; i++) { j = indices[i]; data += j * a->strides[i]; } return data; }
static double ** _get_axis_pointer(integer *indices, int axis, ndarray *a, double **axis_ptr_array, int dim) { /* recursive axis pointer search for 4 dimensions or more */ integer i; double *axis_ptr; if (dim == a->nd) { /* done recursing dimensions, compute address of this axis */ axis_ptr = (double *) datapointer(a, indices); *axis_ptr_array = axis_ptr; return (axis_ptr_array + 1); } else if (dim == axis) { /* use first element only */ indices[dim] = 0; axis_ptr_array = _get_axis_pointer(indices, axis, a, axis_ptr_array, dim+1); return axis_ptr_array; } else { /* iterate range of indices */ integer length = a->dimensions[dim]; for (i=0; i < length; i++) { indices[dim] = i; axis_ptr_array = _get_axis_pointer(indices, axis, a, axis_ptr_array, dim+1); } return axis_ptr_array; } }
static double **get_axis_pointers(ndarray *a, int axis, integer *count) { integer indices[a->nd]; double **out, **tmp; integer i, j; j = 1; for (i=0; i < a->nd; i++) { if (i != axis) j *= a->dimensions[i]; } *count = j;
out = (double **) malloc(*count * sizeof(double*)); if (out == NULL) { *count = 0; return NULL; } tmp = out; switch (a->nd) { /* for one dimension we just return the data pointer */ case 1: *tmp = (double *)a->data; break; /* specialized for two dimensions */ case 2: switch (axis) { case 0: for (i=0; i<a->dimensions[1]; i++) *tmp++ = (double *)(a->data + i*a->strides[1]); break;
case 1: for (i=0; i<a->dimensions[0]; i++) *tmp++ = (double *)(a->data + i*a->strides[0]); break; } break; /* specialized for three dimensions */ case 3: switch (axis) { case 0: for (i=0; i<a->dimensions[1]; i++) for (j=0; j<a->dimensions[2]; j++) *tmp++ = (double *)(a->data + i*a->strides[1] + j*a->strides[2]); break; case 1: for (i=0; i<a->dimensions[0]; i++) for (j=0; j<a->dimensions[2]; j++) *tmp++ = (double *)(a->data + i*a->strides[0] + j*a->strides[2]); break;
case 2: for (i=0; i<a->dimensions[0]; i++) for (j=0; j<a->dimensions[1]; j++) *tmp++ = (double *)(a->data + i*a->strides[0] + j*a->strides[1]);
} break; /* four or more dimensions: use recursion */ default: for (i=0; i<a->nd; i++) indices[i] = 0; _get_axis_pointer(indices, axis, a, out, 0);
} done: return out; }
/* applies a linear filter along one dimension */
static void lfilter_1d( double *restrict b, double *restrict a, double *x, double *y, double *zi, integer K, integer N) { double Z[K]; double *restrict pz = zi, *restrict z = Z; register integer n, k;
for (n=0; n<N; n++) {
register double yn, xn = x[n];
yn = y[n] = b[0] * xn + pz[0];
for (k=0; k<K-1; k++) z[k] = b[k+1] * xn + pz[k+1] - a[k+1]*yn;
z[K-1] = b[K]*xn - a[K]*yn;
double *tmp = pz; pz = z; z = tmp; }
if (pz != zi) memcpy(zi, pz, K*sizeof(double)); }
static void rev_lfilter_1d( double *restrict b, double *restrict a, double *restrict x, double *restrict y, double *zi, integer K, integer N) { double Z[K]; double *restrict pz = zi, *restrict z = Z; register integer n, k;
for (n=N-1; n >= 0; n--) {
register double yn, xn = x[n];
yn = y[n] = b[0] * xn + pz[0];
for (k=0; k<K-1; k++) z[k] = b[k+1]*xn + pz[k+1] - a[k+1]*yn;
z[K-1] = b[K]*xn - a[K]*yn;
double *tmp = pz; pz = z; z = tmp; }
if (pz != zi) memcpy(zi, pz, K*sizeof(double)); }
typedef void (*lfilterfun_t)(double *restrict b, double *restrict a, double *restrict x, double *restrict y, double *zi, integer K, integer N);
int _linear_filter(ndarray *a, ndarray *b, ndarray *x, ndarray *y, ndarray *z, int axis, int direction) { int retval = 0; double *wx, *wy, *wz, *data_a, *data_b; integer xcount, ycount, zcount; integer xstride, ystride, zstride; double *xdata, *ydata, *zdata; double **x_axis_ptrs = NULL, **y_axis_ptrs = NULL, **z_axis_ptrs = NULL; lfilterfun_t lfilter; integer i, K, N;
/* release the GIL */ Py_BEGIN_ALLOW_THREADS
/* sanity check on a and b (these should never fail) */ if (a->nd != 1) goto error_ab; if (b->nd != 1) goto error_ab; if (b->dimensions[0] != a->dimensions[0]) goto error_ab; data_a = (double *)a->data; data_b = (double *)b->data; K = a->dimensions[0] - 1;
/* sanity ckeck on x, y and z */ if ((x->nd != y->nd) || (y->nd != z->nd)) goto error_xyz; for (int d=0; d < a->nd; d++) { if (d == axis) continue; if ((x->dimensions[d] != y->dimensions[d]) || (y->dimensions[d] != z->dimensions[d])) goto error_xyz; } N = x->dimensions[axis];
/* extract axis pointers */ x_axis_ptrs = get_axis_pointers(x, axis, &xcount); y_axis_ptrs = get_axis_pointers(y, axis, &ycount); z_axis_ptrs = get_axis_pointers(z, axis, &zcount);
/* sanity check */ if (!x_axis_ptrs) goto error_malloc; if (!y_axis_ptrs) goto error_malloc; if (!z_axis_ptrs) goto error_malloc; if ((xcount != ycount)||(ycount != zcount)) goto error_xyz;
/* all ok, we can start ... */
lfilter = direction < 0 ? rev_lfilter_1d : lfilter_1d;
xstride = x->strides[axis]; ystride = y->strides[axis]; zstride = z->strides[axis];
#pragma omp parallel \ firstprivate(lfilter, N, K, xcount, x_axis_ptrs, y_axis_ptrs, z_axis_ptrs, \ xstride, ystride, zstride, data_b, data_a) \ private(i, xdata, ydata, zdata, wx, wy, wz) \ shared(retval) \ default(none) { if ((xstride==sizeof(double)) && (ystride==sizeof(double)) && (xstride==sizeof(double))) {
#pragma omp for schedule(static) for (i=0; i < xcount; i++) { xdata = x_axis_ptrs[i]; ydata = y_axis_ptrs[i]; zdata = z_axis_ptrs[i]; /* strictly speaking we could be aliasing xdata and ydata here, but it should not matter. */ lfilter(data_b, data_a, xdata, ydata, zdata, K, N); }
} else {
/* axis is strided, use copy-in copy-out */
wx = (double *)malloc((2*N+K)*sizeof(double)); if (!wx) { #pragma omp critical retval = -1; /* error_malloc */ } #pragma omp barrier #pragma omp flush(retval) if (retval < 0) /* malloc failed in some thread */ goto filtering_complete; wy = wx + N; wz = wy + N;
#pragma omp for schedule(static) for (i=0; i < xcount; i++) {
/* get pointe to the first element of the axis */ xdata = x_axis_ptrs[i]; ydata = y_axis_ptrs[i]; zdata = z_axis_ptrs[i];
/* copy to local contiguous buffers */ copy_in(wx, xdata, N, xstride); copy_in(wz, zdata, K, zstride);
/* filter */ lfilter(data_b, data_a, wx, wy, wz, K, N);
/* copy data back */ copy_out(ydata, wy, N, ystride); copy_out(zdata, wz, K, zstride);
} filtering_complete: if (wx) free(wx); } } /* we are done... */ goto done;
/* error conditions */ error_ab: retval = -3; goto done; error_xyz: retval = -2; goto done; error_malloc: retval = -1;
/* clean up and exit */ done:
if (x_axis_ptrs) free(x_axis_ptrs); if (y_axis_ptrs) free(y_axis_ptrs); if (z_axis_ptrs) free(z_axis_ptrs);
/* get the GIL back and return */ Py_END_ALLOW_THREADS
return retval; }
_______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev

Ralf Gommers skrev:
Looks like a very useful improvement.
Your docstring won't render well, and there is already a cleaned up version of the old one here: http://docs.scipy.org/scipy/docs/scipy.signal.signaltools.lfilter/ Could you please integrate those changes with yours?
The code from SVN I linked is current version used in signaltools. Also lost of the lfilter C code deal with complex numbers, which is something an ISO compliant C or C++ compiler can do for us. #ifdef __cplusplus #include <complex> typedef std::complex<float> complex_float; typedef std::complex<double> complex_double; typedef std::complex<long double> complex_long_double; #define restrict #else #include <complex.h> typedef float _Complex complex_float; typedef double _Complex complex_double; typedef long double _Complex complex_long_double; #endif Perhaps a better option is to use C++ only... In any case, it is better to leave the complex number arithmetics to the compiler, instead of bloating the source. Sturla Molden

On Tue, Sep 22, 2009 at 3:19 PM, Sturla Molden <sturla@molden.no> wrote:
Ralf Gommers skrev:
Looks like a very useful improvement.
Your docstring won't render well, and there is already a cleaned up version of the old one here: http://docs.scipy.org/scipy/docs/scipy.signal.signaltools.lfilter/ Could you please integrate those changes with yours?
The code from SVN I linked is current version used in signaltools.
Sure, I know. The docs I linked have improvements over svn. If you merge
your changes, there'll be a conflict and those docs will not get merged soon - I was hoping to avoid that. I also wanted to draw your attention to the doc wiki, since in this case you redid work that was already done by someone else on the wiki before. Hopefully this will save you some effort next time. Cheers, Ralf

Ralf Gommers skrev:
Sure, I know. The docs I linked have improvements over svn. If you merge your changes, there'll be a conflict and those docs will not get merged soon - I was hoping to avoid that.
I also wanted to draw your attention to the doc wiki, since in this case you redid work that was already done by someone else on the wiki before. Hopefully this will save you some effort next time.
Well, I changed all the code to C++, to use the std::complex type, std::vector instead of malloc, and templates for specializing the filter functions to all dtypes supported by lfilter. I'm quite happy with the way the C++ looks :-) It seems to be faster than signal.lfilter in most circumstances, except one (annoyingly the most common). The most extreme case were complex64 dtype with inplace filtering, for which I got 369% speed improvement. Here are some timings (speed improvement over signal.lfilter in percent): <type 'numpy.float32'> axis=0, shape=(1000000,), speed: 240 axis=0, shape=(1000, 1000), speed: 149 axis=1, shape=(1000, 1000), speed: 238 <type 'numpy.float64'> axis=0, shape=(1000000,), speed: 151 axis=0, shape=(1000, 1000), speed: 93 axis=1, shape=(1000, 1000), speed: 146 <type 'numpy.complex64'> axis=0, shape=(1000000,), speed: 297 axis=0, shape=(1000, 1000), speed: 204 axis=1, shape=(1000, 1000), speed: 292 <type 'numpy.complex128'> axis=0, shape=(1000000,), speed: 209 axis=0, shape=(1000, 1000), speed: 137 axis=1, shape=(1000, 1000), speed: 193 Inplace filter: <type 'numpy.float32'> axis=0, shape=(1000000,), speed: 291 axis=0, shape=(1000, 1000), speed: 182 axis=1, shape=(1000, 1000), speed: 300 <type 'numpy.float64'> axis=0, shape=(1000000,), speed: 227 axis=0, shape=(1000, 1000), speed: 137 axis=1, shape=(1000, 1000), speed: 228 <type 'numpy.complex64'> axis=0, shape=(1000000,), speed: 369 axis=0, shape=(1000, 1000), speed: 257 axis=1, shape=(1000, 1000), speed: 345 <type 'numpy.complex128'> axis=0, shape=(1000000,), speed: 288 axis=0, shape=(1000, 1000), speed: 179 axis=1, shape=(1000, 1000), speed: 276 To build on Windows I did this, using GCC 4.4.1 from http://www.equation.com/servlet/equation.cmd?fa=fortran g++ -c -O3 -ffast-math -msse3 -march=core2 -Ic:/Python26/include -Ic:/Python26/Lib/site-packages/numpy/core/include _linear_filter.cpp cython.py linear_filter.pyx gcc -c -O2 -ffast-math -msse3 -march=core2 -Ic:/Python26/include -Ic:/Python26/Lib/site-packages/numpy/core/include linear_filter.c g++ -shared -o linear_filter.pyd -Lc:/Python26/libs _linear_filter.o linear_filter.o -lpython26 -lmsvcr90 Best regards, Sturla Molden /* Copyright (C) 2009 Sturla Molden */ #include <vector> #include <complex> typedef std::complex<float> complex_float; typedef std::complex<double> complex_double; typedef std::complex<long double> complex_long_double; #include <Python.h> #include <numpy/arrayobject.h> #include <cstring> typedef PyArrayObject ndarray; /* PyArrayObject is an ugly name */ typedef Py_ssize_t integer; /* Py_ssize_t of int for 64 bit support, npy_intp is typedef'ed incorrectly. */ /* copy data to and from temporary work arrays */ template<class T> void copy_in(T *y, T *x, integer n, integer s) { integer i; char *tmp; if (s == sizeof(T)) std::memcpy(y, x, n*sizeof(T)); else { tmp = (char *)x; for (i=0; i<n; i++) { *y++ = *x; tmp += s; x = (T *)tmp; } } } template<class T> void copy_out(T *y, T *x, integer n, integer s) { integer i; char *tmp; if (s == sizeof(T)) std::memcpy(y, x, n*sizeof(T)); else { tmp = (char *)y; for (i=0; i<n; i++) { *y = *x++; tmp += s; y = (T *)tmp; } } } /* this computes the start adress for every vector along a dimension (axis) of an ndarray */ template<class T> inline T *datapointer(ndarray *a, integer *indices, T dummy) { char *data = a->data; int i; integer j; for (i=0; i < a->nd; i++) { j = indices[i]; data += j * a->strides[i]; } return (T *)data; } template<class T> T ** _get_axis_pointer(integer *indices, int axis, ndarray *a, T **axis_ptr_array, int dim, T dummy) { /* recursive axis pointer search for 4 dimensions or more */ integer i; T *axis_ptr; if (dim == a->nd) { /* done recursing dimensions, compute address of this axis */ axis_ptr = datapointer(a, indices, dummy); *axis_ptr_array = axis_ptr; return (axis_ptr_array + 1); } else if (dim == axis) { /* use first element only */ indices[dim] = 0; axis_ptr_array = _get_axis_pointer(indices, axis, a, axis_ptr_array, dim+1, dummy); return axis_ptr_array; } else { /* iterate range of indices */ integer length = a->dimensions[dim]; for (i=0; i < length; i++) { indices[dim] = i; axis_ptr_array = _get_axis_pointer(indices, axis, a, axis_ptr_array, dim+1, dummy); } return axis_ptr_array; } } template<class T> std::vector<T*> get_axis_pointers(ndarray *a, int axis, integer *count, T dummy) { std::vector<integer> indices(a->nd); integer i, j; j = 1; for (i=0; i < a->nd; i++) { if (i != axis) j *= a->dimensions[i]; } *count = j; std::vector<T*> out(*count); T **tmp = &out[0]; switch (a->nd) { /* for one dimension we just return the data pointer */ case 1: *tmp = (T *)a->data; break; /* specialized for two dimensions */ case 2: switch (axis) { case 0: for (i=0; i<a->dimensions[1]; i++) *tmp++ = (T *)(a->data + i*a->strides[1]); break; case 1: for (i=0; i<a->dimensions[0]; i++) *tmp++ = (T *)(a->data + i*a->strides[0]); break; } break; /* specialized for three dimensions */ case 3: switch (axis) { case 0: for (i=0; i<a->dimensions[1]; i++) for (j=0; j<a->dimensions[2]; j++) *tmp++ = (T *)(a->data + i*a->strides[1] + j*a->strides[2]); break; case 1: for (i=0; i<a->dimensions[0]; i++) for (j=0; j<a->dimensions[2]; j++) *tmp++ = (T *)(a->data + i*a->strides[0] + j*a->strides[2]); break; case 2: for (i=0; i<a->dimensions[0]; i++) for (j=0; j<a->dimensions[1]; j++) *tmp++ = (T *)(a->data + i*a->strides[0] + j*a->strides[1]); } break; /* four or more dimensions: use recursion */ default: for (i=0; i<a->nd; i++) indices[i] = 0; integer *pind = &indices[0]; _get_axis_pointer(pind, axis, a, tmp, 0, (T)0); } return out; } /* applies a linear filter along one dimension */ template<class T> static void lfilter_1d( const T *b, const T *a, T *x, T *y, T *zi, integer K, integer N) { std::vector<T> Z(K); T *pz = zi, *z = &Z[0]; register integer n, k; for (n=0; n<N; n++) { register T yn, xn = x[n]; yn = y[n] = b[0] * xn + pz[0]; for (k=0; k<K-1; k++) z[k] = b[k+1] * xn + pz[k+1] - a[k+1]*yn; z[K-1] = b[K]*xn - a[K]*yn; T *tmp = pz; pz = z; z = tmp; } if (pz != zi) std::memcpy(zi, pz, K*sizeof(T)); } template<class T> static void rev_lfilter_1d( const T *b, const T *a, T *x, T *y, T *zi, integer K, integer N) { std::vector<T> Z(K); T *pz = zi, *z = &Z[0]; register integer n, k; for (n=N-1; n >= 0; n--) { register T yn, xn = x[n]; yn = y[n] = b[0] * xn + pz[0]; for (k=0; k<K-1; k++) z[k] = b[k+1]*xn + pz[k+1] - a[k+1]*yn; z[K-1] = b[K]*xn - a[K]*yn; T *tmp = pz; pz = z; z = tmp; } if (pz != zi) std::memcpy(zi, pz, K*sizeof(T)); } template<class T> static void apply_filter(ndarray *x, ndarray *y, ndarray *z, T *b, T *a, int axis, int direction, integer N, integer K, T dummy) { int retval = 0; /* extract axis pointers */ integer xcount, ycount, zcount; std::vector<T*> x_axis_ptrs = get_axis_pointers(x, axis, &xcount, (T)0); std::vector<T*> y_axis_ptrs = get_axis_pointers(y, axis, &ycount, (T)0); std::vector<T*> z_axis_ptrs = get_axis_pointers(z, axis, &zcount, (T)0); /* all ok, we can start ... */ integer xstride, ystride, zstride; xstride = x->strides[axis]; ystride = y->strides[axis]; zstride = z->strides[axis]; if ((xstride==sizeof(T)) && (ystride==sizeof(T)) && (xstride==sizeof(T))) { /* axis is contiguous, do not copy */ for (integer i=0; i < xcount; i++) { T *xdata = x_axis_ptrs[i]; T *ydata = y_axis_ptrs[i]; T *zdata = z_axis_ptrs[i]; if (direction < 0) rev_lfilter_1d(b, a, xdata, ydata, zdata, K, N); else lfilter_1d(b, a, xdata, ydata, zdata, K, N); } } else { /* axis is strided, use copy-in copy-out */ std::vector<T> wx(N); std::vector<T> wy(N); std::vector<T> wz(K); for (integer i=0; i < xcount; i++) { /* get pointer to the first element of the axis */ T *xdata = x_axis_ptrs[i]; T *ydata = y_axis_ptrs[i]; T *zdata = z_axis_ptrs[i]; /* copy to local contiguous buffers */ copy_in(&wx[0], xdata, N, xstride); copy_in(&wz[0], zdata, K, zstride); /* filter */ if (direction < 0) rev_lfilter_1d(b, a, &wx[0], &wy[0], &wz[0], K, N); else lfilter_1d(b, a, &wx[0], &wy[0], &wz[0], K, N); /* copy data back */ copy_out(ydata, &wy[0], N, ystride); copy_out(zdata, &wz[0], K, zstride); } } } static int __linear_filter(ndarray *a, ndarray *b, ndarray *x, ndarray *y, ndarray *z, int axis, int direction) { int retval = 0; integer K = a->dimensions[0] - 1; integer N = x->dimensions[axis]; /* proceed to filter */ switch (a->descr->type_num) { case NPY_FLOAT: apply_filter<float>(x, y, z, (float *)(b->data), (float *)(a->data), axis, direction, N, K, (float)0); break; case NPY_DOUBLE: apply_filter(x, y, z, (double *)(b->data), (double *)(a->data), axis, direction, N, K, (double)0); break; case NPY_CFLOAT: apply_filter(x, y, z, (complex_float *)(b->data), (complex_float *)(a->data), axis, direction, N, K, (complex_float)0); break; case NPY_CDOUBLE: apply_filter(x, y, z, (complex_double *)(b->data), (complex_double *)(a->data), axis, direction, N, K, (complex_double)0); break; case NPY_LONGDOUBLE: apply_filter(x, y, z, (long double *)(b->data), (long double *)(a->data), axis, direction, N, K, (long double)0); break; case NPY_CLONGDOUBLE: apply_filter(x, y, z, (complex_long_double *)(b->data), (complex_long_double *)(a->data), axis, direction, N, K, (complex_long_double)0); break; default: retval = -1; } return retval; } extern "C" int _linear_filter(ndarray *a, ndarray *b, ndarray *x, ndarray *y, ndarray *z, int axis, int direction) { int retval = 0; Py_BEGIN_ALLOW_THREADS try { retval = __linear_filter(a, b, x, y, z, axis, direction); } catch (...) { /* possibly out of memory */ retval = -2; } Py_END_ALLOW_THREADS return retval; } # Copyright (C) 2009 Sturla Molden import numpy as np cimport numpy as np cdef extern int _linear_filter( object a, object b, object x, object y, object z, int axis, int direction) def lfilter(object B, object A, object X, object zi=None, object axis = None, int direction=1, object inplace=False): """ Filter data along one-dimension with an IIR or FIR filter. Description =========== Filter a data sequence, x, using a digital filter. This works for many fundamental data types (including Object type). The filter is a direct form II transposed implementation of the standard difference equation (see "Algorithm"). Inputs: ======= b : vector-like The numerator coefficient vector in a 1-D sequence. a : vector-like The denominator coefficient vector in a 1-D sequence. If a[0] is not 1, then both a and b are normalized by a[0]. x : array-like An N-dimensional input array. axis : int or None The axis of the input data array along which to apply the linear filter. The filter is applied to each subarray along this axis. If axis is None, the filter is applied to x flattened. Default: axis=None. zi : array-like Initial conditions for the filter delays. It is a vector (or array of vectors for an N-dimensional input) of length max(len(a),len(b)). If zi=None or is not given then initial rest is assumed. If axis is None, zi should be a vector of length K = max(M,N), where M=len(b)-1 and N=len(a)-1. If an axis is not None, zi should either be a vector of length K, or an array with same shape as x, except for zi.shape[axis] which should be K. If zi is a vector, the same initial conditions are applied to each vector along the specified axis. Default: zi=None (initial rest). See signal.lfiltic for more information. direction : int If negative, the filter is applied in reverse direction along the axis. Default: direction=1 (filter forward). inplace : bool If True, allow x or zi to be modified inplace if possible. Default: False. Outputs: (y, {zf}) ======= y : ndarray The output of the digital filter. zf : ndarray If zi is None, this is not returned, otherwise, zf holds the final filter delay values. Algorithm: ========== The filter function is implemented as a direct II transposed structure. This means that the filter implements a[0]*y[n] = b[0]*x[n] + b[1]*x[n-1] + ... + b[nb]*x[n-nb] - a[1]*y[n-1] - ... - a[na]*y[n-na] using the following difference equations: y[m] = b[0]*x[m] + z[0,m-1] z[0,m] = b[1]*x[m] + z[1,m-1] - a[1]*y[m] ... z[n-3,m] = b[n-2]*x[m] + z[n-2,m-1] - a[n-2]*y[m] z[n-2,m] = b[n-1]*x[m] - a[n-1]*y[m] where m is the output sample number and n=max(len(a),len(b)) is the model order. The rational transfer function describing this filter in the z-transform domain is -1 -nb b[0] + b[1]z + ... + b[nb] z Y(z) = ---------------------------------- X(z) -1 -na a[0] + a[1]z + ... + a[na] z """ cdef np.ndarray b, a, x, y, z cdef np.npy_intp i, ierr cdef object tmp, K, zshape, iaxis, dtype X = np.asarray(X) dtype = X.dtype # get filter coeffs b, a A = np.asarray(A, dtype=dtype) B = np.asarray(B, dtype=dtype) if (A.ndim != 1) or (B.ndim != 1): raise ValueError, 'a and b must be vectors' K = max(len(A)-1,len(B)-1) b = np.zeros(K+1, dtype=dtype) a = np.zeros(K+1, dtype=dtype) a[:A.shape[0]] = A[:] b[:B.shape[0]] = B[:] # normalize by a[0] tmp = a[0] a = a / tmp b = b / tmp # set up input signal if axis is None: X = X.ravel() elif type(axis) != int: raise ValueError, 'axis must be None or an integer >= 0' elif axis < 0: raise ValueError, 'axis must be None or an integer >= 0' x = X # set up output signal if inplace: y = X else: y = np.zeros(X.shape, dtype=dtype) # set up filter delay buffer iaxis = 0 if (axis is None) else axis if zi is None: # zi not specified, assume initial rest zshape = list(X.shape) zshape[iaxis] = K z = np.zeros(zshape, dtype=dtype) else: zi = np.asarray(zi, dtype=np.float64) # x and zi are both vectors if (x.ndim == 1) and (zi.ndim == 1): if (zi.shape[0] != K): raise ValueError, 'length of zi must be K' if inplace: z = zi else: z = zi.copy() # x is nd array, zi is vector # broadcast zi (cannot modify inplace) elif (x.ndim > 1) and (zi.ndim == 1): if (zi.shape[0] != K): raise ValueError, 'length of zi must be K' zshape = list(X.shape) zshape[iaxis] = K z = np.zeros(zshape, dtype=dtype) zshape = [1] * X.ndim zshape[iaxis] = K z = z + zi.reshape(zshape) # x and zi are both nd arrays else: zshape = list(X.shape) zshape[iaxis] = K zshape = tuple(zshape) if (zi.shape != zshape): raise ValueError, ('bad shape of zi: expected %r, got %r' % (zshape,zi.shape)) if inplace: z = zi else: z = zi.copy() # apply the linear filter ierr = _linear_filter(a, b, x, y, z, <int> iaxis, direction) # check for error conditions on return if ierr == -2: raise MemoryError, 'C++ exception trapped, possibly out of memory.' if ierr == -1: raise TypeError, ('dtype %r not supported' % (a.dtype)) # return y or (y, z) depending on zi # return (y if (zi is None) else (y, z)) ## Cython fails on this if (zi is None): return y else: return (y,z)

Sturla Molden skrev:
It seems to be faster than signal.lfilter in most circumstances, except one (annoyingly the most common). The most extreme case were complex64 dtype with inplace filtering, for which I got 369% speed improvement.
Here are some timings (speed improvement over signal.lfilter in percent):
Sorry, my english were bad. The numbers are not speed "improvement", they are speed relative to signal.lfilter. S.M.

Sturla Molden skrev:
Well, I changed all the code to C++, to use the std::complex type, std::vector instead of malloc, and templates for specializing the filter functions to all dtypes supported by lfilter. I'm quite happy with the way the C++ looks :-)
I forgot to add support for the object dtype. This also fixes a bug in the current signal.lfilter. This will crash the interpreter: import numpy as np import scipy from scipy.signal import butter, lfilter b,a = butter(4, .25) x = np.random.randn(1000000).astype(object).reshape((1000,1000)) fx = lfilter(b,a,x, axis=1) The C++ version here does not crash on this. :-) I just ripped some code from current lfilter and cleaned it up slightly; I hope that is ok. (Cython was not happy about "cdef object *ptr", so I reluctantly had to put it all in C++.) The reason my code does not crash like current lfilter, is copy-in copy-out for strided vectors. It seems lfilter messes up the strides and then segfaults. <type 'object'> axis=0, shape=(1000000,), speed: 102 axis=0, shape=(1000, 1000), speed: 105 axis=1, shape=(1000, 1000), scipy.signal.lfilter crashes the interpreter!!! Building on Windows: g++ -c -O3 -ffast-math -msse3 -march=core2 -Ic:/Python26/include -Ic:/Python26/Lib/site-packages/numpy/core/include _linear_filter.cpp cython.py --cplus linear_filter.pyx g++ -c -O2 -ffast-math -msse3 -march=core2 -Ic:/Python26/include -Ic:/Python26/Lib/site-packages/numpy/core/include linear_filter.cpp g++ -shared -o linear_filter.pyd -Lc:/Python26/libs _linear_filter.o linear_filter.o -lpython26 -lmsvcr90 Regards, Sturla Molden // Copyright (C) 2009 Sturla Molden // SciPy license #include <vector> #include <complex> typedef std::complex<float> complex_float; typedef std::complex<double> complex_double; typedef std::complex<long double> complex_long_double; #include <Python.h> #include <numpy/arrayobject.h> #include <cstring> typedef PyArrayObject ndarray; /* PyArrayObject is an ugly name */ typedef Py_ssize_t integer; /* Py_ssize_t of int for 64 bit support, npy_intp is typedef'ed incorrectly. */ typedef PyObject *object; /* copy data to and from temporary work arrays */ template<class T> void copy_in(T *y, T *x, integer n, integer s) { integer i; char *tmp; if (s == sizeof(T)) std::memcpy(y, x, n*sizeof(T)); else { tmp = (char *)x; for (i=0; i<n; i++) { *y++ = *x; tmp += s; x = (T *)tmp; } } } template<class T> void copy_out(T *y, T *x, integer n, integer s) { integer i; char *tmp; if (s == sizeof(T)) std::memcpy(y, x, n*sizeof(T)); else { tmp = (char *)y; for (i=0; i<n; i++) { *y = *x++; tmp += s; y = (T *)tmp; } } } /* this computes the start adress for every vector along a dimension (axis) of an ndarray */ template<class T> inline T *datapointer(ndarray *a, integer *indices, T dummy) { char *data = a->data; int i; integer j; for (i=0; i < a->nd; i++) { j = indices[i]; data += j * a->strides[i]; } return (T *)data; } template<class T> T ** _get_axis_pointer(integer *indices, int axis, ndarray *a, T **axis_ptr_array, int dim, T dummy) { /* recursive axis pointer search for 4 dimensions or more */ integer i; T *axis_ptr; if (dim == a->nd) { /* done recursing dimensions, compute address of this axis */ axis_ptr = datapointer(a, indices, dummy); *axis_ptr_array = axis_ptr; return (axis_ptr_array + 1); } else if (dim == axis) { /* use first element only */ indices[dim] = 0; axis_ptr_array = _get_axis_pointer(indices, axis, a, axis_ptr_array, dim+1, dummy); return axis_ptr_array; } else { /* iterate range of indices */ integer length = a->dimensions[dim]; for (i=0; i < length; i++) { indices[dim] = i; axis_ptr_array = _get_axis_pointer(indices, axis, a, axis_ptr_array, dim+1, dummy); } return axis_ptr_array; } } template<class T> std::vector<T*> get_axis_pointers(ndarray *a, int axis, integer *count, T dummy) { std::vector<integer> indices(a->nd); integer i, j; j = 1; for (i=0; i < a->nd; i++) { if (i != axis) j *= a->dimensions[i]; } *count = j; std::vector<T*> out(*count); T **tmp = &out[0]; switch (a->nd) { /* for one dimension we just return the data pointer */ case 1: *tmp = (T *)a->data; break; /* specialized for two dimensions */ case 2: switch (axis) { case 0: for (i=0; i<a->dimensions[1]; i++) *tmp++ = (T *)(a->data + i*a->strides[1]); break; case 1: for (i=0; i<a->dimensions[0]; i++) *tmp++ = (T *)(a->data + i*a->strides[0]); break; } break; /* specialized for three dimensions */ case 3: switch (axis) { case 0: for (i=0; i<a->dimensions[1]; i++) for (j=0; j<a->dimensions[2]; j++) *tmp++ = (T *)(a->data + i*a->strides[1] + j*a->strides[2]); break; case 1: for (i=0; i<a->dimensions[0]; i++) for (j=0; j<a->dimensions[2]; j++) *tmp++ = (T *)(a->data + i*a->strides[0] + j*a->strides[2]); break; case 2: for (i=0; i<a->dimensions[0]; i++) for (j=0; j<a->dimensions[1]; j++) *tmp++ = (T *)(a->data + i*a->strides[0] + j*a->strides[1]); } break; /* four or more dimensions: use recursion */ default: for (i=0; i<a->nd; i++) indices[i] = 0; integer *pind = &indices[0]; _get_axis_pointer(pind, axis, a, tmp, 0, (T)0); } return out; } /* linear filter along one dimension */ template<class T> void lfilter_1d( const T *b, const T *a, T *x, T *y, T *zi, integer K, integer N) { std::vector<T> Z(K); T *pz = zi, *z = &Z[0]; register integer n, k; for (n=0; n<N; n++) { register T yn, xn = x[n]; yn = y[n] = b[0] * xn + pz[0]; for (k=0; k<K-1; k++) z[k] = b[k+1] * xn + pz[k+1] - a[k+1]*yn; z[K-1] = b[K]*xn - a[K]*yn; T *tmp = pz; pz = z; z = tmp; } if (pz != zi) std::memcpy(zi, pz, K*sizeof(T)); } template<class T> void rev_lfilter_1d( const T *b, const T *a, T *x, T *y, T *zi, integer K, integer N) { std::vector<T> Z(K); T *pz = zi, *z = &Z[0]; register integer n, k; for (n=N-1; n >= 0; n--) { register T yn, xn = x[n]; yn = y[n] = b[0] * xn + pz[0]; for (k=0; k<K-1; k++) z[k] = b[k+1]*xn + pz[k+1] - a[k+1]*yn; z[K-1] = b[K]*xn - a[K]*yn; T *tmp = pz; pz = z; z = tmp; } if (pz != zi) std::memcpy(zi, pz, K*sizeof(T)); } /* template specialization for the general object case modified from current scipy/signal/lfilter.c.src */ template<> void lfilter_1d<object>(const object *b, const object *a, object *x, object *y, object *Z, integer K, integer N) { object *ptr_x = x, *ptr_y = y; object *ptr_Z; const object *ptr_b; const object *ptr_a; object *xn, *yn; const object *a0 = a; object tmp1, tmp2, tmp3; integer n; integer k; /* My reference counting might not be right */ for (n = 0; n < N; n++) { ptr_b = b; /* Reset a and b pointers */ ptr_a = a; xn = ptr_x; yn = ptr_y; if (K) { ptr_Z = Z; /* Calculate first delay (output) */ tmp1 = PyNumber_Multiply(*ptr_b, *xn); tmp2 = PyNumber_Divide(tmp1, *a0); tmp3 = PyNumber_Add(tmp2, *ptr_Z); Py_XDECREF(*yn); *yn = tmp3; Py_DECREF(tmp1); Py_DECREF(tmp2); ptr_b++; ptr_a++; /* Fill in middle delays */ for (k = 0; k < K-1; k++) { tmp1 = PyNumber_Multiply(*xn, *ptr_b); tmp2 = PyNumber_Divide(tmp1, *a0); tmp3 = PyNumber_Add(tmp2, ptr_Z[1]); Py_DECREF(tmp1); Py_DECREF(tmp2); tmp1 = PyNumber_Multiply(*yn, *ptr_a); tmp2 = PyNumber_Divide(tmp1, *a0); Py_DECREF(tmp1); Py_XDECREF(*ptr_Z); *ptr_Z = PyNumber_Subtract(tmp3, tmp2); Py_DECREF(tmp2); Py_DECREF(tmp3); ptr_b++; ptr_a++; ptr_Z++; } /* Calculate last delay */ tmp1 = PyNumber_Multiply(*xn, *ptr_b); tmp3 = PyNumber_Divide(tmp1, *a0); Py_DECREF(tmp1); tmp1 = PyNumber_Multiply(*yn, *ptr_a); tmp2 = PyNumber_Divide(tmp1, *a0); Py_DECREF(tmp1); Py_XDECREF(*ptr_Z); *ptr_Z = PyNumber_Subtract(tmp3, tmp2); Py_DECREF(tmp2); Py_DECREF(tmp3); } else { tmp1 = PyNumber_Multiply(*xn, *ptr_b); Py_XDECREF(*yn); *yn = PyNumber_Divide(tmp1, *a0); Py_DECREF(tmp1); } ptr_y++; /* Move to next input/output point */ ptr_x++; } } template<> void rev_lfilter_1d<object>(const object *b, const object *a, object *x, object *y, object *Z, integer K, integer N) { object *ptr_x = x + N - 1, *ptr_y = y + N - 1; object *ptr_Z; const object *ptr_b; const object *ptr_a; object *xn, *yn; const object *a0 = a; object tmp1, tmp2, tmp3; integer n; integer k; /* My reference counting might not be right */ for (n = 0; n < N; n++) { ptr_b = b; /* Reset a and b pointers */ ptr_a = a; xn = ptr_x; yn = ptr_y; if (K) { ptr_Z = Z; /* Calculate first delay (output) */ tmp1 = PyNumber_Multiply(*ptr_b, *xn); tmp2 = PyNumber_Divide(tmp1, *a0); tmp3 = PyNumber_Add(tmp2, *ptr_Z); Py_XDECREF(*yn); *yn = tmp3; Py_DECREF(tmp1); Py_DECREF(tmp2); ptr_b++; ptr_a++; /* Fill in middle delays */ for (k = 0; k < K-1; k++) { tmp1 = PyNumber_Multiply(*xn, *ptr_b); tmp2 = PyNumber_Divide(tmp1, *a0); tmp3 = PyNumber_Add(tmp2, ptr_Z[1]); Py_DECREF(tmp1); Py_DECREF(tmp2); tmp1 = PyNumber_Multiply(*yn, *ptr_a); tmp2 = PyNumber_Divide(tmp1, *a0); Py_DECREF(tmp1); Py_XDECREF(*ptr_Z); *ptr_Z = PyNumber_Subtract(tmp3, tmp2); Py_DECREF(tmp2); Py_DECREF(tmp3); ptr_b++; ptr_a++; ptr_Z++; } /* Calculate last delay */ tmp1 = PyNumber_Multiply(*xn, *ptr_b); tmp3 = PyNumber_Divide(tmp1, *a0); Py_DECREF(tmp1); tmp1 = PyNumber_Multiply(*yn, *ptr_a); tmp2 = PyNumber_Divide(tmp1, *a0); Py_DECREF(tmp1); Py_XDECREF(*ptr_Z); *ptr_Z = PyNumber_Subtract(tmp3, tmp2); Py_DECREF(tmp2); Py_DECREF(tmp3); } else { tmp1 = PyNumber_Multiply(*xn, *ptr_b); Py_XDECREF(*yn); *yn = PyNumber_Divide(tmp1, *a0); Py_DECREF(tmp1); } ptr_y--; /* Move to next input/output point */ ptr_x--; } } /* this function does copy-in copy-out optimization and applies the filter to the signal */ template<class T> static void apply_filter(ndarray *x, ndarray *y, ndarray *z, T *b, T *a, int axis, int direction, integer N, integer K, T dummy) { int retval = 0; /* extract axis pointers */ integer xcount, ycount, zcount; std::vector<T*> x_axis_ptrs = get_axis_pointers(x, axis, &xcount, (T)0); std::vector<T*> y_axis_ptrs = get_axis_pointers(y, axis, &ycount, (T)0); std::vector<T*> z_axis_ptrs = get_axis_pointers(z, axis, &zcount, (T)0); /* all ok, we can start ... */ integer xstride, ystride, zstride; xstride = x->strides[axis]; ystride = y->strides[axis]; zstride = z->strides[axis]; if ((xstride==sizeof(T)) && (ystride==sizeof(T)) && (xstride==sizeof(T))) { /* axis is contiguous, do not copy */ for (integer i=0; i < xcount; i++) { T *xdata = x_axis_ptrs[i]; T *ydata = y_axis_ptrs[i]; T *zdata = z_axis_ptrs[i]; if (direction < 0) rev_lfilter_1d(b, a, xdata, ydata, zdata, K, N); else lfilter_1d(b, a, xdata, ydata, zdata, K, N); } } else { /* axis is strided, use copy-in copy-out */ std::vector<T> wx(N); std::vector<T> wy(N); std::vector<T> wz(K); for (integer i=0; i < xcount; i++) { /* get pointer to the first element of the axis */ T *xdata = x_axis_ptrs[i]; T *ydata = y_axis_ptrs[i]; T *zdata = z_axis_ptrs[i]; /* copy to local contiguous buffers */ copy_in(&wx[0], xdata, N, xstride); copy_in(&wz[0], zdata, K, zstride); /* filter */ if (direction < 0) rev_lfilter_1d(b, a, &wx[0], &wy[0], &wz[0], K, N); else lfilter_1d(b, a, &wx[0], &wy[0], &wz[0], K, N); /* copy data back */ copy_out(ydata, &wy[0], N, ystride); copy_out(zdata, &wz[0], K, zstride); } } } static int __linear_filter(ndarray *a, ndarray *b, ndarray *x, ndarray *y, ndarray *z, int axis, int direction) { int retval = 0; integer K = a->dimensions[0] - 1; integer N = x->dimensions[axis]; /* proceed to filter */ switch (a->descr->type_num) { case NPY_FLOAT: Py_BEGIN_ALLOW_THREADS apply_filter<float>(x, y, z, (float *)(b->data), (float *)(a->data), axis, direction, N, K, (float)0); Py_END_ALLOW_THREADS break; case NPY_DOUBLE: Py_BEGIN_ALLOW_THREADS apply_filter(x, y, z, (double *)(b->data), (double *)(a->data), axis, direction, N, K, (double)0); Py_END_ALLOW_THREADS break; case NPY_CFLOAT: Py_BEGIN_ALLOW_THREADS apply_filter(x, y, z, (complex_float *)(b->data), (complex_float *)(a->data), axis, direction, N, K, (complex_float)0); Py_END_ALLOW_THREADS break; case NPY_CDOUBLE: Py_BEGIN_ALLOW_THREADS apply_filter(x, y, z, (complex_double *)(b->data), (complex_double *)(a->data), axis, direction, N, K, (complex_double)0); Py_END_ALLOW_THREADS break; case NPY_LONGDOUBLE: Py_BEGIN_ALLOW_THREADS apply_filter(x, y, z, (long double *)(b->data), (long double *)(a->data), axis, direction, N, K, (long double)0); Py_END_ALLOW_THREADS break; case NPY_CLONGDOUBLE: Py_BEGIN_ALLOW_THREADS apply_filter(x, y, z, (complex_long_double *)(b->data), (complex_long_double *)(a->data), axis, direction, N, K, (complex_long_double)0); Py_END_ALLOW_THREADS break; case NPY_OBJECT: apply_filter(x, y, z, (object *)(b->data), (object *)(a->data), axis, direction, N, K, (object)0); break; default: retval = -1; /* unsupported type */ } return retval; } extern "C" int _linear_filter(ndarray *a, ndarray *b, ndarray *x, ndarray *y, ndarray *z, int axis, int direction) { int retval = 0; try { retval = __linear_filter(a, b, x, y, z, axis, direction); } catch (int a) { /* Python exception */ retval = -3; } catch (...) { /* possibly out of memory */ retval = -2; } return retval; } # Copyright (C) 2009 Sturla Molden # SciPy license import numpy as np cimport numpy as np cdef extern int _linear_filter( object a, object b, object x, object y, object z, int axis, int direction) def lfilter(object B, object A, object X, object zi=None, object axis = None, int direction=1, object inplace=False): cdef np.ndarray b, a, x, y, z cdef np.npy_intp i, ierr cdef object tmp, K, zshape, iaxis, dtype X = np.asarray(X) dtype = X.dtype # get filter coeffs b, a A = np.asarray(A, dtype=dtype) B = np.asarray(B, dtype=dtype) if (A.ndim != 1) or (B.ndim != 1): raise ValueError, 'a and b must be vectors' K = max(len(A)-1,len(B)-1) b = np.zeros(K+1, dtype=dtype) a = np.zeros(K+1, dtype=dtype) a[:A.shape[0]] = A[:] b[:B.shape[0]] = B[:] # normalize by a[0] tmp = a[0] a = a / tmp b = b / tmp # set up input signal if axis is None: X = X.ravel() elif type(axis) != int: raise ValueError, 'axis must be None or an integer >= 0' elif axis < 0: raise ValueError, 'axis must be None or an integer >= 0' x = X # set up output signal if inplace: y = X else: y = np.zeros(X.shape, dtype=dtype) # set up filter delay buffer iaxis = 0 if (axis is None) else axis if zi is None: # zi not specified, assume initial rest zshape = list(X.shape) zshape[iaxis] = K z = np.zeros(zshape, dtype=dtype) else: zi = np.asarray(zi, dtype=np.float64) # x and zi are both vectors if (x.ndim == 1) and (zi.ndim == 1): if (zi.shape[0] != K): raise ValueError, 'length of zi must be K' if inplace: z = zi else: z = zi.copy() # x is nd array, zi is vector # broadcast zi (cannot modify inplace) elif (x.ndim > 1) and (zi.ndim == 1): if (zi.shape[0] != K): raise ValueError, 'length of zi must be K' zshape = list(X.shape) zshape[iaxis] = K z = np.zeros(zshape, dtype=dtype) zshape = [1] * X.ndim zshape[iaxis] = K z = z + zi.reshape(zshape) # x and zi are both nd arrays else: zshape = list(X.shape) zshape[iaxis] = K zshape = tuple(zshape) if (zi.shape != zshape): raise ValueError, ('bad shape of zi: expected %r, got %r' % (zshape,zi.shape)) if inplace: z = zi else: z = zi.copy() # apply the linear filter ierr = _linear_filter(a, b, x, y, z, <int> iaxis, direction) # check for error conditions on return if ierr == -3: raise Exception, 'Unknown Python exception propagated via C++.' if ierr == -2: raise MemoryError, 'C++ exception trapped, possibly out of memory.' if ierr == -1: raise TypeError, ('dtype %r not supported' % (a.dtype)) # return y or (y, z) depending on zi # return (y if (zi is None) else (y, z)) ## Cython fails on this if (zi is None): return y else: return (y,z)

On Wed, Sep 23, 2009 at 3:10 PM, Sturla Molden <sturla@molden.no> wrote:
Sturla Molden skrev:
Well, I changed all the code to C++, to use the std::complex type, std::vector instead of malloc, and templates for specializing the filter functions to all dtypes supported by lfilter. I'm quite happy with the way the C++ looks :-)
I forgot to add support for the object dtype.
This also fixes a bug in the current signal.lfilter. This will crash the interpreter:
import numpy as np import scipy from scipy.signal import butter, lfilter b,a = butter(4, .25) x = np.random.randn(1000000).astype(object).reshape((1000,1000)) fx = lfilter(b,a,x, axis=1)
The C++ version here does not crash on this. :-)
I just ripped some code from current lfilter and cleaned it up slightly; I hope that is ok. (Cython was not happy about "cdef object *ptr", so I reluctantly had to put it all in C++.) The reason my code does not crash like current lfilter, is copy-in copy-out for strided vectors. It seems lfilter messes up the strides and then segfaults.
I am all for improving the lfilter code, here a few comments: - it is much more likely that your improvements will be included if you provide patches instead of rewrite of the full code - it makes reviewing much easier. - I would also prefer having C instead of C++ as well - in this case, C++ does not bring much since we have our "templating" system and you don't use the STL much. - In any case, please do not use exception, it is not portable. - you cannot use Py_ssize_t, as it is python 2.5 >= feature - there is nothing wrong with npy_intp, I don't understand your comment. - using cython is good - that could be a first patch, to replace the existing manual wrapping by cython. You can declare pointer without trouble in cython, you would have to be more explicit about the exact problem you had. cheers, David

David Cournapeau skrev:
- I would also prefer having C instead of C++ as well - in this case, C++ does not bring much since we have our "templating" system and you don't use the STL much.
It was mainly for complex numbers, since MSVC does not support ISO C.
- In any case, please do not use exception, it is not portable.
Ok, the STL containers like vector<> can throw exceptions like std::bad_alloc. That is why I did this.
- you cannot use Py_ssize_t, as it is python 2.5 >= feature - there is nothing wrong with npy_intp, I don't understand your comment.
Yes there is, cf. PEP 353. Using Py_intptr_t for indexing would depend on sizeof(void*) == sizeof(size_t), which the C standard does not mandate. It can differ on segment and offset architectures. Two examples are 16-bit x86 (cf. far and near pointers) and x86 with 36-bit PAE. It accidentally works for flat 32- and 64-bit address spaces. This is why Python uses Py_ssize_t instead of Py_intptr_t. And as it happens, npy_intp is typedef'ed to the latter. (This might be pedantic, but it is a formal error.)
- using cython is good - that could be a first patch, to replace the existing manual wrapping by cython. You can declare pointer without trouble in cython, No, you cannot create pointers to a variable declared object. This is illegal Cython:
cdef object *ptr # would simliar to PyObject **ptr in C So if we want to filter with dtype object, one could use Dag Sverre's numpy syntax and fake "cdef object *ptr" with "cdef np.ndarray[object] ptr", but it would not be efficient. We have to store two z-buffers and swap them for each filter step. The other option is to use "cdef PyObject**" instead of "cdef object *" in Cython, but then Cython will not do reference counting. S.M.

Hi, On Wednesday 23 September 2009 04:31:40 David Cournapeau wrote:
- it is much more likely that your improvements will be included if you provide patches instead of rewrite of the full code - it makes reviewing much easier.
In this case, I respectfully disagree; the full rewrite actually makes sense when comparing the previous code to the current one.
- I would also prefer having C instead of C++ as well - in this case, C++ does not bring much since we have our "templating" system and you don't use the STL much. - In any case, please do not use exception, it is not portable.
Are there any such compilers on which scipy can be compiled? The only compiler on which exceptions are still a problem (as far as I know) is OpenWatcom, but scipy cannot be compiled using that compiler anyway. For all major *nixes (Linux, Solaris, HPUX, AIX(!), *BSD), both the system C++ compiler and the major external compilers (usually gcc & icc) provide full exception support. On Windows, only versions upto MSVC6 have problems (but they cannot be used to compile python extensions). Even the proprietary ARM and MIPS compilers support exceptions. On the embedded side, the only compiler that I can think of that does not support real C++ is Keil, but is anyone running scipy on an 8051? Regards, Ravi

On Wed, Sep 23, 2009 at 9:29 PM, Ravi <lists_ravi@lavabit.com> wrote:
Hi,
On Wednesday 23 September 2009 04:31:40 David Cournapeau wrote:
- it is much more likely that your improvements will be included if you provide patches instead of rewrite of the full code - it makes reviewing much easier.
In this case, I respectfully disagree; the full rewrite actually makes sense when comparing the previous code to the current one.
It can be a full rewrite, but still should be sent as patches. If I am the one to review, I would prefer this way. That's especially important to track regressions.
- I would also prefer having C instead of C++ as well - in this case, C++ does not bring much since we have our "templating" system and you don't use the STL much. - In any case, please do not use exception, it is not portable.
Are there any such compilers on which scipy can be compiled?
It is a fundamental problem of C++. Different compilers do not propagate exceptions the same way, and that's a problem when different compilers are involved (happens easily when the C and C++ compilers are not the same, for example). That has been a problem on every new platform I have tried to port numpy and scipy to. That's the same rationale as why avoiding fortran runtime calls - if you only use fortran, it is ok, but once you use the fortran runtime and the C runtime in the same extension, you get random crashes which are impossible to debug. cheers, David

David Cournapeau skrev:
It can be a full rewrite, but still should be sent as patches. If I am the one to review, I would prefer this way. That's especially important to track regressions.
Well.. I didn't intend this for inclusion in SciPy, at least not in the beginning. I needed it for some neuroscience software I am writing.
It is a fundamental problem of C++. Different compilers do not propagate exceptions the same way, and that's a problem when different compilers are involved (happens easily when the C and C++ compilers are not the same, for example). That has been a problem on every new platform I have tried to port numpy and scipy to.
Does this mean we cannot use g++ to compile extensions at all, when Python VM is compiled with MSVC? Or does the rule just apply to the particular extension, like C and Fortran runtimes? I don't like the complexity of C++. But it has some advantages over C for scientific computing; notably STL containers, templates for generics, the std::complex<> type, and so on. The problem with C++ is that it encourages bloat ugly style, and OOP stuff is etter left in Python. I personally like exceptions because they remove the need for lot or error checking. In C, we can achieve almost the same effect using setjmp/longjmp. Is that bad style as well? Sturla Molden

On Wed, Sep 23, 2009 at 9:07 AM, Sturla Molden <sturla@molden.no> wrote:
David Cournapeau skrev:
It can be a full rewrite, but still should be sent as patches. If I am the one to review, I would prefer this way. That's especially important to track regressions.
Well.. I didn't intend this for inclusion in SciPy, at least not in the beginning. I needed it for some neuroscience software I am writing.
It is a fundamental problem of C++. Different compilers do not propagate exceptions the same way, and that's a problem when different compilers are involved (happens easily when the C and C++ compilers are not the same, for example). That has been a problem on every new platform I have tried to port numpy and scipy to.
Does this mean we cannot use g++ to compile extensions at all, when Python VM is compiled with MSVC? Or does the rule just apply to the particular extension, like C and Fortran runtimes?
I don't like the complexity of C++. But it has some advantages over C for scientific computing; notably STL containers, templates for generics, the std::complex<> type, and so on. The problem with C++ is that it encourages bloat ugly style, and OOP stuff is etter left in Python.
I personally like exceptions because they remove the need for lot or error checking. In C, we can achieve almost the same effect using setjmp/longjmp. Is that bad style as well?
The setjmp/longjmp machinery has been used (zeros.c, for instance), but I thinks it makes the C code less transparent and should be avoided unless the traditional use of return values is so ugly you can't bear to look at the code. C really doesn't have very good error handling, that may even be a feature in a low level language. It might be useful to think up some way that setjmp/longjmp could be used in some sort of standard error handling template. Chuck

Charles R Harris skrev:
It might be useful to think up some way that setjmp/longjmp could be used in some sort of standard error handling template.
One way is to use a resource pool (a la Apache), see Cython code below. cdef int _foobar() nogil: cdef mempool *mp cdef int rv mp = mempool_init() if (mp == NULL): return -1 # error else: some_C_function(mp) mempool_destroy(mp) return int(rv) def foobar(): if (_foobar() < 0): raise MemoryError, 'malloc failed' This alleviates any need for error checking on the return value from mempool_malloc. It might not be obvious though. What happens is that a NULL return form malloc trigger a longjmp back to mempool_init, after which the memory error is raised. Anything left allocated by the pool is freed on mempool_destroy, so it cannot leak. Pools can be used to manage any type of resource, e.g. open file handles, not just memory. The advantage is that you get C code that don't have to do error checking everywhere. The problem with Cython is that a longjmp can mess up refcounts, so it probably should only be used in pure C mode, i.e. in a function safe to call with nogil. C++ exceptions are much cleaner than C resource pools though, for example because destructors are called automatically. And the fact that C++ can put objects on the stack, and references can be used instead of pointers, means it is very safe against resource leaks if used correctly. The problems people have with C++ comes from bad style, for example calling new outside a constructor, using new[] instead of std::vector, using pointers instead of references (a reference cannot be dangling), etc. Sturla Molden # memory pool # Copyright (C) 2009 Sturla Molden import numpy as np cimport numpy cdef extern from "setjmp.h": ctypedef struct jmp_buf: pass int setjmp(jmp_buf jb) nogil void longjmp(jmp_buf jb, int errcode) nogil cdef extern from "stdlib.h": # Assume malloc, realloc and free are thread safe # We may need to change this, in which case a lock # is needed in the memory pool. ctypedef unsigned long size_t void free(void *ptr) nogil void *malloc(size_t size) nogil void *realloc(void *ptr, size_t size) nogil cdef struct mempool: mempool *prev mempool *next cdef public mempool *mempool_init() nogil: cdef jmp_buf *jb cdef mempool *p = <mempool *> malloc(sizeof(jmp_buf) + sizeof(mempool)) if not p: return NULL jb = <jmp_buf*>(<np.npy_intp>p + sizeof(mempool)) p.prev = NULL p.next = NULL if setjmp(jb[0]): # mempool_error has been called mempool_destroy(p) return NULL else: return p cdef public void mempool_error(mempool *p) nogil: # rewind stack back to mempool_init() cdef jmp_buf *jb = <jmp_buf*>(<np.npy_intp>p + sizeof(mempool)) longjmp(jb[0], 1) cdef public void *mempool_destroy(mempool *p) nogil: # this releases all memory allocated to this pool cdef mempool *tmp while p: tmp = p p = p.next free(tmp) cdef public void *mempool_malloc(mempool *p, size_t n) nogil: cdef mempool *block cdef void *m = malloc(sizeof(mempool) + n) if not m: mempool_error(p) block = <mempool*> m block.next = p.next if block.next: block.next.prev = block block.prev = p p.next = block return <void*>(<np.npy_intp>m + sizeof(mempool)) cdef public void *mempool_realloc(mempool *p, void *ptr, size_t n) nogil: cdef void *retval cdef mempool *block, *prev, *next, *new_addr if not n: # realloc(p, 0) is free(p) mempool_free(ptr) retval = NULL elif not ptr: # realloc(0, n) is malloc(n) retval = mempool_malloc(p,n) else: # realloc(p, n) block = <mempool *>(<np.npy_intp>ptr - sizeof(mempool)) prev = block.prev next = block.next new_addr = <mempool*> realloc(block, n + sizeof(mempool)) if not new_addr: mempool_error(p) if new_addr != block: prev.next = new_addr if next: next.prev = new_addr new_addr.prev = prev new_addr.next = next retval = <void *>(<np.npy_intp>new_addr + sizeof(mempool)) return retval cdef public void *mempool_free(void *ptr) nogil: cdef mempool *prev, *next, *cur cdef mempool *block = <mempool *>(<np.npy_intp>ptr - sizeof(mempool)) prev = block.prev next = block.next if next: free(block) next.prev = prev prev.next = next else: free(block) prev.next = NULL

Sturla Molden wrote:
The advantage is that you get C code that don't have to do error checking everywhere.
Really, this is a big disadvantage. In C, you have to check error codes. Anything which gives the illusion you can get away with it is wrong IMHO. Technically, setjmp/longjmp are also problematic for many reasons (often the same as C++ exceptions: it is difficult to get right). At least with goto you can often understand what's wrong - not so much with longjmp/setjmp - if the error handling is too complicated to handle with simple goto, then the code itself is too complicated. If you need to handle deeply nested errors, then using the C API for python exception is good enough. I agree we are not as good as we should in numpy for error handling at the C level, but I think it boils down to not having a unified error handling (that is -3 may mean could not allocate in one function, and could not convert the argument to something in another). But that can be solved with a unified set of error codes and associated strings + a simple log system (that's how sndfile does it for example, and I really like it). cheers, David

On Thu, Sep 24, 2009 at 12:07 AM, Sturla Molden <sturla@molden.no> wrote:
David Cournapeau skrev:
It can be a full rewrite, but still should be sent as patches. If I am the one to review, I would prefer this way. That's especially important to track regressions.
Well.. I didn't intend this for inclusion in SciPy, at least not in the beginning. I needed it for some neuroscience software I am writing.
It is a fundamental problem of C++. Different compilers do not propagate exceptions the same way, and that's a problem when different compilers are involved (happens easily when the C and C++ compilers are not the same, for example). That has been a problem on every new platform I have tried to port numpy and scipy to.
Does this mean we cannot use g++ to compile extensions at all, when Python VM is compiled with MSVC?
Oh, you certainly *can*, as we can use gcc to compile extensions against python built with MS compiler. But there are limitations - with C, those limitations are known, with C++, they become atrocious once you use the C++ runtime (RTTI, exception, etc...). I had to debug some stuff in sparsetools on windows 64 bits a few days ago, and this was horrible because the exceptions are lost between DLL.
I don't like the complexity of C++. But it has some advantages over C for scientific computing; notably STL containers, templates for generics, the std::complex<> type, and so on.
For basic templates, the .src stuff is good enough. For complex, yes, that's a bit of a pain, although you could use the C99 complex type if you only care about modern C compilers in your own extension.
I personally like exceptions because they remove the need for lot or error checking.
I call this a disadvantage :) And having exceptions in a language without gc causes more trouble than it worths IMHO. Exception-safe C++ is incredibly hard to do right.
In C, we can achieve almost the same effect using setjmp/longjmp. Is that bad style as well?
The standard way to handle errors in C code using the python API is goto. It works well, and if you need to jump several calls, you can always use python exception mechanism. cheers, David

David Cournapeau skrev:
Oh, you certainly *can*, as we can use gcc to compile extensions against python built with MS compiler. But there are limitations - with C, those limitations are known, with C++, they become atrocious once you use the C++ runtime (RTTI, exception, etc...). I had to debug some stuff in sparsetools on windows 64 bits a few days ago, and this was horrible because the exceptions are lost between DLL.
Mingw (g++) will statically link its own C++ runtime. DLLs compiled with g++ do not share C++ runtime. This obviously is a problem if you need to propagate an exception from one DLL to another. MS Visual C++ does not have this issue. You get the same issue in C if a DLL statically links its own C runtime, or two DLLs use different shared C runtimes.
I call this a disadvantage :) And having exceptions in a language without gc causes more trouble than it worths IMHO. Exception-safe C++ is incredibly hard to do right.
I'd say exceptipn safe C++ is easy to do right, but C++ textbooks don't teach you how. They generally teach you all the mistakes you can do. And if a book has 'best practices' or 'effective' in the title, it is likely full of BS. The main issue is to never -- ever -- open an allocated reource outside a constructor. That includes calling new or new[]. Always use std::vector instead of new[], unless you have an extremely good reason. Always deallocate resources in a destructor. Objects should always be put on the stack (unless allocated in a constructor), and references should be used instead of pointers. For example, look at wxWidgets, that has an additional Destroy() method in the GUI classes. Destroy() must be called manually. This is a classic C++ mistake. Destroy() is not called automatically, so an exception will mess this up. Sturla Molden
In C, we can achieve almost the same effect using setjmp/longjmp. Is that bad style as well?
The standard way to handle errors in C code using the python API is goto. It works well, and if you need to jump several calls, you can always use python exception mechanism.
cheers,
David _______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev

Sturla Molden wrote:
Mingw (g++) will statically link its own C++ runtime. DLLs compiled with g++ do not share C++ runtime. This obviously is a problem if you need to propagate an exception from one DLL to another. MS Visual C++ does not have this issue.
yes it does, when you mix C runtimes (/MD vs /MDd for example), different C++ code is compiled with different incompatible options, or when you mix mingw and msvc (Those are all concrete problems I had to solve on windows at one point or the other BTW). Those problems are not c++ specific, but they are much more manageable in C than in C++. Interoperability with C++ from other languages is a known problem - given that numpy supports so many compilers and platforms, and that using 'weird' compilers is common in scientific computing, I don't think it worths the hassle in most cases. cheers, David

Ralf Gommers skrev:
Sure, I know. The docs I linked have improvements over svn. If you merge your changes, there'll be a conflict and those docs will not get merged soon - I was hoping to avoid that.
Oh, the docs, not the code... Would something like this be better? The Cython module doesn't need to have docs in it. from linear_filter import lfilter as _lfilter def lfilter(b, a, x, axis=-1, zi=None, direction=1, inplace=-1): """Filter data along one-dimension with an IIR or FIR filter. Description Filter a data sequence, x, using a digital filter. This works for many fundamental data types (including Object type). The filter is a direct form II transposed implementation of the standard difference equation (see "Algorithm"). Inputs: b -- The numerator coefficient vector in a 1-D sequence. a -- The denominator coefficient vector in a 1-D sequence. If a[0] is not 1, then both a and b are normalized by a[0]. x -- An N-dimensional input array. axis -- The axis of the input data array along which to apply the linear filter. The filter is applied to each subarray along this axis. If axis is -1, the filter is applied to x flattened. (*Default* = -1) zi -- Initial conditions for the filter delays. It is a vector (or array of vectors for an N-dimensional input) of length max(len(a),len(b)). If zi=None or is not given then initial rest is assumed. If axis is None, zi should be a vector of length K = max(M,N), where M=len(b)-1 and N=len(a)-1. If an axis is not None, zi should either be a vector of length K, or an array with same shape as x, except for zi.shape[axis] which should be K. If zi is a vector, the same initial conditions are applied to each vector along the specified axis. Default: zi=None (initial rest). direction -- Filtered in the reverse direction if -1. (*Default* = 1) inplace -- If possible, modification to x is done implace. (*Default* = -1) Outputs: (y, {zf}) y -- The output of the digital filter. zf -- If zi is None, this is not returned, otherwise, zf holds the final filter delay values. Algorithm: The filter function is implemented as a direct II transposed structure. This means that the filter implements a[0]*y[n] = b[0]*x[n] + b[1]*x[n-1] + ... + b[nb]*x[n-nb] - a[1]*y[n-1] - ... - a[na]*y[n-na] using the following difference equations: y[m] = b[0]*x[m] + z[0,m-1] z[0,m] = b[1]*x[m] + z[1,m-1] - a[1]*y[m] ... z[n-3,m] = b[n-2]*x[m] + z[n-2,m-1] - a[n-2]*y[m] z[n-2,m] = b[n-1]*x[m] - a[n-1]*y[m] where m is the output sample number and n=max(len(a),len(b)) is the model order. The rational transfer function describing this filter in the z-transform domain is -1 -nb b[0] + b[1]z + ... + b[nb] z Y(z) = ---------------------------------- X(z) -1 -na a[0] + a[1]z + ... + a[na] z """ if isscalar(a): a = [a] if axis < 0: axis = None if inplace < 0: inplace = None return _lfilter(b, a, x, axis=axis, zi=zi, inplace=inplace, direction=direction) Regards, S.M.

On Tue, Sep 22, 2009 at 9:01 PM, Sturla Molden <sturla@molden.no> wrote:
Ralf Gommers skrev:
Sure, I know. The docs I linked have improvements over svn. If you merge your changes, there'll be a conflict and those docs will not get merged soon - I was hoping to avoid that.
Oh, the docs, not the code... Would something like this be better?
See attached for a new version which incorporates your changes, the latest wiki version, as well as some more edits. It fixes indentation, type descriptions and rest markup. I have one (very minor) comment on the options you added, "direction" and "inplace". I think these should both be bools, because they select between two options. The C++ code looks clean, I do not know enough about that language to say much more about it. Cheers, Ralf
The Cython module doesn't need to have docs in it.
from linear_filter import lfilter as _lfilter
def lfilter(b, a, x, axis=-1, zi=None, direction=1, inplace=-1): """Filter data along one-dimension with an IIR or FIR filter.
Description
Filter a data sequence, x, using a digital filter. This works for many fundamental data types (including Object type). The filter is a direct form II transposed implementation of the standard difference equation (see "Algorithm").
Inputs:
b -- The numerator coefficient vector in a 1-D sequence. a -- The denominator coefficient vector in a 1-D sequence. If a[0] is not 1, then both a and b are normalized by a[0]. x -- An N-dimensional input array. axis -- The axis of the input data array along which to apply the linear filter. The filter is applied to each subarray along this axis. If axis is -1, the filter is applied to x flattened. (*Default* = -1) zi -- Initial conditions for the filter delays. It is a vector (or array of vectors for an N-dimensional input) of length max(len(a),len(b)). If zi=None or is not given then initial rest is assumed. If axis is None, zi should be a vector of length K = max(M,N), where M=len(b)-1 and N=len(a)-1. If an axis is not None, zi should either be a vector of length K, or an array with same shape as x, except for zi.shape[axis] which should be K. If zi is a vector, the same initial conditions are applied to each vector along the specified axis. Default: zi=None (initial rest). direction -- Filtered in the reverse direction if -1. (*Default* = 1) inplace -- If possible, modification to x is done implace. (*Default* = -1)
Outputs: (y, {zf})
y -- The output of the digital filter. zf -- If zi is None, this is not returned, otherwise, zf holds the final filter delay values.
Algorithm:
The filter function is implemented as a direct II transposed structure. This means that the filter implements
a[0]*y[n] = b[0]*x[n] + b[1]*x[n-1] + ... + b[nb]*x[n-nb] - a[1]*y[n-1] - ... - a[na]*y[n-na]
using the following difference equations:
y[m] = b[0]*x[m] + z[0,m-1] z[0,m] = b[1]*x[m] + z[1,m-1] - a[1]*y[m] ... z[n-3,m] = b[n-2]*x[m] + z[n-2,m-1] - a[n-2]*y[m] z[n-2,m] = b[n-1]*x[m] - a[n-1]*y[m]
where m is the output sample number and n=max(len(a),len(b)) is the model order.
The rational transfer function describing this filter in the z-transform domain is -1 -nb b[0] + b[1]z + ... + b[nb] z Y(z) = ---------------------------------- X(z) -1 -na a[0] + a[1]z + ... + a[na] z
""" if isscalar(a): a = [a]
if axis < 0: axis = None if inplace < 0: inplace = None
return _lfilter(b, a, x, axis=axis, zi=zi, inplace=inplace, direction=direction)
Regards, S.M.
_______________________________________________ Scipy-dev mailing list Scipy-dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
participants (6)
-
Charles R Harris
-
David Cournapeau
-
David Cournapeau
-
Ralf Gommers
-
Ravi
-
Sturla Molden