[SciPy-user] Efficient submatrix of a sparse matrix

Ed Schofield schofield at ftw.at
Wed Sep 13 08:51:45 EDT 2006


Brendan Simons wrote:
> Hi all,
>
> I have a neat little problem for which I think sparse matrices are  
> the answer.   I need to store a bunch of overlapping "intervals", (a,  
> b pairs for which any a <= val < b crosses that interval) in a manner  
> which makes "stabbing inquiries" (which intervals cross a specified  
> value) easy.  The canonical way to to this is with interval trees (a  
> good summary here: http://w3.jouy.inra.fr/unites/miaj/public/vigneron/ 
> cs4235/l5cs4235.pdf), however I think I can simplify things as follows:
>
> 1)  perform an argsort on the interval a and b endpoints.  This gives  
> the position of each interval in an "order" matrix M, where each row  
> and each column contains only one interval, and intervals are sorted  
> in rows by start point, and in columns by endpoint
>
> 2) for a "stab" point x, do a binary search to find the row i and  
> column j where x could be inserted into M and maintain its ordering.   
> The submatrix of M[:i, j:]  would contains all those intervals which  
> start before x, and end after x.
>
> Since the order matrix M is mostly empty it would make sense to use a  
> sparse matrix storage scheme, however the only one in scipy.sparse  
> that allows 2d slicing is the dok_matrix, and the slicing algorithm  
> is O(n), which is much too expensive for my purposes (I have to do  
> thousands of stabbing inquiries on a space containing thousands of  
> intervals).
>
> Is there no more efficient algorithm for 2d slicing of a sparse matrix?
>   

Hi Brendan,

You should be able to obtain 2d slices of sparse matrices much more
efficiently than O(n) using the csc_matrix, csr_matrix, and lil_matrix
objects. Some of the code you'd need, such as the binary search, is
there already, although it deserves to be cleaned up and streamlined.
Extending it to allow 2d slices should be simple.

I don't have much time to work on it myself right now, but I'd be happy
to give you pointers and suggestions ... I'd gladly accept a patch!

-- Ed




More information about the SciPy-User mailing list