Which sparse matrix package?

eric eric at ericaro.net
Fri Dec 19 00:42:10 CET 2008

On Dec 18, 11:52 pm, MRAB <goo... at mrabarnett.plus.com> wrote:
> James Mills wrote:
> > On Fri, Dec 19, 2008 at 8:18 AM, Martin Manns <mma... at gmx.net> wrote:
> >> Hi:
> > Hi,
> >> I am writing a spreadsheet application in Python
> > What's wrong with pyspread ?
> > [ ... snip ... ]
> >> The dict that I tried out is of the type:
> >> {(1,2,3): "2323", (1,2,545): "2324234", ... }
> >> It is too slow for my application when it grows. One slicing operation
> >> with list comprehensions takes about 1/2 s on my computer for 1E6
> >> elements.
> > Let me get this straight.
> > It's taking 0.5s to slice your matrix
> > of 1E7 (10000000.0 rows/columns)
> > Are you mad ? This is TEN Millions and you
> > required it faster than 0.5s ?
> No, 1E6, or ONLY 1 million. :-)

what does Internet explorer 6 & 7 have to do with that ? ;-)

> > Am I missing something here ?
> Would a matrix of matrices work better, ie a supermatrix where each
> element is either a submatrix of cells or None if the submatrix is
> empty? How much memory it would save would depend on how sparse the
> matrix is.

sparse matrix efficiency is all about guessing the probability for a
cell to be empty. My guess is that the probability for a cell (i,j) to
be empty grows with norm(i,j)

then, you should store your data in a list, and the indexes in another
"list", then use a BTree like algorithm, or keep you data ordered by
norm(i,j) growing.

something like:

data = [] #
kernel =[] # full of couples (i,j)

def add(i,j,value):
    data.append(value) # or put in the "right place" to keep it sorted
    kernel.append(i,j) # same comment

def get(i,j):
    k = kernel_of(i,j)
    return data[k]
def set(i,j,value):
   if is_probably_empty():
      k = kernel_of(i,j)
      if k:
        data[k]= value
      else: add(i,j,value)


where the 'kernel_of method" can take advantage of the sorted order
(i*j or max(i,j) should be as good) to make a fast dichotomy.

The set method is about finding if the cell is 'empty or not' and then
either perform an indexof, or a add.
to efficiently know if a cell is empty (the is_probably_empty method)
or not, try to use the bloom filter using
i*j%m as hash function (for instance)

here arrays have the size of your actual data (maybe thousands of cell
for a huge spreadsheet), that's MUCH better than the size of the

if my memory is correct, there is room in numpy for your own sparse
matrix implementation. You should try it with your own sparse matrix


More information about the Python-list mailing list