![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Tue, Nov 29, 2011 at 7:14 AM, <josef.pktd@gmail.com> wrote:
Is there a simple or fast way to create a sparse indicator array, `a` below, without going through the dense matrix first?
The standard way is to use the LIL or DOK sparse formats. If you want to use them then you'll have to do your construction "by hand", though -- you can't do the nice broadcasting tricks you're using below. Alternatively, constructing CSC or CSR format directly is not that hard, though it may take some time to wrap your head around the definitions...
from scipy import sparse g = np.array([0, 0, 1, 1]) #categories, integers, u = np.arange(2) #unique's, range(number_categories)
If 'u' is *always* going to be np.arange(number_categories), then actually this is quite trivial (untested code): data = np.ones(len(g), dtype=np.int8) indices = g indptr = np.arange(len(g)) a = np.csr_matrix((data, indices, indptr)) This gives you a CSR matrix, which you can either use as is or convert to CSC. If you want to build CSC directly, and want to support an arbitrary 'u' vector, then you could do something like (untested code): data = np.ones(len(g), dtype=np.int8) indices = np.empty(len(g), dtype=int) write_offset = 0 indptr = np.empty(number_categories, dtype=int) for col_i, category in enumerate(u): indptr[col_i] = write_offset rows = (data == category).nonzero()[0] indices[write_offset:write_offset + len(rows)] = rows write_offset += len(rows) Or you could just use a loop that fills in an LIL matrix :-) -- Nathaniel