Re: [SciPy-User] creating sparse indicator arrays

Nov. 29, 2011


      On Tue, Nov 29, 2011 at 2:50 PM,  <josef.pktd@gmail.com> wrote:
...
On Tue, Nov 29, 2011 at 2:07 PM, Nathaniel Smith <njs@pobox.com> wrote:
...
On Tue, Nov 29, 2011 at 7:14 AM,  <josef.pktd@gmail.com> wrote:
...
Is there a simple or fast way to create a sparse indicator array, `a`
below, without going through the dense matrix first?
The standard way is to use the LIL or DOK sparse formats. If you want
to use them then you'll have to do your construction "by hand", though
-- you can't do the nice broadcasting tricks you're using below.
Alternatively, constructing CSC or CSR format directly is not that
hard, though it may take some time to wrap your head around the
definitions...
...
...
...
...
from scipy import sparse
g = np.array([0, 0, 1, 1])   #categories, integers,
u = np.arange(2)    #unique's,  range(number_categories)
If 'u' is *always* going to be np.arange(number_categories), then
actually this is quite trivial (untested code):
data = np.ones(len(g), dtype=np.int8)
indices = g
indptr = np.arange(len(g))
a = np.csr_matrix((data, indices, indptr))
This works nicely  (only "sparse" namespace)
small correction:
indptr needs to be 1 longer or it drops the last row
(since I didn't RTFM, I don't know what it means, but it works)
...
...
...
g = np.array([0, 0, 1, 2, 1, 1, 2, 0])
data = np.ones(len(g), dtype=np.int8)
indptr = np.arange(len(g)+1)   #add 1
a = sparse.csr_matrix((data, g, indptr))
a.todense()
matrix([[1, 0, 0],
        [1, 0, 0],
        [0, 1, 0],
        [0, 0, 1],
        [0, 1, 0],
        [0, 1, 0],
        [0, 0, 1],
        [1, 0, 0]], dtype=int8)
np.all(a.todense() == (g[:,None] == np.arange(3)).astype(int))
True
Josef
...
u = np.arange(number_categories)  will be a code requirement
(group or period labels are consecutive ints)
...
This gives you a CSR matrix, which you can either use as is or convert to CSC.
If you want to build CSC directly, and want to support an arbitrary
'u' vector, then you could do something like (untested code):
data = np.ones(len(g), dtype=np.int8)
indices = np.empty(len(g), dtype=int)
write_offset = 0
indptr = np.empty(number_categories, dtype=int)
for col_i, category in enumerate(u):
 indptr[col_i] = write_offset
 rows = (data == category).nonzero()[0]
 indices[write_offset:write_offset + len(rows)] = rows
 write_offset += len(rows)
I still need to check this.
...
Or you could just use a loop that fills in an LIL matrix :-)
I'm playing with panel data or general error component models. The
main point of using sparse is to have a compact, non-loop version.
In some previous attempts at sparse the cost of constructing the array
with loops removed much of the advantage of using them, and I could
just loop in the algorithm directly.
Thanks,
Josef
...
-- Nathaniel
_______________________________________________
SciPy-User mailing list
SciPy-User@scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user

Re: [SciPy-User] creating sparse indicator arrays

josef.pktd＠gmail.com