[Numpy-discussion] matrix indexing

Aronne Merrelli aronne.merrelli at gmail.com
Tue Feb 7 23:34:26 EST 2012


On Mon, Feb 6, 2012 at 11:44 AM, Naresh Pai <npai at uark.edu> wrote:

> I have two large matrices, say, ABC and DEF, each with a shape of 7000 by
> 4500. I have another list, say, elem, containing 850 values from ABC. I am
> interested in finding out the corresponding values in DEF where ABC has
> elem and store them *separately*. The code that I am using is:
>
> for i in range(len(elem)):
>      DEF_distr = DEF[ABC==elem[i]]
>
> DEF_distr gets used for further processing before it gets cleared from
> memory and the next round of the above loop begins. The loop above
> currently takes about 20 minutes! I think the bottle neck is where elem is
> getting searched repeatedly in ABC. So I am looking for a solution where
> all elem can get processed in a single call and the indices of ABC be
> stored in another variable (separately). I would appreciate if you suggest
> any faster method for getting DEF_distr.
>
>
You'll need to mention some details about the contents of ABC/DEF in order
to get the best answer (what range of values, do they have a certain
structure, etc). I made the assumption that ABC and elem have integers (I'm
not sure it makes sense to search for ABC==elem[n] unless they are both
integers), and then used a sort followed by searchsorted. This has a side
effect of reordering the elements in DEF_distr. I don't know if that
matters. You can skip the .copy() calls if you don't care that ABC/DEF are
sorted.

    ABC_1D = ABC.copy().ravel()
    ABC_1D_sorter = np.argsort(ABC_1D)
    ABC_1D = ABC_1D[ABC_1D_sorter]
    DEF_1D = DEF.copy().ravel()
    DEF_1D = DEF_1D[ABC_1D_sorter]
    ind1 = np.searchsorted(ABC_1D, elem, side='left')
    ind2 = np.searchsorted(ABC_1D, elem, side='right')
    DEF_distr = []
    for n in range(len(elem)):
        DEF_distr.append( DEF_1D[ind1[n]:ind2[n]] )


I tried this on the big memory workstation, and for the 7Kx4K size I get
about 100 seconds for the simple method and 10 seconds for this more
complicated sort-based method - if you are getting 20 minutes for that,
maybe there is a memory problem, or a different part of the code that is
the bottleneck?

Hope that helps,
Aronne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120207/5cb18588/attachment.html>


More information about the NumPy-Discussion mailing list