[Numpy-discussion] matrix indexing

Val Kalatsky kalatsky at gmail.com
Wed Feb 8 01:25:58 EST 2012


Aronne made good suggestions.
Here is another weapon for your arsenal:
1) I assume that the shape of your array is irrelevant (reshape if needed)
2) Depending on the structure of your data np.unique can be handy:
arr_unique, idx = np.unique(arr1d, return_inverse=True)
then search arr_unique instead of arr1d.
3) Caveat: np.unique is a major memory hogger, be prepared to waste ~1GB.
Val

On Tue, Feb 7, 2012 at 10:34 PM, Aronne Merrelli
<aronne.merrelli at gmail.com>wrote:

>
>
> On Mon, Feb 6, 2012 at 11:44 AM, Naresh Pai <npai at uark.edu> wrote:
>
>> I have two large matrices, say, ABC and DEF, each with a shape of 7000 by
>> 4500. I have another list, say, elem, containing 850 values from ABC. I am
>> interested in finding out the corresponding values in DEF where ABC has
>> elem and store them *separately*. The code that I am using is:
>>
>> for i in range(len(elem)):
>>      DEF_distr = DEF[ABC==elem[i]]
>>
>> DEF_distr gets used for further processing before it gets cleared from
>> memory and the next round of the above loop begins. The loop above
>> currently takes about 20 minutes! I think the bottle neck is where elem is
>> getting searched repeatedly in ABC. So I am looking for a solution where
>> all elem can get processed in a single call and the indices of ABC be
>> stored in another variable (separately). I would appreciate if you suggest
>> any faster method for getting DEF_distr.
>>
>>
> You'll need to mention some details about the contents of ABC/DEF in order
> to get the best answer (what range of values, do they have a certain
> structure, etc). I made the assumption that ABC and elem have integers (I'm
> not sure it makes sense to search for ABC==elem[n] unless they are both
> integers), and then used a sort followed by searchsorted. This has a side
> effect of reordering the elements in DEF_distr. I don't know if that
> matters. You can skip the .copy() calls if you don't care that ABC/DEF are
> sorted.
>
>     ABC_1D = ABC.copy().ravel()
>     ABC_1D_sorter = np.argsort(ABC_1D)
>     ABC_1D = ABC_1D[ABC_1D_sorter]
>     DEF_1D = DEF.copy().ravel()
>     DEF_1D = DEF_1D[ABC_1D_sorter]
>     ind1 = np.searchsorted(ABC_1D, elem, side='left')
>     ind2 = np.searchsorted(ABC_1D, elem, side='right')
>     DEF_distr = []
>     for n in range(len(elem)):
>         DEF_distr.append( DEF_1D[ind1[n]:ind2[n]] )
>
>
> I tried this on the big memory workstation, and for the 7Kx4K size I get
> about 100 seconds for the simple method and 10 seconds for this more
> complicated sort-based method - if you are getting 20 minutes for that,
> maybe there is a memory problem, or a different part of the code that is
> the bottleneck?
>
> Hope that helps,
> Aronne
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120208/69681508/attachment.html>


More information about the NumPy-Discussion mailing list