Problem migrating PDL's index() into NumPy
Hello, being quite new to NumPy and having used previously PDL in Perl, I am currently migrating one of my PDL projects into NumPy. Most of the functions can be migrated without problems and there are functions in NumPy that allow me to do things in much clearer way than in PDL. However, I have a problem with the following operation: There are two 2D arrays with dimensions: A[10000,1000] and B[10000,100]. The first dimension of both arrays corresponds to a list of 10000 objects. The array A contains for each of 10000 objects 1000 integer values between 0 and 99, so that for each of 10000 objects a corresponding value can be found in the array B. I need a new array C[10000,1000] with values from B the following way: for x in range(10000): for y in range(1000): C[x,y] = B[x,A[x,y]] In Perl's PDL, this can be done with $C = $B->index($A) If in NumPy I do C = B[A], then I do not get a [10000,1000] 2D array, but rather a [10000,1000,1000] 3D array, in which I can find the correct values on the following positions: for x in range(10000): for y in range(1000): C[x,y,y] which may seem nice, but it needs 1000 times more memory and very probably 1000 times more time to calculate... Impossible with such large arrays... :-( Could anyone help me, please? Regards, Miroslav Sedivy
On Wed, Mar 17, 2010 at 7:12 AM, Miroslav Sedivy <miroslav.sedivy@weather-consult.com> wrote:
Hello,
being quite new to NumPy and having used previously PDL in Perl, I am currently migrating one of my PDL projects into NumPy.
Most of the functions can be migrated without problems and there are functions in NumPy that allow me to do things in much clearer way than in PDL. However, I have a problem with the following operation:
There are two 2D arrays with dimensions: A[10000,1000] and B[10000,100]. The first dimension of both arrays corresponds to a list of 10000 objects.
The array A contains for each of 10000 objects 1000 integer values between 0 and 99, so that for each of 10000 objects a corresponding value can be found in the array B.
I need a new array C[10000,1000] with values from B the following way:
for x in range(10000): for y in range(1000): C[x,y] = B[x,A[x,y]]
In Perl's PDL, this can be done with $C = $B->index($A)
If in NumPy I do C = B[A], then I do not get a [10000,1000] 2D array, but rather a [10000,1000,1000] 3D array, in which I can find the correct values on the following positions:
for x in range(10000): for y in range(1000): C[x,y,y]
which may seem nice, but it needs 1000 times more memory and very probably 1000 times more time to calculate... Impossible with such large arrays... :-(
Could anyone help me, please?
try C = B[:,A] or C = B[np.arange(1000)[:,None], A] I think, one of the two (or both) should work (but no time for trying it myself) Josef
Regards, Miroslav Sedivy
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
josef.pktd@gmail.com wrote:
On Wed, Mar 17, 2010 at 7:12 AM, Miroslav Sedivy wrote:
There are two 2D arrays with dimensions: A[10000,1000] and B[10000,100]. The first dimension of both arrays corresponds to a list of 10000 objects.
The array A contains for each of 10000 objects 1000 integer values between 0 and 99, so that for each of 10000 objects a corresponding value can be found in the array B.
I need a new array C[10000,1000] with values from B the following way:
for x in range(10000): for y in range(1000): C[x,y] = B[x,A[x,y]]
In Perl's PDL, this can be done with $C = $B->index($A)
If in NumPy I do C = B[A], then I do not get a [10000,1000] 2D array, but rather a [10000,1000,1000] 3D array, in which I can find the correct values on the following positions:
for x in range(10000): for y in range(1000): C[x,y,y]
which may seem nice, but it needs 1000 times more memory and very probably 1000 times more time to calculate... Impossible with such large arrays... :-(
Could anyone help me, please?
try C = B[:,A] or C = B[np.arange(1000)[:,None], A]
I think, one of the two (or both) should work (but no time for trying it myself) Josef
Thank you, Josef, for responding. None of them works correctly. The first one works only as B.T[:,A] and gives me the same _3D_ array as B[A].T The second one tells me: ValueError: shape mismatch: objects cannot be broadcast to a single shape Now I am using an iteration over all 10000 elements: C = np.empty_like(A) for i in range(10000): C[:,i] = B[:,i][A[:,i]] which works perfectly. Just it is a real pain seeing such a for-loop in the NumPy-World :-( Thanks, Miroslav
On Wed, Mar 17, 2010 at 9:36 AM, Miroslav Sedivy <miroslav.sedivy@weather-consult.com> wrote:
josef.pktd@gmail.com wrote:
On Wed, Mar 17, 2010 at 7:12 AM, Miroslav Sedivy wrote:
There are two 2D arrays with dimensions: A[10000,1000] and B[10000,100]. The first dimension of both arrays corresponds to a list of 10000 objects.
The array A contains for each of 10000 objects 1000 integer values between 0 and 99, so that for each of 10000 objects a corresponding value can be found in the array B.
I need a new array C[10000,1000] with values from B the following way:
for x in range(10000): for y in range(1000): C[x,y] = B[x,A[x,y]]
In Perl's PDL, this can be done with $C = $B->index($A)
If in NumPy I do C = B[A], then I do not get a [10000,1000] 2D array, but rather a [10000,1000,1000] 3D array, in which I can find the correct values on the following positions:
for x in range(10000): for y in range(1000): C[x,y,y]
which may seem nice, but it needs 1000 times more memory and very probably 1000 times more time to calculate... Impossible with such large arrays... :-(
Could anyone help me, please?
try C = B[:,A] or C = B[np.arange(1000)[:,None], A]
I think, one of the two (or both) should work (but no time for trying it myself) Josef
Thank you, Josef, for responding.
None of them works correctly. The first one works only as B.T[:,A] and gives me the same _3D_ array as B[A].T
The second one tells me: ValueError: shape mismatch: objects cannot be broadcast to a single shape
because you have 10000 rows not 1000 as in the example I typed Index arrays are broadcasted so they have to have matching shapes
n0 = 5 # number of rows B = np.ones((n0,3))*np.arange(3) A = np.random.randint(3,size=(n0,3)) C = B[np.arange(n0)[:,None],A] assert (A == C).all() A array([[2, 0, 1], [2, 0, 1], [2, 1, 2], [0, 0, 2], [2, 0, 0]]) C array([[ 2., 0., 1.], [ 2., 0., 1.], [ 2., 1., 2.], [ 0., 0., 2.], [ 2., 0., 0.]])
Josef
Now I am using an iteration over all 10000 elements:
C = np.empty_like(A) for i in range(10000): C[:,i] = B[:,i][A[:,i]]
which works perfectly. Just it is a real pain seeing such a for-loop in the NumPy-World :-(
Thanks, Miroslav
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
n0 = 5 # number of rows B = np.ones((n0,3))*np.arange(3) A = np.random.randint(3,size=(n0,3)) C = B[np.arange(n0)[:,None],A] assert (A == C).all() A array([[2, 0, 1], [2, 0, 1], [2, 1, 2], [0, 0, 2], [2, 0, 0]]) C array([[ 2., 0., 1.], [ 2., 0., 1.], [ 2., 1., 2.], [ 0., 0., 2.], [ 2., 0., 0.]])
Josef
Thank you, Josef, now it works! I had a problem with the shape of my arrays. When I transposed them correctly, your solution has worked! Miroslav
participants (2)
-
josef.pktd@gmail.com
-
Miroslav Sedivy