
Hi, I don't think there is a solution for this, but perhaps anybody may offer some idea. Given: In [79]:a=numpy.arange(9,-1,-1) In [80]:b=numpy.arange(10) In [81]:numpy.random.shuffle(b) In [82]:b Out[82]:array([2, 6, 3, 5, 4, 9, 0, 8, 7, 1]) In [83]:a=a[b] In [84]:a Out[84]:array([7, 3, 6, 4, 5, 0, 9, 1, 2, 8]) is there a way to make the step 83 without having to keep 3 arrays in-memory at the same time? This is, some way of doing fancy indexing, but changing the elements *inplace*. The idea is to keep memory requeriments as low as possible when a and b are large arrays. Thanks! -- Francesc Altet | Be careful about using the following code -- Carabos Coop. V. | I've only proven that it works, www.carabos.com | I haven't tested it. -- Donald Knuth

On 3/1/07, Francesc Altet <faltet@carabos.com> wrote:
Hi,
I don't think there is a solution for this, but perhaps anybody may offer some idea. Given:
In [79]:a=numpy.arange(9,-1,-1) In [80]:b=numpy.arange(10) In [81]:numpy.random.shuffle(b) In [82]:b Out[82]:array([2, 6, 3, 5, 4, 9, 0, 8, 7, 1]) In [83]:a=a[b] In [84]:a Out[84]:array([7, 3, 6, 4, 5, 0, 9, 1, 2, 8])
is there a way to make the step 83 without having to keep 3 arrays in-memory at the same time? This is, some way of doing fancy indexing, but changing the elements *inplace*. The idea is to keep memory requeriments as low as possible when a and b are large arrays.
Thanks!
I think that would be tough because of overlap between the two sides. The permutation could be factored into cycles which would mostly avoid that, but that is more theoretical than practical here. What is it you are trying to do? Chuck

On 3/1/07, Charles R Harris <charlesr.harris@gmail.com> wrote:
On 3/1/07, Francesc Altet <faltet@carabos.com> wrote:
Hi,
I don't think there is a solution for this, but perhaps anybody may offer some idea. Given:
In [79]:a=numpy.arange(9,-1,-1) In [80]:b=numpy.arange(10) In [81]:numpy.random.shuffle(b) In [82]:b Out[82]:array([2, 6, 3, 5, 4, 9, 0, 8, 7, 1]) In [83]:a=a[b] In [84]:a Out[84]:array([7, 3, 6, 4, 5, 0, 9, 1, 2, 8])
is there a way to make the step 83 without having to keep 3 arrays in-memory at the same time? This is, some way of doing fancy indexing, but changing the elements *inplace*. The idea is to keep memory requeriments as low as possible when a and b are large arrays.
Thanks!
I think that would be tough because of overlap between the two sides. The permutation could be factored into cycles which would mostly avoid that, but that is more theoretical than practical here. What is it you are trying to do?
Chuck

El dj 01 de 03 del 2007 a les 13:26 -0700, en/na Charles R Harris va escriure:
On 3/1/07, Francesc Altet <faltet@carabos.com> wrote: Hi,
I don't think there is a solution for this, but perhaps anybody may offer some idea. Given:
In [79]:a=numpy.arange(9,-1,-1) In [80]:b=numpy.arange(10) In [81]:numpy.random.shuffle(b) In [82]:b Out[82]:array([2, 6, 3, 5, 4, 9, 0, 8, 7, 1]) In [83]:a=a[b] In [84]:a Out[84]:array([7, 3, 6, 4, 5, 0, 9, 1, 2, 8])
is there a way to make the step 83 without having to keep 3 arrays in-memory at the same time? This is, some way of doing fancy indexing, but changing the elements *inplace*. The idea is to keep memory requeriments as low as possible when a and b are large arrays.
Thanks!
I think that would be tough because of overlap between the two sides. The permutation could be factored into cycles which would mostly avoid that, but that is more theoretical than practical here. What is it you are trying to do?
Yeah, the problem is the overlap. Well, what I'm trying to do is, given two arrays on-disk (say, block and block_idx), sort one of them, and then, re-order the other following with the same order than the first one. My best approach until now is: block = tmp_sorted[nslice] # read block from disk sblock_idx = block.argsort() block.sort() # do things with block... del block # get rid of block block_idx = tmp_indices[nslice] # read bock_idx from disk indices[nslice] = block_idx[sblock_idx] but the last line will take 3 times the memory that takes block_idx alone. My goal would be that the algorithm above would take only 2 times the memory of block_idx, but I don't think this is going to be possible. Thanks! -- Francesc Altet | Be careful about using the following code -- Carabos Coop. V. | I've only proven that it works, www.carabos.com | I haven't tested it. -- Donald Knuth

On 3/1/07, Francesc Altet <faltet@carabos.com> wrote:
El dj 01 de 03 del 2007 a les 13:26 -0700, en/na Charles R Harris va escriure:
On 3/1/07, Francesc Altet <faltet@carabos.com> wrote: Hi,
I don't think there is a solution for this, but perhaps anybody may offer some idea. Given:
In [79]:a=numpy.arange(9,-1,-1) In [80]:b=numpy.arange(10) In [81]:numpy.random.shuffle(b) In [82]:b Out[82]:array([2, 6, 3, 5, 4, 9, 0, 8, 7, 1]) In [83]:a=a[b] In [84]:a Out[84]:array([7, 3, 6, 4, 5, 0, 9, 1, 2, 8])
is there a way to make the step 83 without having to keep 3 arrays in-memory at the same time? This is, some way of doing fancy indexing, but changing the elements *inplace*. The idea is to keep memory requeriments as low as possible when a and b are large arrays.
Thanks!
I think that would be tough because of overlap between the two sides. The permutation could be factored into cycles which would mostly avoid that, but that is more theoretical than practical here. What is it you are trying to do?
Yeah, the problem is the overlap. Well, what I'm trying to do is, given two arrays on-disk (say, block and block_idx), sort one of them, and then, re-order the other following with the same order than the first one. My best approach until now is:
block = tmp_sorted[nslice] # read block from disk sblock_idx = block.argsort() block.sort() # do things with block... del block # get rid of block block_idx = tmp_indices[nslice] # read bock_idx from disk indices[nslice] = block_idx[sblock_idx]
but the last line will take 3 times the memory that takes block_idx alone. My goal would be that the algorithm above would take only 2 times the memory of block_idx, but I don't think this is going to be possible.
Argsort sorta does what you want, it is an indirect sort but the other array is a list of integers and at present, it doesn't exchange the key array. If you want a special case sort it wouldn't be too difficult to modify it to do what you want. If you want to operate on small pieces disk to disk, an indirect version of merge sort might be the way to go. Chuck

On 3/1/07, Francesc Altet <faltet@carabos.com> wrote:
Hi,
I don't think there is a solution for this, but perhaps anybody may offer some idea. Given:
In [79]:a=numpy.arange(9,-1,-1) In [80]:b=numpy.arange(10) In [81]:numpy.random.shuffle(b) In [82]:b Out[82]:array([2, 6, 3, 5, 4, 9, 0, 8, 7, 1]) In [83]:a=a[b] In [84]:a Out[84]:array([7, 3, 6, 4, 5, 0, 9, 1, 2, 8])
is there a way to make the step 83 without having to keep 3 arrays in-memory at the same time? This is, some way of doing fancy indexing, but changing the elements *inplace*. The idea is to keep memory requeriments as low as possible when a and b are large arrays.
Thanks!
You can also put the arrays together and implement it as an inplace sort, which will save space at the price of n*log(n) operations. The idea is to sort on the shuffled array while carrying the corresponding elements of the other array along in the exchanges, which I think you can now do using fields and the order keyword in the sort. Chuck

El dj 01 de 03 del 2007 a les 13:40 -0700, en/na Charles R Harris va escriure:
On 3/1/07, Francesc Altet <faltet@carabos.com> wrote: Hi,
I don't think there is a solution for this, but perhaps anybody may offer some idea. Given:
In [79]:a=numpy.arange(9,-1,-1) In [80]:b=numpy.arange(10) In [81]:numpy.random.shuffle(b) In [82]:b Out[82]:array([2, 6, 3, 5, 4, 9, 0, 8, 7, 1]) In [83]:a=a[b] In [84]:a Out[84]:array([7, 3, 6, 4, 5, 0, 9, 1, 2, 8])
is there a way to make the step 83 without having to keep 3 arrays in-memory at the same time? This is, some way of doing fancy indexing, but changing the elements *inplace*. The idea is to keep memory requeriments as low as possible when a and b are large arrays.
Thanks!
You can also put the arrays together and implement it as an inplace sort, which will save space at the price of n*log(n) operations. The idea is to sort on the shuffled array while carrying the corresponding elements of the other array along in the exchanges, which I think you can now do using fields and the order keyword in the sort.
Nice idea! I think your approach is going to work: In [18]:a=numpy.arange(9.,-1,-1) In [19]:b=numpy.arange(10) In [20]:numpy.random.shuffle(b) In [21]:c=numpy.rec.fromarrays([a,b], dtype='i4,i4') In [22]:c Out[22]: recarray([(9, 1), (8, 6), (7, 9), (6, 5), (5, 3), (4, 4), (3, 7), (2, 8), (1, 0), (0, 2)], dtype=[('f0', '<i4'), ('f1', '<i4')]) In [23]:c.sort(order='f0') In [24]:c Out[24]: recarray([(0, 2), (1, 0), (2, 8), (3, 7), (4, 4), (5, 3), (6, 5), (7, 9), (8, 6), (9, 1)], dtype=[('f0', '<i4'), ('f1', '<i4')]) In [25]:c['f0'] Out[25]:array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) In [26]:c['f1'] Out[26]:array([2, 0, 8, 7, 4, 3, 5, 9, 6, 1]) Tomorrow I'll do some timings and will check that memory consumption is lower than with my current approach, but my guts are having a good feel. Many thanks!
-- Francesc Altet | Be careful about using the following code -- Carabos Coop. V. | I've only proven that it works, www.carabos.com | I haven't tested it. -- Donald Knuth

On 01/03/07, Francesc Altet <faltet@carabos.com> wrote:
Hi,
I don't think there is a solution for this, but perhaps anybody may offer some idea. Given:
In [79]:a=numpy.arange(9,-1,-1) In [80]:b=numpy.arange(10) In [81]:numpy.random.shuffle(b) In [82]:b Out[82]:array([2, 6, 3, 5, 4, 9, 0, 8, 7, 1]) In [83]:a=a[b] In [84]:a Out[84]:array([7, 3, 6, 4, 5, 0, 9, 1, 2, 8])
is there a way to make the step 83 without having to keep 3 arrays in-memory at the same time? This is, some way of doing fancy indexing, but changing the elements *inplace*. The idea is to keep memory requeriments as low as possible when a and b are large arrays.
Thanks!
From the example you gave, though, it's likely that what you want is to apply a permutation (this turns up frequently with argsorts). That's possible in principle, though even in C I think you need to either maintain an array of auxiliary data or destroy the indexing array. The problem is that every element you move, you need to clear space for it. You can chase this around one cycle in the permutation with no difficulty, but as you move from cycle to cycle you need to keep track of which elements of the array have already been shuffled and which haven't. You can do this with bitfields, or you can adjust
In general, this is impossible: a=arange(3) b=a[[0,1,0,1,0]] b is bigger than a, so you can't hope to do this in place. the indexing array as you go. Since either of these effectively require O(n) extra storage, they don't have much appeal - except that you might want to permute just one axis of an array, in which case the extra storage. could be much smaller than the array itself. I've attached an implementation of this (in Python, but it might be useful anyway since the row copies are done by numpy). Anne
participants (3)
-
Anne Archibald
-
Charles R Harris
-
Francesc Altet