convert strides/shape/offset into nd index?
Hi all, I'm curious as to what the most straightforward way is to convert an offset into a memory buffer representing an arbitrarily strided array into the nd index into that array. (Let's assume for simplicity that each element is one byte...) Does sorting the strides from largest to smallest and then using integer division and mod (in the obvious way) always work? (I can't seem to find a counterexample, but I am not used to thinking too deeply about bizarrelystrided configurations.) Is there a method that doesn't involve sorting? Thanks, Zach
Your question involves a few concepts:  an integer vector describing the position of an element  the logical shape (another int vector)  the physical strides (another int vector) Ignoring the case of negative offsets, a physical offset is the inner product of the physical strides with the position vector. In these terms, you are asking how to solve the innerproduct equation for the position vector. There can be many possible solutions (like, if there is a stride of 1, then you can make that dimension account for the entire offset. This is often not the solution you want.). For valid ndarrays though, there is at most one solution though with the property that every position element is less than the shape. You will also need to take into account that for certain stride vectors, there is no way to get certain offsets. Imagine all the strides were even, and you needed to get at an odd offset... it would be impossible. It would even be impossible if there were a dimension with stride 1 but it had shape of 1 too. I can't think of an algorithm off the top of my head that would do this in a quick and elegant way. James On Sun, Nov 29, 2009 at 10:36 AM, Zachary Pincus <zachary.pincus@yale.edu> wrote:
Hi all,
I'm curious as to what the most straightforward way is to convert an offset into a memory buffer representing an arbitrarily strided array into the nd index into that array. (Let's assume for simplicity that each element is one byte...)
Does sorting the strides from largest to smallest and then using integer division and mod (in the obvious way) always work? (I can't seem to find a counterexample, but I am not used to thinking too deeply about bizarrelystrided configurations.) Is there a method that doesn't involve sorting?
Thanks, Zach _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
2009/11/30 James Bergstra <bergstrj@iro.umontreal.ca>:
Your question involves a few concepts:
 an integer vector describing the position of an element
 the logical shape (another int vector)
 the physical strides (another int vector)
Ignoring the case of negative offsets, a physical offset is the inner product of the physical strides with the position vector.
In these terms, you are asking how to solve the innerproduct equation for the position vector. There can be many possible solutions (like, if there is a stride of 1, then you can make that dimension account for the entire offset. This is often not the solution you want.). For valid ndarrays though, there is at most one solution though with the property that every position element is less than the shape.
You will also need to take into account that for certain stride vectors, there is no way to get certain offsets. Imagine all the strides were even, and you needed to get at an odd offset... it would be impossible. It would even be impossible if there were a dimension with stride 1 but it had shape of 1 too.
I can't think of an algorithm off the top of my head that would do this in a quick and elegant way.
Not to be a downer, but this problem is technically NPcomplete. The socalled "knapsack problem" is to find a subset of a collection of numbers that adds up to the specified number, and it is NPcomplete. Unfortunately, it is exactly what you need to do to find the indices to a particular memory location in an array of shape (2,2,...,2). What that means in practice is that either you have to allow potentially very slow algorithms (though you know that there will never be more than 32 different values in the knapsack, which might or might not be enough to keep things tractable) or use heuristic algorithms that don't always work. There are probably fairly good heuristics, particularly if the array elements are all at distinct memory locations (arrays with overlapping elements can arise from broadcasting and other slightly more arcane operations). Anne
James
On Sun, Nov 29, 2009 at 10:36 AM, Zachary Pincus <zachary.pincus@yale.edu> wrote:
Hi all,
I'm curious as to what the most straightforward way is to convert an offset into a memory buffer representing an arbitrarily strided array into the nd index into that array. (Let's assume for simplicity that each element is one byte...)
Does sorting the strides from largest to smallest and then using integer division and mod (in the obvious way) always work? (I can't seem to find a counterexample, but I am not used to thinking too deeply about bizarrelystrided configurations.) Is there a method that doesn't involve sorting?
Thanks, Zach _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
 http://wwwetud.iro.umontreal.ca/~bergstrj _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
Anne Archibald wrote:
2009/11/30 James Bergstra <bergstrj@iro.umontreal.ca>:
Your question involves a few concepts:
 an integer vector describing the position of an element
 the logical shape (another int vector)
 the physical strides (another int vector)
Ignoring the case of negative offsets, a physical offset is the inner product of the physical strides with the position vector.
In these terms, you are asking how to solve the innerproduct equation for the position vector. There can be many possible solutions (like, if there is a stride of 1, then you can make that dimension account for the entire offset. This is often not the solution you want.). For valid ndarrays though, there is at most one solution though with the property that every position element is less than the shape.
You will also need to take into account that for certain stride vectors, there is no way to get certain offsets. Imagine all the strides were even, and you needed to get at an odd offset... it would be impossible. It would even be impossible if there were a dimension with stride 1 but it had shape of 1 too.
I can't think of an algorithm off the top of my head that would do this in a quick and elegant way.
Not to be a downer, but this problem is technically NPcomplete. The socalled "knapsack problem" is to find a subset of a collection of numbers that adds up to the specified number, and it is NPcomplete. Unfortunately, it is exactly what you need to do to find the indices to a particular memory location in an array of shape (2,2,...,2).
What that means in practice is that either you have to allow potentially very slow algorithms (though you know that there will never be more than 32 different values in the knapsack, which might or might not be enough to keep things tractable) or use heuristic algorithms that don't always work. There are probably fairly good heuristics, particularly if the array elements are all at distinct memory locations (arrays with overlapping elements can arise from broadcasting and other slightly more arcane operations).
Not that this should be done, but getting a chance to discuss NP is always fun: I think this particular problem can be solved in O(d*n^2) or better, where n is the offset in question and d the number of dimensions of the array, by using dynamic programming on the buffer offset in question (so first try for offset 1, then 2, and so on up to n). Which doesn't contradict the fact that the problem is exponential (n is exponential in terms of the length of the input to the problem), but it is still not *too* bad in many cases, because the exponential term is always smaller than the size of the array. Dag Sverre
Dag Sverre Seljebotn wrote:
Anne Archibald wrote:
2009/11/30 James Bergstra <bergstrj@iro.umontreal.ca>:
Your question involves a few concepts:
 an integer vector describing the position of an element
 the logical shape (another int vector)
 the physical strides (another int vector)
Ignoring the case of negative offsets, a physical offset is the inner product of the physical strides with the position vector.
In these terms, you are asking how to solve the innerproduct equation for the position vector. There can be many possible solutions (like, if there is a stride of 1, then you can make that dimension account for the entire offset. This is often not the solution you want.). For valid ndarrays though, there is at most one solution though with the property that every position element is less than the shape.
You will also need to take into account that for certain stride vectors, there is no way to get certain offsets. Imagine all the strides were even, and you needed to get at an odd offset... it would be impossible. It would even be impossible if there were a dimension with stride 1 but it had shape of 1 too.
I can't think of an algorithm off the top of my head that would do this in a quick and elegant way.
Not to be a downer, but this problem is technically NPcomplete. The socalled "knapsack problem" is to find a subset of a collection of numbers that adds up to the specified number, and it is NPcomplete. Unfortunately, it is exactly what you need to do to find the indices to a particular memory location in an array of shape (2,2,...,2).
What that means in practice is that either you have to allow potentially very slow algorithms (though you know that there will never be more than 32 different values in the knapsack, which might or might not be enough to keep things tractable) or use heuristic algorithms that don't always work. There are probably fairly good heuristics, particularly if the array elements are all at distinct memory locations (arrays with overlapping elements can arise from broadcasting and other slightly more arcane operations).
Not that this should be done, but getting a chance to discuss NP is always fun:
I think this particular problem can be solved in O(d*n^2) or better,
Hmm, I guess that should be O(d*n). http://en.wikipedia.org/wiki/Knapsack_problem has the exact algorithm (though it needs some customization). Dag Sverre
where n is the offset in question and d the number of dimensions of the array, by using dynamic programming on the buffer offset in question (so first try for offset 1, then 2, and so on up to n).
Which doesn't contradict the fact that the problem is exponential (n is exponential in terms of the length of the input to the problem), but it is still not *too* bad in many cases, because the exponential term is always smaller than the size of the array.
Dag Sverre _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
Not to be a downer, but this problem is technically NPcomplete. The socalled "knapsack problem" is to find a subset of a collection of numbers that adds up to the specified number, and it is NPcomplete. Unfortunately, it is exactly what you need to do to find the indices to a particular memory location in an array of shape (2,2,...,2).
Ha ha, right  that is the knapsack problem isn't it. Oh well... I'll just require fortran or Cstyle strided arrays, for which case it is easy to unravel offsets into indices. Thanks everyone! Zach
participants (4)

Anne Archibald

Dag Sverre Seljebotn

James Bergstra

Zachary Pincus