convert strides/shape/offset into nd index?
Hi all, I'm curious as to what the most straightforward way is to convert an offset into a memory buffer representing an arbitrarily strided array into the nd index into that array. (Let's assume for simplicity that each element is one byte...) Does sorting the strides from largest to smallest and then using integer division and mod (in the obvious way) always work? (I can't seem to find a counterexample, but I am not used to thinking too deeply about bizarrely-strided configurations.) Is there a method that doesn't involve sorting? Thanks, Zach
Your question involves a few concepts:
- an integer vector describing the position of an element
- the logical shape (another int vector)
- the physical strides (another int vector)
Ignoring the case of negative offsets, a physical offset is the inner
product of the physical strides with the position vector.
In these terms, you are asking how to solve the inner-product equation
for the position vector. There can be many possible solutions (like,
if there is a stride of 1, then you can make that dimension account
for the entire offset. This is often not the solution you want.).
For valid ndarrays though, there is at most one solution though with
the property that every position element is less than the shape.
You will also need to take into account that for certain stride
vectors, there is no way to get certain offsets. Imagine all the
strides were even, and you needed to get at an odd offset... it would
be impossible. It would even be impossible if there were a dimension
with stride 1 but it had shape of 1 too.
I can't think of an algorithm off the top of my head that would do
this in a quick and elegant way.
James
On Sun, Nov 29, 2009 at 10:36 AM, Zachary Pincus
Hi all,
I'm curious as to what the most straightforward way is to convert an offset into a memory buffer representing an arbitrarily strided array into the nd index into that array. (Let's assume for simplicity that each element is one byte...)
Does sorting the strides from largest to smallest and then using integer division and mod (in the obvious way) always work? (I can't seem to find a counterexample, but I am not used to thinking too deeply about bizarrely-strided configurations.) Is there a method that doesn't involve sorting?
Thanks, Zach _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
2009/11/30 James Bergstra
Your question involves a few concepts:
- an integer vector describing the position of an element
- the logical shape (another int vector)
- the physical strides (another int vector)
Ignoring the case of negative offsets, a physical offset is the inner product of the physical strides with the position vector.
In these terms, you are asking how to solve the inner-product equation for the position vector. There can be many possible solutions (like, if there is a stride of 1, then you can make that dimension account for the entire offset. This is often not the solution you want.). For valid ndarrays though, there is at most one solution though with the property that every position element is less than the shape.
You will also need to take into account that for certain stride vectors, there is no way to get certain offsets. Imagine all the strides were even, and you needed to get at an odd offset... it would be impossible. It would even be impossible if there were a dimension with stride 1 but it had shape of 1 too.
I can't think of an algorithm off the top of my head that would do this in a quick and elegant way.
Not to be a downer, but this problem is technically NP-complete. The so-called "knapsack problem" is to find a subset of a collection of numbers that adds up to the specified number, and it is NP-complete. Unfortunately, it is exactly what you need to do to find the indices to a particular memory location in an array of shape (2,2,...,2). What that means in practice is that either you have to allow potentially very slow algorithms (though you know that there will never be more than 32 different values in the knapsack, which might or might not be enough to keep things tractable) or use heuristic algorithms that don't always work. There are probably fairly good heuristics, particularly if the array elements are all at distinct memory locations (arrays with overlapping elements can arise from broadcasting and other slightly more arcane operations). Anne
James
On Sun, Nov 29, 2009 at 10:36 AM, Zachary Pincus
wrote: Hi all,
I'm curious as to what the most straightforward way is to convert an offset into a memory buffer representing an arbitrarily strided array into the nd index into that array. (Let's assume for simplicity that each element is one byte...)
Does sorting the strides from largest to smallest and then using integer division and mod (in the obvious way) always work? (I can't seem to find a counterexample, but I am not used to thinking too deeply about bizarrely-strided configurations.) Is there a method that doesn't involve sorting?
Thanks, Zach _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- http://www-etud.iro.umontreal.ca/~bergstrj _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Anne Archibald wrote:
2009/11/30 James Bergstra
: Your question involves a few concepts:
- an integer vector describing the position of an element
- the logical shape (another int vector)
- the physical strides (another int vector)
Ignoring the case of negative offsets, a physical offset is the inner product of the physical strides with the position vector.
In these terms, you are asking how to solve the inner-product equation for the position vector. There can be many possible solutions (like, if there is a stride of 1, then you can make that dimension account for the entire offset. This is often not the solution you want.). For valid ndarrays though, there is at most one solution though with the property that every position element is less than the shape.
You will also need to take into account that for certain stride vectors, there is no way to get certain offsets. Imagine all the strides were even, and you needed to get at an odd offset... it would be impossible. It would even be impossible if there were a dimension with stride 1 but it had shape of 1 too.
I can't think of an algorithm off the top of my head that would do this in a quick and elegant way.
Not to be a downer, but this problem is technically NP-complete. The so-called "knapsack problem" is to find a subset of a collection of numbers that adds up to the specified number, and it is NP-complete. Unfortunately, it is exactly what you need to do to find the indices to a particular memory location in an array of shape (2,2,...,2).
What that means in practice is that either you have to allow potentially very slow algorithms (though you know that there will never be more than 32 different values in the knapsack, which might or might not be enough to keep things tractable) or use heuristic algorithms that don't always work. There are probably fairly good heuristics, particularly if the array elements are all at distinct memory locations (arrays with overlapping elements can arise from broadcasting and other slightly more arcane operations).
Not that this should be done, but getting a chance to discuss NP is always fun: I think this particular problem can be solved in O(d*n^2) or better, where n is the offset in question and d the number of dimensions of the array, by using dynamic programming on the buffer offset in question (so first try for offset 1, then 2, and so on up to n). Which doesn't contradict the fact that the problem is exponential (n is exponential in terms of the length of the input to the problem), but it is still not *too* bad in many cases, because the exponential term is always smaller than the size of the array. Dag Sverre
Dag Sverre Seljebotn wrote:
Anne Archibald wrote:
2009/11/30 James Bergstra
: Your question involves a few concepts:
- an integer vector describing the position of an element
- the logical shape (another int vector)
- the physical strides (another int vector)
Ignoring the case of negative offsets, a physical offset is the inner product of the physical strides with the position vector.
In these terms, you are asking how to solve the inner-product equation for the position vector. There can be many possible solutions (like, if there is a stride of 1, then you can make that dimension account for the entire offset. This is often not the solution you want.). For valid ndarrays though, there is at most one solution though with the property that every position element is less than the shape.
You will also need to take into account that for certain stride vectors, there is no way to get certain offsets. Imagine all the strides were even, and you needed to get at an odd offset... it would be impossible. It would even be impossible if there were a dimension with stride 1 but it had shape of 1 too.
I can't think of an algorithm off the top of my head that would do this in a quick and elegant way.
Not to be a downer, but this problem is technically NP-complete. The so-called "knapsack problem" is to find a subset of a collection of numbers that adds up to the specified number, and it is NP-complete. Unfortunately, it is exactly what you need to do to find the indices to a particular memory location in an array of shape (2,2,...,2).
What that means in practice is that either you have to allow potentially very slow algorithms (though you know that there will never be more than 32 different values in the knapsack, which might or might not be enough to keep things tractable) or use heuristic algorithms that don't always work. There are probably fairly good heuristics, particularly if the array elements are all at distinct memory locations (arrays with overlapping elements can arise from broadcasting and other slightly more arcane operations).
Not that this should be done, but getting a chance to discuss NP is always fun:
I think this particular problem can be solved in O(d*n^2) or better,
Hmm, I guess that should be O(d*n). http://en.wikipedia.org/wiki/Knapsack_problem has the exact algorithm (though it needs some customization). Dag Sverre
where n is the offset in question and d the number of dimensions of the array, by using dynamic programming on the buffer offset in question (so first try for offset 1, then 2, and so on up to n).
Which doesn't contradict the fact that the problem is exponential (n is exponential in terms of the length of the input to the problem), but it is still not *too* bad in many cases, because the exponential term is always smaller than the size of the array.
Dag Sverre _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Not to be a downer, but this problem is technically NP-complete. The so-called "knapsack problem" is to find a subset of a collection of numbers that adds up to the specified number, and it is NP-complete. Unfortunately, it is exactly what you need to do to find the indices to a particular memory location in an array of shape (2,2,...,2).
Ha ha, right -- that is the knapsack problem isn't it. Oh well... I'll just require fortran- or C-style strided arrays, for which case it is easy to unravel offsets into indices. Thanks everyone! Zach
participants (4)
-
Anne Archibald
-
Dag Sverre Seljebotn
-
James Bergstra
-
Zachary Pincus