Funny business with 'is' operator?
Hi, could someone help me understand why this assertion fails? def test_is(self): a = np.empty(1) b = np.empty(1) if a.data is not b.data: assert id(a.data) != id(b.data) # <-- fail I'm trying to write an alternate may_share_memory function. Thanks, - James
On Fri, Sep 6, 2013 at 9:19 AM, James Bergstra <bergstrj@iro.umontreal.ca>wrote:
def test_is(self): a = np.empty(1) b = np.empty(1) if a.data is not b.data: assert id(a.data) != id(b.data) # <-- fail
I'm not familiar with the internals, but: In [27]: a = np.empty(1) In [28]: a.data Out[28]: <read-write buffer for 0x122df30, size 8, offset 0 at 0x3501860> so .data is a buffer object, wrapped around a particular pointer. In [29]: id(a.data) Out[29]: 55581376 In [30]: id(a.data) Out[30]: 55581728 but it looks like the buffer object itself is created on the fly -- so you are getting a different pyton object, even though it wraps the same data pointer. So you need to compare the value of the pointer compared, not the identity of the buffer object. Not sure how to get that value though, other that parsing __repr__ In [44]: a.data.__repr__().split()[-1][:-1] Out[44]: '0x3512100' ugly hack -- there must be a better way! But: In [38]: a = np.zeros((100,)) In [39]: b = a[50:] In [40]: a.data Out[40]: <read-write buffer for 0x1443990, size 800, offset 0 at 0x3501da0> In [41]: b.data Out[41]: <read-write buffer for 0x1617130, size 400, offset 0 at 0x3501c20> a and b share memory, but don't have the same data pointer as b's is offset. HTH, -Chris
I'm trying to write an alternate may_share_memory function.
Thanks, - James
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Fri, Sep 6, 2013 at 10:19 AM, James Bergstra <bergstrj@iro.umontreal.ca>wrote:
Hi, could someone help me understand why this assertion fails?
def test_is(self): a = np.empty(1) b = np.empty(1) if a.data is not b.data: assert id(a.data) != id(b.data) # <-- fail
I'm trying to write an alternate may_share_memory function.
id seems not as advertised: In [22]: for i in range(10): print id(a.data) 66094640 66095792 66095792 66095792 66095792 66095792 66094640 66094640 66094640 66094640 Not sure why. Chuck
Thanks for the tips! FWIW my guess is that since '.data' is dynamically generated property rather than an attribute, it is being freed and re-allocated in the loop, and once for each of my id() expressions. On Fri, Sep 6, 2013 at 12:32 PM, Charles R Harris <charlesr.harris@gmail.com
wrote:
On Fri, Sep 6, 2013 at 10:19 AM, James Bergstra <bergstrj@iro.umontreal.ca
wrote:
Hi, could someone help me understand why this assertion fails?
def test_is(self): a = np.empty(1) b = np.empty(1) if a.data is not b.data: assert id(a.data) != id(b.data) # <-- fail
I'm trying to write an alternate may_share_memory function.
id seems not as advertised:
In [22]: for i in range(10): print id(a.data) 66094640 66095792 66095792 66095792 66095792 66095792 66094640 66094640 66094640 66094640
Not sure why.
Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I'm stumped. I can't figure out how to extract from e.g. view = A[:, 3] that the view starts at element 3 of A. I was planning to make a may_share_memory implementation based on the idea of swapping in a buffer of 0s, and using the shapes, strides, itemsize etc. to increment just the parts of the 0s buffer touched by the two ndarrays. If there are any 2s in the incremented buffer, it's because the two views overlapped. It's not the best algorithm for comparing tiny views of huge arrays, I was wondering if in my case it would have been quicker than the built-in method (I don't know how it works). I actually have another data structure around to pull out that shape and stride info, but it's a shame to use it, because then I can't use the algorithm to compare ndarrays in general (or at least ones that have been created by typical construction methods and slicing). On Fri, Sep 6, 2013 at 12:48 PM, James Bergstra <bergstrj@iro.umontreal.ca>wrote:
Thanks for the tips! FWIW my guess is that since '.data' is dynamically generated property rather than an attribute, it is being freed and re-allocated in the loop, and once for each of my id() expressions.
On Fri, Sep 6, 2013 at 12:32 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
On Fri, Sep 6, 2013 at 10:19 AM, James Bergstra < bergstrj@iro.umontreal.ca> wrote:
Hi, could someone help me understand why this assertion fails?
def test_is(self): a = np.empty(1) b = np.empty(1) if a.data is not b.data: assert id(a.data) != id(b.data) # <-- fail
I'm trying to write an alternate may_share_memory function.
id seems not as advertised:
In [22]: for i in range(10): print id(a.data) 66094640 66095792 66095792 66095792 66095792 66095792 66094640 66094640 66094640 66094640
Not sure why.
Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Fri, Sep 6, 2013 at 5:58 PM, James Bergstra <bergstrj@iro.umontreal.ca> wrote:
I'm stumped. I can't figure out how to extract from e.g.
view = A[:, 3]
that the view starts at element 3 of A. I was planning to make a
may_share_memory implementation based on the idea of swapping in a buffer of 0s, and using the shapes, strides, itemsize etc. to increment just the parts of the 0s buffer touched by the two ndarrays. If there are any 2s in the incremented buffer, it's because the two views overlapped. It's not the best algorithm for comparing tiny views of huge arrays, I was wondering if in my case it would have been quicker than the built-in method (I don't know how it works). It certainly won't be faster. may_share_memory() is very simplistic. It just checks if the outermost bounds of the each array overlap. Thus, it can give false positives for arrays that are interleaved but do not actually intersect. This is why it is named may_share_memory() instead of does_share_memory().
I actually have another data structure around to pull out that shape and stride info, but it's a shame to use it, because then I can't use the algorithm to compare ndarrays in general (or at least ones that have been created by typical construction methods and slicing).
All of the information you need is stored in the __array_interface__ attribute of an ndarray. The right way to solve this is to solve a bounded, linear Diophantine equation. That's where you should be looking if you want to crack this problem. -- Robert Kern
Thanks, this is exactly what I was looking for. I'll look into what this Diophantine equation is. Also, relatedly, a few months ago Julian Taylor at least wrote what was there in C, which made it faster, if not better. - James On Fri, Sep 6, 2013 at 1:27 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Fri, Sep 6, 2013 at 5:58 PM, James Bergstra <bergstrj@iro.umontreal.ca> wrote:
I'm stumped. I can't figure out how to extract from e.g.
view = A[:, 3]
that the view starts at element 3 of A. I was planning to make a
may_share_memory implementation based on the idea of swapping in a buffer of 0s, and using the shapes, strides, itemsize etc. to increment just the parts of the 0s buffer touched by the two ndarrays. If there are any 2s in the incremented buffer, it's because the two views overlapped. It's not the best algorithm for comparing tiny views of huge arrays, I was wondering if in my case it would have been quicker than the built-in method (I don't know how it works).
It certainly won't be faster. may_share_memory() is very simplistic. It just checks if the outermost bounds of the each array overlap. Thus, it can give false positives for arrays that are interleaved but do not actually intersect. This is why it is named may_share_memory() instead of does_share_memory().
I actually have another data structure around to pull out that shape and stride info, but it's a shame to use it, because then I can't use the algorithm to compare ndarrays in general (or at least ones that have been created by typical construction methods and slicing).
All of the information you need is stored in the __array_interface__ attribute of an ndarray.
The right way to solve this is to solve a bounded, linear Diophantine equation. That's where you should be looking if you want to crack this problem.
-- Robert Kern
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Fri, Sep 6, 2013 at 6:50 PM, James Bergstra <bergstrj@iro.umontreal.ca> wrote:
Thanks, this is exactly what I was looking for. I'll look into what this
Also, relatedly, a few months ago Julian Taylor at least wrote what was
Diophantine equation is. Let's say we have two arrays with shape tuples `shape0` and `shape1`, stride tuples `stride0` and `stride1` and memory offsets `offset0` and `offset1`. Without loss of generality, let's assume that itemsize=1 since it is trivial to convert the general case to itemsize=1. Now, if you will permit Einstein summation notation, you can generate the memory address of every item in the first array like so: index0[i]*stride0[i] + offset0 0 <= i < len(shape0) 0 <= index0[i] < shape0[i] There exists an overlap between the two arrays iff there exists two tuples `index0` and `index1` such that index0[i]*stride0[i] + offset0 = index1[j]*stride0[j] + offset1 0 <= i < len(shape0) 0 <= index0[i] < shape0[i] 0 <= j < len(shape1) 0 <= index1[j] < shape1[j] This is a bounded linear Diophantine equation. Diophantine because the variables `index0[i]` and `index1[j]` are integer-valued, linear because the variables only appear multiplied by a constant, and bounded because each variable must stay within the size of its corresponding axis. Unbounded linear Diophantine equations can be solved using an extended version of the Euclidean GCD algorithm. Bounded linear Diophantine equations are NP-<mumble> to solve. With the right heuristics, a branch-and-bound approach could probably work well. there in C, which made it faster, if not better. The main benefit of that was to make it available at the C level for other C-implemented routines, IIRC, not speed. -- Robert Kern
participants (4)
-
Charles R Harris
-
Chris Barker - NOAA Federal
-
James Bergstra
-
Robert Kern