np.unique with structured arrays
Hello, I've found a strange behavior or I'm missing something obvious (or np.unique is not supposed to work with structured arrays). I'm trying to extract unique values from a simple structured array but it does not seem to work as expected. Here is a minimal script showing the problem: import numpy as np V = np.zeros(4, dtype=[("v", np.float32, 3)]) V["v"] = [ [0.5, 0.0, 1.0], [0.5, -1.e-16, 1.0], # [0.5, +1.e-16, 1.0] works [0.5, 0.0, -1.0], [0.5, -1.e-16, -1.0]] # [0.5, +1.e-16, -1.0]] works V_ = np.zeros_like(V) V_["v"][:,0] = V["v"][:,0].round(decimals=3) V_["v"][:,1] = V["v"][:,1].round(decimals=3) V_["v"][:,2] = V["v"][:,2].round(decimals=3) print np.unique(V_) [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],) ([0.5, -0.0, 1.0],) ([0.5, -0.0, -1.0],)] While I would have expected: [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],)] Can anyone confirm ? Nicolas
I can confirm, the issue seems to be in sorting:
np.sort(V_) array([([0.5, 0.0, 1.0],), ([0.5, 0.0, -1.0],), ([0.5, -0.0, 1.0],), ([0.5, -0.0, -1.0],)], dtype=[('v', '
These I think are handled by the generic sort functions, and it looks like the comparison function being used is the one for a VOID dtype with no fields, so it is being done byte-wise, hence the problems with 0.0 and -0.0. Not sure where exactly the bug is, though... Jaime On Fri, Aug 22, 2014 at 6:20 AM, Nicolas P. Rougier < Nicolas.Rougier@inria.fr> wrote:
Hello,
I've found a strange behavior or I'm missing something obvious (or np.unique is not supposed to work with structured arrays).
I'm trying to extract unique values from a simple structured array but it does not seem to work as expected. Here is a minimal script showing the problem:
import numpy as np
V = np.zeros(4, dtype=[("v", np.float32, 3)]) V["v"] = [ [0.5, 0.0, 1.0], [0.5, -1.e-16, 1.0], # [0.5, +1.e-16, 1.0] works [0.5, 0.0, -1.0], [0.5, -1.e-16, -1.0]] # [0.5, +1.e-16, -1.0]] works V_ = np.zeros_like(V) V_["v"][:,0] = V["v"][:,0].round(decimals=3) V_["v"][:,1] = V["v"][:,1].round(decimals=3) V_["v"][:,2] = V["v"][:,2].round(decimals=3)
print np.unique(V_) [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],) ([0.5, -0.0, 1.0],) ([0.5, -0.0, -1.0],)]
While I would have expected:
[([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],)]
Can anyone confirm ?
Nicolas
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
Oh yeah this could be. Floating point equality and bitwise equality are not the same thing.
-----Original Message-----
From: "Jaime Fernández del Río"
np.sort(V_) array([([0.5, 0.0, 1.0],), ([0.5, 0.0, -1.0],), ([0.5, -0.0, 1.0],), ([0.5, -0.0, -1.0],)], dtype=[('v', '
These I think are handled by the generic sort functions, and it looks like the comparison function being used is the one for a VOID dtype with no fields, so it is being done byte-wise, hence the problems with 0.0 and -0.0. Not sure where exactly the bug is, though...
Jaime
On Fri, Aug 22, 2014 at 6:20 AM, Nicolas P. Rougier
structured arrays are of VOID dtype, but with a non-None names attribute:
V_.dtype.num 20 V_.dtype.names ('v',) V_.view(np.void).dtype.num 20 V_.view(np.void).dtype.names
The comparison function uses the STRING comparison function if names is None, or a proper field by field comparison if not, see here: https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/arrayty... With a quick look at the source, the only fishy thing I see is that the original array has the sort axis moved to the end of the shape tuple, and is then copied into a contiguous array here: https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/item_se... But that new array should preserve the dtype unchanged, and hence the right compare function should be called. If no one with a better understanding of the internals spots it, I will try to further debug it over the weekend. Jaime On Fri, Aug 22, 2014 at 7:54 AM, Eelco Hoogendoorn < hoogendoorn.eelco@gmail.com> wrote:
Oh yeah this could be. Floating point equality and bitwise equality are not the same thing. ------------------------------ From: Jaime Fernández del Río
Sent: 22-8-2014 16:22 To: Discussion of Numerical Python
Subject: Re: [Numpy-discussion] np.unique with structured arrays I can confirm, the issue seems to be in sorting:
np.sort(V_) array([([0.5, 0.0, 1.0],), ([0.5, 0.0, -1.0],), ([0.5, -0.0, 1.0],), ([0.5, -0.0, -1.0],)], dtype=[('v', '
These I think are handled by the generic sort functions, and it looks like the comparison function being used is the one for a VOID dtype with no fields, so it is being done byte-wise, hence the problems with 0.0 and -0.0. Not sure where exactly the bug is, though...
Jaime
On Fri, Aug 22, 2014 at 6:20 AM, Nicolas P. Rougier < Nicolas.Rougier@inria.fr> wrote:
Hello,
I've found a strange behavior or I'm missing something obvious (or np.unique is not supposed to work with structured arrays).
I'm trying to extract unique values from a simple structured array but it does not seem to work as expected. Here is a minimal script showing the problem:
import numpy as np
V = np.zeros(4, dtype=[("v", np.float32, 3)]) V["v"] = [ [0.5, 0.0, 1.0], [0.5, -1.e-16, 1.0], # [0.5, +1.e-16, 1.0] works [0.5, 0.0, -1.0], [0.5, -1.e-16, -1.0]] # [0.5, +1.e-16, -1.0]] works V_ = np.zeros_like(V) V_["v"][:,0] = V["v"][:,0].round(decimals=3) V_["v"][:,1] = V["v"][:,1].round(decimals=3) V_["v"][:,2] = V["v"][:,2].round(decimals=3)
print np.unique(V_) [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],) ([0.5, -0.0, 1.0],) ([0.5, -0.0, -1.0],)]
While I would have expected:
[([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],)]
Can anyone confirm ?
Nicolas
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
It does not sound like an issue with unique, but rather like a matter of floating point equality and representation. Do the ' identical' elements pass an equality test?
-----Original Message-----
From: "Nicolas P. Rougier"
participants (3)
-
Eelco Hoogendoorn
-
Jaime Fernández del Río
-
Nicolas P. Rougier