[Numpy-discussion] Compare NumPy arrays with threshold and return the differences

Robert Kern robert.kern at gmail.com
Wed May 17 13:16:09 EDT 2017


On Wed, May 17, 2017 at 9:50 AM, Nissim Derdiger <NissimD at elspec-ltd.com>
wrote:

> Hi,
> In my script, I need to compare big NumPy arrays (2D or 3D), and return a
> list of all cells with difference bigger than a defined threshold.
> The compare itself can be done easily done with "allclose" function, like
> that:
> Threshold = 0.1
> if (np.allclose(Arr1, Arr2, Threshold, equal_nan=True)):
>     Print('Same')
> But this compare does not return *which* cells are not the same.
>
> The easiest (yet naive) way to know which cells are not the same is to use
> a simple for loops code like this one:
> def CheckWhichCellsAreNotEqualInArrays(Arr1,Arr2,Threshold):
>    if not Arr1.shape == Arr2.shape:
>        return ['Arrays size not the same']
>    Dimensions = Arr1.shape
>    Diff = []
>    for i in range(Dimensions [0]):
>        for j in range(Dimensions [1]):
>            if not np.allclose(Arr1[i][j], Arr2[i][j], Threshold,
> equal_nan=True):
>                Diff.append(',' + str(i) + ',' + str(j) + ',' +
> str(Arr1[i,j]) + ','
>                + str(Arr2[i,j]) + ',' + str(Threshold) + ',Fail\n')
>        return Diff
> (and same for 3D arrays - with 1 more for loop)
> This way is very slow when the Arrays are big and full of none-equal cells.
>
> Is there a fast straight forward way in case they are not the same - to
> get a list of the uneven cells? maybe some built-in function in the NumPy
> itself?
>

Use `close_mask = np.isclose(Arr1, Arr2, Threshold, equal_nan=True)` to
return a boolean mask the same shape as the arrays which is True where the
elements are close and False where they are not. You can invert it to get a
boolean mask which is True where they are "far" with respect to the
threshold: `far_mask = ~close_mask`. Then you can use `i_idx, j_idx =
np.nonzero(far_mask)` to get arrays of the `i` and `j` indices where the
values are far. For example:

for i, j in zip(i_idx, j_idx):
    print("{0}, {1}, {2}, {3}, {4}, Fail".format(i, j, Arr1[i, j], Arr2[i,
j], Threshold))

-- 
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20170517/3d57f695/attachment-0001.html>


More information about the NumPy-Discussion mailing list