Mailman 3 Augment unique method - NumPy-Discussion

Augment unique method

Amin Sadeghi

16 Jul 2020 16 Jul '20

6:27 p.m.

It would be handy to add "atol" and "rtol" optional arguments to the "unique" method. I'm proposing this since uniqueness is a bit vague for floats. This change would be clearly backwards-compatible.

Attachments:

attachment.htm (text/html — 357 bytes)

Show replies by date

Roman Yurchak

16 Jul 16 Jul

6:41 p.m.

One issue with adding a tolerance to np.unique for floats is say you have [0, 0.1, 0.2, 0.3, 0.4, 0.5] with atol=0.15 Should this return a single element or multiple ones? One once side each consecutive float is closer than the tolerance to the next one but the first one and the last one are clearly not within atol. Generally this is similar to what DBSCAN clustering algorithm does (e.g. in scikit-learn) and that would probably be out of scope for np.unique. Roman On 16/07/2020 20:27, Amin Sadeghi wrote:

...

It would be handy to add "atol" and "rtol" optional arguments to the "unique" method. I'm proposing this since uniqueness is a bit vague for floats. This change would be clearly backwards-compatible.

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Stephan Hoyer

7:06 p.m.

On Thu, Jul 16, 2020 at 11:41 AM Roman Yurchak wrote:

...

One issue with adding a tolerance to np.unique for floats is say you have [0, 0.1, 0.2, 0.3, 0.4, 0.5] with atol=0.15

Should this return a single element or multiple ones? One once side each consecutive float is closer than the tolerance to the next one but the first one and the last one are clearly not within atol.

Generally this is similar to what DBSCAN clustering algorithm does (e.g. in scikit-learn) and that would probably be out of scope for np.unique.

I agree, I don't think there's an easy answer for selecting "approximately unique" floats in the case of overlap. np.unique() does actually have well defined behavior for float, comparing floats for exact equality. This isn't always directly useful, but it definitely is well defined. My suggestion for this use-case would be round floats to the desired precision before passing them into np.unique().

...

Roman

On 16/07/2020 20:27, Amin Sadeghi wrote:

...
It would be handy to add "atol" and "rtol" optional arguments to the "unique" method. I'm proposing this since uniqueness is a bit vague for floats. This change would be clearly backwards-compatible.

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

aminthefresh＠gmail.com

8:04 p.m.

I see your point. How about passing number of significant figures instead of atol. In fact, that’s what I originally intended but I thought that it could be expressed via atol and rtol, whereas number of significant figures doesn’t seem to suffer from the ambiguity you pointed out. From: NumPy-Discussion On Behalf Of Stephan Hoyer Sent: Thursday, July 16, 2020 3:06 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Augment unique method On Thu, Jul 16, 2020 at 11:41 AM Roman Yurchak mailto:rth.yurchak@gmail.com > wrote: One issue with adding a tolerance to np.unique for floats is say you have [0, 0.1, 0.2, 0.3, 0.4, 0.5] with atol=0.15 Should this return a single element or multiple ones? One once side each consecutive float is closer than the tolerance to the next one but the first one and the last one are clearly not within atol. Generally this is similar to what DBSCAN clustering algorithm does (e.g. in scikit-learn) and that would probably be out of scope for np.unique. I agree, I don't think there's an easy answer for selecting "approximately unique" floats in the case of overlap. np.unique() does actually have well defined behavior for float, comparing floats for exact equality. This isn't always directly useful, but it definitely is well defined. My suggestion for this use-case would be round floats to the desired precision before passing them into np.unique(). Roman On 16/07/2020 20:27, Amin Sadeghi wrote:

...

It would be handy to add "atol" and "rtol" optional arguments to the "unique" method. I'm proposing this since uniqueness is a bit vague for floats. This change would be clearly backwards-compatible.

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org mailto:NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org mailto:NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

Stephan Hoyer

8:14 p.m.

On Thu, Jul 16, 2020 at 1:04 PM wrote:

...

I see your point. How about passing number of significant figures instead of atol.

In fact, that’s what I originally intended but I thought that it could be expressed via atol and rtol, whereas number of significant figures doesn’t seem to suffer from the ambiguity you pointed out.

This can already be expressed clearly* with a separate function call, e.g., np.unique(np.round(x, 3)) In general, it's a better software design practice to have separate composable functions rather than adding more features into a single function. So I don't think this would be an improvement for np.unique(). * Note: this is rounding to fixed precision rather than a fixed number of significant figures. I can see a case why adding a helper function for rounding to a number of significant digits would be useful, but this should be a separate change from np.unique(). You can certainly do this currently in NumPy but it's a bit of work: https://stackoverflow.com/questions/18915378/rounding-to-significant-figures...

...

*From:* NumPy-Discussion *On Behalf Of *Stephan Hoyer *Sent:* Thursday, July 16, 2020 3:06 PM *To:* Discussion of Numerical Python *Subject:* Re: [Numpy-discussion] Augment unique method

On Thu, Jul 16, 2020 at 11:41 AM Roman Yurchak wrote:

One issue with adding a tolerance to np.unique for floats is say you have [0, 0.1, 0.2, 0.3, 0.4, 0.5] with atol=0.15

Should this return a single element or multiple ones? One once side each consecutive float is closer than the tolerance to the next one but the first one and the last one are clearly not within atol.

Generally this is similar to what DBSCAN clustering algorithm does (e.g. in scikit-learn) and that would probably be out of scope for np.unique.

I agree, I don't think there's an easy answer for selecting "approximately unique" floats in the case of overlap.

np.unique() does actually have well defined behavior for float, comparing floats for exact equality. This isn't always directly useful, but it definitely is well defined.

My suggestion for this use-case would be round floats to the desired precision before passing them into np.unique().

Roman

On 16/07/2020 20:27, Amin Sadeghi wrote:

...
It would be handy to add "atol" and "rtol" optional arguments to the "unique" method. I'm proposing this since uniqueness is a bit vague for floats. This change would be clearly backwards-compatible.

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion

1379

Age (days ago)

1379

Last active (days ago)

List overview

Download

4 comments

4 participants

participants (4)

Amin Sadeghi
aminthefresh＠gmail.com
Roman Yurchak
Stephan Hoyer

Augment unique method

Amin Sadeghi

Roman Yurchak

Stephan Hoyer

aminthefresh＠gmail.com

Stephan Hoyer

tags

participants (4)