Mailman 3 Proposal for new function to determine if a float contains an integer - NumPy-Discussion

newer
Proposal to add method for string...

Proposal for new function to determine if a float contains an integer

older
Re: NumPy-Discussion Digest, Vol...

Joseph Fox-Rabinovitz

Dec. 30, 2021

7:33 p.m.

Hi, I wrote a reference implementation for a C ufunc, `isint`, which returns True for integers and False for non-integers, found here: https://github.com/madphysicist/isint_ufunc. The idea came from a Stack Overflow question of mine, which has gotten a fair number of views and even some upvotes: https://stackoverflow.com/q/35042128/2988730. The current "recommended" solution is to use ``((x % 1) == 0)``. This is slower and more cumbersome because of the math operations and the temporary storage. My version returns a single array of booleans with no intermediaries, and is between 5 and 40 times faster, depending on the type and size of the input. If you are interested in taking a look, there is a suite of tests and a small benchmarking script that compares the ufunc against the modulo expression. The entire thing currently works with bit twiddling on an appropriately converted integer representation of the number. It assumes a standard IEEE754 representation for float16, float32, float64. The extended 80-bit float128 format gets some special treatment because of the explicit integer bit. Complex numbers are currently integers only if they are real and integral. Integer types (including bool) are always integers. Time and text raise TypeErrors, since their integerness is meaningless. If a consensus forms that this is something appropriate for numpy, I will need some pointers on how to package up C code properly. This was an opportunity for me to learn to write a basic ufunc. I am still a bit confused about where code like this would go, and how to harness numpy's code generation. I put comments in my .c and .h file showing how I would expect the generators to look, but I'm not sure where to plug something like that into numpy. It would also be nice to test on architectures that have something other than a 80-bit extended long double instead of a proper float128 quad-precision number. Please let me know your thoughts. Regards, - Joe

Attachments:

attachment.htm (text/html — 2.2 KB)

Show replies by date

Brock Mendel

December 2021

9:55 p.m.

At least some of the commenters on that StackOverflow page need a slightly stronger check: not only is_integer(x), but also "np.iinfo(dtype).min <= x <= np.info(dtype).max" for some particular dtype. i.e. "Can I losslessly set these values into the array I already have?" On Thu, Dec 30, 2021 at 4:34 PM Joseph Fox-Rabinovitz < jfoxrabinovitz@gmail.com> wrote:

...

Hi,

I wrote a reference implementation for a C ufunc, `isint`, which returns True for integers and False for non-integers, found here: https://github.com/madphysicist/isint_ufunc. The idea came from a Stack Overflow question of mine, which has gotten a fair number of views and even some upvotes: https://stackoverflow.com/q/35042128/2988730. The current "recommended" solution is to use ``((x % 1) == 0)``. This is slower and more cumbersome because of the math operations and the temporary storage. My version returns a single array of booleans with no intermediaries, and is between 5 and 40 times faster, depending on the type and size of the input.

If you are interested in taking a look, there is a suite of tests and a small benchmarking script that compares the ufunc against the modulo expression. The entire thing currently works with bit twiddling on an appropriately converted integer representation of the number. It assumes a standard IEEE754 representation for float16, float32, float64. The extended 80-bit float128 format gets some special treatment because of the explicit integer bit. Complex numbers are currently integers only if they are real and integral. Integer types (including bool) are always integers. Time and text raise TypeErrors, since their integerness is meaningless.

If a consensus forms that this is something appropriate for numpy, I will need some pointers on how to package up C code properly. This was an opportunity for me to learn to write a basic ufunc. I am still a bit confused about where code like this would go, and how to harness numpy's code generation. I put comments in my .c and .h file showing how I would expect the generators to look, but I'm not sure where to plug something like that into numpy. It would also be nice to test on architectures that have something other than a 80-bit extended long double instead of a proper float128 quad-precision number.

Please let me know your thoughts.

Regards,

- Joe _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: jbrockmendel@gmail.com

Joseph Fox-Rabinovitz

10:46 p.m.

Is adding arbitrary optional parameters a thing with ufuncs? I could easily add upper and lower bounds checks. On Thu, Dec 30, 2021, 20:56 Brock Mendel <jbrockmendel@gmail.com> wrote:

...

At least some of the commenters on that StackOverflow page need a slightly stronger check: not only is_integer(x), but also "np.iinfo(dtype).min <= x <= np.info(dtype).max" for some particular dtype. i.e. "Can I losslessly set these values into the array I already have?"

On Thu, Dec 30, 2021 at 4:34 PM Joseph Fox-Rabinovitz < jfoxrabinovitz@gmail.com> wrote:

...
Hi,

I wrote a reference implementation for a C ufunc, `isint`, which returns True for integers and False for non-integers, found here: https://github.com/madphysicist/isint_ufunc. The idea came from a Stack Overflow question of mine, which has gotten a fair number of views and even some upvotes: https://stackoverflow.com/q/35042128/2988730. The current "recommended" solution is to use ``((x % 1) == 0)``. This is slower and more cumbersome because of the math operations and the temporary storage. My version returns a single array of booleans with no intermediaries, and is between 5 and 40 times faster, depending on the type and size of the input.

If you are interested in taking a look, there is a suite of tests and a small benchmarking script that compares the ufunc against the modulo expression. The entire thing currently works with bit twiddling on an appropriately converted integer representation of the number. It assumes a standard IEEE754 representation for float16, float32, float64. The extended 80-bit float128 format gets some special treatment because of the explicit integer bit. Complex numbers are currently integers only if they are real and integral. Integer types (including bool) are always integers. Time and text raise TypeErrors, since their integerness is meaningless.

If a consensus forms that this is something appropriate for numpy, I will need some pointers on how to package up C code properly. This was an opportunity for me to learn to write a basic ufunc. I am still a bit confused about where code like this would go, and how to harness numpy's code generation. I put comments in my .c and .h file showing how I would expect the generators to look, but I'm not sure where to plug something like that into numpy. It would also be nice to test on architectures that have something other than a 80-bit extended long double instead of a proper float128 quad-precision number.

Please let me know your thoughts.

Regards,

- Joe _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: jbrockmendel@gmail.com

_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: jfoxrabinovitz@gmail.com

Andras Deak

6:44 a.m.

On Fri, Dec 31, 2021 at 1:36 AM Joseph Fox-Rabinovitz < jfoxrabinovitz@gmail.com> wrote:

...

Shouldn't we keep the name of the stdlib float method? >>> (3.0).is_integer() True See https://docs.python.org/3/library/stdtypes.html#float.is_integer András

...

The idea came from a Stack Overflow question of mine, which has gotten a fair number of views and even some upvotes: https://stackoverflow.com/q/35042128/2988730. The current "recommended" solution is to use ``((x % 1) == 0)``. This is slower and more cumbersome because of the math operations and the temporary storage. My version returns a single array of booleans with no intermediaries, and is between 5 and 40 times faster, depending on the type and size of the input.

If you are interested in taking a look, there is a suite of tests and a small benchmarking script that compares the ufunc against the modulo expression. The entire thing currently works with bit twiddling on an appropriately converted integer representation of the number. It assumes a standard IEEE754 representation for float16, float32, float64. The extended 80-bit float128 format gets some special treatment because of the explicit integer bit. Complex numbers are currently integers only if they are real and integral. Integer types (including bool) are always integers. Time and text raise TypeErrors, since their integerness is meaningless.

If a consensus forms that this is something appropriate for numpy, I will need some pointers on how to package up C code properly. This was an opportunity for me to learn to write a basic ufunc. I am still a bit confused about where code like this would go, and how to harness numpy's code generation. I put comments in my .c and .h file showing how I would expect the generators to look, but I'm not sure where to plug something like that into numpy. It would also be nice to test on architectures that have something other than a 80-bit extended long double instead of a proper float128 quad-precision number.

Please let me know your thoughts.

Regards,

- Joe _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: deak.andris@gmail.com

Joseph Fox-Rabinovitz

1:47 p.m.

On Fri, Dec 31, 2021 at 5:46 AM Andras Deak <deak.andris@gmail.com> wrote:

...

This sounds obvious in hindsight. I renamed it to is_integer, including the repo itself. The new link is here: https://github.com/madphysicist/is_integer_ufunc

...

András

...
The idea came from a Stack Overflow question of mine, which has gotten a fair number of views and even some upvotes: https://stackoverflow.com/q/35042128/2988730. The current "recommended" solution is to use ``((x % 1) == 0)``. This is slower and more cumbersome because of the math operations and the temporary storage. My version returns a single array of booleans with no intermediaries, and is between 5 and 40 times faster, depending on the type and size of the input.

If you are interested in taking a look, there is a suite of tests and a small benchmarking script that compares the ufunc against the modulo expression. The entire thing currently works with bit twiddling on an appropriately converted integer representation of the number. It assumes a standard IEEE754 representation for float16, float32, float64. The extended 80-bit float128 format gets some special treatment because of the explicit integer bit. Complex numbers are currently integers only if they are real and integral. Integer types (including bool) are always integers. Time and text raise TypeErrors, since their integerness is meaningless.

If a consensus forms that this is something appropriate for numpy, I will need some pointers on how to package up C code properly. This was an opportunity for me to learn to write a basic ufunc. I am still a bit confused about where code like this would go, and how to harness numpy's code generation. I put comments in my .c and .h file showing how I would expect the generators to look, but I'm not sure where to plug something like that into numpy. It would also be nice to test on architectures that have something other than a 80-bit extended long double instead of a proper float128 quad-precision number.

Please let me know your thoughts.

Regards,

- Joe _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: deak.andris@gmail.com

_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: jfoxrabinovitz@gmail.com

Brock Mendel

December 2021

9:55 p.m.

...

Hi,

I wrote a reference implementation for a C ufunc, `isint`, which returns True for integers and False for non-integers, found here: https://github.com/madphysicist/isint_ufunc. The idea came from a Stack Overflow question of mine, which has gotten a fair number of views and even some upvotes: https://stackoverflow.com/q/35042128/2988730. The current "recommended" solution is to use ``((x % 1) == 0)``. This is slower and more cumbersome because of the math operations and the temporary storage. My version returns a single array of booleans with no intermediaries, and is between 5 and 40 times faster, depending on the type and size of the input.

If you are interested in taking a look, there is a suite of tests and a small benchmarking script that compares the ufunc against the modulo expression. The entire thing currently works with bit twiddling on an appropriately converted integer representation of the number. It assumes a standard IEEE754 representation for float16, float32, float64. The extended 80-bit float128 format gets some special treatment because of the explicit integer bit. Complex numbers are currently integers only if they are real and integral. Integer types (including bool) are always integers. Time and text raise TypeErrors, since their integerness is meaningless.

If a consensus forms that this is something appropriate for numpy, I will need some pointers on how to package up C code properly. This was an opportunity for me to learn to write a basic ufunc. I am still a bit confused about where code like this would go, and how to harness numpy's code generation. I put comments in my .c and .h file showing how I would expect the generators to look, but I'm not sure where to plug something like that into numpy. It would also be nice to test on architectures that have something other than a 80-bit extended long double instead of a proper float128 quad-precision number.

Please let me know your thoughts.

Regards,

- Joe _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: jbrockmendel@gmail.com

Joseph Fox-Rabinovitz

10:46 p.m.

Is adding arbitrary optional parameters a thing with ufuncs? I could easily add upper and lower bounds checks. On Thu, Dec 30, 2021, 20:56 Brock Mendel <jbrockmendel@gmail.com> wrote:

...

At least some of the commenters on that StackOverflow page need a slightly stronger check: not only is_integer(x), but also "np.iinfo(dtype).min <= x <= np.info(dtype).max" for some particular dtype. i.e. "Can I losslessly set these values into the array I already have?"

On Thu, Dec 30, 2021 at 4:34 PM Joseph Fox-Rabinovitz < jfoxrabinovitz@gmail.com> wrote:

...
Hi,

I wrote a reference implementation for a C ufunc, `isint`, which returns True for integers and False for non-integers, found here: https://github.com/madphysicist/isint_ufunc. The idea came from a Stack Overflow question of mine, which has gotten a fair number of views and even some upvotes: https://stackoverflow.com/q/35042128/2988730. The current "recommended" solution is to use ``((x % 1) == 0)``. This is slower and more cumbersome because of the math operations and the temporary storage. My version returns a single array of booleans with no intermediaries, and is between 5 and 40 times faster, depending on the type and size of the input.

If you are interested in taking a look, there is a suite of tests and a small benchmarking script that compares the ufunc against the modulo expression. The entire thing currently works with bit twiddling on an appropriately converted integer representation of the number. It assumes a standard IEEE754 representation for float16, float32, float64. The extended 80-bit float128 format gets some special treatment because of the explicit integer bit. Complex numbers are currently integers only if they are real and integral. Integer types (including bool) are always integers. Time and text raise TypeErrors, since their integerness is meaningless.

If a consensus forms that this is something appropriate for numpy, I will need some pointers on how to package up C code properly. This was an opportunity for me to learn to write a basic ufunc. I am still a bit confused about where code like this would go, and how to harness numpy's code generation. I put comments in my .c and .h file showing how I would expect the generators to look, but I'm not sure where to plug something like that into numpy. It would also be nice to test on architectures that have something other than a 80-bit extended long double instead of a proper float128 quad-precision number.

Please let me know your thoughts.

Regards,

- Joe _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: jbrockmendel@gmail.com

_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: jfoxrabinovitz@gmail.com

Andras Deak

6:44 a.m.

On Fri, Dec 31, 2021 at 1:36 AM Joseph Fox-Rabinovitz < jfoxrabinovitz@gmail.com> wrote:

...

Shouldn't we keep the name of the stdlib float method? >>> (3.0).is_integer() True See https://docs.python.org/3/library/stdtypes.html#float.is_integer András

...

The idea came from a Stack Overflow question of mine, which has gotten a fair number of views and even some upvotes: https://stackoverflow.com/q/35042128/2988730. The current "recommended" solution is to use ``((x % 1) == 0)``. This is slower and more cumbersome because of the math operations and the temporary storage. My version returns a single array of booleans with no intermediaries, and is between 5 and 40 times faster, depending on the type and size of the input.

If you are interested in taking a look, there is a suite of tests and a small benchmarking script that compares the ufunc against the modulo expression. The entire thing currently works with bit twiddling on an appropriately converted integer representation of the number. It assumes a standard IEEE754 representation for float16, float32, float64. The extended 80-bit float128 format gets some special treatment because of the explicit integer bit. Complex numbers are currently integers only if they are real and integral. Integer types (including bool) are always integers. Time and text raise TypeErrors, since their integerness is meaningless.

If a consensus forms that this is something appropriate for numpy, I will need some pointers on how to package up C code properly. This was an opportunity for me to learn to write a basic ufunc. I am still a bit confused about where code like this would go, and how to harness numpy's code generation. I put comments in my .c and .h file showing how I would expect the generators to look, but I'm not sure where to plug something like that into numpy. It would also be nice to test on architectures that have something other than a 80-bit extended long double instead of a proper float128 quad-precision number.

Please let me know your thoughts.

Regards,

- Joe _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: deak.andris@gmail.com

Joseph Fox-Rabinovitz

1:47 p.m.

On Fri, Dec 31, 2021 at 5:46 AM Andras Deak <deak.andris@gmail.com> wrote:

...

This sounds obvious in hindsight. I renamed it to is_integer, including the repo itself. The new link is here: https://github.com/madphysicist/is_integer_ufunc

...

András

...
The idea came from a Stack Overflow question of mine, which has gotten a fair number of views and even some upvotes: https://stackoverflow.com/q/35042128/2988730. The current "recommended" solution is to use ``((x % 1) == 0)``. This is slower and more cumbersome because of the math operations and the temporary storage. My version returns a single array of booleans with no intermediaries, and is between 5 and 40 times faster, depending on the type and size of the input.

If you are interested in taking a look, there is a suite of tests and a small benchmarking script that compares the ufunc against the modulo expression. The entire thing currently works with bit twiddling on an appropriately converted integer representation of the number. It assumes a standard IEEE754 representation for float16, float32, float64. The extended 80-bit float128 format gets some special treatment because of the explicit integer bit. Complex numbers are currently integers only if they are real and integral. Integer types (including bool) are always integers. Time and text raise TypeErrors, since their integerness is meaningless.

If a consensus forms that this is something appropriate for numpy, I will need some pointers on how to package up C code properly. This was an opportunity for me to learn to write a basic ufunc. I am still a bit confused about where code like this would go, and how to harness numpy's code generation. I put comments in my .c and .h file showing how I would expect the generators to look, but I'm not sure where to plug something like that into numpy. It would also be nice to test on architectures that have something other than a 80-bit extended long double instead of a proper float128 quad-precision number.

Please let me know your thoughts.

Regards,

- Joe _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: deak.andris@gmail.com

_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: jfoxrabinovitz@gmail.com

1169

Age (days ago)

1169

Last active (days ago)

List overview

Download

4 comments

3 participants

participants (3)

Andras Deak
Brock Mendel
Joseph Fox-Rabinovitz

Proposal for new function to determine if a float contains an integer

tags

participants (3)