ENH: Uniform interface for accessing minimum or maximum value of a dtype
![](https://secure.gravatar.com/avatar/6c746c93bad1aa2312e48e9973657bd9.jpg?s=120&d=mm&r=g)
As discussed [here](https://github.com/numpy/numpy/issues/5032#issuecomment-1830838701), [here](https://github.com/numpy/numpy/issues/5032#issuecomment-2307927804), and [here](https://github.com/google/jax/issues/18661#issuecomment-1829031914), I'm interested in a uniform interface for accessing the minimum or maximum value of a given dtype. Currently, this requires branching on the type of dtype (boolean, integer, or floating point) and then (for the latter two) calling either [iinfo](https://numpy.org/doc/stable/reference/generated/numpy.iinfo.html) or [finfo](https://numpy.org/doc/stable/reference/generated/numpy.finfo.html), respectively. It would be more ergonomic to have a single, uniform interface for accessing this information that is dtype-independent. Possible interfaces include: ```python3 import numpy as np dt = np.dtype('int32') dt.min np.dtypes.info(dt).min np.dtypes.min(dt) np.dtypes.min_value(dt) ```
![](https://secure.gravatar.com/avatar/a2c1b891fe9dd5c60430e823bfe8c298.jpg?s=120&d=mm&r=g)
+1 for the general idea! It may be nice to have such a function which sits at the top level of the API, to fit into https://data-apis.org/array-api/draft/API_specification/data_type_functions.... nicely. However, ‘min_value’ or ‘min‘ won’t do then - we’d probably need to include ‘dtype’ in the name somewhere. But I don’t really like `np.min_dtype(dt)`. Maybe `np.min_dtype_value(dt)`? Cheers, Lucas
![](https://secure.gravatar.com/avatar/a2c1b891fe9dd5c60430e823bfe8c298.jpg?s=120&d=mm&r=g)
Or how about `np.dtype_info(dt)`, which could return an object with attributes like `min` and `max`. Would that be possible?
![](https://secure.gravatar.com/avatar/7272106f3e0d0ac17272f94a8a71f9ca.jpg?s=120&d=mm&r=g)
That seems reasonable to me on its face. There are some corner cases to work out though. Swayam is tinkering with a quad precision dtype written using rhe new DType API and just ran into the fact that finfo doesn’t support user dtypes: https://github.com/numpy/numpy/issues/27231 IMO any new feature along these lines should have some thought in the design about how to handle user-defined data types. Another thing to consider is that data types can be non-numeric (things like categories) or number-like but not really just a number like a quantity with a physical unit. That means you should also think about what to do where fields like min and max don’t make any sense or need to be a generic python object rather than a numeric type. I think if someone proposed a NEP that fully worked this out it would be welcome. My understanding is that the array API consortium prefers to standardize on APIs that gain tractions in libraries rather than inventing APIs and telling libraries to adopt them, so I think a NEP is the right first step, rather than trying to standardize something in the array API. On Mon, Aug 26, 2024 at 8:06 AM Lucas Colley <lucas.colley8@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/73e482f0af57b624af58ddc78fe9b128.jpg?s=120&d=mm&r=g)
I think a NEP is a good idea. It would also seem to make sense to consider how the dtype itself can hold/calculate this type of information, since that will be the only way a generic ``info()`` function can get information for a user-defined dtype. Indeed, taking that further, might a method or property on the dtype itself be the cleaner interface? I.e., one would do ``dtype.info().min`` or ``dtype.info.min``. -- Marten Nathan <nathan.goldbaum@gmail.com> writes:
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Mon, 2024-08-26 at 11:26 -0400, Marten van Kerkwijk wrote:
I agree, I think it should be properties/attributes (I don't think it needs to be a function, it should be cheap?) Now it might also be that `np.finfo()` could keep working via `dtype.finfo` or a dunder if we want to hide it. In general, I would lean towards some form of attributes, even if I am not sure if they should be `.info`, `.finfo`, or even directly on the dtype. (`.info.min` seems tricky, because I am not sure it is clear whether inf or the minimum finite value is "min".) A (potentially very short) NEP would probably help to get momentum on making a decision. I certainly would like to see this being worked on! - Sebastian
![](https://secure.gravatar.com/avatar/d2aafb97833979e3668c61d36e697bfc.jpg?s=120&d=mm&r=g)
On Mon, Aug 26, 2024 at 5:42 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
A namespace attached to the dtype to hold useful constants seems like a good approach. This could also be used to hold type-dependent constants such as `pi`, `e`, etc. for the real floating point types. Over in https://github.com/numpy/numpy/issues/9698, I suggested the name `constants` (https://github.com/numpy/numpy/issues/9698#issuecomment-2186653455). This would also be available for user-defined dtypes, where types such as `quaddtype` (https://github.com/numpy/numpy-user-dtypes/tree/main/quaddtype) , `mpfdtype` (https://github.com/numpy/numpy-user-dtypes/tree/main/mpfdtype), and `logfloat32` (https://github.com/WarrenWeckesser/numtypes) would make available their own representations of `pi`, `e`, etc. There will be some work required to define the semantics of the existing attributes. Not all attributes can be required for all data types. For example, a few considerations off the top of my head: * The `min` and `max` values for `datetime64` andf `timedelta64` would have values that depend on the time unit. * Floating point types that are not IEEE 754 such as IBM double-double wouldn't necessarily have all the attributes IEEE 754 float types have. * The StringDType has a well-defined `min` (the empty string), but not a `max`. Warren
![](https://secure.gravatar.com/avatar/09e8193f35628be825c37595484370da.jpg?s=120&d=mm&r=g)
I’d advocate for something like a `DTypeInfo` object in the Array API itself, with `max_value` and `min_value` being members. Of course, one would have to imagine how this would work with complex-valued dtypes, but I’d like API that returns an object rather than a million different calls, similar to `finfo` and `iinfo`, but unified.
![](https://secure.gravatar.com/avatar/a2c1b891fe9dd5c60430e823bfe8c298.jpg?s=120&d=mm&r=g)
+1 for the general idea! It may be nice to have such a function which sits at the top level of the API, to fit into https://data-apis.org/array-api/draft/API_specification/data_type_functions.... nicely. However, ‘min_value’ or ‘min‘ won’t do then - we’d probably need to include ‘dtype’ in the name somewhere. But I don’t really like `np.min_dtype(dt)`. Maybe `np.min_dtype_value(dt)`? Cheers, Lucas
![](https://secure.gravatar.com/avatar/a2c1b891fe9dd5c60430e823bfe8c298.jpg?s=120&d=mm&r=g)
Or how about `np.dtype_info(dt)`, which could return an object with attributes like `min` and `max`. Would that be possible?
![](https://secure.gravatar.com/avatar/7272106f3e0d0ac17272f94a8a71f9ca.jpg?s=120&d=mm&r=g)
That seems reasonable to me on its face. There are some corner cases to work out though. Swayam is tinkering with a quad precision dtype written using rhe new DType API and just ran into the fact that finfo doesn’t support user dtypes: https://github.com/numpy/numpy/issues/27231 IMO any new feature along these lines should have some thought in the design about how to handle user-defined data types. Another thing to consider is that data types can be non-numeric (things like categories) or number-like but not really just a number like a quantity with a physical unit. That means you should also think about what to do where fields like min and max don’t make any sense or need to be a generic python object rather than a numeric type. I think if someone proposed a NEP that fully worked this out it would be welcome. My understanding is that the array API consortium prefers to standardize on APIs that gain tractions in libraries rather than inventing APIs and telling libraries to adopt them, so I think a NEP is the right first step, rather than trying to standardize something in the array API. On Mon, Aug 26, 2024 at 8:06 AM Lucas Colley <lucas.colley8@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/73e482f0af57b624af58ddc78fe9b128.jpg?s=120&d=mm&r=g)
I think a NEP is a good idea. It would also seem to make sense to consider how the dtype itself can hold/calculate this type of information, since that will be the only way a generic ``info()`` function can get information for a user-defined dtype. Indeed, taking that further, might a method or property on the dtype itself be the cleaner interface? I.e., one would do ``dtype.info().min`` or ``dtype.info.min``. -- Marten Nathan <nathan.goldbaum@gmail.com> writes:
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Mon, 2024-08-26 at 11:26 -0400, Marten van Kerkwijk wrote:
I agree, I think it should be properties/attributes (I don't think it needs to be a function, it should be cheap?) Now it might also be that `np.finfo()` could keep working via `dtype.finfo` or a dunder if we want to hide it. In general, I would lean towards some form of attributes, even if I am not sure if they should be `.info`, `.finfo`, or even directly on the dtype. (`.info.min` seems tricky, because I am not sure it is clear whether inf or the minimum finite value is "min".) A (potentially very short) NEP would probably help to get momentum on making a decision. I certainly would like to see this being worked on! - Sebastian
![](https://secure.gravatar.com/avatar/d2aafb97833979e3668c61d36e697bfc.jpg?s=120&d=mm&r=g)
On Mon, Aug 26, 2024 at 5:42 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
A namespace attached to the dtype to hold useful constants seems like a good approach. This could also be used to hold type-dependent constants such as `pi`, `e`, etc. for the real floating point types. Over in https://github.com/numpy/numpy/issues/9698, I suggested the name `constants` (https://github.com/numpy/numpy/issues/9698#issuecomment-2186653455). This would also be available for user-defined dtypes, where types such as `quaddtype` (https://github.com/numpy/numpy-user-dtypes/tree/main/quaddtype) , `mpfdtype` (https://github.com/numpy/numpy-user-dtypes/tree/main/mpfdtype), and `logfloat32` (https://github.com/WarrenWeckesser/numtypes) would make available their own representations of `pi`, `e`, etc. There will be some work required to define the semantics of the existing attributes. Not all attributes can be required for all data types. For example, a few considerations off the top of my head: * The `min` and `max` values for `datetime64` andf `timedelta64` would have values that depend on the time unit. * Floating point types that are not IEEE 754 such as IBM double-double wouldn't necessarily have all the attributes IEEE 754 float types have. * The StringDType has a well-defined `min` (the empty string), but not a `max`. Warren
![](https://secure.gravatar.com/avatar/09e8193f35628be825c37595484370da.jpg?s=120&d=mm&r=g)
I’d advocate for something like a `DTypeInfo` object in the Array API itself, with `max_value` and `min_value` being members. Of course, one would have to imagine how this would work with complex-valued dtypes, but I’d like API that returns an object rather than a million different calls, similar to `finfo` and `iinfo`, but unified.
participants (7)
-
Carlos Martin
-
Hameer Abbasi
-
Lucas Colley
-
Marten van Kerkwijk
-
Nathan
-
Sebastian Berg
-
Warren Weckesser