Fwd: ndarray should offer __format__ that can adjust precision
It would be nice to be able to use the Python syntax we already use to format the precision of floating numbers in numpy:
a = np.array([-np.pi, np.pi]) print(f"{a:+.2f}") [-3.14 +3.14] This is particularly useful when you have large arrangements. The problem is that if you want to do it today, it is not implemented: print(f"{a:+.2f}") Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unsupported format string passed to numpy.ndarray.__format__
In this PR (https://github.com/numpy/numpy/pull/19550) I propose a very basic formatting implementation for numeric numbers that uses `array2string` just like it currently does `str` At first, since we are only considering formatting the numeric type, floating numbers specifically, we are only interested in being able to change the precision, the sign, and possibly the rounding or truncation. Since the `array2string` function already does everything we need, we only need to implement the` __format__` function of the `ndarray` class which parses a predefined format (similar to the one already used by Python for built-in data types) to indicate the parameters before said. I propose a mini format specification inspired in the [Format Specification Mini-Language](https://docs.python.org/3/library/string.html#formatspec). ``` format_spec ::= [sign][.precision][type] sign ::= "+" | "-" | " " precision ::= [0-9]+ type ::= "f" | "e" ``` We are going to consider only 3 arguments of the `array2string` function:` precision`, `suppress_small`,` sign`. In particular, the `type` token sets the` suppress_small` argument to True when the type is `f` and False when it is `e`. This is in order to mimic Python's behavior in truncating decimals when using the fixed-point notation. As @brandon-rhodes said in gh-5543, the behavior when you try to format an array containing Python objects, the behavior should be the same as Python has implemented by default in the `object` class: ` format (a, "") ` should be equivalent to `str (a)` and `format(a, "not empty")` should raise an exception. What remains to be defined is the behavior when trying to format an array with a non-numeric data type (`np.numeric`) other than `np.object_`. Should we raise an exception? In my opinion yes, since in the future formatting is extended -- for example, for dates -- people are aware that before that was not implemented. I'm open to suggestions. - Ivan
On Mon, 2021-07-26 at 01:04 -0400, Ivan Gonzalez wrote:
a = np.array([-np.pi, np.pi]) print(f"{a:+.2f}") [-3.14 +3.14] This is particularly useful when you have large arrangements. The
It would be nice to be able to use the Python syntax we already use to format the precision of floating numbers in numpy: problem is that if you want to do it today, it is not implemented:
print(f"{a:+.2f}") Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unsupported format string passed to numpy.ndarray.__format__
This discussion has recently surfaced again and I am wondering what the stance on it is for people? The PR is: https://github.com/numpy/numpy/pull/19550 I.e. that things like "f{arr:.2f}" would be enabled in certain cases, at least for all floating point and complex values. I am wondering more about the general API progression here, since I do not think we have any prior art to compare to. * NumPy arrays are N-D objects (containers), do we want f/e formatting to work for it? * NumPy printing has a lot more option than just how to format each element. Are we happy to say that implemeting `.2f` is fine without unlocking other things? * Some formatting might come with an expectation that the result has that length `f"{3.123:30e}"` gas a length of 30, but for an array that is obviously not true? Do we care about that? I do not have much of an opinion on this. It feels a bit limited to me, but if users thinks it is useful in practice I am happy to allow it. The PR had some other discussion about `__format__` being currently not always correct/matching the Python behavior for scalars. But, I believe these problems are either fixed, or can be fixed before we merge. Without any input, we may discuss this briefly in the community meeting and likely give the go-ahead based on no opposition :). Cheers, Sebastian
In this PR (https://github.com/numpy/numpy/pull/19550) I propose a very basic formatting implementation for numeric numbers that uses `array2string` just like it currently does `str`
At first, since we are only considering formatting the numeric type, floating numbers specifically, we are only interested in being able to change the precision, the sign, and possibly the rounding or truncation. Since the `array2string` function already does everything we need, we only need to implement the` __format__` function of the `ndarray` class which parses a predefined format (similar to the one already used by Python for built-in data types) to indicate the parameters before said.
I propose a mini format specification inspired in the [Format Specification Mini- Language](https://docs.python.org/3/library/string.html#formatspec).
``` format_spec ::= [sign][.precision][type] sign ::= "+" | "-" | " " precision ::= [0-9]+ type ::= "f" | "e" ```
We are going to consider only 3 arguments of the `array2string` function:` precision`, `suppress_small`,` sign`. In particular, the `type` token sets the` suppress_small` argument to True when the type is `f` and False when it is `e`. This is in order to mimic Python's behavior in truncating decimals when using the fixed-point notation.
As @brandon-rhodes said in gh-5543, the behavior when you try to format an array containing Python objects, the behavior should be the same as Python has implemented by default in the `object` class: ` format (a, "") ` should be equivalent to `str (a)` and `format(a, "not empty")` should raise an exception.
What remains to be defined is the behavior when trying to format an array with a non-numeric data type (`np.numeric`) other than `np.object_`. Should we raise an exception? In my opinion yes, since in the future formatting is extended -- for example, for dates -- people are aware that before that was not implemented.
I'm open to suggestions.
- Ivan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Fri, Dec 3, 2021 at 12:07 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
This discussion has recently surfaced again and I am wondering what the stance on it is for people?
The PR is: https://github.com/numpy/numpy/pull/19550
I.e. that things like "f{arr:.2f}" would be enabled in certain cases, at least for all floating point and complex values. I am wondering more about the general API progression here, since I do not think we have any prior art to compare to.
* NumPy arrays are N-D objects (containers), do we want f/e formatting to work for it?
Yes, I imagine this could be quite handy -- way nicer than figuring out the syntax for np.array2string. I would use this functionality myself.
* NumPy printing has a lot more option than just how to format each element. Are we happy to say that implemeting `.2f` is fine without unlocking other things?
If we want to add support for custom whole array formatting in the future, I think it would be reasonable to constrain ourselves to backwards compatible extensions of elementwise formatting.
* Some formatting might come with an expectation that the result has that length `f"{3.123:30e}"` gas a length of 30, but for an array that is obviously not true? Do we care about that?
I'm not concerned about this. If you aren't checking the types of arguments that you are trying to format today, you are already going to encounter surprising errors when string formatting fails.
participants (3)
-
Ivan Gonzalez
-
Sebastian Berg
-
Stephan Hoyer