real numbers with SI scale factors: next steps

Okay, let's try to wrap this up. In summary I proposed three things: 1. A change to the Python lexer to accept SI literal as an alternative, but not replacement to, E-notation. As an optional feature, simple units could be added to the end but would be largely ignored. So the following would be accepted: freq = 2.4GHz r = 1k l = 10nm The idea in accepting units was to allow them to be specified when convenient as additional documentation on the meaning of the number. Objections: a. Acceptance of the abbreviation for Exa (E) overlaps with E-notation (1E+1 could represent 1e18 + 1 or 10). A suggestion to change the prefix from E to X conflicts with a proposal to use X, W, and V to represent 10^27, 10^30, and 10^33 (en.wikipedia.org/wiki/Metric_prefix) b. Allowing the units to be specified will lead some users to assume a dimensional analysis is being performed when in fact the units are ignored. This false sense of security could lead to bugs. c. The proposal only supports simple units, not compound units such as m/s. So even if hooks were provided to allow access to the units to support an add-on dimensional analysis capability, an additional mechanism would have to be provided to support compound units. d. Many people objected to allowing the use of naked scale factors as a perversion of the standard. 2. A change to the float() function so that it accepts SI scale factors and units. This extension naturally follows from the first: the float function should accept anything the Python parser accepts. For example: freq = float('2.4GHz') r = float('1k') l = float('10nm') Objections: a. The Exa objection from the above proposal is problematic here as well. b. Things that used to be errors are now no longer errors. This could cause problems if a program was counting on float('1k') to be an error. 3. A change to the various string formatting mechanisms to allow outputting real numbers with SI scale factors: >>> print('Speed of light in a vacuum: {:r}m/s.'.format(2.9979e+08)) Speed of light in a vacuum: 299.79 Mm/s. >>> print('Speed of sound in water: %rm/s.' % 1481 Speed of sound in water: 1.481 km/s. Objections: No objections were raised that I recall, however here is something else to consider: a. Should we also provide mechanism for the binary scale factors (Ki, Mi, ..., Yi)? For example: '{:b}B'.format(2**30) --> 1 GiB. On proposed extension 1 (native support for SI literals) my conclusion is that we did not reach any sense of consensus and there was considerable opposition to my proposal. There was much less discussion on extensions 2 & 3, so it is hard to say whether consensus was reached. So, given all this, I would like to make the following recommendations: 1. No action should be taken. 2. The main justification to modifying float() was to make it consistent with the extended Python language. Without extension 1, this justification goes away. However the need to be able to easily convert strings of numbers with SI scale factors into floats still exists. This should be handled by adding a library or extending an existing library. 3. Allowing numbers to be formatted with SI prefixes is useful and not controversial. The 'r' and 'b' format codes should be added to the various string formatting mechanisms. What do you think? -Ken

On 30 August 2016 at 21:34, Ken Kundert <python-ideas@shalmirane.com> wrote:
Thanks for the summary (which I mostly elided) which I think was fair. Regarding (3), the only one that remains proposed, I think it would be useful to see a 3rd-party library implementation of the formatting operation proposed. This would allow any corner cases or controversial points to be ironed out before proposing it for direct incorporation in the string formatting mini-language. Furthermore, in Python 2.6, it will be possible to write f"The value is {si_format(the_val)}" directly, using PEP 498 f-strings. The combination of a 3rd party function and f-strings may even make special formatting support unnecessary - but that will be easier to establish with practical experience. And there's little or no downside - the proposed feature won't be possible before 3.7, so we may as well use lifetime of the 3.6 release to gain that experience. Paul

Given that something like this gets proposed from time to time, I wonder if it would make sense to actually write up (1) and (2) as a PEP that is immediately marked rejected. The PEP should make it clear *why* it is rejected. This would be a handy reference doc to have around the next time the idea comes up. -- --Guido van Rossum (python.org/~guido)

Thanks a lot for this comprehensive summary. :) Find my comments below. On 30.08.2016 22:34, Ken Kundert wrote:
I think this results from the possibility of omitting the SI units.
Same can be said for variable annotations for which a PEP is in the works.
I get the feeling that SI syntax should only work when the hook is provided. So this could be the dealbreaker here: only enabling it when the hook is provided, changes the syntax/semantics of valid Python code depending on the presence of some hidden hooks. Enabling the syntax regardless of a working hook, have those sideeffects like described by you above. So, no matter how done, it always has some negative connotation.
d. Many people objected to allowing the use of naked scale factors as a perversion of the standard.
Remove this and it also solves 1.a.
I like your conclusion. It seems there is missing some technical note of why this won't happen the way you proposed it (maybe the hook + missing stdlib package for SI units). :) Aren't there some package already available for recommendation 3? Sven

On Tue, Aug 30, 2016 at 01:34:27PM -0700, Ken Kundert wrote:
3. A change to the various string formatting mechanisms to allow outputting real numbers with SI scale factors:
This is somewhat similar to a library I wrote for formatting bytes: https://pypi.python.org/pypi/byteformat Given that feature freeze for 3.6 is two weeks way, I don't think that this proposal will appear before 3.7. So I'm interested, but I'm less interested *right now*. So for now I'll limit myself to only a few observations.
>>> print('Speed of light in a vacuum: {:r}m/s.'.format(2.9979e+08)) Speed of light in a vacuum: 299.79 Mm/s.
Do you think that {:r} might be confused with {!r}? What's the mnemonic here? Why "r" for scale factor?
>>> print('Speed of sound in water: %rm/s.' % 1481 Speed of sound in water: 1.481 km/s.
I doubt that you'll get any new % string formatting codes. That's a legacy interface, *not* deprecated but unlikely to get new features added, and it is intended to closely match the C printf codes. A few more questions: (1) Why no support for choosing a particular scale? If this only auto-scales, I'm not interested. (2) Support for full prefix names, so we can format (say) "kilograms" as well as "kg"? (3) Scientific notation and engineering notation? (4) 1e5 versus 1×10^5 notation? (5) Is this really something that format() needs to understand? We can get a *much* richer and more powerful interface by turning it into a generalise numeric pretty-printing library, at the cost of a little less convenience.
3. Allowing numbers to be formatted with SI prefixes is useful and not controversial.
I wouldn't quite go that far. You made an extremely controversial request (new syntax for scaling prefixes + ignored units) and nearly all the attention was on that. For what its worth, I have no need for a format code which *only* auto-selects the scaling factor. If I don't have at least the option to choose which scaling factor I get, and hence the prefix, this is of little or no use to me, I likely wouldn't use it, and as far as I am concerned the nuisance value of having yet another format string code to learn outweighs the benefit. -- Steve

On Wed, Aug 31, 2016 at 12:05 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Or just have a subclass of int or float that defines __format__, and can do whatever it likes - including specifying the scale, if you so choose. Say, something like: {:s} -- autoscale, prefix {:S} -- autoscale, full word {:sM} -- scale to mega, print "M" {:SM} -- scale to mega, print "Mega" etc ChrisA

What's the mnemonic here? Why "r" for scale factor?
My thinking was that r stands for real like f stands for float. With the base 2 scale factors, b stands for binary.
(1) Why no support for choosing a particular scale? If this only auto-scales, I'm not interested.
Auto-scaling is kind of the point. There is really little need for a special mechanism if your going to specify the scale factor yourself. >>> print('Attenuation = {:.1f} dB at {:r}m.'.format(-13.7, 50e3)) Attenuation = -13.7 dB at 50 km. If you wanted to force the second number to be in km, you use a %f format and scale the argument: >>> print('Attenuation = {:.1f} dB at {:.1f} km.'.format(-13.7, 50e3/1e3)) Attenuation = -13.7 dB at 50 km.
(2) Support for full prefix names, so we can format (say) "kilograms" as well as "kg"?
This assumes that somehow this code can access the units so that it can switch between long form 'grams' and short form 'g'. That is a huge expansion in the complexity for what seems like a small benefit.
(3) Scientific notation and engineering notation?
(4) 1e5 versus 1×10^5 notation?
Ah, okay. But all of these require auto-scaling. And I was still thinking that we need to provide input and output capability (ie, we still need be able to convert whatever format we output back from strings into floats). Are you thinking that we should parse 1×10^5? And why 1×10^5 and not 1×10⁵?
This is suddenly a much bigger project than what I was envisioning. -Ken

On 31 August 2016 at 05:08, Ken Kundert <python-ideas@shalmirane.com> wrote:
This argument can just as easily be used against your proposal: If you want auto-scaling you use a %s format and a suitable library function: >>> print('Attenuation = {:.1f} dB at {}m.'.format(-13.7, scale(50e3))) Attenuation = -13.7 dB at 50 km. Anything that's going to be included in the language has to consider other requirements than just your own.
This is suddenly a much bigger project than what I was envisioning.
You're going to have to write the scaling code one way or the other. Writing it in Python and publishing it as a library is *far* easier than writing it in C and hooking it into the format mechanism. You can leave others to offer pull requests to your library to add extra types of formatting. IMO, it's probably time to write some code. Publish a library on PyPI (call it a "prototype" if you like) implementing the scale() function above, publicise it here and elsewhere, and see what reception it gets. Paul

On Wed, Aug 31, 2016 at 2:08 PM, Ken Kundert <python-ideas@shalmirane.com> wrote:
"Real" has historically often been a synonym for "float", and it doesn't really say that it'll be shown in engineering notation. But then, we currently have format codes 'e', 'f', and 'g', and I don't think there's much logic there beyond "exponential", "floating-point", and... "general format"? I think that's a back-formation, frankly, and 'g' was used simply because it comes nicely after 'e' and 'f'. (C's decision, not Python's, fwiw.) I'll stick with 'r' for now, but it could just as easily become 'h' to avoid confusion with %r for repr.
AIUI, it's just giving the full word. class ScaledNumber(float): invert = {"μ": 1e6, "m": 1e3, "": 1, "k": 1e-3, "M": 1e-6} words = {"μ": "micro", "m": "milli", "": "", "k": "kilo", "M": "mega"} aliases = {"u": "μ"} def autoscale(self): if self < 1e-6: return None if self < 1e-3: return "μ" if self < 1: return "m" if self < 1e3: return "" if self < 1e6: return "k" if self < 1e9: return "M" return None def __format__(self, fmt): if fmt == "r" or fmt == "R": scale = self.autoscale() fmt = fmt + scale if scale else "f" if fmt.startswith("r"): scale = self.aliases.get(fmt[1], fmt[1]) return "%g%s" % (self * self.invert[scale], scale) if fmt.startswith("R"): scale = self.aliases.get(fmt[1], fmt[1]) return "%g %s" % (self * self.invert[scale], self.words[scale]) return super().__format__(self, fmt)
It's a minor flexibility, but could be very useful. As you see, it's still not at all unit-aware; but grammatically, these formats only make sense if followed by an actual unit name. (And not an SI base unit, necessarily - you have to use "gram", not "kilogram", lest you get silly constructs like "microkilogram" for milligram.) Note that this *already works*. You do have to use an explicit class for your scaled numbers, since Python doesn't want you monkey-patching the built-in float type, but if you were to request that float.__format__ grow support for this, it'd be a relatively non-intrusive change. This class could live on PyPI until one day becoming subsumed into core, or just be a permanent third-party float formatting feature. ChrisA

On 31 August 2016 at 17:07, Chris Angelico <rosuav@gmail.com> wrote:
"h" would be a decent choice - it's not only a continuation of the e/f/g pattern, it's also very commonly used as a command line flag for "human-readable output" in system utilities that print numbers. The existing "alternate form" marker in string formatting could be used to request the use of the base 2 scaling prefixes rather than the base 10 ones: "#h". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, Aug 31, 2016 at 5:21 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I like it. So after all the drama we're just talking about adding an 'h' format code that's like 'g' but uses SI scale factors instead of exponents. I guess we need to debate what it should do if the value is way out of range of the SI scale system -- what's it going to do when I pass it 1e50? I propose that it should fall back to 'g' style then, but use "engineering" style where exponents are always a multiple of 3.)
Not sure about this one. -- --Guido van Rossum (python.org/~guido)

On 08/31/2016 01:07 PM, MRAB wrote:
Would you also want h to work with integers?
'#' already has a meaning for float's 'g' format:
So I think you'd want to pick another type character to mean base 2 scaling, or another character other than #. But it gets cryptic pretty quickly. You could indeed use type == 'b' for floats to mean base 2 scaling, since it has no current meaning, but I'm not sure that's a great idea because 'b' means binary for integers, and if you want to also be able to scale ints (see above), then there's a conflict. Maybe type == 'z'? Or, use something like '@' (or whatever) instead of '#' to mean "the other alternate form", base 2 scaling.
Does the 'type' have to be a single character?
As a practical matter, yes, it should just be a single character. You could make a special case for 'h' and 'hb', but I would not recommend that. Explaining it in the documentation would be confusing. Eric.

On Wed, Aug 31, 2016, at 12:19, Guido van Rossum wrote:
One thing to consider is that this is very likely to be used with a unit (e.g. "%hA" intending to display in amperes), so maybe it should put a space after it? Though really people are probably going to want "1 A" vs "1 kA" in that case, rather than "1 A" vs "1kA". Also, maybe consider that "1*10^50" [or, slightly less so, 1.0*10**50] is more human-readable than "1e+50". Er, with engineering style it'd be 100e+48 etc, but same basic issue. Also, is it really necessary to use single-character codes not shared with any other language? The only rationale here seems to be a desire to support everything in % and its limited grammar rather than requiring anyone to use format. If this feature is only supported in format a more verbose description of the desired format could be used. What if, for example, you want engineering style without SI scale factors? What should the "precision" field mean? %f takes a number of places after the decimal point whereas %e/%g takes a number of significant digits. Engineering or SI-scale-factor format suggests a third possibility: number of decimal places to be shown after the displayed decimal point, e.g. "%.1h" % 1.2345 * 10 ** x for x in range(10): "1.2", "12.3", "123.5", "1.2k", "12.3k", "123.5k", "1.2M", "12.3M", "123.5M". And the actual -h behavior of those system utilities you mentioned is "123k", "1.2M", "12M", with the effect being that the value always fits within a four-character field width, but this isn't a fixed number of decimal places *or* significant digits.
If base 2 scaling prefixes are used, should "engineering style" mean 2**[multiple of 10] instead of 10**[multiple of 3]?
Not sure about this one.

Random832 writes:
Also, interesting quirk - it always rounds up. 1025 bytes is "1.1K", and in SI mode, 1001 bytes is "1.1k"
That seems to be right approach: in system administration, these numbers are used mostly to understand resource usage, and underestimates are almost never what you want, while quite large overestimates are tolerable, and are typically limited because the actual precision of calculations is much higher than that of the "human-readable" output. I don't know if that would be true in general-purpose programming. I suspect not.

Guido van Rossum writes:
On Wed, Aug 31, 2016 at 8:57 PM, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
True, but I don't think the implications are symmetric. I buy storage to handle space (expected to be) used, not space available. But when I find myself caring about the "slop" in space available, the fact that I care about that is already very bad news. Time to head for Fry's Electronics! As I wrote before, I don't think the same argument applies to scientific computing.

Random832 wrote:
I don't think a space should be automatic. The typographical recommendation is to put a thin non-breaking space between the value and the unit, but this is not possible with a monospaced font, so some people might decide that it's better without a space, or they might want to use a character other than 0x20. Better to let the user put the space in the format string if wanted.
I'm inclined to think it should be the number of significant digits, not decimal places, to give a more consistent precision as the magnitude of the number changes. For example, if you're displaying some resistor values that are accurate to 2 digits, you would want to see 2.7k, 27k, 270k, but not 27.0k or 270.0k as those would suggest spurious precision. This would also help with fitting the value into a fixed width, since you would know that a precision of n would use at most n+1 characters for the numeric part. -- Greg

On Thu, Sep 1, 2016, at 02:17, Greg Ewing wrote:
If the space needs to be between the number and the unit there's no good way to do this. I think this is an argument for a separate function that returns a tuple of (formatted number, prefix). Incidentally, do we have a good primitive to return (string of digits, exponent or position of decimal point) a la C's ecvt/fcvt? This would be something that might be useful in allowing users to build their own formatting code. It can be worked around though ("guess" the exponent, scale with multiplication or division, round to an integer to get the string of digits) so I guess it's not that important.
What I was getting at is that there are two different use cases possible here.
Exactly n+1, surely? And on the other hand a fixed number of decimal places allows easy alignment by right-justifying the text within a field (and will use at most n+4 characters for the numeric part).

On Aug 31 2016, Guido van Rossum <guido-+ZN9ApsXKcEdnm+yROfE0A@public.gmane.org> wrote:
There's also the important nitpick if 32e7 is best rendered as 320 M or 0.32 G. There's valid applications for both. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

Nikolaus Rath wrote:
There's also the important nitpick if 32e7 is best rendered as 320 M or 0.32 G. There's valid applications for both.
If you want 0.32 G it's probably because you're showing it alongside other values >= 1 G, so you're really getting into the business of letting the user choose the prefix. The default should be 320 M, I think. (Unless it's a capacitor value, where there's a long-standing convention in some circles to use uF or pF, but never nF. :-) -- Greg

All, Armed with all of your requirements, suggestions and good ideas, I believe I am ready to try to put something together. Thank you all, and once again let me apologize for 'all the drama'. I'll let you know when I have something. -Ken

On Tue, Aug 30, 2016 at 09:08:01PM -0700, Ken Kundert wrote:
What's the mnemonic here? Why "r" for scale factor?
My thinking was that r stands for real like f stands for float.
Hmmm. Do you know many mathematicians who use SI prefixes when talking about real numbers? I don't think "real number" is relevant to SI prefixes.
With the base 2 scale factors, b stands for binary.
Well, obviously :-)
The point is not to have to repeat yourself. If I have to scale numbers in lots of places, I don't want to have to re-write the same code in each of them. I want to call a function. Understand that I'm not against auto-scaling. I think it is a good idea. But I strongly disagree that it is the *only* way to do this. If there's code in the std lib to format numbers to some scale, I should be able to loop through a bunch of numbers and format them all in a consistent unit if I so choose, without having to do my own formatting. Its not that I don't want you to be able to auto-scale. I just want the choice of being able to use a consistent scale or not. [...]
*shrug* Well, you could do exactly the same thing. You only need a short function that determines the scale you want, and then scale it yourself. The point of making this a standard function is so that we don't have to keep re-writing the same code.
No, I'm talking about chosing between "M" or "mega". The actual unit itself is up to the caller to supply. You have definitely prodded my interest in the output side of this. I'm rather busy at the moment, but in the coming weeks I think I'll brush the cobwebs off byteformat and see what can be done. https://pypi.python.org/pypi/byteformat in case you want to have a play with it. -- Steve

Steven D'Aprano wrote:
On Tue, Aug 30, 2016 at 09:08:01PM -0700, Ken Kundert wrote:
My thinking was that r stands for real like f stands for float.
The next available letter in the e, f, g sequence would be 'h'. If you want it to stand for something, it could be "human-readable" or "human-oriented". (There's a precedent for this in the "df" unix utility which has a -H option producing SI prefixes.)
I'm talking about chosing between "M" or "mega". The actual unit itself is up to the caller to supply.
Maybe 'h' for abbreviations and 'H' for full prefixes? -- Greg

On 30 August 2016 at 21:34, Ken Kundert <python-ideas@shalmirane.com> wrote:
Thanks for the summary (which I mostly elided) which I think was fair. Regarding (3), the only one that remains proposed, I think it would be useful to see a 3rd-party library implementation of the formatting operation proposed. This would allow any corner cases or controversial points to be ironed out before proposing it for direct incorporation in the string formatting mini-language. Furthermore, in Python 2.6, it will be possible to write f"The value is {si_format(the_val)}" directly, using PEP 498 f-strings. The combination of a 3rd party function and f-strings may even make special formatting support unnecessary - but that will be easier to establish with practical experience. And there's little or no downside - the proposed feature won't be possible before 3.7, so we may as well use lifetime of the 3.6 release to gain that experience. Paul

Given that something like this gets proposed from time to time, I wonder if it would make sense to actually write up (1) and (2) as a PEP that is immediately marked rejected. The PEP should make it clear *why* it is rejected. This would be a handy reference doc to have around the next time the idea comes up. -- --Guido van Rossum (python.org/~guido)

Thanks a lot for this comprehensive summary. :) Find my comments below. On 30.08.2016 22:34, Ken Kundert wrote:
I think this results from the possibility of omitting the SI units.
Same can be said for variable annotations for which a PEP is in the works.
I get the feeling that SI syntax should only work when the hook is provided. So this could be the dealbreaker here: only enabling it when the hook is provided, changes the syntax/semantics of valid Python code depending on the presence of some hidden hooks. Enabling the syntax regardless of a working hook, have those sideeffects like described by you above. So, no matter how done, it always has some negative connotation.
d. Many people objected to allowing the use of naked scale factors as a perversion of the standard.
Remove this and it also solves 1.a.
I like your conclusion. It seems there is missing some technical note of why this won't happen the way you proposed it (maybe the hook + missing stdlib package for SI units). :) Aren't there some package already available for recommendation 3? Sven

On Tue, Aug 30, 2016 at 01:34:27PM -0700, Ken Kundert wrote:
3. A change to the various string formatting mechanisms to allow outputting real numbers with SI scale factors:
This is somewhat similar to a library I wrote for formatting bytes: https://pypi.python.org/pypi/byteformat Given that feature freeze for 3.6 is two weeks way, I don't think that this proposal will appear before 3.7. So I'm interested, but I'm less interested *right now*. So for now I'll limit myself to only a few observations.
>>> print('Speed of light in a vacuum: {:r}m/s.'.format(2.9979e+08)) Speed of light in a vacuum: 299.79 Mm/s.
Do you think that {:r} might be confused with {!r}? What's the mnemonic here? Why "r" for scale factor?
>>> print('Speed of sound in water: %rm/s.' % 1481 Speed of sound in water: 1.481 km/s.
I doubt that you'll get any new % string formatting codes. That's a legacy interface, *not* deprecated but unlikely to get new features added, and it is intended to closely match the C printf codes. A few more questions: (1) Why no support for choosing a particular scale? If this only auto-scales, I'm not interested. (2) Support for full prefix names, so we can format (say) "kilograms" as well as "kg"? (3) Scientific notation and engineering notation? (4) 1e5 versus 1×10^5 notation? (5) Is this really something that format() needs to understand? We can get a *much* richer and more powerful interface by turning it into a generalise numeric pretty-printing library, at the cost of a little less convenience.
3. Allowing numbers to be formatted with SI prefixes is useful and not controversial.
I wouldn't quite go that far. You made an extremely controversial request (new syntax for scaling prefixes + ignored units) and nearly all the attention was on that. For what its worth, I have no need for a format code which *only* auto-selects the scaling factor. If I don't have at least the option to choose which scaling factor I get, and hence the prefix, this is of little or no use to me, I likely wouldn't use it, and as far as I am concerned the nuisance value of having yet another format string code to learn outweighs the benefit. -- Steve

On Wed, Aug 31, 2016 at 12:05 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Or just have a subclass of int or float that defines __format__, and can do whatever it likes - including specifying the scale, if you so choose. Say, something like: {:s} -- autoscale, prefix {:S} -- autoscale, full word {:sM} -- scale to mega, print "M" {:SM} -- scale to mega, print "Mega" etc ChrisA

What's the mnemonic here? Why "r" for scale factor?
My thinking was that r stands for real like f stands for float. With the base 2 scale factors, b stands for binary.
(1) Why no support for choosing a particular scale? If this only auto-scales, I'm not interested.
Auto-scaling is kind of the point. There is really little need for a special mechanism if your going to specify the scale factor yourself. >>> print('Attenuation = {:.1f} dB at {:r}m.'.format(-13.7, 50e3)) Attenuation = -13.7 dB at 50 km. If you wanted to force the second number to be in km, you use a %f format and scale the argument: >>> print('Attenuation = {:.1f} dB at {:.1f} km.'.format(-13.7, 50e3/1e3)) Attenuation = -13.7 dB at 50 km.
(2) Support for full prefix names, so we can format (say) "kilograms" as well as "kg"?
This assumes that somehow this code can access the units so that it can switch between long form 'grams' and short form 'g'. That is a huge expansion in the complexity for what seems like a small benefit.
(3) Scientific notation and engineering notation?
(4) 1e5 versus 1×10^5 notation?
Ah, okay. But all of these require auto-scaling. And I was still thinking that we need to provide input and output capability (ie, we still need be able to convert whatever format we output back from strings into floats). Are you thinking that we should parse 1×10^5? And why 1×10^5 and not 1×10⁵?
This is suddenly a much bigger project than what I was envisioning. -Ken

On 31 August 2016 at 05:08, Ken Kundert <python-ideas@shalmirane.com> wrote:
This argument can just as easily be used against your proposal: If you want auto-scaling you use a %s format and a suitable library function: >>> print('Attenuation = {:.1f} dB at {}m.'.format(-13.7, scale(50e3))) Attenuation = -13.7 dB at 50 km. Anything that's going to be included in the language has to consider other requirements than just your own.
This is suddenly a much bigger project than what I was envisioning.
You're going to have to write the scaling code one way or the other. Writing it in Python and publishing it as a library is *far* easier than writing it in C and hooking it into the format mechanism. You can leave others to offer pull requests to your library to add extra types of formatting. IMO, it's probably time to write some code. Publish a library on PyPI (call it a "prototype" if you like) implementing the scale() function above, publicise it here and elsewhere, and see what reception it gets. Paul

On Wed, Aug 31, 2016 at 2:08 PM, Ken Kundert <python-ideas@shalmirane.com> wrote:
"Real" has historically often been a synonym for "float", and it doesn't really say that it'll be shown in engineering notation. But then, we currently have format codes 'e', 'f', and 'g', and I don't think there's much logic there beyond "exponential", "floating-point", and... "general format"? I think that's a back-formation, frankly, and 'g' was used simply because it comes nicely after 'e' and 'f'. (C's decision, not Python's, fwiw.) I'll stick with 'r' for now, but it could just as easily become 'h' to avoid confusion with %r for repr.
AIUI, it's just giving the full word. class ScaledNumber(float): invert = {"μ": 1e6, "m": 1e3, "": 1, "k": 1e-3, "M": 1e-6} words = {"μ": "micro", "m": "milli", "": "", "k": "kilo", "M": "mega"} aliases = {"u": "μ"} def autoscale(self): if self < 1e-6: return None if self < 1e-3: return "μ" if self < 1: return "m" if self < 1e3: return "" if self < 1e6: return "k" if self < 1e9: return "M" return None def __format__(self, fmt): if fmt == "r" or fmt == "R": scale = self.autoscale() fmt = fmt + scale if scale else "f" if fmt.startswith("r"): scale = self.aliases.get(fmt[1], fmt[1]) return "%g%s" % (self * self.invert[scale], scale) if fmt.startswith("R"): scale = self.aliases.get(fmt[1], fmt[1]) return "%g %s" % (self * self.invert[scale], self.words[scale]) return super().__format__(self, fmt)
It's a minor flexibility, but could be very useful. As you see, it's still not at all unit-aware; but grammatically, these formats only make sense if followed by an actual unit name. (And not an SI base unit, necessarily - you have to use "gram", not "kilogram", lest you get silly constructs like "microkilogram" for milligram.) Note that this *already works*. You do have to use an explicit class for your scaled numbers, since Python doesn't want you monkey-patching the built-in float type, but if you were to request that float.__format__ grow support for this, it'd be a relatively non-intrusive change. This class could live on PyPI until one day becoming subsumed into core, or just be a permanent third-party float formatting feature. ChrisA

On 31 August 2016 at 17:07, Chris Angelico <rosuav@gmail.com> wrote:
"h" would be a decent choice - it's not only a continuation of the e/f/g pattern, it's also very commonly used as a command line flag for "human-readable output" in system utilities that print numbers. The existing "alternate form" marker in string formatting could be used to request the use of the base 2 scaling prefixes rather than the base 10 ones: "#h". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, Aug 31, 2016 at 5:21 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
I like it. So after all the drama we're just talking about adding an 'h' format code that's like 'g' but uses SI scale factors instead of exponents. I guess we need to debate what it should do if the value is way out of range of the SI scale system -- what's it going to do when I pass it 1e50? I propose that it should fall back to 'g' style then, but use "engineering" style where exponents are always a multiple of 3.)
Not sure about this one. -- --Guido van Rossum (python.org/~guido)

On 08/31/2016 01:07 PM, MRAB wrote:
Would you also want h to work with integers?
'#' already has a meaning for float's 'g' format:
So I think you'd want to pick another type character to mean base 2 scaling, or another character other than #. But it gets cryptic pretty quickly. You could indeed use type == 'b' for floats to mean base 2 scaling, since it has no current meaning, but I'm not sure that's a great idea because 'b' means binary for integers, and if you want to also be able to scale ints (see above), then there's a conflict. Maybe type == 'z'? Or, use something like '@' (or whatever) instead of '#' to mean "the other alternate form", base 2 scaling.
Does the 'type' have to be a single character?
As a practical matter, yes, it should just be a single character. You could make a special case for 'h' and 'hb', but I would not recommend that. Explaining it in the documentation would be confusing. Eric.

On Wed, Aug 31, 2016, at 12:19, Guido van Rossum wrote:
One thing to consider is that this is very likely to be used with a unit (e.g. "%hA" intending to display in amperes), so maybe it should put a space after it? Though really people are probably going to want "1 A" vs "1 kA" in that case, rather than "1 A" vs "1kA". Also, maybe consider that "1*10^50" [or, slightly less so, 1.0*10**50] is more human-readable than "1e+50". Er, with engineering style it'd be 100e+48 etc, but same basic issue. Also, is it really necessary to use single-character codes not shared with any other language? The only rationale here seems to be a desire to support everything in % and its limited grammar rather than requiring anyone to use format. If this feature is only supported in format a more verbose description of the desired format could be used. What if, for example, you want engineering style without SI scale factors? What should the "precision" field mean? %f takes a number of places after the decimal point whereas %e/%g takes a number of significant digits. Engineering or SI-scale-factor format suggests a third possibility: number of decimal places to be shown after the displayed decimal point, e.g. "%.1h" % 1.2345 * 10 ** x for x in range(10): "1.2", "12.3", "123.5", "1.2k", "12.3k", "123.5k", "1.2M", "12.3M", "123.5M". And the actual -h behavior of those system utilities you mentioned is "123k", "1.2M", "12M", with the effect being that the value always fits within a four-character field width, but this isn't a fixed number of decimal places *or* significant digits.
If base 2 scaling prefixes are used, should "engineering style" mean 2**[multiple of 10] instead of 10**[multiple of 3]?
Not sure about this one.

Random832 writes:
Also, interesting quirk - it always rounds up. 1025 bytes is "1.1K", and in SI mode, 1001 bytes is "1.1k"
That seems to be right approach: in system administration, these numbers are used mostly to understand resource usage, and underestimates are almost never what you want, while quite large overestimates are tolerable, and are typically limited because the actual precision of calculations is much higher than that of the "human-readable" output. I don't know if that would be true in general-purpose programming. I suspect not.

Guido van Rossum writes:
On Wed, Aug 31, 2016 at 8:57 PM, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
True, but I don't think the implications are symmetric. I buy storage to handle space (expected to be) used, not space available. But when I find myself caring about the "slop" in space available, the fact that I care about that is already very bad news. Time to head for Fry's Electronics! As I wrote before, I don't think the same argument applies to scientific computing.

Random832 wrote:
I don't think a space should be automatic. The typographical recommendation is to put a thin non-breaking space between the value and the unit, but this is not possible with a monospaced font, so some people might decide that it's better without a space, or they might want to use a character other than 0x20. Better to let the user put the space in the format string if wanted.
I'm inclined to think it should be the number of significant digits, not decimal places, to give a more consistent precision as the magnitude of the number changes. For example, if you're displaying some resistor values that are accurate to 2 digits, you would want to see 2.7k, 27k, 270k, but not 27.0k or 270.0k as those would suggest spurious precision. This would also help with fitting the value into a fixed width, since you would know that a precision of n would use at most n+1 characters for the numeric part. -- Greg

On Thu, Sep 1, 2016, at 02:17, Greg Ewing wrote:
If the space needs to be between the number and the unit there's no good way to do this. I think this is an argument for a separate function that returns a tuple of (formatted number, prefix). Incidentally, do we have a good primitive to return (string of digits, exponent or position of decimal point) a la C's ecvt/fcvt? This would be something that might be useful in allowing users to build their own formatting code. It can be worked around though ("guess" the exponent, scale with multiplication or division, round to an integer to get the string of digits) so I guess it's not that important.
What I was getting at is that there are two different use cases possible here.
Exactly n+1, surely? And on the other hand a fixed number of decimal places allows easy alignment by right-justifying the text within a field (and will use at most n+4 characters for the numeric part).

On Aug 31 2016, Guido van Rossum <guido-+ZN9ApsXKcEdnm+yROfE0A@public.gmane.org> wrote:
There's also the important nitpick if 32e7 is best rendered as 320 M or 0.32 G. There's valid applications for both. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

Nikolaus Rath wrote:
There's also the important nitpick if 32e7 is best rendered as 320 M or 0.32 G. There's valid applications for both.
If you want 0.32 G it's probably because you're showing it alongside other values >= 1 G, so you're really getting into the business of letting the user choose the prefix. The default should be 320 M, I think. (Unless it's a capacitor value, where there's a long-standing convention in some circles to use uF or pF, but never nF. :-) -- Greg

All, Armed with all of your requirements, suggestions and good ideas, I believe I am ready to try to put something together. Thank you all, and once again let me apologize for 'all the drama'. I'll let you know when I have something. -Ken

On Tue, Aug 30, 2016 at 09:08:01PM -0700, Ken Kundert wrote:
What's the mnemonic here? Why "r" for scale factor?
My thinking was that r stands for real like f stands for float.
Hmmm. Do you know many mathematicians who use SI prefixes when talking about real numbers? I don't think "real number" is relevant to SI prefixes.
With the base 2 scale factors, b stands for binary.
Well, obviously :-)
The point is not to have to repeat yourself. If I have to scale numbers in lots of places, I don't want to have to re-write the same code in each of them. I want to call a function. Understand that I'm not against auto-scaling. I think it is a good idea. But I strongly disagree that it is the *only* way to do this. If there's code in the std lib to format numbers to some scale, I should be able to loop through a bunch of numbers and format them all in a consistent unit if I so choose, without having to do my own formatting. Its not that I don't want you to be able to auto-scale. I just want the choice of being able to use a consistent scale or not. [...]
*shrug* Well, you could do exactly the same thing. You only need a short function that determines the scale you want, and then scale it yourself. The point of making this a standard function is so that we don't have to keep re-writing the same code.
No, I'm talking about chosing between "M" or "mega". The actual unit itself is up to the caller to supply. You have definitely prodded my interest in the output side of this. I'm rather busy at the moment, but in the coming weeks I think I'll brush the cobwebs off byteformat and see what can be done. https://pypi.python.org/pypi/byteformat in case you want to have a play with it. -- Steve

Steven D'Aprano wrote:
On Tue, Aug 30, 2016 at 09:08:01PM -0700, Ken Kundert wrote:
My thinking was that r stands for real like f stands for float.
The next available letter in the e, f, g sequence would be 'h'. If you want it to stand for something, it could be "human-readable" or "human-oriented". (There's a precedent for this in the "df" unix utility which has a -H option producing SI prefixes.)
I'm talking about chosing between "M" or "mega". The actual unit itself is up to the caller to supply.
Maybe 'h' for abbreviations and 'H' for full prefixes? -- Greg
participants (14)
-
Barry Warsaw
-
Chris Angelico
-
Eric V. Smith
-
Greg Ewing
-
Guido van Rossum
-
Ken Kundert
-
MRAB
-
Nick Coghlan
-
Nikolaus Rath
-
Paul Moore
-
Random832
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Sven R. Kunze