On Fri, Aug 26, 2016 at 10:47 PM, Steven D'Aprano
(1) Are the results floats, ints, or something else?
I would expect that 1K would be int 1000, not float 1000. But what about fractional prefixes, like 1m? Should that be a float or a decimal?
If I write 7981m I would expect 7.981, not 7.9809999999999999, so maybe I want a decimal float, not a binary float?
Introduce "d" as a prefix meaning 1, and this could be the way of creating something that people have periodically asked for: Decimal literals. (Though IIRC there were some complexities involving Decimal literals and decimal.getcontext(), which would have to be resolved before 1m could represent a Decimal.)
Actually, what I would really want is for the scale factor to be tracked separately. If I write 7981m * 1M, I should end up with 7981000 as an int, not a float. Am I being unreasonable?
Easy. Make them Fraction literals instead. You'll end up with 7981000/1 as a rational, rather than a pure int, but if you want accurate handling of SI prefixes, rationals will serve you fairly well.
Obviously if I write 1.1K then I'm expecting a float. So I'm not *entirely* unreasonable :-)
Obviously :)
(2) Decimal or binary scale factors?
The SI units are all decimal, and I think if we support these, we should insist that K == 1000, not 1024. For binary scale factors, there is the IEC standard:
http://physics.nist.gov/cuu/Units/binary.html
which defines Ki = 2**10, Mi = 2**20, etc. (Fortunately this doesn't have to deal with fractional prefixes.) So it would be easy enough to support them as well.
from __future__ import binary_scale_factors as scale_factors from __future__ import decimal_scale_factors as scale_factors # tongue only partly in cheek
(3) µ or u, k or K?
I'm going to go to the barricades to fight for the real SI prefixes µ and k to be supported. If people want to support the common fakes u and K as well, that's fine, I have no objection, but I think that its important to support the actual prefixes too.
I would strongly support the use of µ and weakly u. With k vs K, no opinion. If both can be supported without being confusing, grab 'em both. With output formats, it's less clear, but I would still be inclined toward µ for output.
(4) What about E?
E is tricky if we want 1E to be read as the integer 10**18, because it matches the floating point syntax 1E (which is currently a syntax error). So there's a nasty bit of ambiguity where it may be unclear whether or not 1E is intended as an int or an incomplete float, and then there's 1E1E which might be read as 1E1*10**18 or as just an error.
It's worse than that. Currently, 1E+2 is a perfectly legal 100.0 (float), but under this proposal, it would be a constant expression yielding 1_000_000_000_000_000_002, so it wouldn't just be giving meaning to things that are currently errors.
Replacing E with (say) X is risky. The two largest current SI prefixes are Z and Y, it seems very likely that the next one added (if that ever happens) will be X. Actually, using any other letter risks clashing with a future expansion of the SI prefixes.
Anything's risky. Probably the least risky option is to simply stop before Exa and implement the feature without.
(7) What about repr() and str()?
I don't think that the repr() or str() of numeric types should change. But perhaps format() could grow some new codes to display numbers using either the most obvious scale factor, or some specific scale factor.
Agreed. And I'd have them simply pick the one most obvious - if you want a specific factor, you can simply invert and display.
This leads to my first proposal: require an explicit numeric prefix on numbers before scale factors are allowed, similar to how we treat non-decimal bases.
8M # remains a syntax error 0s8M # unambiguously an int with a scale factor of M = 10**6
0s1E1E # a float 1E1 with a scale factor of E = 10**18 0s1.E # a float 1. with a scale factor of E, not an exponent
int('8M') # remains a ValueError int('0s8M', base=0) # returns 8*10**6
Hmm, interesting. Feels clunky but could work.
Or if that's too heavy (two whole characters, plus the suffix!) perhaps we could have a rule that the suffix must follow the final underscore of the number:
8_M # int 8*10*6 123_456_789_M # int 123456789*10**6 123_M_456 # still an error 8._M # float 8.0*10**6
This sounds better IMO. It's not legal syntax in any version of Python older than 3.6, so there's minimal backward compatibility trouble. ChrisA