Mailman 3 SI scale factors alone, without units or dimensional analysis - Python-ideas

Aug. 26, 2016

      Ken has made what I consider a very reasonable suggestion, to introduce 
SI prefixes to Python syntax for numbers. For example, typing 1K will be 
equivalent to 1000.

However, there are some complexities that have been glossed over.

(1) Are the results floats, ints, or something else?

I would expect that 1K would be int 1000, not float 1000. But what about 
fractional prefixes, like 1m? Should that be a float or a decimal?

If I write 7981m I would expect 7.981, not 7.9809999999999999, so maybe 
I want a decimal float, not a binary float?

Actually, what I would really want is for the scale factor to be tracked 
separately. If I write 7981m * 1M, I should end up with 7981000 as an 
int, not a float. Am I being unreasonable?

Obviously if I write 1.1K then I'm expecting a float. So I'm not 
*entirely* unreasonable :-)

(2) Decimal or binary scale factors?

The SI units are all decimal, and I think if we support these, we should 
insist that K == 1000, not 1024. For binary scale factors, there is the 
IEC standard:

http://physics.nist.gov/cuu/Units/binary.html

which defines Ki = 2**10, Mi = 2**20, etc. (Fortunately this doesn't 
have to deal with fractional prefixes.) So it would be easy enough to 
support them as well.

(3) µ or u, k or K?

I'm going to go to the barricades to fight for the real SI prefixes µ 
and k to be supported. If people want to support the common fakes u and 
K as well, that's fine, I have no objection, but I think that its 
important to support the actual prefixes too.

(Python 3 assumes UTF-8 as the default encoding, so it shouldn't cause 
any technical difficulties to support µ as syntax. The political 
difficulties though...)

(4) What about E?

E is tricky if we want 1E to be read as the integer 10**18, because it 
matches the floating point syntax 1E (which is currently a syntax 
error). So there's a nasty bit of ambiguity where it may be unclear 
whether or not 1E is intended as an int or an incomplete float, and then 
there's 1E1E which might be read as 1E1*10**18 or as just an error.

Replacing E with (say) X is risky. The two largest current SI prefixes 
are Z and Y, it seems very likely that the next one added (if that ever 
happens) will be X. Actually, using any other letter risks clashing with 
a future expansion of the SI prefixes.

(5) What about other numeric types?

Just because there's no syntactic support for Fraction and Decimal 
shouldn't mean we can't use these scale factors with them.

(6) What happens to int(), float() etc?

I wouldn't want int("23K") to suddenly change from being an error to 
returning 23000. Presumably we would want int to take an optional 
argument to allow the interpretation of scale factors.

This gives us an advantage: int("23E", scale=True) is unambiguously an 
int, and we can ignore the fact that it looks like a float.

(7) What about repr() and str()?

I don't think that the repr() or str() of numeric types should change. 
But perhaps format() could grow some new codes to display numbers using 
either the most obvious scale factor, or some specific scale factor.

* * * 

This leads to my first proposal: require an explicit numeric prefix on 
numbers before scale factors are allowed, similar to how we treat 
non-decimal bases.

8M  # remains a syntax error
0s8M  # unambiguously an int with a scale factor of M = 10**6

0s1E1E  # a float 1E1 with a scale factor of E = 10**18
0s1.E  # a float 1. with a scale factor of E, not an exponent

int('8M')  # remains a ValueError
int('0s8M', base=0)  # returns 8*10**6

Or if that's too heavy (two whole characters, plus the suffix!) perhaps 
we could have a rule that the suffix must follow the final underscore 
of the number:

8_M  # int 8*10*6
123_456_789_M  # int 123456789*10**6
123_M_456  # still an error
8._M  # float 8.0*10**6

int() and float() take a keyword only argument to allow a scale factor 
when converting from strings:

int("8_M")  # remains an error
int("8_M", scale=True)  # allowed

This solves the problem with E and floats. Its only a scale factor if it 
immediately follows the final underscore in the float, otherwise it is 
the regular exponent sign.

Proposal number two: don't make any changes to the syntax, but treat 
these as *literally* numeric scale factors. Add a simple module to the 
std lib defining the various factors:

k = kilo = 10**3
M = mega = 10**6
G = giga = 10**9

etc. and then allow the user to literally treat them as scale factors by 
multiplying:

from scaling import *
int_value = 8*M 
float_value = 8.0*M
fraction_value = Fraction(1, 8)*M
decimal_value = Decimal("1.2345")*M

and so forth. The biggest advantage of this is that there is no 
syntactic changes needed, it is completely backwards compatible, it 
works with any numeric type and even non-numbers:

py> x = [None]*M
py> len(x)
1000000

You can even scale by multiple factors:

x = 8*M*K

Disadvantages: none I can think of.

(Some cleverness may be needed to have fractional scale values work with 
both floats and Decimals, but that shouldn't be hard.)

-- 
Steve

SI scale factors alone, without units or dimensional analysis

tags

participants (14)