[Python-ideas] SI scale factors alone, without units or dimensional analysis

Chris Angelico rosuav at gmail.com
Fri Aug 26 09:34:18 EDT 2016


On Fri, Aug 26, 2016 at 10:47 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> (1) Are the results floats, ints, or something else?
>
> I would expect that 1K would be int 1000, not float 1000. But what about
> fractional prefixes, like 1m? Should that be a float or a decimal?
>
> If I write 7981m I would expect 7.981, not 7.9809999999999999, so maybe
> I want a decimal float, not a binary float?

Introduce "d" as a prefix meaning 1, and this could be the way of
creating something that people have periodically asked for: Decimal
literals.

(Though IIRC there were some complexities involving Decimal literals
and decimal.getcontext(), which would have to be resolved before 1m
could represent a Decimal.)

> Actually, what I would really want is for the scale factor to be tracked
> separately. If I write 7981m * 1M, I should end up with 7981000 as an
> int, not a float. Am I being unreasonable?

Easy. Make them Fraction literals instead. You'll end up with
7981000/1 as a rational, rather than a pure int, but if you want
accurate handling of SI prefixes, rationals will serve you fairly
well.

> Obviously if I write 1.1K then I'm expecting a float. So I'm not
> *entirely* unreasonable :-)

Obviously :)

> (2) Decimal or binary scale factors?
>
> The SI units are all decimal, and I think if we support these, we should
> insist that K == 1000, not 1024. For binary scale factors, there is the
> IEC standard:
>
> http://physics.nist.gov/cuu/Units/binary.html
>
> which defines Ki = 2**10, Mi = 2**20, etc. (Fortunately this doesn't
> have to deal with fractional prefixes.) So it would be easy enough to
> support them as well.

from __future__ import binary_scale_factors as scale_factors
from __future__ import decimal_scale_factors as scale_factors
# tongue only partly in cheek

> (3) µ or u, k or K?
>
> I'm going to go to the barricades to fight for the real SI prefixes µ
> and k to be supported. If people want to support the common fakes u and
> K as well, that's fine, I have no objection, but I think that its
> important to support the actual prefixes too.

I would strongly support the use of µ and weakly u. With k vs K, no
opinion. If both can be supported without being confusing, grab 'em
both. With output formats, it's less clear, but I would still be
inclined toward µ for output.

> (4) What about E?
>
> E is tricky if we want 1E to be read as the integer 10**18, because it
> matches the floating point syntax 1E (which is currently a syntax
> error). So there's a nasty bit of ambiguity where it may be unclear
> whether or not 1E is intended as an int or an incomplete float, and then
> there's 1E1E which might be read as 1E1*10**18 or as just an error.

It's worse than that. Currently, 1E+2 is a perfectly legal 100.0
(float), but under this proposal, it would be a constant expression
yielding 1_000_000_000_000_000_002, so it wouldn't just be giving
meaning to things that are currently errors.

> Replacing E with (say) X is risky. The two largest current SI prefixes
> are Z and Y, it seems very likely that the next one added (if that ever
> happens) will be X. Actually, using any other letter risks clashing with
> a future expansion of the SI prefixes.

Anything's risky. Probably the least risky option is to simply stop
before Exa and implement the feature without.

> (7) What about repr() and str()?
>
> I don't think that the repr() or str() of numeric types should change.
> But perhaps format() could grow some new codes to display numbers using
> either the most obvious scale factor, or some specific scale factor.

Agreed. And I'd have them simply pick the one most obvious - if you
want a specific factor, you can simply invert and display.

> This leads to my first proposal: require an explicit numeric prefix on
> numbers before scale factors are allowed, similar to how we treat
> non-decimal bases.
>
> 8M  # remains a syntax error
> 0s8M  # unambiguously an int with a scale factor of M = 10**6
>
> 0s1E1E  # a float 1E1 with a scale factor of E = 10**18
> 0s1.E  # a float 1. with a scale factor of E, not an exponent
>
> int('8M')  # remains a ValueError
> int('0s8M', base=0)  # returns 8*10**6

Hmm, interesting. Feels clunky but could work.

> Or if that's too heavy (two whole characters, plus the suffix!) perhaps
> we could have a rule that the suffix must follow the final underscore
> of the number:
>
> 8_M  # int 8*10*6
> 123_456_789_M  # int 123456789*10**6
> 123_M_456  # still an error
> 8._M  # float 8.0*10**6

This sounds better IMO. It's not legal syntax in any version of Python
older than 3.6, so there's minimal backward compatibility trouble.

ChrisA


More information about the Python-ideas mailing list