[Python-ideas] SI scale factors in Python

Thu Aug 25 06:16:22 EDT 2016

On Thu, Aug 25, 2016 at 6:19 PM, Ken Kundert
<python-ideas at shalmirane.com> wrote:
>> So you can have 1000mm or 0.001km but not 1m?
>
> If the scale factor is optional, then numbers like 1m are problematic because
> the m can represent either milli or meter. This is resolved by requiring the
> scale factor and defining a unity scale factor. I propose '_'. So 1m represents
> milli and 1_m represents 1 meter.

This could also be hashed out using a constructor-only API. You'll
probably want to avoid '_', as it's just been added as a comma
separator for numeric literals.

>> If units are retained, what you have is no longer a simple number, but
>> a value with a unit, and is a quite different beast. (For instance,
>> addition would have to cope with unit mismatches (probably by throwing
>> an error), and multiplication would have to combine the units (length
>> * length = area).) That would be a huge new feature.
>
> Indeed. I am not proposing that anything be done with the units other than
> possibly retain them for later output. Doing dimensional analysis on expressions
> would be a huge burden both for those implementing the language and for those
> using them in a program.  Just allowing the units to be present, even it not
> retained, is a big advantage because it can bring a great deal of clarity to the
> meaning of the number. For example, even if the language does not flag an error
> when a user writes:
>
>     vdiff = 1mV - 30uA
>
> the person that wrote the line will generally see it as a problem and fix it.

How often do you do arithmetic on literals like that? More likely,
what you'd do is tag your variable names, so it'll be something like:

input_volts = 1m#V
inefficiency = 30u#A
vdiff = input_volts - inefficiency

> In my experience, providing units is the most efficient form of documentation
> available in numerical programming in the sense that one or two additional
> characters can often clarify otherwise very confusing code.
>
> My feeling is that retaining the units on real literals is of little value if
> you don't also extend the real variable type to hold units, or to create another
> variable type that would carry the units. Extending reals does not seem like
> a good idea, but creating a new type, quantity, seems practical. In this case,
> the units would be rather ephemeral in that they would not survive any
> operation.  Thus, the result of an operation between a quantity and either
> a integer, real or quantity would always be a real, meaning that the units are
> lost.  In this way, units are very light-weight and only really serve as
> documentation (for both programmers and end users).
>
> But this idea of retaining the units is the least important aspect of this
> proposal. The important aspects are:
> 1. It allows numbers to be entered in a clean form that is easy to type and easy
>    to interpret
> 2. It allows numbers to be output in a clean form that is easy to interpret.
> 3. In many cases it allows units to be inserted into the code in a very natural
>    and clean way to improve the clarity of the code.

The decimal.Decimal and fractions.Fractions types have no syntactic
support. I would suggest imitating their styles initially, sorting out
all the details of which characters mean what, and appealing for
syntax once it's all settled - otherwise, it's too likely that
something will end up being baked into the language half-baked, if
that makes any sense.

>> Question, though: What happens with exa-? Currently, if the parser
>> sees "1E", it'll expect to see another number, eg 1E+1 == 10.0. Will
>> this double meaning cause confusion?
>
> Oh, I did not see this. Both SPICE and Verilog limit the scale factors to the
> common ones (T, G, M, k, _, m, u, n, p, f, a). I work in electrical engineering,
> and in that domain exa never comes up. My suggestion would be to limit ourselves
> to the common scale factors as most people know them. Using P, E, Z, Y, z, and
> y often actually works against us as most people are not familiar with them and
> so cannot interpret them easily.

That seems pretty reasonable. At very least, it'd be something that
can be extended later. Even femto and atto are rare enough that they
could be dropped if necessary (pico too, perhaps).

Easy scaling seems general enough to include in the language. Tagging
numbers with units, though, feels like the domain of a third-party
library. Maybe I'm wrong.

ChrisA