Re: [Python-ideas] SI scale factors in Python

So you can have 1000mm or 0.001km but not 1m?
If the scale factor is optional, then numbers like 1m are problematic because the m can represent either milli or meter. This is resolved by requiring the scale factor and defining a unity scale factor. I propose '_'. So 1m represents milli and 1_m represents 1 meter.
Indeed. I am not proposing that anything be done with the units other than possibly retain them for later output. Doing dimensional analysis on expressions would be a huge burden both for those implementing the language and for those using them in a program. Just allowing the units to be present, even it not retained, is a big advantage because it can bring a great deal of clarity to the meaning of the number. For example, even if the language does not flag an error when a user writes: vdiff = 1mV - 30uA the person that wrote the line will generally see it as a problem and fix it. In my experience, providing units is the most efficient form of documentation available in numerical programming in the sense that one or two additional characters can often clarify otherwise very confusing code. My feeling is that retaining the units on real literals is of little value if you don't also extend the real variable type to hold units, or to create another variable type that would carry the units. Extending reals does not seem like a good idea, but creating a new type, quantity, seems practical. In this case, the units would be rather ephemeral in that they would not survive any operation. Thus, the result of an operation between a quantity and either a integer, real or quantity would always be a real, meaning that the units are lost. In this way, units are very light-weight and only really serve as documentation (for both programmers and end users). But this idea of retaining the units is the least important aspect of this proposal. The important aspects are: 1. It allows numbers to be entered in a clean form that is easy to type and easy to interpret 2. It allows numbers to be output in a clean form that is easy to interpret. 3. In many cases it allows units to be inserted into the code in a very natural and clean way to improve the clarity of the code.
Oh, I did not see this. Both SPICE and Verilog limit the scale factors to the common ones (T, G, M, k, _, m, u, n, p, f, a). I work in electrical engineering, and in that domain exa never comes up. My suggestion would be to limit ourselves to the common scale factors as most people know them. Using P, E, Z, Y, z, and y often actually works against us as most people are not familiar with them and so cannot interpret them easily. -Ken

On Thu, Aug 25, 2016 at 6:19 PM, Ken Kundert <python-ideas@shalmirane.com> wrote:
This could also be hashed out using a constructor-only API. You'll probably want to avoid '_', as it's just been added as a comma separator for numeric literals.
How often do you do arithmetic on literals like that? More likely, what you'd do is tag your variable names, so it'll be something like: input_volts = 1m#V inefficiency = 30u#A vdiff = input_volts - inefficiency
The decimal.Decimal and fractions.Fractions types have no syntactic support. I would suggest imitating their styles initially, sorting out all the details of which characters mean what, and appealing for syntax once it's all settled - otherwise, it's too likely that something will end up being baked into the language half-baked, if that makes any sense.
That seems pretty reasonable. At very least, it'd be something that can be extended later. Even femto and atto are rare enough that they could be dropped if necessary (pico too, perhaps). Easy scaling seems general enough to include in the language. Tagging numbers with units, though, feels like the domain of a third-party library. Maybe I'm wrong. ChrisA

On 8/25/16, Ken Kundert <python-ideas@shalmirane.com> wrote: [...]
It reminds me: "Metric mishap caused loss of NASA's Mars Climate orbiter. It could be nice to have language support helping to avoid something similar. [...]
Are SI prefixes frozen? Could not be safer to use E_ instead of X in case of possible new future prefixes? ------ What you are proposing reminds me " [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484". Instead of adding constant type declaration. Sorry this is just really quick idea around thinking that it could be good to have parser possibility to check metric mishaps. distance1 = 1:km # or? -> distance1:length = 1:km distance2 = 1000:cm # or? -> distance2:length = 1000:cm length = distance1 + distance2 # our parser could yell :) (or compiler could translate it with warning)

On Fri, Aug 26, 2016 at 07:35:36AM +0200, Pavol Lisy wrote:
This proposal won't help to avoid this sort of disasterous misuse of units. It will make that sort of mistake *easier*, not harder, by giving the user the false sense of security. A good description of the Mars Orbiter mishape can be found here, with a link to the NASA report: http://pint.readthedocs.io/en/0.7.2/ Suppose I am programming the Mars lander. I read in some thruster data, in pound-force seconds: thrust = sm_forces(arg) # say, it returns 100 lbf·s I don't expect to see the tag "lbf·s" anywhere unless I explicitly print the value out or view it in a debugger. So the tag gives me no visual assistence in avoiding unit conversion bugs. It is worse than having no unit attached at all, because now I have the false sense of security that it is tagged with a unit. Much later on I pass that to a function that expects the thrust to be in Newton seconds: result = fire_engines(thrust) There's no dimensional analysis, so I could just as easily pass 100 kilograms per second cubed, or 100 volts. I have no protection from passing wrong units. But let's ignore that possibility, and trust that I do actually pass a thrust rather than something completely different. The original poster Ken has said that he doesn't want to do unit conversions. So I pass a measurement in pound force seconds, which is compatible with Newton seconds, and quite happily use 100 lbf·s as if it were 100 N·s. End result: a repeat of the original Mars lander debacle, when my lander crashes directly into the surface of Mars, due to a failure to convert units. This could have been avoided if I had used a real units package that applied the conversion factor 1 lbf·s = 44.5 N·s, but Kevin's suggestion won't prevent that. You can't avoid bugs caused by using the wrong units by just labelling values with a unit. You actually have to convert from the wrong units to the right units, something this proposal avoids. I think that Ken is misled by his experience in one narrow field, circuit design, where everyone uses precisely the same SI units and there are no conversions needed. This is a field where people can drop dimensions because everyone understands what you mean to say that a current equals a voltage. But in the wider world, that's disasterous. Take v = s/t (velocity equals distance over time). If I write v = s because it is implicitly understood that the time t is "one": s = 100 miles v = s Should v be understood as 100 miles per hour or 100 miles per second or 100 miles per year? That sort of ambiguity doesn't come up in circuit design, but it is common elsewhere. -- Steve

On 26 August 2016 at 13:01, Steven D'Aprano <steve@pearwood.info> wrote:
[snip]
If one writes this as from units import m, s, miles s = miles(100) v: m/s = s This could be flagged as an error by a static type checker. Let me add some clarifications here: 1. By defining __mul__ and __truediv__ on m, s, and other units one can achieve the desirable semantics 2. Arbitrary (reasonable) unit can be described by a tuple of 7 rational numbers (powers of basic SI units, m/s will be e.g. (1, -1, 0, 0, 0, 0, 0)), if one wants also non SI units, then there will be one more float number in the tuple. 3. It is impossible to write down all the possible overloads for operations on units, e.g. 1 m / 1 s should be 1 m/s, 1 m/s / 1 s should be 1 m/s**2, and so on to infinity. Only finite number of overloads can be described with PEP 484 type hints. 4. It is very easy to specify all overloads with very basic dependent types, unit will depend on the above mentioned tuple, and multiplication should be overloaded like this (I write three numbers instead of seven for simplicity): class Unit(Dependent[k,l,m]): def __mul__(self, other: Unit[ko, lo, mo]) -> Unit[k+ko, l+lo, m+mo]: ... 5. Currently neither "mainstream" python type checkers nor PEP 484 support dependent types. 6. For those who are not familiar with dependent types, this concept is very similar to generics. Generic type (e.g. List) is like a "function" that takes a concrete type (e.g. int) and "returns" another concrete type (e.g. List[int], lists of integers). Dependent types do the same, but they allowed to also receive values, not only types as "arguments". The most popular example is matrices of fixed size n by m: Mat[n, m]. The matrix multiplication then could be overloaded as class Mat(Dependent[n, m]): def __matmul__(self, other: Mat[m, k]) -> Mat[n, k]: ... 7. I like the formulation by Nick, if e.g. the library circuit_units defines sufficiently many overloads, then it will safely cover 99.9% of use cases *without* dependent types. (An operation for which type checker does not find an overload will be flagged as error, although the operation might be correct). -- Ivan

On Fri, Aug 26, 2016 at 02:49:51PM +0300, Ivan Levkivskyi wrote:
1. By defining __mul__ and __truediv__ on m, s, and other units one can achieve the desirable semantics
I'm not entirely sure about that. I'm not even close to an expert on the theory of types, so I welcome correction, but it doesn't seem reasonable to me to model units as types. Or at least not using a standard type checker. ("Define standard," I hear you ask. Um, the sort of type checker that Lingxiao Jiang and Zhendong Su had in mind when they wrote Osprey?) http://web.cs.ucdavis.edu/~su/publications/icse06-unit.pdf Okay, so you certainly can do dimensional analysis with *some* type-checkers. But should you? I think that units are orthogonal to types: I can have a float of unit "gram" and a Fraction of unit "gram", and they shouldn't necessarily be treated as the same type. Likewise I can have a float of unit "gram" and a float of unit "inch", and while they are the same type, they aren't the same dimension. So I think that you need *two* distinct checkers, one to check types, and one to check dimensions (and do unit conversions), even if they're both built on the same or similar technologies. Another issue is that a decent dimensional system should allow the user to create their own dimensions, not just their own units. The seven standard SI dimensions are a good starting point, but there are applications where you may want more, e.g. currency, bits are two common ones. And no, you (probably) don't want bits to be a dimensionless number: it makes no sense to add "7 bits" and "2 radians". Your application may want to track "number of cats" and "number of dogs" as separate dimensions, rather than treat both as dimensionless quantities. And then use the type-checker to ensure that they are both ints, not floats.
A decent unit converter/dimension analyser needs to support arbitrary dimensions, not just the seven SI dimensions. But let's not get bogged down with implementation details.
Right. This is perhaps why the authors of Osprey say that "standard type checking algorithms are not powerful enough to handle units because of their abelian group nature (e.g., being commutative, multiplicative, and associative)." Another factor: dimensions should support rational powers, not just integer powers. -- Steve

Steven, This keeps coming up, so let me address it again. First, I concede that you are correct that my proposal does not provide dimensional analysis, so any dimensional errors that exist in this new code will not be caught by Python itself, as is currently the case. However, you should concede that by bringing the units front and center in the language, they are more likely to be caught by the user themselves. Yes, it is true that my proposal only addresses units on literals and not variables, expressions, functions, etc. But my proposal addresses not only real literals in the program itself, but also real values in the input and output. Extending this to address variables, expressions, functions, etc would only make sense if the units were checked, which of course is dimensional analysis. It is my position that dimensional analysis is so difficult and burdensome that there is no way it should be in the base Python language. If available, it should be as an add on. This proposal is more about adding capabilities to be base language that happen to make dimensional analysis easier and more attractive than about providing dimensional analysis itself. Second, I concede that there is some chance that users may be lulled into a false sense of complacency and that some dimensional errors would get missed by these otherwise normally very diligent users. But I would point out that I have been intensively using and supporting languages that provide this feature for 40 years and have never seen it. Finally, lets consider the incident on Mars. The problem occurred because one software package output numbers in English units (what were they thinking?) that were then entered into another program that was expecting metric units. The only way this could have been caught in an automated fashion is if the first package output the units for its numbers, and the second package accessed and checked those units. And it is precisely this that I am trying to make easier and more likely with this extension. Of the three steps that must occur, output the units, input the units, check the units, this proposal addresses the first two. -Ken On Fri, Aug 26, 2016 at 08:01:29PM +1000, Steven D'Aprano wrote:

On Thu, Aug 25, 2016 at 6:19 PM, Ken Kundert <python-ideas@shalmirane.com> wrote:
This could also be hashed out using a constructor-only API. You'll probably want to avoid '_', as it's just been added as a comma separator for numeric literals.
How often do you do arithmetic on literals like that? More likely, what you'd do is tag your variable names, so it'll be something like: input_volts = 1m#V inefficiency = 30u#A vdiff = input_volts - inefficiency
The decimal.Decimal and fractions.Fractions types have no syntactic support. I would suggest imitating their styles initially, sorting out all the details of which characters mean what, and appealing for syntax once it's all settled - otherwise, it's too likely that something will end up being baked into the language half-baked, if that makes any sense.
That seems pretty reasonable. At very least, it'd be something that can be extended later. Even femto and atto are rare enough that they could be dropped if necessary (pico too, perhaps). Easy scaling seems general enough to include in the language. Tagging numbers with units, though, feels like the domain of a third-party library. Maybe I'm wrong. ChrisA

On 8/25/16, Ken Kundert <python-ideas@shalmirane.com> wrote: [...]
It reminds me: "Metric mishap caused loss of NASA's Mars Climate orbiter. It could be nice to have language support helping to avoid something similar. [...]
Are SI prefixes frozen? Could not be safer to use E_ instead of X in case of possible new future prefixes? ------ What you are proposing reminds me " [Python-ideas] Trial balloon: adding variable type declarations in support of PEP 484". Instead of adding constant type declaration. Sorry this is just really quick idea around thinking that it could be good to have parser possibility to check metric mishaps. distance1 = 1:km # or? -> distance1:length = 1:km distance2 = 1000:cm # or? -> distance2:length = 1000:cm length = distance1 + distance2 # our parser could yell :) (or compiler could translate it with warning)

On Fri, Aug 26, 2016 at 07:35:36AM +0200, Pavol Lisy wrote:
This proposal won't help to avoid this sort of disasterous misuse of units. It will make that sort of mistake *easier*, not harder, by giving the user the false sense of security. A good description of the Mars Orbiter mishape can be found here, with a link to the NASA report: http://pint.readthedocs.io/en/0.7.2/ Suppose I am programming the Mars lander. I read in some thruster data, in pound-force seconds: thrust = sm_forces(arg) # say, it returns 100 lbf·s I don't expect to see the tag "lbf·s" anywhere unless I explicitly print the value out or view it in a debugger. So the tag gives me no visual assistence in avoiding unit conversion bugs. It is worse than having no unit attached at all, because now I have the false sense of security that it is tagged with a unit. Much later on I pass that to a function that expects the thrust to be in Newton seconds: result = fire_engines(thrust) There's no dimensional analysis, so I could just as easily pass 100 kilograms per second cubed, or 100 volts. I have no protection from passing wrong units. But let's ignore that possibility, and trust that I do actually pass a thrust rather than something completely different. The original poster Ken has said that he doesn't want to do unit conversions. So I pass a measurement in pound force seconds, which is compatible with Newton seconds, and quite happily use 100 lbf·s as if it were 100 N·s. End result: a repeat of the original Mars lander debacle, when my lander crashes directly into the surface of Mars, due to a failure to convert units. This could have been avoided if I had used a real units package that applied the conversion factor 1 lbf·s = 44.5 N·s, but Kevin's suggestion won't prevent that. You can't avoid bugs caused by using the wrong units by just labelling values with a unit. You actually have to convert from the wrong units to the right units, something this proposal avoids. I think that Ken is misled by his experience in one narrow field, circuit design, where everyone uses precisely the same SI units and there are no conversions needed. This is a field where people can drop dimensions because everyone understands what you mean to say that a current equals a voltage. But in the wider world, that's disasterous. Take v = s/t (velocity equals distance over time). If I write v = s because it is implicitly understood that the time t is "one": s = 100 miles v = s Should v be understood as 100 miles per hour or 100 miles per second or 100 miles per year? That sort of ambiguity doesn't come up in circuit design, but it is common elsewhere. -- Steve

On 26 August 2016 at 13:01, Steven D'Aprano <steve@pearwood.info> wrote:
[snip]
If one writes this as from units import m, s, miles s = miles(100) v: m/s = s This could be flagged as an error by a static type checker. Let me add some clarifications here: 1. By defining __mul__ and __truediv__ on m, s, and other units one can achieve the desirable semantics 2. Arbitrary (reasonable) unit can be described by a tuple of 7 rational numbers (powers of basic SI units, m/s will be e.g. (1, -1, 0, 0, 0, 0, 0)), if one wants also non SI units, then there will be one more float number in the tuple. 3. It is impossible to write down all the possible overloads for operations on units, e.g. 1 m / 1 s should be 1 m/s, 1 m/s / 1 s should be 1 m/s**2, and so on to infinity. Only finite number of overloads can be described with PEP 484 type hints. 4. It is very easy to specify all overloads with very basic dependent types, unit will depend on the above mentioned tuple, and multiplication should be overloaded like this (I write three numbers instead of seven for simplicity): class Unit(Dependent[k,l,m]): def __mul__(self, other: Unit[ko, lo, mo]) -> Unit[k+ko, l+lo, m+mo]: ... 5. Currently neither "mainstream" python type checkers nor PEP 484 support dependent types. 6. For those who are not familiar with dependent types, this concept is very similar to generics. Generic type (e.g. List) is like a "function" that takes a concrete type (e.g. int) and "returns" another concrete type (e.g. List[int], lists of integers). Dependent types do the same, but they allowed to also receive values, not only types as "arguments". The most popular example is matrices of fixed size n by m: Mat[n, m]. The matrix multiplication then could be overloaded as class Mat(Dependent[n, m]): def __matmul__(self, other: Mat[m, k]) -> Mat[n, k]: ... 7. I like the formulation by Nick, if e.g. the library circuit_units defines sufficiently many overloads, then it will safely cover 99.9% of use cases *without* dependent types. (An operation for which type checker does not find an overload will be flagged as error, although the operation might be correct). -- Ivan

On Fri, Aug 26, 2016 at 02:49:51PM +0300, Ivan Levkivskyi wrote:
1. By defining __mul__ and __truediv__ on m, s, and other units one can achieve the desirable semantics
I'm not entirely sure about that. I'm not even close to an expert on the theory of types, so I welcome correction, but it doesn't seem reasonable to me to model units as types. Or at least not using a standard type checker. ("Define standard," I hear you ask. Um, the sort of type checker that Lingxiao Jiang and Zhendong Su had in mind when they wrote Osprey?) http://web.cs.ucdavis.edu/~su/publications/icse06-unit.pdf Okay, so you certainly can do dimensional analysis with *some* type-checkers. But should you? I think that units are orthogonal to types: I can have a float of unit "gram" and a Fraction of unit "gram", and they shouldn't necessarily be treated as the same type. Likewise I can have a float of unit "gram" and a float of unit "inch", and while they are the same type, they aren't the same dimension. So I think that you need *two* distinct checkers, one to check types, and one to check dimensions (and do unit conversions), even if they're both built on the same or similar technologies. Another issue is that a decent dimensional system should allow the user to create their own dimensions, not just their own units. The seven standard SI dimensions are a good starting point, but there are applications where you may want more, e.g. currency, bits are two common ones. And no, you (probably) don't want bits to be a dimensionless number: it makes no sense to add "7 bits" and "2 radians". Your application may want to track "number of cats" and "number of dogs" as separate dimensions, rather than treat both as dimensionless quantities. And then use the type-checker to ensure that they are both ints, not floats.
A decent unit converter/dimension analyser needs to support arbitrary dimensions, not just the seven SI dimensions. But let's not get bogged down with implementation details.
Right. This is perhaps why the authors of Osprey say that "standard type checking algorithms are not powerful enough to handle units because of their abelian group nature (e.g., being commutative, multiplicative, and associative)." Another factor: dimensions should support rational powers, not just integer powers. -- Steve

Steven, This keeps coming up, so let me address it again. First, I concede that you are correct that my proposal does not provide dimensional analysis, so any dimensional errors that exist in this new code will not be caught by Python itself, as is currently the case. However, you should concede that by bringing the units front and center in the language, they are more likely to be caught by the user themselves. Yes, it is true that my proposal only addresses units on literals and not variables, expressions, functions, etc. But my proposal addresses not only real literals in the program itself, but also real values in the input and output. Extending this to address variables, expressions, functions, etc would only make sense if the units were checked, which of course is dimensional analysis. It is my position that dimensional analysis is so difficult and burdensome that there is no way it should be in the base Python language. If available, it should be as an add on. This proposal is more about adding capabilities to be base language that happen to make dimensional analysis easier and more attractive than about providing dimensional analysis itself. Second, I concede that there is some chance that users may be lulled into a false sense of complacency and that some dimensional errors would get missed by these otherwise normally very diligent users. But I would point out that I have been intensively using and supporting languages that provide this feature for 40 years and have never seen it. Finally, lets consider the incident on Mars. The problem occurred because one software package output numbers in English units (what were they thinking?) that were then entered into another program that was expecting metric units. The only way this could have been caught in an automated fashion is if the first package output the units for its numbers, and the second package accessed and checked those units. And it is precisely this that I am trying to make easier and more likely with this extension. Of the three steps that must occur, output the units, input the units, check the units, this proposal addresses the first two. -Ken On Fri, Aug 26, 2016 at 08:01:29PM +1000, Steven D'Aprano wrote:
participants (5)
-
Chris Angelico
-
Ivan Levkivskyi
-
Ken Kundert
-
Pavol Lisy
-
Steven D'Aprano