real numbers with SI scale factors

Wow, things have suddenly turned very negative. I understand that this is very normal for these types of forums where it is easy to lose sight of the big picture. So let me try to refocus this discussion again. MOTIVATION The way the scientific and engineering communities predominately write real numbers is by using SI scale factors. These numbers almost always represent physical quantities, so it is common to write the number with scale factor and units. So for example, the distance to Andromeda is 780kpc, the pressure at the bottom of the Mariana Trench is 108MPa, the total power produced by a typical hurricane is 600TW, the number of base pairs in human DNA is 3.2Gb, and the Bohr radius is 53pm. These numbers are the language of science and engineering, but in recent years they have also entered the realm of popular culture. For example, an article by Ars Technica that calculates that the value of the music that can fit on a classic iPod as over $8G (that is the total penalty that can be accessed if they were all stolen copyright works). http://arstechnica.com/tech-policy/2012/06/from-gigabytes-to-petadollars-cop... In all of these examples the numbers are either very large or very small, and they employ the use of SI scale factors to make them easy to write and communicate. This way of writing numbers is so well established that it was formally standardized as part of the International System of Units (the Système international d'unités) in 1960. The problem is that most general purpose programming languages do not support this way of writing numbers. Instead they force people to convert the numbers to and from exponential notation, which is rather inconvenient and hard to read, write, say, or hear (it is much easier to say or hear 53 picometers than 5.3e-11 meters). When working with a general purpose programming language, the above numbers become: 780kpc -> 7.8e+05 108MPa -> 1.08e+08 600TW -> 6e+14 3.2Gb -> 3.2e+09 53pm -> 5.3e-11 $8G -> 8e+09 Notice that the numbers become longer, harder to read, harder to type, harder to say, and harder to hear. NEXT STEP So the question is, should Python accommodate this widely used method of writing real numbers? Python already has many ways of writing numbers. For example, Python allows integers to be written in binary, octal, decimal and hex. Any number you can write in one form you can write in another, so the only real reason for providing all these ways is the convenience of the users. Don't Python's users in the scientific and engineering communities deserve the same treatment? These are, after all, core communities for Python. Now, Python is a very flexible language, and it is certainly possible to add simple libraries to make it easy to read and write numbers with SI scale factors. I have written such a library (engfmt). But with such libraries this common form of writing numbers remains a second class citizen. The language itself does not understand SI scale factors, instead any time you want to input or output numbers in their natural form you must manually convert them yourself. Changing Python so that it understands SI scale factors on real numbers as first class citizens innately requires a change to the base language; it cannot be done solely through libraries. The question before you is, should we do it? The same question confronted the developers of Python when it was decided to add binary, octal and hexadecimal number support to Python. You could have done it with libraries, but you didn't because binary, octal and hexadecimal numbers were too common and important to be left as second class citizens. Well, use of binary, octal and hexadecimal numbers is tiny compared to the use of real numbers with SI scale factors. Before we expend any more effort on this topic, let's put aside the question of how it should be done, or how it will be used after its done, and just focus on whether we do it at all. Should Python support real numbers specified with SI scale factors as first class citizens? What do you think? -Ken

-1 on Python ever having any syntactic support for SI scale factors. It makes the language needlessly complicated, has no benefit I've discerned (vs using libraries), and is a magnet for a large class of bugs. Btw, the argument below feels dishonest in another respect. Within a domain there is a general size scale of quantities of interest. I worked in a molecular dynamics lab for a number of years, and we would deal with simulated timesteps of a few femtoseconds. A total simulation might run into microseconds (or with our custom supercomputer, a millisecond). There were lots of issues I don't begin to understand of exactly how many femtoseconds might be possible to squeeze in a timesteps while retaining good behavior. But the numbers of interest were in the range 0.5-5, and anyone in the field knows that. In contrast, cosmologists deal with intervals of petaseconds. Yeah, I know it's not as simple as that, but just to get at the symmetry. No one would write 2.5e-15 every time they were doing something with an MD timestep. The scaling, if anywhere at all, would be defined once as a general factor at the boundaries. The number of interest is simply, e.g. 2.5, not some large negative exponent on that. In fact, at a certain point I proposed that we should deal with rounding issues by calling the minimum domain specific time unit an attosecond, and only use integers in using this unit. That wasn't what was adopted, but it wasn't absurd. If we had done that, we simply deal with, say 1500 "inherent units" in the program. The fact it related to a physical quantity is at most something for documentation (this principle isn't different because we used floats in this case). On Aug 28, 2016 8:44 PM, "Ken Kundert" <python-ideas@shalmirane.com> wrote:

On Mon, Aug 29, 2016 at 12:32 PM, David Mertz <mertz@gnosis.cx> wrote:
Definitely not absurd. I've done the same kind of thing numerous times (storing monetary values in cents, distances in millimeters, or timepoints in music in milliseconds), because it's just way, WAY simpler than working with fractional values. So the SI prefix gets attached to the (implicit) *unit*, not to the value. I believe this is the correct way to handle things. ChrisA

It makes the language needlessly complicated, has no benefit I've discerned (vs using libraries), and is a magnet for a large class of bugs.
Well the comment about bugs in speculation that does not fit with the extensive experience in the electrical engineering community. But other than that, these arguments could be used against supporting binary, octal, and hexadecimal notation. Are you saying building those into the language was a mistake? On Sun, Aug 28, 2016 at 07:32:35PM -0700, David Mertz wrote:
Yes, without an convenient way of specifying real numbers, the computation communities have to resort to things like this. And they can work for a while, but over type the scale of things often change, and good choice of scale can turn bad after a few years. For example, when I first started in electrical engineering, the typical size of capacitors was in the micro Farad range, and software would just assume that if you gave a capacitance it was in uF. But then, with the advancement of technology the capacitors got smaller. They went from uF to nF to pF and now a growing fraction of capacitors are specified in the fF range. The fact that SPICE allowed values to be specified with SI scale factors, meant that it continued to be easy to use over the years, whereas the programs that hard coded the scale of its numbers because increasing difficult to use and then eventually became simply absurd. Even your example is a good argument for specifying numbers is SI scale factors. If I am using one of your molecular simulators, I don't want to specify the simulation time range as being from 1 to 1_000_000_000_000 fs. That is ridiculous. There are 12 orders of magnitude between the minimum resolvable time and the maximum. There are only two practical ways of representing values over such a wide range: using SI scale factors and using exponential notation. And we can tell which is preferred. You said femptoseconds, you did not say 1e-15 seconds. Even you prefer SI scale factors. -Ken

On Mon, Aug 29, 2016 at 11:44 AM, Ken Kundert <python-ideas@shalmirane.com> wrote:
And easier to compare. The SI prefixes are almost consistent in using uppercase for larger units and lowercase for smaller, but not quite; and there's no particular pattern in which letter is larger. For someone who isn't extremely familiar with them, that makes them completely unordered - which is larger, peta or exa? Which is smaller, nano or pico? Plus, there's a limit to how far you can go with these kinds of numbers, currently yotta at e+24. Exponential notation scales to infinity (to 1e308 in IEEE 64-bit binary floats, but plenty further in decimal.Decimal - I believe its limit is about 1e+(1e6), and REXX on OS/2 had a limit of 1e+(1e10) for its arithmetic), remaining equally readable at all scales. So we can't get rid of exponential notation, no matter what happens. Mathematics cannot usefully handle a system in which we have to represent large exponents with ridiculous compound scale factors: sys.float_info.max = 179.76931348623157*Y*Y*Y*Y*Y*Y*Y*Y*Y*Y*Y*Y*E (It's even worse if the Exa collision means you stop at Peta. 179.76931348623157*P*P*P*P*P*P*P*P*P*P*P*P*P*P*P*P*P*P*P*P*M, anyone?) Which means that these tags are a duplicate way of representing a specific set of exponents.
Except that those are exactly the important questions to be answered. How *could* it be done? With the units stripped off, your examples become: 780k == 7.8e+05 == 780*k 108M == 1.08e+08 == 108*M 600T == 6e+14 == 600*T 3.2G == 3.2e+09 == 3.2*G 53p == 5.3e-11 == 53*p 8G == 8e+09 == 8*G Without any support whatsoever, you can already use the third column notation, simply by creating this module: # si.py k, M, G, T, P, E, Z, Y = 1e3, 1e6, 1e9, 1e12, 1e15, 1e18, 1e21, 1e24 m, μ, n, p, f, a, z, y = 1e-3, 1e-6, 1e-9, 1e-12, 1e-15, 1e-18, 1e-21, 1e-24 u = μ K = k And using it as "from si import *" at the top of your code. Do we see a lot of code in the wild doing this? "[H]ow it will be used after it's done" is exactly the question that this would answer.
Don't Python's users in the scientific and engineering communities deserve the same treatment? These are, after all, core communities for Python.
Yes. That's why we have things like the @ matrix multiplication operator (because the numeric computational community asked for it), and %-formatting for bytes strings (because the networking, mainly HTTP serving, community asked for it). Python *does* have a history of supporting things that are needed by specific sub-communities of Python coders. But there first needs to be a demonstrable need. How much are people currently struggling due to the need to transform "gigapascal" into "e+9"? Can you show convoluted real-world code that would be made dramatically cleaner by language support? ChrisA

On Mon, Aug 29, 2016 at 12:33:16PM +1000, Chris Angelico wrote:
Yes, of course. No one is suggesting abandoning exponential notation. I am not suggesting we force people to use SI scale factors, only that we allow them to. What I am suggesting is that we stop saying to them things like 'you must use exponential notation because we have decided that its better. See, you can easily compare the size of numbers by looking at the exponents.' What is wrong with have two ways of doing things? We have many ways of specifying the value of the integer 16: 0b10000, 0o20, 16, 0x10, 16L, ....
Because by focusing on the implementation details, we miss the big picture. We have already done that, and we ended up going down countless ratholes.
Can you show code that would have been convoluted if Python had used a library rather than built-in support for hexadecimal numbers? So, in summary, you are suggesting that we tell the scientific and engineering communities that we refuse to provide native support for their preferred way of writing numbers because: 1. our way is better, 2. their way is bad because some uneducated person might see the numbers and not understand them, 3. we already have way of representing numbers that we came up with in the '60s and we simply cannot do another, 4. well we could do it, but we have decided that if you would only adapt to this new way of doing things that we just came up with, then we would not have to do any work, and that is better for us. Oh and this this new way of writing numbers, it only works in the program itself. Your out of luck when it comes to IO. These do not seem like good reasons for not doing this. -Ken

On 2016-08-28 20:29, Ken Kundert wrote:
What is wrong with have two ways of doing things? We have many ways of specifying the value of the integer 16: 0b10000, 0o20, 16, 0x10, 16L, ....
Zen of Python: "There should be one-- and preferably only one --obvious way to do it." If Python didn't have binary or octal notation and someone came here proposing it, I would not support it, for the same reasons I don't support your proposal. If someone proposed eliminating binary or octal notation for Python 4 (or even maybe Python 3.8), I would probably support it for the same reason. Those notations are not useful enough to justify their existence. Hexadecimal is more justifiable as it is far more widely used, but I would be more open to removing hexadecimal than I would be to adding octal. Also, "L" as a long-integer suffix is already gone in Python 3. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On Mon, Aug 29, 2016 at 1:40 PM, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
I agree with you on octal - there are very few places where it's the one obvious way to do things, and you can always use int("755",8) if you have data that's represented best octally. But hex is incredibly helpful when you do any sort of bit manipulations, and decimal quickly becomes unwieldy and error-prone. Here's some code from Lib/stat.py: S_IFDIR = 0o040000 # directory S_IFCHR = 0o020000 # character device S_IFBLK = 0o060000 # block device S_IFREG = 0o100000 # regular file S_IFIFO = 0o010000 # fifo (named pipe) S_IFLNK = 0o120000 # symbolic link S_IFSOCK = 0o140000 # socket file These are shown in octal, because Unix file modes are often written in octal. If Python didn't support octal, the obvious alternative would be hex: S_IFDIR = 0x4000 # directory S_IFCHR = 0x2000 # character device S_IFBLK = 0x6000 # block device S_IFREG = 0x8000 # regular file S_IFIFO = 0x1000 # fifo (named pipe) S_IFLNK = 0xA000 # symbolic link S_IFSOCK = 0xC000 # socket file About comparable for these; not as good for the actual permission bits, since there are three blocks of three bits. Python could manage without octal literals, as long as hex literals are available. (I don't support their *removal*, because that's completely unnecessary backward incompatibility; but if Python today didn't have octal support, I wouldn't support its addition.) But the decimal equivalents? No thank you. S_IFDIR = 16384 # directory S_IFCHR = 8192 # character device S_IFBLK = 24756 # block device S_IFREG = 32768 # regular file S_IFIFO = 4096 # fifo (named pipe) S_IFLNK = 40960 # symbolic link S_IFSOCK = 49152 # socket file One of these is wrong. Which one? You know for certain that each of these values has at most two bits set. Can you read these? If you're familiar with the powers of two, you should have no trouble eyeballing the single-bit examples, but what about the others? We need hex constants for anything that involves bitwise manipulations. Having binary constants is nice, but (like with octal) not strictly necessary; but we need at least one out of bin/oct/hex. (Also, 16L doesn't actually mean the integer 16 - it means the *long* integer 16, which is as different from 16 as 16.0 is.) ChrisA

On 29 August 2016 at 13:40, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
Octal literals were on the Python 3 chopping block, with only two things saving them: - *nix file permissions (i.e. the existing sysadmin user base) - the proposal to switch to "0o" as the prefix The addition of "0b" was to make bitwise operators easier to work with, rather than requiring folks to mentally convert between binary and hexadecimal just to figure out how to set a particular bit flag, with the requirement to understand binary math being seen as an essential requirement for working with computers at the software development level (since it impacts so many things, directly and indirectly). Hexadecimal then sticks around as a way of more concisely writing binary literals However, the readability-as-a-general-purpose-language argument in the case of SI scaling factors goes as follows: - exponential notation (both the scientific and engineering variants) falls into the same "required to understand computers" category as binary and hexadecimal notation - for folks that have memorised the SI scaling factors, the engineering notation equivalents should be just as readable - for folks that have not memorised the SI scaling factors, the engineering notation equivalents are *more* readable - therefore, at the language level, this is a style guide recommendation to use engineering notation for quantitative literals over scientific notation (since engineering notation is easier to mentally convert to SI prefixes) However, once we're talking domain specific languages (like circuit design), rather than a general purpose programming language, then knowledge of the SI prefixes can be included in the assumed set of user knowledge, and made a language level feature. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Aug 29, 2016 at 1:29 PM, Ken Kundert <python-ideas@shalmirane.com> wrote:
Because by focusing on the implementation details, we miss the big picture. We have already done that, and we ended up going down countless ratholes.
They're important ratholes though. Without digging into those questions, all you have is an emotive argument of "but we NEEEEEEED to support SI prefixes as integer suffixes!".
See my other email, with examples of bit flags. It's not too bad if you only ever work with a single bit at a time, but bit masks combine beautifully in binary, fairly cleanly in hex, and really badly in decimal. Hex is a great trade-off between clean bit handling and compact representation. (Octal is roughly the same trade-off, and in days of yore was the one obvious choice, but hex has overtaken it.)
Is more general, yes. If all you have is SI prefixes, you're badly scuppered. If all you have is exponential notation, you can do everything.
2. their way is bad because some uneducated person might see the numbers and not understand them,
Is, again, less general. It's a way of writing numbers that makes sense only in a VERY narrow area.
3. we already have way of representing numbers that we came up with in the '60s and we simply cannot do another,
False.
I'm not sure what you mean by "IO" here, but if you're writing a program that accepts text strings and prints text strings, it's free to do whatever it wants.
These do not seem like good reasons for not doing this.
Not worded the way you have them, no, because you've aimed for an extremely emotional argument instead of answering concrete questions like "where's the code that this would improve". Find some real-world code that would truly benefit from this. Show us how it's better. Something that I don't think you've acknowledged is that the SI scaling markers are *prefixes to units*, not *suffixes to numbers*. That is to say, you don't have "132G" of a thing called a "pascal", you have "132" of a thing called a "gigapascal". Outside of a unit-based system, SI prefixes don't really have meaning. I don't remember ever, in a scientific context, seeing a coefficient of friction listed as "30 milli-something"; it's always "0.03". So if unitless values are simply numbers, and Python's ints and floats are unitless, they won't see much benefit from prefixes-on-nothing. ChrisA

Sorry, I am trying very hard not to let my emotions show through, and instead provide answers, examples, and comments that are well reasoned and well supported. I do find it frustrating that I appear to be the only one involved in the conversation that has a strong background in numerical computation, meaning that I have to carry one side of the argument without any support. It is especially frustrating when that background is used as a reason to discount my position. Let me try to make the case in an unemotional way. It is hard to justify the need for SI scale factors being built into the language with an example because it is relatively simple to do the conversion. For example ... With built-in support for SI scale factors: h_line = 1.4204GHz print('hline = {:r}Hz'.format(h_line)) ... In Python today: from engfmt import Quantity h_line = Quantity('1.4204GHz') print('hline = {:q}'.format(h_line)) h_line = float(h_line) ... Not really much harder to use the library. This is very similar to the situation with octal numbers ... With built-in support for octal numbers: S_IFREG = 0o100000 # regular file With out built-in support for octal numbers: S_IFREG = int('100000', base=8) # regular file So just giving a simple example is not enough to see the importance of native support. The problem with using a library is that you always have to convert from SI scale factors as the number is input and then converting back as the number is output. So you can spend a fair amount of effort converting too and from representations that support SI scale factors. Not a big deal if there is only a few, but can be burdensome if there is a large number. But the real benefit to building it in a native capability is that it puts pressure on the rest of the ecosystem to also adopt the new way of representing real numbers. For example, now the interchange packages and formats (Pickle, YaML, etc.) need to come up with a way of passing the information without losing its essential character. This in turn puts pressure on other languages to follow suit. It would also put pressure on documenting and formatting packages, such as Sphinx, Jinja, and matplotlib, to adapt. Now it becomes easier to generate clean documentation. Also the interactive environments, such as ipython, need to adapt. The more this occurs, the better life gets for scientists and engineers.
Yeah, this is why I suggested that we support the ability for users to specify units with the numbers, but it is not a hard and fast rule. Once you add support for SI scale factors, people find them so convenient that they tend to use them whether they are units or not. For example, it is common for circuit designers to specify the gain of an amplifier using SI scale factors even though gain is often unitless. For example, gain=50k. Also, electrical engineers will often drop the units when they are obvious, especially if they are long. For example, it is common to see a resistance specified as 100k. When values are given in a table and all the values in a column have the same units, it is common to give numbers with scale factors by without units to save space. -Ken

On Mon, Aug 29, 2016 at 7:08 PM, Ken Kundert <python-ideas@shalmirane.com> wrote:
An "emotional argument" doesn't necessarily mean that your emotions are governing everything - it's more that your line of reasoning is to play on other people's emotions, rather than on concrete data. Your primary argument has been "But think of all the scientific developers - don't you care about them??", without actually giving us code to work with. (In a recent post, you did at least give notes from a conversation with a lady of hard science, but without any of her code, we still can't evaluate the true benefit of unitless SI scaling.)
No no no; your background is not a reason to discount your position. However, on its own, it's insufficient justification for your position. Suppose I come to python-ideas and say "Hey, the MUD community would really benefit from a magic decoder that would use UTF-8 where possible, ISO-8859-1 as fall-back, and Windows-1252 for characters not in 8859-1". Apart from responding that 8859-1 is a complete subset of 1252, there's not really a lot that you could discuss about that proposal, unless I were to show you some of my code. I can tell you about the number of MUDs that I play, the number of MUD clients that I've written, and some stats from my MUD server, and say "The MUD community needs this support", but it's of little value compared to actual code. (For the record, a two-step decode of "UTF-8, fall back on 1252" is exactly what I do... in half a dozen lines of code. So this does NOT need to be implemented.) That's why I keep asking you for code examples. Real-world code, taken from important projects, that would be significantly improved by this proposal. It has to be Python 3 compatible (unless you reckon that this is the killer feature that will make people take the jump from 2.7), and it has to be enough of an improvement that its authors will be willing to drop support for <3.6 (which might be a trivial concern, eg if the author expects to be the only person running the code).
Maybe; or maybe you'd already be doing a lot of I/O work, and it's actually quite trivial to slap in one little function call at one call site, and magically apply it to everything you do. Without code, we can't know. (I sound like a broken record here.)
Ewwww, I doubt it. That would either mean a whole lot of interchange formats would need to have a whole new form of support. You didn't mention JSON, but that's a lot more common than Pickle; and JSON is unlikely to gain a feature like this unless ECMAScript does (because JSON is supposed to be a strict subset of JavaScript's Object Notation). Pickle - at least the one in Python - doesn't need any way to store non-semantic information, so unless you intend for the scale factor to be a fundamental part of the number, it won't need changes. (People don't manually edit pickle files the way they edit JSON files.) YAML might need to be enhanced. That's the only one I can think of. And it's a big fat MIGHT.
Eww eww eww. You're now predicating your benefit on a huge number of different groups gaining support for this. To what extent do they need to take notice of the SI scales on numbers? Input-only? Output? Optionally on output? How do you control this? Are SI-scaled numbers somehow different from raw numbers, or are they equivalent forms (like 1.1e3 and 1100.0)? If they're equivalent forms, how do you decide how to represent on output? Are all these questions to be answered globally, across all systems, or is it okay for one group to decide one thing and another another?
Gain of 50k, that makes reasonable sense. Resistance as 100k is a shorthand for "100 kiloohms", and it's still fundamentally a unit-bearing value. All of this would be fine if you were building a front-end that was designed *SOLELY* for electrical engineers. So maybe that's the best way. Fork IDLE or iPython and build the very best electrical engineering interactive Python; it doesn't matter, then, how crazy it is for everyone else. You can mess with stuff on the way in and on the way out, you can interpret numbers as unitless values despite being written as "100kPa", and you can figure out what to do in all the edge cases based on actual real-world usage. You'd have your own rules for backward compatibility, rather than being bound by Python's release model (18 months between feature improvements, and nothing gets dropped without a VERY good reason), so you could chop and change as you have need. The base language would still be Python, but it'd be so optimized for electrical engineers that you'd never want to go back to vanilla CPython 3.6. Sound doable? ChrisA

On 29.08.2016 11:37, Chris Angelico wrote:
There was no reasonable real-world code examples taken from important projects, that would be significantly improved by underscores in numbers. Still, we got them, so your argument here is void. that's not different for this proposal. Sven

On Mon, Aug 29, 2016 at 1:55 PM, Sven R. Kunze <srkunze@mail.de> wrote:
There was no reasonable real-world code examples taken from important projects, that would be significantly improved by underscores in numbers.
I recall dozens of real world examples that came up during the discussion, and have written very numerous such examples in code of my own. This is something that directly affects readability in code I write almost every day, and is a clear and obvious win. I taught examples *today* where I would have badly liked to have underscore separators, and it was obvious to me and students that awkwardness. Writing, e.g. `range(int(1e7))` feel contrives (but usually the way I do it). Writing `range(10000000)` is nearly impossible to parse visually. In contrast, writing `range(10_000_000)` will be immediately clear and obvious. None of those things can be said of SI units as Python literals. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On 30.08.2016 04:34, David Mertz wrote:
Hu? None of those things? I do think you exaggerate quite a lot here. If your real-world example works for underscores, it works for SI units and scales as well. I for one don't have usage of either, so having such a distance to the subject at hand, I don't see this as a compelling argument against/for his proposal. Sven

On 29 August 2016 at 19:08, Ken Kundert <python-ideas@shalmirane.com> wrote:
Also the interactive environments, such as ipython, need to adapt. The more this occurs, the better life gets for scientists and engineers.
This theory of change is backwards - we follow IPython and Project Jupyter when it comes to understanding what's a desirable UX for scientists (primarily) and engineers (somewhat), rather than the other way around. (Ditto for SciPy and Numpy for the computational requirements side of things - in addition to the already referenced https://www.python.org/dev/peps/pep-0465/ for matrix multiplication, there's also https://www.python.org/dev/peps/pep-0357/ which defined the __index__ protocol, and https://www.python.org/dev/peps/pep-3118/ which defined a rich C-level protocol for shaped data export. Even before there was a PEP process, extended slicing and the Ellipsis literal were added for the benefits of folks writing multidimensional array indexing libraries) So if your aim is to make a "scientists & engineers will appreciate it" UX argument, then you're unlikely to gain much traction here if you haven't successfully made that argument in the Project Jupyter and/or SciPy ecosystems first - if there was a popular "%%siunits" cell magic, or a custom Project Jupyter kernel that added support for SI literals, we'd be having a very different discussion (and you wouldn't feel so alone in making the case for the feature). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I teach working scientists about numeric computing on a daily basis. There are a few special field where Ken's ideas are the norm, at least in informal notation. The large majority of working scientists would find a syntax change like he proposes an annoyance and nuisance. Alienating and confusing everyone who isn't a circuit designer is a bad goal. It's not going to happen in python. If you really want this syntax, you need to use a different language, or maybe write a preprocessor that turns a slightly different language back into Python. On Aug 29, 2016 4:09 AM, "Ken Kundert" <python-ideas@shalmirane.com> wrote:

On Aug 28 2016, Ken Kundert <python-ideas-jl/PDlM0qtzz1n+OaKNE4w@public.gmane.org> wrote:
I think you're making some incorrect assumptions here. Who, exactly, do you mean with "we" and "them"? I consider myself part of the scientific community and think your proposal is a bad idea, and Google finds some Python modules from you, but no prior CPython contributions... Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

Nikolaus, I have belatedly realized that this kind of hyperbole is counter productive. So let me back away from that statement and instead try to understand your reasons for not liking the proposal. Do you think there is no value to be able to naturally read and write numbers with SI scale factors from Python? Or is your issue with something about my proposal? -Ken On Mon, Aug 29, 2016 at 06:59:15PM -0700, Nikolaus Rath wrote:

On 30 August 2016 at 04:19, Ken Kundert <python-ideas@shalmirane.com> wrote:
Ken, Answering these questions from my perspective (and thanks for taking note of the comments and toning down your delivery, by the way) I have an issue with the way your proposal is vague about the relationship between SI scales and units - it's been mentioned a couple of times, but never adequately addressed, that scales are tightly linked to units (we routinely talk about kilometers and milligrams, but almost never about kilos or millis). There have been some strong (and justified, IMO) objections to adding units without giving them proper semantic meaning, and your response to that seems to be to talk for a while about scale factors in isolation, but then go straight back to examples using scaled units. It's hard to understand exactly what you're proposing when your examples don't match your suggestions. If we assume you *are* simply talking about pure scales (k=1000, M=1000000 etc), then you haven't addressed the many suggestions of alternatives, with anything more substantive than "well, I (and colleagues I know) prefer scale factors", plus some examples with scaled *units* again. Your comparisons tend to show your preferred approach in the best light, while using the least attractive alternative options. And there's almost no proper discussions of pros and cons. In short, you offer almost nothing in the way of objective arguments for your proposals. You mention "reading and writing numbers with scale factors from Python". It's easy enough to do external IO with scale factors, you just read strings and parse them as you wish. A language syntax only affects internal constants - and it's not clear to me that the benefit is significant even then, as I'd expect (as a matter of good style) that any constant needing this type of syntax should be named anyway. Again, this isn't something you address. You've offered no examples of real-world code in existing public projects that would be improved by your proposal. While that's not always necessary to a successful proposal, it certainly makes it more compelling, and helps to confirm that a proposal isn't limited to "niche" areas. So to summarise, I don't think you've made objective arguments for your proposal (your *subjective* enthusiasm for the proposal has never been in doubt), or addressed many of the comments that have already been made. To be honest, I don't think there's much chance of your proposal being accepted at this point in time. As Steven noted, Python tends not to be a leader in matters like this, and so the lack of mainstream prior art is probably sufficient to kill this proposal. But for your own benefit (and the benefit of any future proposals you may make in this or other areas - please don't feel put off by the fact that this specific proposal has met with a lot of resistance) you might want to review the thread and consider what a PEP might look like for this discussion, and how you would have incorporated and responded to the objections raised here - https://www.python.org/dev/peps/pep-0001/#what-belongs-in-a-successful-pep is a good summary of the sort of things you should be looking at. There's no need to actually complete a PEP or post it, the proposal here hasn't reached a stage where a PEP is useful, but thinking about the PEP structure might help you understand the context a bit better. I hope this helps, Paul

On Aug 29 2016, Ken Kundert <python-ideas-jl/PDlM0qtzz1n+OaKNE4w@public.gmane.org> wrote:
* I think there is no value gained by being able to write 32.3m instead of 32.3e6. I think the second one is clear to everyone who uses SI prefixes, while the first one just introduces a lot of complexities. Most of them have been mentioned already: - no deducible ordering if one doesn't know the prefixes - potential for ambiguity with Exa - question about base 2 vs base 10, e.g what do you expect to be stored in *size* if you reed this: "size = 10M # we need that many bytes" - rather abitrary naming ("M" and "m" vs "T" and "p"). * I think having SI *Unit* support "32.3 kN" would be nice, but only if there is a space between number and unit, and only if the unit actually get's attached to the number. But your proposal would result in 1km + 1µN == 2 being true. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

On 2016-08-29 02:44, Ken Kundert wrote: [snip]
For currency, it's usually "million" or "m"/"M", "billion" or "bn" (or maybe "b"/"B"), "trillion" (or maybe "tn" or "t"/"T"). Using a suffixed SI scale factor with a prefixed currency symbol is not that common, in my experience. [snip]
There's also "engineering notation", where the exponent is a multiple of 3. [snip]
I expect that octal and hexadecimal number support was there from the start. CPython is written in C and Python borrowed the notation. The binary notation was added in Python 2.6 and followed the same pattern as the hexadecimal notation. The octal notation of a leading "0" was later replaced with a clearer one that followed the same pattern. C had octal and hexadecimal from the start. (Actually, I'm not entirely sure about hexadecimal, octal being the preferred form, but if it wasn't there from the very start, it was an early addition.) C descends from BCPL, which had octal and hexadecimal, and BCPL dates from 1967. There are other languages too that had hexadecimal and octal. They've been around in programming languages for decades. How many languages have scale factors? Does Fortran? Not that I know of. [snip]

The reason why hexadecimal and octal are in general purpose languages and real numbers with SI scale factors are not is because languages are developed by computer scientists and not by scientists. I keep using SPICE and Verilog as examples of a languages that supports SI scale factors, and that is because they are the extremely rare cases where the languages were either developed or specified by end users and not by computer scientists. The reason why computer scientists tend to add hexadecimal and octal numbers to their languages and not SI scale factors is that they use hexadecimal and octal numbers, and as we have seen by this discussion, are rather unfamiliar with real numbers with SI scale factors. It is easy for them to justify adding hex because they know from personal experience that it is useful, but if you don't use widely scaled real numbers day in and day out it is hard to understand just how tedious exponential notation is and how useful it would be to use SI scale factors.

I didn't follow the previous discussion so far, so excuse me if I repeat something somebody already mentioned. But these are intriguing points you made here. On 29.08.2016 09:31, Ken Kundert wrote:
I didn't know that THERE ARE languages that already feature SI factors. You could be right about their development. I for one wouldn't have an issue with this being in Python for the following reasons: 1) I wouldn't use it as I don't have the use-cases right now 2) if I would need to read such code, it wouldn't hurt my reading experience as I am used to SI 3) there will be two classes of code here: a) code that has use for it and thus uses it quite extensively and code that doesn't; depending on where you work you will encounter this feature or you don't even know it exists (this is true for many features in Python which is a good thing: each domain should use what is the best tool for them) The biggest issue I have is the following: SI scale factors without SI units do not make much sense, I think (especially considering those syntax changes). So, the potential, if any, can only illustrated in combination with them. But Python does not feature any SI units so far as those are provided by external packages. If you can resolve that I am +1 on this proposal, but otherwise just +0. Sven PS: If I think about it this way, I might have a use-case in a small side-project.

On Sun, Aug 28, 2016 at 6:44 PM, Ken Kundert <python-ideas@shalmirane.com> wrote:
These are not equal. 780kpc is a *number with units*. 7.8e+05 == 780000 is a *unitless number*. All the numbers on the right hand side above have no units so I can't tell which are pc or W or m or $. It's asking for trouble to go halfway in representing units. On the left hand side, 780kpc + 108MPa is invalid while 780kpc + 53pm is valid. On the right hand side, sums of any two numbers are valid as they would be with the unitless SI prefixes. So if you want to solve this problem, write a module that supports units. For example, m(780, 'kpc') == m('780kpc') == m(780, kpc) and it's legal to write m(780, kpc) + m(53, pm) but an exception gets raised if you write m(2, kW) + m(3, kh) instead of m(2, kW) * m(3, km) == m(6, MWh). In fact, several people have already done that. Here are three I found in < 1 minute of searching: https://pint.readthedocs.io/en/0.7.2/ https://pypi.python.org/pypi/units/ https://python-measurement.readthedocs.io/en/latest/ Should support for this ever get added to the core language? I doubt it. But if one of these modules becomes enormously popular and semi-standard, you never know. I think you're much more likely to get this into your Python code by way of a preprocessor. --- Bruce Check out my puzzle book and get it free here: http://J.mp/ingToConclusionsFree (available on iOS)

On 2016-08-28 18:44, Ken Kundert wrote:
You've continually repeated this assertion, but I don't buy it. For the general case, exponential notation is easier to read because you can always see exactly what the exponent is as a number. To read SI units, you have to know all the SI prefixes. This may well be common within scientific communities, but to say that it is "easier" is really a bit much. The same is true of "harder to type". "kpc" is three characters; e+5 is also three (note that you don't need to write e+05), and one of those is a number that transparently indicates how many places to move the decimal, whereas all of the letters in "kpc" are opaque unless you already know what the number is meant to represent. If you have concrete evidence (e.g., from actual user experience research) showing that it is across-the-board "easier" to read or type SI prefixes than exponential notation, that would be good to see. In the absence of that, these assertions are just doubling down on the same initial claim, namely that adding SI units to Python would make things more convenient *for those using it to compute with literally-entered quantities in SI units*. I quite agree that that is likely true, but to my mind that is not enough to justify the disruption of adding it at the syntactic level. (Unless, again, you have some actual evidence showing that this particular kind of use of numeric literals occurs in a large proportion of Python code.)
My current opinion is no. There are lots of things that are common. (Should we include a spellchecker in Python because many people frequently make spelling errors?) The fact that SI units are de rigeur in the physical science community isn't enough. I would want to see some actual attempt to quantify how much benefit there would be in the PYTHON community (which of course includes, but is not limited to, those using Python for physical-science computations). -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On Sun, Aug 28, 2016 at 08:26:38PM -0700, Brendan Barnwell wrote:
For the record, I don't know what kpc might mean. "kilo pico speed of light"? So I looked it up using units, and it is kilo-parsecs. That demonstrates that unless your audience is intimately familiar with the domain you are working with, adding units (especially units that aren't actually used for anything) adds confusion. Python is not a specialist application targetted at a single domain. It is a general purpose programming language where you can expect a lot of cross-domain people (e.g. a system administrator asked to hack on a script in a domain they know nothing about).
You don't have to write e+5 either, just e5 is sufficient.
I completely believe Ken that within a single tightly focussed user community, using their expected conventions (including SI prefixes) works really well. But Python users do not belong to a single tightly focussed user community. -- Steve

On Mon, Aug 29, 2016 at 01:45:20PM +1000, Steven D'Aprano wrote:
I talked to astrophysicist about your comments, and what she said was: 1. She would love it if Python had built in support for real numbers with SI scale factors 2. I told her about my library for reading and writing numbers with SI scale factors, and she was much less enthusiastic because using it would require convincing the rest of the group, which would be too much effort. 3. She was amused by the "kilo pico speed of light" comment, but she was adamant that the fact that you, or some system administrator, does not understand what kpc means has absolutely no affect on her desired to use SI scale factors. Her comment: I did not write it for him. 4. She pointed out that the software she writes and uses is intended either for herself of other astrophysicists. No system administrators involved.
You think that Python is only used by generalists? That is silly. Have you seen SciPy? If you think that, take a look at Casa (casaguides.nrao.edu). It is written by astrophysicists for astrophysicists doing observations on radio telescope arrays. That is pretty specialized. -Ken

On 2016-08-29 00:07, Ken Kundert wrote:
I think you misunderstand. My position (reiterated by the text you quote from Steven D'Aprano) is not that Python is used only by generalists. It is that we shouldn't change Python in a way that ONLY helps specialists. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On Mon, Aug 29, 2016 at 12:18:02AM -0700, Brendan Barnwell wrote:
But surely we should consider changing Python if the change benefits a wide variety of specialists, especially if the change is small and fits cleanly into the language. In this case, our specialists come from most of the disciplines of science and engineering. That is a pretty big group. -Ken

Chris, I was not able to get an astrophyics example, but I do have a reasonable one that performs a spectral analysis of the output of an analog to digital converter, something radio astronomers are known to do. I am including the code, but it requires a rather large data file to run, which I will not include. The code uses my 'engfmt' library from pypi to perform conversion to SI form. In this example, there is no need for conversion from SI form. #!/usr/bin/env python3 import numpy as np from numpy.fft import fft, fftfreq, fftshift import matplotlib as mpl mpl.use('SVG') from matplotlib.ticker import FuncFormatter import matplotlib.pyplot as pl from engfmt import Quantity, set_preferences set_preferences(spacer=' ') def mag(spectrum): return np.absolute(spectrum) def freq_fmt(val, pos): return Quantity(val, 'Hz').to_eng() def volt_fmt(val, pos): return Quantity(val, 'V').to_eng() freq_formatter = FuncFormatter(freq_fmt) volt_formatter = FuncFormatter(volt_fmt) data = np.fromfile('delta-sigma.smpl', sep=' ') time, wave = data.reshape((2, len(data)//2), order='F') timestep = time[1] - time[0] nonperiodicity = wave[-1] - wave[0] period = timestep * len(time) print('timestep = {}'.format(Quantity(timestep, 's'))) print('nonperiodicity = {}'.format(Quantity(nonperiodicity, 'V'))) print('timepoints = {}'.format(len(time))) print('freq resolution = {}'.format(Quantity(1/period, 'Hz'))) window = np.kaiser(len(time), 11)/0.37 # beta=11 corresponds to alpha=3.5 (beta = pi*alpha), /.4 # 0.4 is the # processing gain with alpha=3.5 is 0.37 #window = 1 windowed = window*wave spectrum = 2*fftshift(fft(windowed))/len(time) freq = fftshift(fftfreq(len(wave), timestep)) fig = pl.figure() ax = fig.add_subplot(111) ax.plot(freq, mag(spectrum)) ax.set_yscale('log') ax.xaxis.set_major_formatter(freq_formatter) ax.yaxis.set_major_formatter(volt_formatter) pl.savefig('spectrum.svg') ax.set_xlim((0, 1e6)) pl.savefig('spectrum-zoomed.svg') When run, this program prints the following diagnostics to stdout: timestep = 20 ns nonperiodicity = 2.3 pV timepoints = 27994 freq resolution = 1.7861 kHz It also generates two SVG files. I have converted one to PNG and attached it. A few comments: 1. The data in the input file ('delta-sigma.smpl') has low dynamic range and is machine generated, and not really meant for direct human consumption. As such, it does not benefit from using SI scale factors. But there are certainly cases where the data has both high dynamic range and is intended for people to examine it directly. In those cases it would be very helpful if NumPy was able to directly read the file. As the language exists today, I would need to read the file myself, manually convert it, and feed the result to NumPy. 2. Many of these numbers that are output do have high dynamic range and are intended to be consumed directly by humans. These benefit from using SI scale factors. For example, the 'freq resolution' can vary from Hz to MHz and 'nonperiodicity' can vary from fV to mV. 3. Extra effort was expended to make the axis labels on the graph use SI scale factors so as to make the results 'publication quality'. My hope is that if Python accepted SI literals directly, then both NumPy nad MatPlotLib would also be extended to accept/use these formats directly, eliminating the need for me to do the conversions and manage the axes. -Ken On Mon, Aug 29, 2016 at 06:02:29PM +1000, Chris Angelico wrote:

On Mon, Aug 29, 2016 at 9:07 AM, Ken Kundert <python-ideas@shalmirane.com> wrote:
Astropy also has a very powerful units package--originally derived from pyunit I think but long since diverged and grown: http://docs.astropy.org/en/stable/units/index.html It was originally developed especially for astronomy/astrophysics use and has some pre-defined units that many other packages don't have, as well as support for logarithmic units like decibel and optional (and customizeable) unit equivalences (e.g. frequency/wavelength or flux/power). That said, its power extends beyond astronomy and I heard through last week's EuroScipy that even some biology people have been using it. There's been some (informal) talk about splitting it out from Astropy into a stand-alone package. This is tricky since almost everything in Astropy has been built around it (dimensional calculations are always used where possible), but not impossible. One of the other big advantages of astropy.units is the Quantity class representing scale+dimension values. This is deeply integrated into Numpy so that units can be attached to Numpy arrays, and all Numpy ufuncs can operate on them in a dimensionally meaningful way. The needs for this have driven a number of recent features in Numpy. This is work that, unfortunately, could never be integrated into the Python stdlib.

On Mon, Aug 29, 2016 at 3:05 PM, Erik Bray <erik.m.bray@gmail.com> wrote:
I'll also add that syntactic support for units has rarely been an issue in Astropy. The existing algebraic rules for units work fine with Python's existing order of operations. It can be *nice* to be able to write "1m" instead of "1 * m" but ultimately it doesn't add much for clarity (and if really desired could be handled with a preparser--something I've considered adding for Astropy sources (via codecs). Best, Erik

I just want to add, as an astrophysicist who uses astropy.units: the astropy solution is pretty great, and I don’t mind the library overhead. I’d much rather have astropy.units, which does dimensional analysis, as well as handling SI prefixes for 2 reasons: 1. I don’t normally see or use SI prefixes without units, so bare SI prefixes are fairly worthless to me as a scientist. IF the units are going to be there, I’d much rather have a library that does a good job at dimensional analysis, and has taken my domain-specific concerns into account, for reasons fairly well covered in this thread. 2. I don’t find it cumbersome at all to use something like astropy.units which provides both the prefix and units for my code on input and output. The added syntactic weight of a single import, plus multiplication, is really not that big a burden, and makes it both clear what I am trying to write, and easy for the library to maintain this meaning when I use the variable later. e.g. from astropy.units import * distance = 10 * km If that multiplication symbol is really too much to handle, then I’d rather see python support implicit multiplication as suggested above (i.e. “10 km” is parsed as “10 * km") and domain-specific libraries can support SI prefixes and units. ~ Alex

Erik, One aspect of astropy.units that differs significantly from what I am proposing is that with astropy.units a user would explicitly specify the scale factor along with the units, and that scale factor would not change even if the value became very large or very small. For example: >>> from astropy import units as u >>> d_andromeda = 7.8e5 * u.parsec >>> print(d_andromeda) 780000.0 pc >>> d_sun = 93e6*u.imperial.mile >>> print(d_sun.to(u.parsec)) 4.850441695494146e-06 pc >>> print(d_andromeda.to(u.kpc)) 780.0 kpc >>> print(d_sun.to(u.kpc)) 4.850441695494146e-09 kpc I can see where this can be helpful at times, but it kind of goes against the spirit of SI scale factors, were you are generally expected to 'normalize' the scale factor (use the scale factor that results in the digits presented before the decimal point falling between 1 and 999). So I would expected d_andromeda = 780 kpc d_sun = 4.8504 upc Is the normalization available astropy.units and I just did not find it? Is there some reason not to provide the normalization? It seems to me that pre-specifying the scale factor might be preferred if one is generating data for a table and all the magnitude of the values are known in advance to within 2-3 orders of magnitude. It also seems to me that if these assumptions were not true, then normalizing the scale factors would generally be preferred. Do you believe that? -Ken On Mon, Aug 29, 2016 at 03:05:50PM +0200, Erik Bray wrote:

On 30 August 2016 at 13:48, Ken Kundert <python-ideas@shalmirane.com> wrote:
The "imperial.mile" example here highlights one key benefit that expression based approaches enjoy over dedicated syntax: easy access to Python's existing namespace features. As a quick implementation sketch, consider something like:
You could also relatively easily adapt that such that there there was only one level of lookup, and you could write the examples without the second dot (you'd just need to do some parsing of the key value in __getattr__ to separate the SI prefix from the nominal units) One particular benefit of this kind of approach is that you automatically avoid the "E" ambiguity problem, since there's nothing wrong with "si.E" from Python's perspective. You also gain an easy hook to attach interactive help: "help(si)" (or si? in IPython terms) Expanding out to full dimensional analysis with something like astropy.units also becomes relatively straightforward, just by changing the kind of value that __getattr__ returns. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Aug 29, 2016 at 08:48:55PM -0700, Ken Kundert wrote:
Let me see if I get this straight... you *explicitly* asked for the distance to the sun in kpc (kiloparsecs), but you expected a result in µpc (microparsecs)? When you ask the waiter for a short black, do you get upset that he doesn't bring you a latte with soy milk? *wink* I can see that such a normalising function would be useful, but I don't think it should be the default. (If I ask for millimetres, I want millimetres, not gigametres.) I've written and used code like that for bytes, it makes sense to apply it to other measurement units. But only if the caller requests normalisation, never by default. I don't think there is any such general expectation that values should be normalised in that way, and certainly not that your conversion program should automatically do it for you. For example, see this list of long-lived radioactive isotopes: http://w.astro.berkeley.edu/~dperley/areopagus/isotopetable.html Values above 650,000,000,000 (650e9) years are shown in "scientific format", not "engineering format", e.g. Selenium-82 is given as 1.1 x 10^20 rather than 110 x 10^18. Likewise: http://www.nist.gov/pml/data/halflife-html.cfm displays a range of units (minutes, hours, days) with the base value ranging up to over ten thousand, e.g. Ti-44 is shown as 22154 ± 456 days. This is NIST, which makes it pretty official. I don't think there's any general expectation that values should be shown in the range 1 to 999. (Perhaps in certain specialist areas.) -- Steve

Steve, Actually I initially asked for the distances in parsecs and was expecting that they would be presented in a convenient format. So, to frame it in terms of your analogy, I ordered a short black and become upset when I am delivered 8oz of coffee in a 55 gallon drum. This seems to be one of those unstated assumptions that have caused confusion in these discussions. Sometimes you want fix the prefix, sometimes you don't. For example, the Bel (B) is a unit of measure for ratios, but we never use it directly, we always use decibels (dB). Nobody uses mB or kB or even B, it is always dB. But with other units we do use the scale factors and we do tend to normalize the presentation. For example, nobody says the Usain Bolt won the 100000mm dash, or the 0.1km dash. Similarly when people refer to the length of the Olympic road race in Rio, they say 56km, not 56000m. This is really only an issue with output. What I am suggesting is adding support for the second case into stdlib. For example: >>> print('Attenuation = {:.1f}dB at {:r}m.'.format(-13.7, 50e3)) Attenuation = -13.7dB at 50km. -Ken On Tue, Aug 30, 2016 at 11:41:10PM +1000, Steven D'Aprano wrote:

On Tue, Aug 30, 2016 at 11:48 AM, Ken Kundert <python-ideas@shalmirane.com> wrote:
[...] Similarly when people refer to the length of the Olympic road race in Rio, they say 56km, not 56000m.
However I can't help to point out that if I said the distance to the sun is 149.6 Gm, most people would do a double-take.
This is really only an issue with output.
So maybe the proposal should be toned down to just a way to request SI units when formatting numbers? -- --Guido van Rossum (python.org/~guido)

Guido, I am in the process of summarizing the discussion as a way of wrapping this up. As part of that I will be making a proposal that I think has a chance of being accepted, and it will largely be what you suggest. -Ken On Tue, Aug 30, 2016 at 11:59:19AM -0700, Guido van Rossum wrote:

On Tue, Aug 30, 2016 at 5:48 AM, Ken Kundert <python-ideas@shalmirane.com> wrote:
Hi Ken, I see what you're getting at, and that's a good idea. There's also nothing in the current implementation preventing it, and I think I'll even suggest this to Astropy (with proper attribution)! I think there are reasons not to always do this, but it's a nice option to have. Point being nothing about this particular feature requires special support from the language, unless I'm missing something obvious. And given that Astropy (or any other units library) is third-party chances are a feature like this will land in place a lot faster than it has any chance of showing up in Python :) Best, Erik

It has been pointed out to me that the above comes off as being condescending towards Steven, system administrators and language developers in general. For this I am profoundly sorry. It was not my intent. My only point was that the output of these numerical programs are often so highly specialized that only the authors and their peers understand it. Let me go further in saying that if anything I have said in this discussion has come off as critical or insulting please know that that was not my intent. I have tremendous respect for what you all have accomplished and I am extremely appreciative of all the feedback and help you have given me. -Ken

On Mon, Aug 29, 2016 at 04:24:42PM -0700, Ken Kundert wrote:
No offense taken Ken! I completely understand what your astrophysicist friend means, and I don't expect that she should write code for me. But we have to consider code written for her, and me, and you, and system administrators, children learning their first language, Java gurus, Haskell experts, people whose only other language was BASIC in 1980, animators, scientists, web developers, and many, many other disparate groups of people. We have to do it without breaking backwards compatibility. And some how we have to try to balance all those different needs without compromising the essential "Pythonicity" of the language. The culture of Python is very conservative. I don't know of any features in Python that haven't come from some other language. Sometimes, like significant indentation, it was only a single language at the time that Python copied the feature. Sometimes a feature is not accepted unless it is in widespread use. It's a good sign that unit tracking is (slowly) becoming a language feature, like in F#, but I think you doomed your proposal as soon as you said (paraphrasing) "no other language does this, Python should lead the way and blaze this trail". (That's actually not the case, but when the prior art is niche languages like F# and Frink and calculator languages like RPL, it was always going to be a tough sell.) Sometimes it just means that the time is not right for a new feature. The ternary if operator was resisted for many years until a compelling reason to add it was found, then it was accepted rapidly. Maybe the time will never be right: nearly everyone agrees that there is no reason to actively avoid having multi-statement anonymous lambda functions, if only we can find the right syntax. But nobody has found the right syntax that isn't ambiguous, or goes against the style of Python, or requires changes to the parser that are unacceptible for other reasons. Personally, I think that your proposal has a lot of merit, it's just the details that I don't like. Absent an excellent reason why it MUST be a language feature, it should stay in libraries, where people are free to experiment more freely, or projects like IPython and Sage, that can explore their own interactive interpreters that can add new features that Python the language can't. And maybe, in Python 3.7 or 3.9 or 4.9, somebody will come up with the right proposal, or the right syntax, or notice that (let's imagine) Javascript or C++ has done it and the world didn't end, and Python will get unit tracking and/or multiplicative scaling factors as a language feature and you'll be vindicated. Personally, I hope it does. But it has to be done right, and I'm not convinced your proposal is the right way. So until then, I'm happy to stick to it being in libraries. But most importantly, thanks for caring about this! -- Steve

On 29 August 2016 at 11:44, Ken Kundert <python-ideas@shalmirane.com> wrote:
A better comparison here would be to engineering notation with comments stating the units, since Python doesn't restrict the mantissa to a single integer digit (unlike strict scientific notation): 780kpc -> 780e3 # parsecs 108MPa -> 108e6 # pascals 600TW -> 600e12 # watts 3.2Gb -> 3.2e9 # base pairs 53pm -> 53e-12 # meters $8G -> 8e9 # dollars The fundamental requirements for readable code are: - code authors want their code to be readable - the language makes readable code possible So this starts to look like a style guide recommendation: 1. use engineering notation rather than scientific notation 2. annotate your literals with units if they're not implied by context I find the pressure example a particularly interesting one, as I don't know any meteorologists that work with Pascals directly - they work with hectopascals or kilopascals instead. That would make the second example more likely to be one of: 108e3 # kilopascals 1.08e6 # hectopascals Similarly, depending on what you're doing (and this gets into the "natural unit of work" concept David Mertz raised), your base unit of mass may be micrograms, milligrams, grams, kilograms, or tonnes, and engineering notation lets you freely shift those scaling factors between your literals and your (implied or explicit) units, while native SI scaling would be very confusing if you're working with anything other than the base SI unit. Accordingly, I'm starting to wonder if a better way for us to go here might be to finally introduce the occasionally-discussed-but-never-written Informational PEP that spells out "reserved syntax for Python supersets", where we carve out certain things we plan *NOT* to do with the base language, so folks writing Python supersets (e.g. for electronics design, data analysis or code generation) can use that syntax without needing to worry about future versions of Python potentially treading on their toes. Specifically, in addition to this SI scaling idea, I'm thinking we could document: - the Cython extensions for C code generation in .pyx files (cdef, ctypedef, cimport, nogil, NULL) - the IPython extensions for cell magics and shell invocation (unary '%', unary '!'), and maybe their help syntax (postfix '?') The reason I think this may be worth doing is that some of these ideas are ones that only make sense *given a suitably constrained domain*. Python supersets like Cython and IPython get to constrain their target domain in a way that makes these extensions appropriate there in a way that wouldn't be appropriate at the level of the base language, but we can still be explicit at the base language level that we're not doing certain things because we're delegating them to a tool with a more focused target audience. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 8/28/2016 9:44 PM, Ken Kundert wrote:
The way the scientific and engineering communities predominately write real numbers is by using SI scale factors.
I don't believe it, not with naked scale factors as you have proposed. I have worked in science and I never saw naked scale factors until this proposal. The scale factors are usually attached to units.
The scale factor is part of the unit, and people now learn this in grade school, I presume.
These are all scaled units and to me not relevant to the proposed addition of scale factors without units. At this point I quit reading. -- Terry Jan Reedy

On 29/08/2016 02:44, Ken Kundert wrote:
No, no, no, if the people who provide this http://www.scipy.org/ can do without it. Now would you please be kind enough to give up with this dead horse before I take a ride to the Clifton Suspension Bridge or Beachy Head, whichever is closest. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence

Note that the Sage computer algebra system uses Python with some syntactic changes implemented by a "pre-parser". The current proposal could be implemented in a similar way and then integrated in, say, Ipython. If it would prove to be wildly popular, then it would make a stronger case for incorporation in the core. Stephan Op 29 aug. 2016 2:12 p.m. schreef "Mark Lawrence via Python-ideas" < python-ideas@python.org>:

On 29/08/2016 13:35, Stephan Houben wrote:
As iPython is a core part of scipy, which I linked above, why would the developers want to incorporate this suggestion? I'd have also thought that if this idea was to be "wildly popular" it would have been done years ago. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence

On 29 August 2016 at 22:55, Mark Lawrence via Python-ideas <python-ideas@python.org> wrote:
While "If this was a good idea, it would have been done years ago" is a useful rule of thumb, it's also useful to look for ways to test that heuristic to see if there were actually just other incidental barriers in the way of broader adoption. That's particularly so for cases like this one, where a common practice in a handful of domains has failed to make the leap into the broader computing context of general purpose programming. One of the nice things about IPython for this kind of experimentation is that it's *significantly* more pluggable than the default interpreter (where you can do some interesting things with import hooks, but it's hard to change the default REPL). That means it's easier for people to try out ideas as code pre-processors, rather than as full Python implementations (in this case, something that translated SI unit suffixes into runtime scaling factors) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Aug 29, 2016 at 10:55 PM, Mark Lawrence via Python-ideas <python-ideas@python.org> wrote:
I'd have also thought that if this idea was to be "wildly popular" it would have been done years ago.
Here's my question, though, if you want to see the lanterns so badly, why haven't you gone before? -- Flynn Rider, to Rapunzel There are a good few reasons, one of which is simply "nobody's actually done the work to implement it". And a lot of end users might be excited to use something if it were implemented, but wouldn't think to ask for it if nobody mentioned it. Spotting the feature that isn't there is a pretty hard thing to do (unless you're comparing two products and can say "what I want is program X but with feature Q from program Y"). ChrisA

On Mon, Aug 29, 2016 at 02:35:26PM +0200, Stephan Houben wrote:
Indeed. My own personal feeling is that eventually unit tracking and dimensional checking will be considered as mainstream as garbage collection and type checking. But I don't think Python should try to blaze this trail, especially not with the current proposal. In the meantime, if Ken is right about this being of interest to scientists, Sage and IPython would be the most likely places to start. -- Steve

-1 on Python ever having any syntactic support for SI scale factors. It makes the language needlessly complicated, has no benefit I've discerned (vs using libraries), and is a magnet for a large class of bugs. Btw, the argument below feels dishonest in another respect. Within a domain there is a general size scale of quantities of interest. I worked in a molecular dynamics lab for a number of years, and we would deal with simulated timesteps of a few femtoseconds. A total simulation might run into microseconds (or with our custom supercomputer, a millisecond). There were lots of issues I don't begin to understand of exactly how many femtoseconds might be possible to squeeze in a timesteps while retaining good behavior. But the numbers of interest were in the range 0.5-5, and anyone in the field knows that. In contrast, cosmologists deal with intervals of petaseconds. Yeah, I know it's not as simple as that, but just to get at the symmetry. No one would write 2.5e-15 every time they were doing something with an MD timestep. The scaling, if anywhere at all, would be defined once as a general factor at the boundaries. The number of interest is simply, e.g. 2.5, not some large negative exponent on that. In fact, at a certain point I proposed that we should deal with rounding issues by calling the minimum domain specific time unit an attosecond, and only use integers in using this unit. That wasn't what was adopted, but it wasn't absurd. If we had done that, we simply deal with, say 1500 "inherent units" in the program. The fact it related to a physical quantity is at most something for documentation (this principle isn't different because we used floats in this case). On Aug 28, 2016 8:44 PM, "Ken Kundert" <python-ideas@shalmirane.com> wrote:

On Mon, Aug 29, 2016 at 12:32 PM, David Mertz <mertz@gnosis.cx> wrote:
Definitely not absurd. I've done the same kind of thing numerous times (storing monetary values in cents, distances in millimeters, or timepoints in music in milliseconds), because it's just way, WAY simpler than working with fractional values. So the SI prefix gets attached to the (implicit) *unit*, not to the value. I believe this is the correct way to handle things. ChrisA

It makes the language needlessly complicated, has no benefit I've discerned (vs using libraries), and is a magnet for a large class of bugs.
Well the comment about bugs in speculation that does not fit with the extensive experience in the electrical engineering community. But other than that, these arguments could be used against supporting binary, octal, and hexadecimal notation. Are you saying building those into the language was a mistake? On Sun, Aug 28, 2016 at 07:32:35PM -0700, David Mertz wrote:
Yes, without an convenient way of specifying real numbers, the computation communities have to resort to things like this. And they can work for a while, but over type the scale of things often change, and good choice of scale can turn bad after a few years. For example, when I first started in electrical engineering, the typical size of capacitors was in the micro Farad range, and software would just assume that if you gave a capacitance it was in uF. But then, with the advancement of technology the capacitors got smaller. They went from uF to nF to pF and now a growing fraction of capacitors are specified in the fF range. The fact that SPICE allowed values to be specified with SI scale factors, meant that it continued to be easy to use over the years, whereas the programs that hard coded the scale of its numbers because increasing difficult to use and then eventually became simply absurd. Even your example is a good argument for specifying numbers is SI scale factors. If I am using one of your molecular simulators, I don't want to specify the simulation time range as being from 1 to 1_000_000_000_000 fs. That is ridiculous. There are 12 orders of magnitude between the minimum resolvable time and the maximum. There are only two practical ways of representing values over such a wide range: using SI scale factors and using exponential notation. And we can tell which is preferred. You said femptoseconds, you did not say 1e-15 seconds. Even you prefer SI scale factors. -Ken

On Mon, Aug 29, 2016 at 11:44 AM, Ken Kundert <python-ideas@shalmirane.com> wrote:
And easier to compare. The SI prefixes are almost consistent in using uppercase for larger units and lowercase for smaller, but not quite; and there's no particular pattern in which letter is larger. For someone who isn't extremely familiar with them, that makes them completely unordered - which is larger, peta or exa? Which is smaller, nano or pico? Plus, there's a limit to how far you can go with these kinds of numbers, currently yotta at e+24. Exponential notation scales to infinity (to 1e308 in IEEE 64-bit binary floats, but plenty further in decimal.Decimal - I believe its limit is about 1e+(1e6), and REXX on OS/2 had a limit of 1e+(1e10) for its arithmetic), remaining equally readable at all scales. So we can't get rid of exponential notation, no matter what happens. Mathematics cannot usefully handle a system in which we have to represent large exponents with ridiculous compound scale factors: sys.float_info.max = 179.76931348623157*Y*Y*Y*Y*Y*Y*Y*Y*Y*Y*Y*Y*E (It's even worse if the Exa collision means you stop at Peta. 179.76931348623157*P*P*P*P*P*P*P*P*P*P*P*P*P*P*P*P*P*P*P*P*M, anyone?) Which means that these tags are a duplicate way of representing a specific set of exponents.
Except that those are exactly the important questions to be answered. How *could* it be done? With the units stripped off, your examples become: 780k == 7.8e+05 == 780*k 108M == 1.08e+08 == 108*M 600T == 6e+14 == 600*T 3.2G == 3.2e+09 == 3.2*G 53p == 5.3e-11 == 53*p 8G == 8e+09 == 8*G Without any support whatsoever, you can already use the third column notation, simply by creating this module: # si.py k, M, G, T, P, E, Z, Y = 1e3, 1e6, 1e9, 1e12, 1e15, 1e18, 1e21, 1e24 m, μ, n, p, f, a, z, y = 1e-3, 1e-6, 1e-9, 1e-12, 1e-15, 1e-18, 1e-21, 1e-24 u = μ K = k And using it as "from si import *" at the top of your code. Do we see a lot of code in the wild doing this? "[H]ow it will be used after it's done" is exactly the question that this would answer.
Don't Python's users in the scientific and engineering communities deserve the same treatment? These are, after all, core communities for Python.
Yes. That's why we have things like the @ matrix multiplication operator (because the numeric computational community asked for it), and %-formatting for bytes strings (because the networking, mainly HTTP serving, community asked for it). Python *does* have a history of supporting things that are needed by specific sub-communities of Python coders. But there first needs to be a demonstrable need. How much are people currently struggling due to the need to transform "gigapascal" into "e+9"? Can you show convoluted real-world code that would be made dramatically cleaner by language support? ChrisA

On Mon, Aug 29, 2016 at 12:33:16PM +1000, Chris Angelico wrote:
Yes, of course. No one is suggesting abandoning exponential notation. I am not suggesting we force people to use SI scale factors, only that we allow them to. What I am suggesting is that we stop saying to them things like 'you must use exponential notation because we have decided that its better. See, you can easily compare the size of numbers by looking at the exponents.' What is wrong with have two ways of doing things? We have many ways of specifying the value of the integer 16: 0b10000, 0o20, 16, 0x10, 16L, ....
Because by focusing on the implementation details, we miss the big picture. We have already done that, and we ended up going down countless ratholes.
Can you show code that would have been convoluted if Python had used a library rather than built-in support for hexadecimal numbers? So, in summary, you are suggesting that we tell the scientific and engineering communities that we refuse to provide native support for their preferred way of writing numbers because: 1. our way is better, 2. their way is bad because some uneducated person might see the numbers and not understand them, 3. we already have way of representing numbers that we came up with in the '60s and we simply cannot do another, 4. well we could do it, but we have decided that if you would only adapt to this new way of doing things that we just came up with, then we would not have to do any work, and that is better for us. Oh and this this new way of writing numbers, it only works in the program itself. Your out of luck when it comes to IO. These do not seem like good reasons for not doing this. -Ken

On 2016-08-28 20:29, Ken Kundert wrote:
What is wrong with have two ways of doing things? We have many ways of specifying the value of the integer 16: 0b10000, 0o20, 16, 0x10, 16L, ....
Zen of Python: "There should be one-- and preferably only one --obvious way to do it." If Python didn't have binary or octal notation and someone came here proposing it, I would not support it, for the same reasons I don't support your proposal. If someone proposed eliminating binary or octal notation for Python 4 (or even maybe Python 3.8), I would probably support it for the same reason. Those notations are not useful enough to justify their existence. Hexadecimal is more justifiable as it is far more widely used, but I would be more open to removing hexadecimal than I would be to adding octal. Also, "L" as a long-integer suffix is already gone in Python 3. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On Mon, Aug 29, 2016 at 1:40 PM, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
I agree with you on octal - there are very few places where it's the one obvious way to do things, and you can always use int("755",8) if you have data that's represented best octally. But hex is incredibly helpful when you do any sort of bit manipulations, and decimal quickly becomes unwieldy and error-prone. Here's some code from Lib/stat.py: S_IFDIR = 0o040000 # directory S_IFCHR = 0o020000 # character device S_IFBLK = 0o060000 # block device S_IFREG = 0o100000 # regular file S_IFIFO = 0o010000 # fifo (named pipe) S_IFLNK = 0o120000 # symbolic link S_IFSOCK = 0o140000 # socket file These are shown in octal, because Unix file modes are often written in octal. If Python didn't support octal, the obvious alternative would be hex: S_IFDIR = 0x4000 # directory S_IFCHR = 0x2000 # character device S_IFBLK = 0x6000 # block device S_IFREG = 0x8000 # regular file S_IFIFO = 0x1000 # fifo (named pipe) S_IFLNK = 0xA000 # symbolic link S_IFSOCK = 0xC000 # socket file About comparable for these; not as good for the actual permission bits, since there are three blocks of three bits. Python could manage without octal literals, as long as hex literals are available. (I don't support their *removal*, because that's completely unnecessary backward incompatibility; but if Python today didn't have octal support, I wouldn't support its addition.) But the decimal equivalents? No thank you. S_IFDIR = 16384 # directory S_IFCHR = 8192 # character device S_IFBLK = 24756 # block device S_IFREG = 32768 # regular file S_IFIFO = 4096 # fifo (named pipe) S_IFLNK = 40960 # symbolic link S_IFSOCK = 49152 # socket file One of these is wrong. Which one? You know for certain that each of these values has at most two bits set. Can you read these? If you're familiar with the powers of two, you should have no trouble eyeballing the single-bit examples, but what about the others? We need hex constants for anything that involves bitwise manipulations. Having binary constants is nice, but (like with octal) not strictly necessary; but we need at least one out of bin/oct/hex. (Also, 16L doesn't actually mean the integer 16 - it means the *long* integer 16, which is as different from 16 as 16.0 is.) ChrisA

On 29 August 2016 at 13:40, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
Octal literals were on the Python 3 chopping block, with only two things saving them: - *nix file permissions (i.e. the existing sysadmin user base) - the proposal to switch to "0o" as the prefix The addition of "0b" was to make bitwise operators easier to work with, rather than requiring folks to mentally convert between binary and hexadecimal just to figure out how to set a particular bit flag, with the requirement to understand binary math being seen as an essential requirement for working with computers at the software development level (since it impacts so many things, directly and indirectly). Hexadecimal then sticks around as a way of more concisely writing binary literals However, the readability-as-a-general-purpose-language argument in the case of SI scaling factors goes as follows: - exponential notation (both the scientific and engineering variants) falls into the same "required to understand computers" category as binary and hexadecimal notation - for folks that have memorised the SI scaling factors, the engineering notation equivalents should be just as readable - for folks that have not memorised the SI scaling factors, the engineering notation equivalents are *more* readable - therefore, at the language level, this is a style guide recommendation to use engineering notation for quantitative literals over scientific notation (since engineering notation is easier to mentally convert to SI prefixes) However, once we're talking domain specific languages (like circuit design), rather than a general purpose programming language, then knowledge of the SI prefixes can be included in the assumed set of user knowledge, and made a language level feature. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Aug 29, 2016 at 1:29 PM, Ken Kundert <python-ideas@shalmirane.com> wrote:
Because by focusing on the implementation details, we miss the big picture. We have already done that, and we ended up going down countless ratholes.
They're important ratholes though. Without digging into those questions, all you have is an emotive argument of "but we NEEEEEEED to support SI prefixes as integer suffixes!".
See my other email, with examples of bit flags. It's not too bad if you only ever work with a single bit at a time, but bit masks combine beautifully in binary, fairly cleanly in hex, and really badly in decimal. Hex is a great trade-off between clean bit handling and compact representation. (Octal is roughly the same trade-off, and in days of yore was the one obvious choice, but hex has overtaken it.)
Is more general, yes. If all you have is SI prefixes, you're badly scuppered. If all you have is exponential notation, you can do everything.
2. their way is bad because some uneducated person might see the numbers and not understand them,
Is, again, less general. It's a way of writing numbers that makes sense only in a VERY narrow area.
3. we already have way of representing numbers that we came up with in the '60s and we simply cannot do another,
False.
I'm not sure what you mean by "IO" here, but if you're writing a program that accepts text strings and prints text strings, it's free to do whatever it wants.
These do not seem like good reasons for not doing this.
Not worded the way you have them, no, because you've aimed for an extremely emotional argument instead of answering concrete questions like "where's the code that this would improve". Find some real-world code that would truly benefit from this. Show us how it's better. Something that I don't think you've acknowledged is that the SI scaling markers are *prefixes to units*, not *suffixes to numbers*. That is to say, you don't have "132G" of a thing called a "pascal", you have "132" of a thing called a "gigapascal". Outside of a unit-based system, SI prefixes don't really have meaning. I don't remember ever, in a scientific context, seeing a coefficient of friction listed as "30 milli-something"; it's always "0.03". So if unitless values are simply numbers, and Python's ints and floats are unitless, they won't see much benefit from prefixes-on-nothing. ChrisA

Sorry, I am trying very hard not to let my emotions show through, and instead provide answers, examples, and comments that are well reasoned and well supported. I do find it frustrating that I appear to be the only one involved in the conversation that has a strong background in numerical computation, meaning that I have to carry one side of the argument without any support. It is especially frustrating when that background is used as a reason to discount my position. Let me try to make the case in an unemotional way. It is hard to justify the need for SI scale factors being built into the language with an example because it is relatively simple to do the conversion. For example ... With built-in support for SI scale factors: h_line = 1.4204GHz print('hline = {:r}Hz'.format(h_line)) ... In Python today: from engfmt import Quantity h_line = Quantity('1.4204GHz') print('hline = {:q}'.format(h_line)) h_line = float(h_line) ... Not really much harder to use the library. This is very similar to the situation with octal numbers ... With built-in support for octal numbers: S_IFREG = 0o100000 # regular file With out built-in support for octal numbers: S_IFREG = int('100000', base=8) # regular file So just giving a simple example is not enough to see the importance of native support. The problem with using a library is that you always have to convert from SI scale factors as the number is input and then converting back as the number is output. So you can spend a fair amount of effort converting too and from representations that support SI scale factors. Not a big deal if there is only a few, but can be burdensome if there is a large number. But the real benefit to building it in a native capability is that it puts pressure on the rest of the ecosystem to also adopt the new way of representing real numbers. For example, now the interchange packages and formats (Pickle, YaML, etc.) need to come up with a way of passing the information without losing its essential character. This in turn puts pressure on other languages to follow suit. It would also put pressure on documenting and formatting packages, such as Sphinx, Jinja, and matplotlib, to adapt. Now it becomes easier to generate clean documentation. Also the interactive environments, such as ipython, need to adapt. The more this occurs, the better life gets for scientists and engineers.
Yeah, this is why I suggested that we support the ability for users to specify units with the numbers, but it is not a hard and fast rule. Once you add support for SI scale factors, people find them so convenient that they tend to use them whether they are units or not. For example, it is common for circuit designers to specify the gain of an amplifier using SI scale factors even though gain is often unitless. For example, gain=50k. Also, electrical engineers will often drop the units when they are obvious, especially if they are long. For example, it is common to see a resistance specified as 100k. When values are given in a table and all the values in a column have the same units, it is common to give numbers with scale factors by without units to save space. -Ken

On Mon, Aug 29, 2016 at 7:08 PM, Ken Kundert <python-ideas@shalmirane.com> wrote:
An "emotional argument" doesn't necessarily mean that your emotions are governing everything - it's more that your line of reasoning is to play on other people's emotions, rather than on concrete data. Your primary argument has been "But think of all the scientific developers - don't you care about them??", without actually giving us code to work with. (In a recent post, you did at least give notes from a conversation with a lady of hard science, but without any of her code, we still can't evaluate the true benefit of unitless SI scaling.)
No no no; your background is not a reason to discount your position. However, on its own, it's insufficient justification for your position. Suppose I come to python-ideas and say "Hey, the MUD community would really benefit from a magic decoder that would use UTF-8 where possible, ISO-8859-1 as fall-back, and Windows-1252 for characters not in 8859-1". Apart from responding that 8859-1 is a complete subset of 1252, there's not really a lot that you could discuss about that proposal, unless I were to show you some of my code. I can tell you about the number of MUDs that I play, the number of MUD clients that I've written, and some stats from my MUD server, and say "The MUD community needs this support", but it's of little value compared to actual code. (For the record, a two-step decode of "UTF-8, fall back on 1252" is exactly what I do... in half a dozen lines of code. So this does NOT need to be implemented.) That's why I keep asking you for code examples. Real-world code, taken from important projects, that would be significantly improved by this proposal. It has to be Python 3 compatible (unless you reckon that this is the killer feature that will make people take the jump from 2.7), and it has to be enough of an improvement that its authors will be willing to drop support for <3.6 (which might be a trivial concern, eg if the author expects to be the only person running the code).
Maybe; or maybe you'd already be doing a lot of I/O work, and it's actually quite trivial to slap in one little function call at one call site, and magically apply it to everything you do. Without code, we can't know. (I sound like a broken record here.)
Ewwww, I doubt it. That would either mean a whole lot of interchange formats would need to have a whole new form of support. You didn't mention JSON, but that's a lot more common than Pickle; and JSON is unlikely to gain a feature like this unless ECMAScript does (because JSON is supposed to be a strict subset of JavaScript's Object Notation). Pickle - at least the one in Python - doesn't need any way to store non-semantic information, so unless you intend for the scale factor to be a fundamental part of the number, it won't need changes. (People don't manually edit pickle files the way they edit JSON files.) YAML might need to be enhanced. That's the only one I can think of. And it's a big fat MIGHT.
Eww eww eww. You're now predicating your benefit on a huge number of different groups gaining support for this. To what extent do they need to take notice of the SI scales on numbers? Input-only? Output? Optionally on output? How do you control this? Are SI-scaled numbers somehow different from raw numbers, or are they equivalent forms (like 1.1e3 and 1100.0)? If they're equivalent forms, how do you decide how to represent on output? Are all these questions to be answered globally, across all systems, or is it okay for one group to decide one thing and another another?
Gain of 50k, that makes reasonable sense. Resistance as 100k is a shorthand for "100 kiloohms", and it's still fundamentally a unit-bearing value. All of this would be fine if you were building a front-end that was designed *SOLELY* for electrical engineers. So maybe that's the best way. Fork IDLE or iPython and build the very best electrical engineering interactive Python; it doesn't matter, then, how crazy it is for everyone else. You can mess with stuff on the way in and on the way out, you can interpret numbers as unitless values despite being written as "100kPa", and you can figure out what to do in all the edge cases based on actual real-world usage. You'd have your own rules for backward compatibility, rather than being bound by Python's release model (18 months between feature improvements, and nothing gets dropped without a VERY good reason), so you could chop and change as you have need. The base language would still be Python, but it'd be so optimized for electrical engineers that you'd never want to go back to vanilla CPython 3.6. Sound doable? ChrisA

On 29.08.2016 11:37, Chris Angelico wrote:
There was no reasonable real-world code examples taken from important projects, that would be significantly improved by underscores in numbers. Still, we got them, so your argument here is void. that's not different for this proposal. Sven

On Mon, Aug 29, 2016 at 1:55 PM, Sven R. Kunze <srkunze@mail.de> wrote:
There was no reasonable real-world code examples taken from important projects, that would be significantly improved by underscores in numbers.
I recall dozens of real world examples that came up during the discussion, and have written very numerous such examples in code of my own. This is something that directly affects readability in code I write almost every day, and is a clear and obvious win. I taught examples *today* where I would have badly liked to have underscore separators, and it was obvious to me and students that awkwardness. Writing, e.g. `range(int(1e7))` feel contrives (but usually the way I do it). Writing `range(10000000)` is nearly impossible to parse visually. In contrast, writing `range(10_000_000)` will be immediately clear and obvious. None of those things can be said of SI units as Python literals. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

On 30.08.2016 04:34, David Mertz wrote:
Hu? None of those things? I do think you exaggerate quite a lot here. If your real-world example works for underscores, it works for SI units and scales as well. I for one don't have usage of either, so having such a distance to the subject at hand, I don't see this as a compelling argument against/for his proposal. Sven

On 29 August 2016 at 19:08, Ken Kundert <python-ideas@shalmirane.com> wrote:
Also the interactive environments, such as ipython, need to adapt. The more this occurs, the better life gets for scientists and engineers.
This theory of change is backwards - we follow IPython and Project Jupyter when it comes to understanding what's a desirable UX for scientists (primarily) and engineers (somewhat), rather than the other way around. (Ditto for SciPy and Numpy for the computational requirements side of things - in addition to the already referenced https://www.python.org/dev/peps/pep-0465/ for matrix multiplication, there's also https://www.python.org/dev/peps/pep-0357/ which defined the __index__ protocol, and https://www.python.org/dev/peps/pep-3118/ which defined a rich C-level protocol for shaped data export. Even before there was a PEP process, extended slicing and the Ellipsis literal were added for the benefits of folks writing multidimensional array indexing libraries) So if your aim is to make a "scientists & engineers will appreciate it" UX argument, then you're unlikely to gain much traction here if you haven't successfully made that argument in the Project Jupyter and/or SciPy ecosystems first - if there was a popular "%%siunits" cell magic, or a custom Project Jupyter kernel that added support for SI literals, we'd be having a very different discussion (and you wouldn't feel so alone in making the case for the feature). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I teach working scientists about numeric computing on a daily basis. There are a few special field where Ken's ideas are the norm, at least in informal notation. The large majority of working scientists would find a syntax change like he proposes an annoyance and nuisance. Alienating and confusing everyone who isn't a circuit designer is a bad goal. It's not going to happen in python. If you really want this syntax, you need to use a different language, or maybe write a preprocessor that turns a slightly different language back into Python. On Aug 29, 2016 4:09 AM, "Ken Kundert" <python-ideas@shalmirane.com> wrote:

On Aug 28 2016, Ken Kundert <python-ideas-jl/PDlM0qtzz1n+OaKNE4w@public.gmane.org> wrote:
I think you're making some incorrect assumptions here. Who, exactly, do you mean with "we" and "them"? I consider myself part of the scientific community and think your proposal is a bad idea, and Google finds some Python modules from you, but no prior CPython contributions... Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

Nikolaus, I have belatedly realized that this kind of hyperbole is counter productive. So let me back away from that statement and instead try to understand your reasons for not liking the proposal. Do you think there is no value to be able to naturally read and write numbers with SI scale factors from Python? Or is your issue with something about my proposal? -Ken On Mon, Aug 29, 2016 at 06:59:15PM -0700, Nikolaus Rath wrote:

On 30 August 2016 at 04:19, Ken Kundert <python-ideas@shalmirane.com> wrote:
Ken, Answering these questions from my perspective (and thanks for taking note of the comments and toning down your delivery, by the way) I have an issue with the way your proposal is vague about the relationship between SI scales and units - it's been mentioned a couple of times, but never adequately addressed, that scales are tightly linked to units (we routinely talk about kilometers and milligrams, but almost never about kilos or millis). There have been some strong (and justified, IMO) objections to adding units without giving them proper semantic meaning, and your response to that seems to be to talk for a while about scale factors in isolation, but then go straight back to examples using scaled units. It's hard to understand exactly what you're proposing when your examples don't match your suggestions. If we assume you *are* simply talking about pure scales (k=1000, M=1000000 etc), then you haven't addressed the many suggestions of alternatives, with anything more substantive than "well, I (and colleagues I know) prefer scale factors", plus some examples with scaled *units* again. Your comparisons tend to show your preferred approach in the best light, while using the least attractive alternative options. And there's almost no proper discussions of pros and cons. In short, you offer almost nothing in the way of objective arguments for your proposals. You mention "reading and writing numbers with scale factors from Python". It's easy enough to do external IO with scale factors, you just read strings and parse them as you wish. A language syntax only affects internal constants - and it's not clear to me that the benefit is significant even then, as I'd expect (as a matter of good style) that any constant needing this type of syntax should be named anyway. Again, this isn't something you address. You've offered no examples of real-world code in existing public projects that would be improved by your proposal. While that's not always necessary to a successful proposal, it certainly makes it more compelling, and helps to confirm that a proposal isn't limited to "niche" areas. So to summarise, I don't think you've made objective arguments for your proposal (your *subjective* enthusiasm for the proposal has never been in doubt), or addressed many of the comments that have already been made. To be honest, I don't think there's much chance of your proposal being accepted at this point in time. As Steven noted, Python tends not to be a leader in matters like this, and so the lack of mainstream prior art is probably sufficient to kill this proposal. But for your own benefit (and the benefit of any future proposals you may make in this or other areas - please don't feel put off by the fact that this specific proposal has met with a lot of resistance) you might want to review the thread and consider what a PEP might look like for this discussion, and how you would have incorporated and responded to the objections raised here - https://www.python.org/dev/peps/pep-0001/#what-belongs-in-a-successful-pep is a good summary of the sort of things you should be looking at. There's no need to actually complete a PEP or post it, the proposal here hasn't reached a stage where a PEP is useful, but thinking about the PEP structure might help you understand the context a bit better. I hope this helps, Paul

On Aug 29 2016, Ken Kundert <python-ideas-jl/PDlM0qtzz1n+OaKNE4w@public.gmane.org> wrote:
* I think there is no value gained by being able to write 32.3m instead of 32.3e6. I think the second one is clear to everyone who uses SI prefixes, while the first one just introduces a lot of complexities. Most of them have been mentioned already: - no deducible ordering if one doesn't know the prefixes - potential for ambiguity with Exa - question about base 2 vs base 10, e.g what do you expect to be stored in *size* if you reed this: "size = 10M # we need that many bytes" - rather abitrary naming ("M" and "m" vs "T" and "p"). * I think having SI *Unit* support "32.3 kN" would be nice, but only if there is a space between number and unit, and only if the unit actually get's attached to the number. But your proposal would result in 1km + 1µN == 2 being true. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

On 2016-08-29 02:44, Ken Kundert wrote: [snip]
For currency, it's usually "million" or "m"/"M", "billion" or "bn" (or maybe "b"/"B"), "trillion" (or maybe "tn" or "t"/"T"). Using a suffixed SI scale factor with a prefixed currency symbol is not that common, in my experience. [snip]
There's also "engineering notation", where the exponent is a multiple of 3. [snip]
I expect that octal and hexadecimal number support was there from the start. CPython is written in C and Python borrowed the notation. The binary notation was added in Python 2.6 and followed the same pattern as the hexadecimal notation. The octal notation of a leading "0" was later replaced with a clearer one that followed the same pattern. C had octal and hexadecimal from the start. (Actually, I'm not entirely sure about hexadecimal, octal being the preferred form, but if it wasn't there from the very start, it was an early addition.) C descends from BCPL, which had octal and hexadecimal, and BCPL dates from 1967. There are other languages too that had hexadecimal and octal. They've been around in programming languages for decades. How many languages have scale factors? Does Fortran? Not that I know of. [snip]

The reason why hexadecimal and octal are in general purpose languages and real numbers with SI scale factors are not is because languages are developed by computer scientists and not by scientists. I keep using SPICE and Verilog as examples of a languages that supports SI scale factors, and that is because they are the extremely rare cases where the languages were either developed or specified by end users and not by computer scientists. The reason why computer scientists tend to add hexadecimal and octal numbers to their languages and not SI scale factors is that they use hexadecimal and octal numbers, and as we have seen by this discussion, are rather unfamiliar with real numbers with SI scale factors. It is easy for them to justify adding hex because they know from personal experience that it is useful, but if you don't use widely scaled real numbers day in and day out it is hard to understand just how tedious exponential notation is and how useful it would be to use SI scale factors.

I didn't follow the previous discussion so far, so excuse me if I repeat something somebody already mentioned. But these are intriguing points you made here. On 29.08.2016 09:31, Ken Kundert wrote:
I didn't know that THERE ARE languages that already feature SI factors. You could be right about their development. I for one wouldn't have an issue with this being in Python for the following reasons: 1) I wouldn't use it as I don't have the use-cases right now 2) if I would need to read such code, it wouldn't hurt my reading experience as I am used to SI 3) there will be two classes of code here: a) code that has use for it and thus uses it quite extensively and code that doesn't; depending on where you work you will encounter this feature or you don't even know it exists (this is true for many features in Python which is a good thing: each domain should use what is the best tool for them) The biggest issue I have is the following: SI scale factors without SI units do not make much sense, I think (especially considering those syntax changes). So, the potential, if any, can only illustrated in combination with them. But Python does not feature any SI units so far as those are provided by external packages. If you can resolve that I am +1 on this proposal, but otherwise just +0. Sven PS: If I think about it this way, I might have a use-case in a small side-project.

On Sun, Aug 28, 2016 at 6:44 PM, Ken Kundert <python-ideas@shalmirane.com> wrote:
These are not equal. 780kpc is a *number with units*. 7.8e+05 == 780000 is a *unitless number*. All the numbers on the right hand side above have no units so I can't tell which are pc or W or m or $. It's asking for trouble to go halfway in representing units. On the left hand side, 780kpc + 108MPa is invalid while 780kpc + 53pm is valid. On the right hand side, sums of any two numbers are valid as they would be with the unitless SI prefixes. So if you want to solve this problem, write a module that supports units. For example, m(780, 'kpc') == m('780kpc') == m(780, kpc) and it's legal to write m(780, kpc) + m(53, pm) but an exception gets raised if you write m(2, kW) + m(3, kh) instead of m(2, kW) * m(3, km) == m(6, MWh). In fact, several people have already done that. Here are three I found in < 1 minute of searching: https://pint.readthedocs.io/en/0.7.2/ https://pypi.python.org/pypi/units/ https://python-measurement.readthedocs.io/en/latest/ Should support for this ever get added to the core language? I doubt it. But if one of these modules becomes enormously popular and semi-standard, you never know. I think you're much more likely to get this into your Python code by way of a preprocessor. --- Bruce Check out my puzzle book and get it free here: http://J.mp/ingToConclusionsFree (available on iOS)

On 2016-08-28 18:44, Ken Kundert wrote:
You've continually repeated this assertion, but I don't buy it. For the general case, exponential notation is easier to read because you can always see exactly what the exponent is as a number. To read SI units, you have to know all the SI prefixes. This may well be common within scientific communities, but to say that it is "easier" is really a bit much. The same is true of "harder to type". "kpc" is three characters; e+5 is also three (note that you don't need to write e+05), and one of those is a number that transparently indicates how many places to move the decimal, whereas all of the letters in "kpc" are opaque unless you already know what the number is meant to represent. If you have concrete evidence (e.g., from actual user experience research) showing that it is across-the-board "easier" to read or type SI prefixes than exponential notation, that would be good to see. In the absence of that, these assertions are just doubling down on the same initial claim, namely that adding SI units to Python would make things more convenient *for those using it to compute with literally-entered quantities in SI units*. I quite agree that that is likely true, but to my mind that is not enough to justify the disruption of adding it at the syntactic level. (Unless, again, you have some actual evidence showing that this particular kind of use of numeric literals occurs in a large proportion of Python code.)
My current opinion is no. There are lots of things that are common. (Should we include a spellchecker in Python because many people frequently make spelling errors?) The fact that SI units are de rigeur in the physical science community isn't enough. I would want to see some actual attempt to quantify how much benefit there would be in the PYTHON community (which of course includes, but is not limited to, those using Python for physical-science computations). -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On Sun, Aug 28, 2016 at 08:26:38PM -0700, Brendan Barnwell wrote:
For the record, I don't know what kpc might mean. "kilo pico speed of light"? So I looked it up using units, and it is kilo-parsecs. That demonstrates that unless your audience is intimately familiar with the domain you are working with, adding units (especially units that aren't actually used for anything) adds confusion. Python is not a specialist application targetted at a single domain. It is a general purpose programming language where you can expect a lot of cross-domain people (e.g. a system administrator asked to hack on a script in a domain they know nothing about).
You don't have to write e+5 either, just e5 is sufficient.
I completely believe Ken that within a single tightly focussed user community, using their expected conventions (including SI prefixes) works really well. But Python users do not belong to a single tightly focussed user community. -- Steve

On Mon, Aug 29, 2016 at 01:45:20PM +1000, Steven D'Aprano wrote:
I talked to astrophysicist about your comments, and what she said was: 1. She would love it if Python had built in support for real numbers with SI scale factors 2. I told her about my library for reading and writing numbers with SI scale factors, and she was much less enthusiastic because using it would require convincing the rest of the group, which would be too much effort. 3. She was amused by the "kilo pico speed of light" comment, but she was adamant that the fact that you, or some system administrator, does not understand what kpc means has absolutely no affect on her desired to use SI scale factors. Her comment: I did not write it for him. 4. She pointed out that the software she writes and uses is intended either for herself of other astrophysicists. No system administrators involved.
You think that Python is only used by generalists? That is silly. Have you seen SciPy? If you think that, take a look at Casa (casaguides.nrao.edu). It is written by astrophysicists for astrophysicists doing observations on radio telescope arrays. That is pretty specialized. -Ken

On 2016-08-29 00:07, Ken Kundert wrote:
I think you misunderstand. My position (reiterated by the text you quote from Steven D'Aprano) is not that Python is used only by generalists. It is that we shouldn't change Python in a way that ONLY helps specialists. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On Mon, Aug 29, 2016 at 12:18:02AM -0700, Brendan Barnwell wrote:
But surely we should consider changing Python if the change benefits a wide variety of specialists, especially if the change is small and fits cleanly into the language. In this case, our specialists come from most of the disciplines of science and engineering. That is a pretty big group. -Ken

Chris, I was not able to get an astrophyics example, but I do have a reasonable one that performs a spectral analysis of the output of an analog to digital converter, something radio astronomers are known to do. I am including the code, but it requires a rather large data file to run, which I will not include. The code uses my 'engfmt' library from pypi to perform conversion to SI form. In this example, there is no need for conversion from SI form. #!/usr/bin/env python3 import numpy as np from numpy.fft import fft, fftfreq, fftshift import matplotlib as mpl mpl.use('SVG') from matplotlib.ticker import FuncFormatter import matplotlib.pyplot as pl from engfmt import Quantity, set_preferences set_preferences(spacer=' ') def mag(spectrum): return np.absolute(spectrum) def freq_fmt(val, pos): return Quantity(val, 'Hz').to_eng() def volt_fmt(val, pos): return Quantity(val, 'V').to_eng() freq_formatter = FuncFormatter(freq_fmt) volt_formatter = FuncFormatter(volt_fmt) data = np.fromfile('delta-sigma.smpl', sep=' ') time, wave = data.reshape((2, len(data)//2), order='F') timestep = time[1] - time[0] nonperiodicity = wave[-1] - wave[0] period = timestep * len(time) print('timestep = {}'.format(Quantity(timestep, 's'))) print('nonperiodicity = {}'.format(Quantity(nonperiodicity, 'V'))) print('timepoints = {}'.format(len(time))) print('freq resolution = {}'.format(Quantity(1/period, 'Hz'))) window = np.kaiser(len(time), 11)/0.37 # beta=11 corresponds to alpha=3.5 (beta = pi*alpha), /.4 # 0.4 is the # processing gain with alpha=3.5 is 0.37 #window = 1 windowed = window*wave spectrum = 2*fftshift(fft(windowed))/len(time) freq = fftshift(fftfreq(len(wave), timestep)) fig = pl.figure() ax = fig.add_subplot(111) ax.plot(freq, mag(spectrum)) ax.set_yscale('log') ax.xaxis.set_major_formatter(freq_formatter) ax.yaxis.set_major_formatter(volt_formatter) pl.savefig('spectrum.svg') ax.set_xlim((0, 1e6)) pl.savefig('spectrum-zoomed.svg') When run, this program prints the following diagnostics to stdout: timestep = 20 ns nonperiodicity = 2.3 pV timepoints = 27994 freq resolution = 1.7861 kHz It also generates two SVG files. I have converted one to PNG and attached it. A few comments: 1. The data in the input file ('delta-sigma.smpl') has low dynamic range and is machine generated, and not really meant for direct human consumption. As such, it does not benefit from using SI scale factors. But there are certainly cases where the data has both high dynamic range and is intended for people to examine it directly. In those cases it would be very helpful if NumPy was able to directly read the file. As the language exists today, I would need to read the file myself, manually convert it, and feed the result to NumPy. 2. Many of these numbers that are output do have high dynamic range and are intended to be consumed directly by humans. These benefit from using SI scale factors. For example, the 'freq resolution' can vary from Hz to MHz and 'nonperiodicity' can vary from fV to mV. 3. Extra effort was expended to make the axis labels on the graph use SI scale factors so as to make the results 'publication quality'. My hope is that if Python accepted SI literals directly, then both NumPy nad MatPlotLib would also be extended to accept/use these formats directly, eliminating the need for me to do the conversions and manage the axes. -Ken On Mon, Aug 29, 2016 at 06:02:29PM +1000, Chris Angelico wrote:

On Mon, Aug 29, 2016 at 9:07 AM, Ken Kundert <python-ideas@shalmirane.com> wrote:
Astropy also has a very powerful units package--originally derived from pyunit I think but long since diverged and grown: http://docs.astropy.org/en/stable/units/index.html It was originally developed especially for astronomy/astrophysics use and has some pre-defined units that many other packages don't have, as well as support for logarithmic units like decibel and optional (and customizeable) unit equivalences (e.g. frequency/wavelength or flux/power). That said, its power extends beyond astronomy and I heard through last week's EuroScipy that even some biology people have been using it. There's been some (informal) talk about splitting it out from Astropy into a stand-alone package. This is tricky since almost everything in Astropy has been built around it (dimensional calculations are always used where possible), but not impossible. One of the other big advantages of astropy.units is the Quantity class representing scale+dimension values. This is deeply integrated into Numpy so that units can be attached to Numpy arrays, and all Numpy ufuncs can operate on them in a dimensionally meaningful way. The needs for this have driven a number of recent features in Numpy. This is work that, unfortunately, could never be integrated into the Python stdlib.

On Mon, Aug 29, 2016 at 3:05 PM, Erik Bray <erik.m.bray@gmail.com> wrote:
I'll also add that syntactic support for units has rarely been an issue in Astropy. The existing algebraic rules for units work fine with Python's existing order of operations. It can be *nice* to be able to write "1m" instead of "1 * m" but ultimately it doesn't add much for clarity (and if really desired could be handled with a preparser--something I've considered adding for Astropy sources (via codecs). Best, Erik

I just want to add, as an astrophysicist who uses astropy.units: the astropy solution is pretty great, and I don’t mind the library overhead. I’d much rather have astropy.units, which does dimensional analysis, as well as handling SI prefixes for 2 reasons: 1. I don’t normally see or use SI prefixes without units, so bare SI prefixes are fairly worthless to me as a scientist. IF the units are going to be there, I’d much rather have a library that does a good job at dimensional analysis, and has taken my domain-specific concerns into account, for reasons fairly well covered in this thread. 2. I don’t find it cumbersome at all to use something like astropy.units which provides both the prefix and units for my code on input and output. The added syntactic weight of a single import, plus multiplication, is really not that big a burden, and makes it both clear what I am trying to write, and easy for the library to maintain this meaning when I use the variable later. e.g. from astropy.units import * distance = 10 * km If that multiplication symbol is really too much to handle, then I’d rather see python support implicit multiplication as suggested above (i.e. “10 km” is parsed as “10 * km") and domain-specific libraries can support SI prefixes and units. ~ Alex

Erik, One aspect of astropy.units that differs significantly from what I am proposing is that with astropy.units a user would explicitly specify the scale factor along with the units, and that scale factor would not change even if the value became very large or very small. For example: >>> from astropy import units as u >>> d_andromeda = 7.8e5 * u.parsec >>> print(d_andromeda) 780000.0 pc >>> d_sun = 93e6*u.imperial.mile >>> print(d_sun.to(u.parsec)) 4.850441695494146e-06 pc >>> print(d_andromeda.to(u.kpc)) 780.0 kpc >>> print(d_sun.to(u.kpc)) 4.850441695494146e-09 kpc I can see where this can be helpful at times, but it kind of goes against the spirit of SI scale factors, were you are generally expected to 'normalize' the scale factor (use the scale factor that results in the digits presented before the decimal point falling between 1 and 999). So I would expected d_andromeda = 780 kpc d_sun = 4.8504 upc Is the normalization available astropy.units and I just did not find it? Is there some reason not to provide the normalization? It seems to me that pre-specifying the scale factor might be preferred if one is generating data for a table and all the magnitude of the values are known in advance to within 2-3 orders of magnitude. It also seems to me that if these assumptions were not true, then normalizing the scale factors would generally be preferred. Do you believe that? -Ken On Mon, Aug 29, 2016 at 03:05:50PM +0200, Erik Bray wrote:

On 30 August 2016 at 13:48, Ken Kundert <python-ideas@shalmirane.com> wrote:
The "imperial.mile" example here highlights one key benefit that expression based approaches enjoy over dedicated syntax: easy access to Python's existing namespace features. As a quick implementation sketch, consider something like:
You could also relatively easily adapt that such that there there was only one level of lookup, and you could write the examples without the second dot (you'd just need to do some parsing of the key value in __getattr__ to separate the SI prefix from the nominal units) One particular benefit of this kind of approach is that you automatically avoid the "E" ambiguity problem, since there's nothing wrong with "si.E" from Python's perspective. You also gain an easy hook to attach interactive help: "help(si)" (or si? in IPython terms) Expanding out to full dimensional analysis with something like astropy.units also becomes relatively straightforward, just by changing the kind of value that __getattr__ returns. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Aug 29, 2016 at 08:48:55PM -0700, Ken Kundert wrote:
Let me see if I get this straight... you *explicitly* asked for the distance to the sun in kpc (kiloparsecs), but you expected a result in µpc (microparsecs)? When you ask the waiter for a short black, do you get upset that he doesn't bring you a latte with soy milk? *wink* I can see that such a normalising function would be useful, but I don't think it should be the default. (If I ask for millimetres, I want millimetres, not gigametres.) I've written and used code like that for bytes, it makes sense to apply it to other measurement units. But only if the caller requests normalisation, never by default. I don't think there is any such general expectation that values should be normalised in that way, and certainly not that your conversion program should automatically do it for you. For example, see this list of long-lived radioactive isotopes: http://w.astro.berkeley.edu/~dperley/areopagus/isotopetable.html Values above 650,000,000,000 (650e9) years are shown in "scientific format", not "engineering format", e.g. Selenium-82 is given as 1.1 x 10^20 rather than 110 x 10^18. Likewise: http://www.nist.gov/pml/data/halflife-html.cfm displays a range of units (minutes, hours, days) with the base value ranging up to over ten thousand, e.g. Ti-44 is shown as 22154 ± 456 days. This is NIST, which makes it pretty official. I don't think there's any general expectation that values should be shown in the range 1 to 999. (Perhaps in certain specialist areas.) -- Steve

Steve, Actually I initially asked for the distances in parsecs and was expecting that they would be presented in a convenient format. So, to frame it in terms of your analogy, I ordered a short black and become upset when I am delivered 8oz of coffee in a 55 gallon drum. This seems to be one of those unstated assumptions that have caused confusion in these discussions. Sometimes you want fix the prefix, sometimes you don't. For example, the Bel (B) is a unit of measure for ratios, but we never use it directly, we always use decibels (dB). Nobody uses mB or kB or even B, it is always dB. But with other units we do use the scale factors and we do tend to normalize the presentation. For example, nobody says the Usain Bolt won the 100000mm dash, or the 0.1km dash. Similarly when people refer to the length of the Olympic road race in Rio, they say 56km, not 56000m. This is really only an issue with output. What I am suggesting is adding support for the second case into stdlib. For example: >>> print('Attenuation = {:.1f}dB at {:r}m.'.format(-13.7, 50e3)) Attenuation = -13.7dB at 50km. -Ken On Tue, Aug 30, 2016 at 11:41:10PM +1000, Steven D'Aprano wrote:

On Tue, Aug 30, 2016 at 11:48 AM, Ken Kundert <python-ideas@shalmirane.com> wrote:
[...] Similarly when people refer to the length of the Olympic road race in Rio, they say 56km, not 56000m.
However I can't help to point out that if I said the distance to the sun is 149.6 Gm, most people would do a double-take.
This is really only an issue with output.
So maybe the proposal should be toned down to just a way to request SI units when formatting numbers? -- --Guido van Rossum (python.org/~guido)

Guido, I am in the process of summarizing the discussion as a way of wrapping this up. As part of that I will be making a proposal that I think has a chance of being accepted, and it will largely be what you suggest. -Ken On Tue, Aug 30, 2016 at 11:59:19AM -0700, Guido van Rossum wrote:

On Tue, Aug 30, 2016 at 5:48 AM, Ken Kundert <python-ideas@shalmirane.com> wrote:
Hi Ken, I see what you're getting at, and that's a good idea. There's also nothing in the current implementation preventing it, and I think I'll even suggest this to Astropy (with proper attribution)! I think there are reasons not to always do this, but it's a nice option to have. Point being nothing about this particular feature requires special support from the language, unless I'm missing something obvious. And given that Astropy (or any other units library) is third-party chances are a feature like this will land in place a lot faster than it has any chance of showing up in Python :) Best, Erik

It has been pointed out to me that the above comes off as being condescending towards Steven, system administrators and language developers in general. For this I am profoundly sorry. It was not my intent. My only point was that the output of these numerical programs are often so highly specialized that only the authors and their peers understand it. Let me go further in saying that if anything I have said in this discussion has come off as critical or insulting please know that that was not my intent. I have tremendous respect for what you all have accomplished and I am extremely appreciative of all the feedback and help you have given me. -Ken

On Mon, Aug 29, 2016 at 04:24:42PM -0700, Ken Kundert wrote:
No offense taken Ken! I completely understand what your astrophysicist friend means, and I don't expect that she should write code for me. But we have to consider code written for her, and me, and you, and system administrators, children learning their first language, Java gurus, Haskell experts, people whose only other language was BASIC in 1980, animators, scientists, web developers, and many, many other disparate groups of people. We have to do it without breaking backwards compatibility. And some how we have to try to balance all those different needs without compromising the essential "Pythonicity" of the language. The culture of Python is very conservative. I don't know of any features in Python that haven't come from some other language. Sometimes, like significant indentation, it was only a single language at the time that Python copied the feature. Sometimes a feature is not accepted unless it is in widespread use. It's a good sign that unit tracking is (slowly) becoming a language feature, like in F#, but I think you doomed your proposal as soon as you said (paraphrasing) "no other language does this, Python should lead the way and blaze this trail". (That's actually not the case, but when the prior art is niche languages like F# and Frink and calculator languages like RPL, it was always going to be a tough sell.) Sometimes it just means that the time is not right for a new feature. The ternary if operator was resisted for many years until a compelling reason to add it was found, then it was accepted rapidly. Maybe the time will never be right: nearly everyone agrees that there is no reason to actively avoid having multi-statement anonymous lambda functions, if only we can find the right syntax. But nobody has found the right syntax that isn't ambiguous, or goes against the style of Python, or requires changes to the parser that are unacceptible for other reasons. Personally, I think that your proposal has a lot of merit, it's just the details that I don't like. Absent an excellent reason why it MUST be a language feature, it should stay in libraries, where people are free to experiment more freely, or projects like IPython and Sage, that can explore their own interactive interpreters that can add new features that Python the language can't. And maybe, in Python 3.7 or 3.9 or 4.9, somebody will come up with the right proposal, or the right syntax, or notice that (let's imagine) Javascript or C++ has done it and the world didn't end, and Python will get unit tracking and/or multiplicative scaling factors as a language feature and you'll be vindicated. Personally, I hope it does. But it has to be done right, and I'm not convinced your proposal is the right way. So until then, I'm happy to stick to it being in libraries. But most importantly, thanks for caring about this! -- Steve

On 29 August 2016 at 11:44, Ken Kundert <python-ideas@shalmirane.com> wrote:
A better comparison here would be to engineering notation with comments stating the units, since Python doesn't restrict the mantissa to a single integer digit (unlike strict scientific notation): 780kpc -> 780e3 # parsecs 108MPa -> 108e6 # pascals 600TW -> 600e12 # watts 3.2Gb -> 3.2e9 # base pairs 53pm -> 53e-12 # meters $8G -> 8e9 # dollars The fundamental requirements for readable code are: - code authors want their code to be readable - the language makes readable code possible So this starts to look like a style guide recommendation: 1. use engineering notation rather than scientific notation 2. annotate your literals with units if they're not implied by context I find the pressure example a particularly interesting one, as I don't know any meteorologists that work with Pascals directly - they work with hectopascals or kilopascals instead. That would make the second example more likely to be one of: 108e3 # kilopascals 1.08e6 # hectopascals Similarly, depending on what you're doing (and this gets into the "natural unit of work" concept David Mertz raised), your base unit of mass may be micrograms, milligrams, grams, kilograms, or tonnes, and engineering notation lets you freely shift those scaling factors between your literals and your (implied or explicit) units, while native SI scaling would be very confusing if you're working with anything other than the base SI unit. Accordingly, I'm starting to wonder if a better way for us to go here might be to finally introduce the occasionally-discussed-but-never-written Informational PEP that spells out "reserved syntax for Python supersets", where we carve out certain things we plan *NOT* to do with the base language, so folks writing Python supersets (e.g. for electronics design, data analysis or code generation) can use that syntax without needing to worry about future versions of Python potentially treading on their toes. Specifically, in addition to this SI scaling idea, I'm thinking we could document: - the Cython extensions for C code generation in .pyx files (cdef, ctypedef, cimport, nogil, NULL) - the IPython extensions for cell magics and shell invocation (unary '%', unary '!'), and maybe their help syntax (postfix '?') The reason I think this may be worth doing is that some of these ideas are ones that only make sense *given a suitably constrained domain*. Python supersets like Cython and IPython get to constrain their target domain in a way that makes these extensions appropriate there in a way that wouldn't be appropriate at the level of the base language, but we can still be explicit at the base language level that we're not doing certain things because we're delegating them to a tool with a more focused target audience. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 8/28/2016 9:44 PM, Ken Kundert wrote:
The way the scientific and engineering communities predominately write real numbers is by using SI scale factors.
I don't believe it, not with naked scale factors as you have proposed. I have worked in science and I never saw naked scale factors until this proposal. The scale factors are usually attached to units.
The scale factor is part of the unit, and people now learn this in grade school, I presume.
These are all scaled units and to me not relevant to the proposed addition of scale factors without units. At this point I quit reading. -- Terry Jan Reedy

On 29/08/2016 02:44, Ken Kundert wrote:
No, no, no, if the people who provide this http://www.scipy.org/ can do without it. Now would you please be kind enough to give up with this dead horse before I take a ride to the Clifton Suspension Bridge or Beachy Head, whichever is closest. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence

Note that the Sage computer algebra system uses Python with some syntactic changes implemented by a "pre-parser". The current proposal could be implemented in a similar way and then integrated in, say, Ipython. If it would prove to be wildly popular, then it would make a stronger case for incorporation in the core. Stephan Op 29 aug. 2016 2:12 p.m. schreef "Mark Lawrence via Python-ideas" < python-ideas@python.org>:

On 29/08/2016 13:35, Stephan Houben wrote:
As iPython is a core part of scipy, which I linked above, why would the developers want to incorporate this suggestion? I'd have also thought that if this idea was to be "wildly popular" it would have been done years ago. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence

On 29 August 2016 at 22:55, Mark Lawrence via Python-ideas <python-ideas@python.org> wrote:
While "If this was a good idea, it would have been done years ago" is a useful rule of thumb, it's also useful to look for ways to test that heuristic to see if there were actually just other incidental barriers in the way of broader adoption. That's particularly so for cases like this one, where a common practice in a handful of domains has failed to make the leap into the broader computing context of general purpose programming. One of the nice things about IPython for this kind of experimentation is that it's *significantly* more pluggable than the default interpreter (where you can do some interesting things with import hooks, but it's hard to change the default REPL). That means it's easier for people to try out ideas as code pre-processors, rather than as full Python implementations (in this case, something that translated SI unit suffixes into runtime scaling factors) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Mon, Aug 29, 2016 at 10:55 PM, Mark Lawrence via Python-ideas <python-ideas@python.org> wrote:
I'd have also thought that if this idea was to be "wildly popular" it would have been done years ago.
Here's my question, though, if you want to see the lanterns so badly, why haven't you gone before? -- Flynn Rider, to Rapunzel There are a good few reasons, one of which is simply "nobody's actually done the work to implement it". And a lot of end users might be excited to use something if it were implemented, but wouldn't think to ask for it if nobody mentioned it. Spotting the feature that isn't there is a pretty hard thing to do (unless you're comparing two products and can say "what I want is program X but with feature Q from program Y"). ChrisA

On Mon, Aug 29, 2016 at 02:35:26PM +0200, Stephan Houben wrote:
Indeed. My own personal feeling is that eventually unit tracking and dimensional checking will be considered as mainstream as garbage collection and type checking. But I don't think Python should try to blaze this trail, especially not with the current proposal. In the meantime, if Ken is right about this being of interest to scientists, Sage and IPython would be the most likely places to start. -- Steve
participants (19)
-
Alex Rudy
-
Brendan Barnwell
-
Bruce Leban
-
Chris Angelico
-
David Mertz
-
Erik Bray
-
Guido van Rossum
-
João Santos
-
Ken Kundert
-
Mark Lawrence
-
MRAB
-
Nick Coghlan
-
Nikolaus Rath
-
Paul Moore
-
Stephan Houben
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Sven R. Kunze
-
Terry Reedy