Custom literals, a la C++

C++ has custom literal syntax <https://en.cppreference.com/w/cpp/language/user_literal>. I couldn't find a PEP on the subject— has this been considered and rejected, or is there a reason it's unpopular?

Will Bradley writes:
I'm pretty sure there is no PEP on that specifically, and I can't recall it being proposed. I don't know about C++-style user literals, but in general Python has rejected user-defined syntax on the grounds that it makes the language harder to parse both for the compiler and for human beings. Python has a fixed set of operator symbols in any version (operator symbols have been added in the past and more may be added in the future), with a standard way of providing semantics for each operator (with a few exceptions: assignment is name binding, not an operation on objects, and the semantics can't be changed, 'is' is object identity and can't be overloaded, and the ternary operator 'x if test else y' can't be overloaded). Proposals for user-defined operator symbols as in Haskell's `` notation or R's %% notation have been outright rejected on that principle. There was a proposal to provide literal syntax for physical units like meters, kilograms, and seconds, along the the SI magnitude prefixes. I think that got to the "proto-PEP" stage, but it got a lot of weak opposition for a number of reasons, mostly "Python isn't intended for creating DSLs and the units module gives the semantics needed via classes", and petered out without an actual PEP IIRC. There are frequently proposals to give the Decimal constructor a literal syntax, always rejected on the grounds that it's not needed and there hasn't been a really compelling syntax that everybody likes. There are also frequent proposals to create special string literals, with occasionals successes like rawstrings (the r"" syntax) and more recently f-strings. The most popular is some kind of lookup syntax with the primary use case being internationalization, which typically grabs the single underscore as the marker for a localizable string (it's aliased to the gettext function). This is annoying and inconvenient because it conflicts with the common usage of "_" as a placeholder. This hasn't yet had a compelling proposal, I think there's a dormant PEP for "i-strings". I'm sorry that my memory is not entirely clear, but that gives you some idea of how such discussions have gone in the past. The f-string PEP and the i-string PEP, there's also a PEP for "null coalescing" operators I believe, would give you some idea of how the core devs think about these issues. Perhaps somebody with a better memory can be more precise. Until then, this is what I've got. :-) Steve

On Sun, Apr 03, 2022 at 01:09:00PM +0900, Stephen J. Turnbull wrote:
Python is excellent for creating DSLs. It is one of the things it is well known for. https://www.startpage.com/sp/search?query=writing+dsls+in+python
That's not my recollection. My recollection is that in principle, at least, there is a reasonable level of support for a built-in decimal type, no strong opposition, and consensus that the syntax that makes the most sense is a "d" suffix: 6.0123d The implementation would be a fixed-precision (64- or 128-bit) type rather than the variable precision implementation used in the decimal module, which would massively reduce the complexity of the implementation and the public interface. (No context manager for the builtin decimal, fixed precision, only one rounding mode, no user-control over what signals are trapped, etc. If you need all those bells and whistles, use the decimal module.) The discussion fizzled out rather than being rejected. Whether it would be rejected *now*, two or four(?) years later, by a different Steering Council, is another story.
There are also frequent proposals to create special string literals, with occasionals successes like rawstrings (the r"" syntax)
Raw strings were added in Python 1.5 to support the new re module: https://www.python.org/download/releases/1.5/whatsnew/ There was no formal mechanism for adding new features back then. -- Steve

On 2022-04-02 22:28, Steven D'Aprano wrote:
I'm not the person you're replying to, but a lot of those search results are pretty clearly not what was meant here. Python is fine for creating "real" DSLs, where the L is actually a separate language and Python is just parsing/interpreting it. What Python isn't so good at is creating quasi-DSLs or "DSDs" (domain specific dialects), where Python itself is the language and the domain-specific part is grafted on by use of objects, operator overloading, etc., so that what you run is actually a Python program that just looks and behaves a bit different from what you might expect from "vanilla" Python. This is the "Python isn't for DSLs" argument that I've seen mentioned on this list and elsewhere (although I agree that it's a pretty loose use of "DSL"). -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

*SOAP BOX WARNING* It's not often that I would say that C++ is easier to read or more WYSIWYG than Python, but in this case, C++ is clearly well ahead of Python. I have spent a fair amount of my own time, and I have seen so many others' time wasted because command line or input fields do not include units, or the inputs field units are accidentally handled with different units, or units are not used at all. I get the sentiment that Python, or programming languages in general, are not meant to deal with units. From the perspective of a computer scientist, I can understand why this would be seen as a level of abstraction too high for programming languages and core libraries to aspire to. But from the perspective of a scientist or engineer, units are a CORE part of language. Anyone who has taken science or engineering classes in college knows what happens when you turn in homework with missing units in your answers - zero credit. Anyone who has worked out complicated calculations by hand, or with the help of packages like "units" knows the sinking feeling and the red flags raised when your answer comes out in the wrong units. There has also been a shift in the expectations of scientists and engineers regarding their programming capabilities. A generation ago, a good many of them would not be expected to use their computers for anything more than writing documents, crunching numbers in a spreadsheet, or using a fully integrated task-specific application for which their employer paid dearly. These assumptions were codified in workflows and job descriptions. Today, if your workflow, especially in R&D, has a gap that Microsoft Office or task-specific software doesn't solve for you, then you are pretty much expected to write your own code. Job postings for engineering roles (other than software engineering) regularly include programming in their required skills. Software design, on the other hand, is rarely a required or hired skill. And even though these scientists and engineers are required to know how to program, they are almost never *paid* to write code. Spending any more time than needed writing code, even if it is to fill a critical gap in a workflow, is seen as a negative. So software design best practices are non-existent. All of this leads to very poor practices around and improper handling of an absolutely essential part of scientific and engineering language - units. If you had asked me twenty years ago if I thought units should be a native part of any programming language, I would have said absolutely - because in my youthful ignorance I had no idea what it would take to make such a thing work. Five years later, I would have said "not worth it". Now I'm back where I started. The lack of native language support for SI units is a problem for an entire segment of programmers. Programming languages took a big step forward in deciding that EVERYTHING is a pointer/reference, and EVERYTHING is an object. They need to take another step forward to say that EVERY number has a unit, including "unitless". Not having this language feature is becoming (or already is) a problem. The question is, is it Python's problem?

HEAR HEAR! BUT- SI units isn't enough. Engineers in the US and Canada (I have many colleagues in Canada and when I ask they always say: we pretend to use SI but we don't) have all kinds of units. Give us native, customizable units, or give us death! Who's with me??!! ... .... I'm kidding to a degree but I did feel a swell of excitement as I read this response. :) The libraries out there- pint is probably the biggest one- have filled those gap as much as they can, but there are so many shortfalls... The old engineering disciplines- mine (civil engineering), structural, electrical, etc- are the next frontier in the "software eats the world" revolution, and they desperately need a language with native units support. I was just on an interview call yesterday for a senior engineer role at a large multinational earth works engineering firm and we spent 15 minutes talking about software and what we see coming down the road when it comes to the need for our discipline to grow in its software creation capabilities. Python SHOULD be that language we do this with. It is awesome in every other way. But if it isn't DEAD SIMPLE to use units in python, it won't happen. I don't know what the solution is. I'm looking to you software engineers, you true geniuses and giants of your fields, to figure that out for me. But once you hand it to me I promise I will evangelize it to the ends of the Earth. On Sun, Apr 3, 2022, 2:56 PM Brian McCall <brian.patrick.mccall@gmail.com> wrote:

Looks like this segue moved on to a new thread, but I'm glad I'm not the only one who thinks this way!

On Mon, 4 Apr 2022 at 04:53, Brian McCall <brian.patrick.mccall@gmail.com> wrote:
If you had asked me twenty years ago if I thought units should be a native part of any programming language, I would have said absolutely - because in my youthful ignorance I had no idea what it would take to make such a thing work. Five years later, I would have said "not worth it". Now I'm back where I started. The lack of native language support for SI units is a problem for an entire segment of programmers. Programming languages took a big step forward in deciding that EVERYTHING is a pointer/reference, and EVERYTHING is an object. They need to take another step forward to say that EVERY number has a unit, including "unitless". Not having this language feature is becoming (or already is) a problem. The question is, is it Python's problem?
Part of the problem here is that Python has to be many many things. Which set of units is appropriate? For instance, in a lot of contexts, it's fine to simply attach K to the end of something to mean "a thousand", while still keeping it unitless; but in other contexts, 273K clearly is a unit of temperature. (Although I think the solution there is to hard-disallow prefixes without units, as otherwise there'd be all manner of collisions.) Is it valid to refer to fifteen Angstroms as 15A, or do you have to say 15Å, or 15e-10m and accept that it's now a float not an int? Similarly, what if you want to write a Python script that works in natural units - the Planck length, mass, time, and temperature? Purity and practicality are at odds here. Practicality says that you should be able to have "miles" as a unit, purity says that the only valid units are pure SI fundamentals and everything else is transformed into those. Leaving it to libraries would allow different Python programs to make different choices. But I would very much like to see a measure of language support for "number with alphabetic tag", without giving it any semantic meaning whatsoever. Python currently has precisely one such tag, and one conflicting piece of syntax: "10j" means "complex(imag=10)", and "10e1" means "100.0". (They can of course be combined, 10e1j does indeed mean 100*sqrt(-1).) This is what could be expanded. C++ does things differently, since it can actually compile things in, and declarations earlier in the file can redefine how later parts of the file get parsed. In Python, I think it'd make sense to syntactically accept *any* suffix, and then have a run-time translation table that can have anything registered; if you use a suffix that isn't registered, it's a run-time error. Something like this: import sys # sys.register_numeric_suffix("j", lambda n: complex(imag=n)) sys.register_numeric_suffix("m", lambda n: unit(n, "meter")) sys.register_numeric_suffix("mol", lambda n: unit(n, "mole")) (For backward compatibility, the "j" suffix probably still has to be handled at compilation time, which would mean you can't actually do that first one.) Using it would look something like this: def spread(): """Calculate the thickness of avocado when spread on a single slice of bread""" qty = 1.5mol area = 200mm * 200mm return qty / area Unfortunately, these would no longer be "literals" in the same way that imaginary numbers are, but let's call them "unit displays". To evaluate a unit display, you take the literal (1.5) and the unit (stored as a string, "mol"), and do a lookup into the core table (CPython would probably have an opcode for this, rather than doing it with a method that could be overridden, but it would basically be "sys.lookup_unit(1.5, 'mol')" or something). Whatever it gives back is the object you use. Does this seem like a plausible way to go about it? ChrisA

Why don't we allow different libraries to use different, incompatible implementations of integers, floating points, and bool? Standard units are just as immutable as any of these data types.

On Mon, 4 Apr 2022 at 18:28, Brian McCall <brian.patrick.mccall@gmail.com> wrote:
Why don't we allow different libraries to use different, incompatible implementations of integers, floating points, and bool? Standard units are just as immutable as any of these data types.
Those three data types are unambiguous, but a more reasonable parallel would be: Why don't we allow different libraries to use different, incompatible implementations of numbers? And we do. There are rationals and decimal floats in the standard library, and plenty of third party libraries with additional numeric data types. And guess what? There have been lots of calls for Decimal literals too :) So, yup, not that different. ChrisA

Asked and answered! Although, see below*, the additional representations of these numbers does not mean that "int", "bool", and "float" have no place in the core language. *Here is a URL to a GIF of the good people of Letterkenny saying "to be fair": https://media.giphy.com/media/Nl6T837bDWE1DPczq3/giphy.gif
And guess what? There have been lots of calls for Decimal literals too :)
I believe it, and I support it!

On Mon, Apr 04, 2022 at 08:27:45AM -0000, Brian McCall wrote:
Why don't we allow different libraries to use different, incompatible implementations of integers, floating points, and bool?
We do. numpy supports 32-bit and 64-bit ints and possibly others, gmpy supports mpz integers. I don't know about floats, but there's nothing stopping anyone from developing a library for 32-bit floats, or minifloats, or posits, or whatever.
Standard units are just as immutable as any of these data types.
Immutability vs mutability is just one design decision out of many that we would have to make. Regardless of which way we go, we still have to deal with the facts that: * There are an unlimited number of derived (non-SI) and compound units that people will want to use. * Many of those can have conflicting names, e.g. "mile" can refer to any of Roman mile, international mile, nautical mile, U.S. survey mile, Italian mile, Chinese mile, imperial mile, English *miles* (note plural), and many more. * To say nothing of having to deal with adjustments to the definitions, e.g. a kilometre in 1920 is not the same as a kilometre in 2020, and applications that care about high precision may care about the difference. Having a single interpreter-wide namespace for units will cause many name collisions. I expect that units should be scoped like variables are, with some variant of the LEGB (Local, Enclosing, Global, Builtin) scoping rules in place. At the very least, we should have two namespaces, per module and per interpreter. That will allow modules to register their own units without stomping all over those over other modules. -- Steve

Asked and answered!
* There are an unlimited number of derived (non-SI) and compound units that people will want to use.
Unlimited? You sure that problem can't be bounded? There are few things I can think of that could bound this problem in a performance-friendly manner. In terms of the internal representation of units, the representation that is use for machine calculations, there are only 7 units that need to be supported. Everything else is a product of powers of these 7 units. So you can represent every combination with 7 counters. And those counters do not need to have lots of bits. If you're using units in a way that leads to meters**255, then you might have a bug in your code, or you might be doing something that doesn't really need units. 4-8 bits are enough to store the powers of the 7 SI quantities (4-8 bytes). Translating those 7 quantities to the few hundred standard derived units can be handled by higher level libraries, which may still require counters of multiple types of units depending on the level and breadth of support being maintained.
True. It's a problem. Might require additional unit sets and/or namespaces. But in 3020, will we still be using Python?
Yes, yes, yes!

On Tue, 5 Apr 2022 at 00:48, Brian McCall <brian.patrick.mccall@gmail.com> wrote:
That would only be true if we had infinite-precision numbers. You can't simply store "this is two lengths divided by a time" and expect everything else to work perfectly.
The trouble with namespacing like this is that you need to be extremely verbose about which units you're using in which module. With a single interpreter-wide namespace, all you have to do is ask the SI module to register itself, and you're done, they're available. Is it really that much of a problem? Tell me: How often do you REALLY expect to have collisions within an application, but in different modules? YAGNI. ChrisA

On Tue, Apr 05, 2022 at 04:02:24AM +1000, Chris Angelico wrote:
You have no idea how many different definitions there are for "mile", do you? :-) And I don't just mean historical miles, before standardisation. I mean even in current use, in English speaking countries. (And its not just miles that this problem affects.) Sure, we can demand that every application that needs to deal with US survey miles and imperial miles and international miles give them all distinct names. That's one solution, but not the only solution. But even if you do that, having one interpreter-wide database that any random library can write to is asking for trouble. If this becomes widespread, expecting libraries to "just don't overwrite existing units" is not good enough. Wait until you import some library which is not quite so accurate in its definitions as yours, and it tramples all over your system-wide database with its own (slightly different) definitions. How would you like your unit conversions to differ according to the order in which you import your libraries? "If I import cheddar first, then camembert, my lander safely lands on Mars, but if I import camembert first, then cheddar, it crashes into the planet at 215 miles per hour." Awesome. Its 2022, and you're arguing in favour of a single system-wide database where any random module can monkey-patch the definitions used by all other modules. Ouch. This is exactly analogous to the situation Python would have if there were no per-module globals, just the system-wide builtins, and every library stored top-level variables and functions in that namespace. *shudders* Look, I know that in Python, any module *might* sneak into my globals and modify them. But in practice, they don't, and that would be considered malware if they did it without very good reason and very obvious documentation. But to have a situation where *by design* all modules trample over each other's defined units, that's a suboptimal design. (I'm being polite there.) -- Steve

On Tue, 5 Apr 2022 at 13:00, Steven D'Aprano <steve@pearwood.info> wrote:
I don't, but I know there are many. But that's not the problem. The problem is: Do you ever have one module that's using statute miles and another that's using nautical miles, but *not both in the same module*? The only reason to have them namespaced to modules is to allow different modules to use them independently. If your application needs to use both statute and nautical miles in the same module (most likely the main module), then it's going to have an issue, and your proposal adds a ton of complexity (that's a real unit, by the way, I totally didn't make it up) for no benefit whatsoever.
My solution is to allow the very very few applications that need both to do some sort of disambiguation. Of course, this is only significant if you need *literals* of all of them. The units themselves can be distinct, even if each one would want to register itself with the name "mile".
What's your proposal? from units.SI import * ? This pollutes your main namespace *and* still has all the same problems.
"If I import * from cheddar first, then camembert, then I have issues". What's the difference? You're looking at a fundamentally identical problem, and thinking that it's fundamentally solved by module-level separation? Show me some evidence.
Yup I am! Have you ever mutated sys.modules? That's a system-wide database. And there are lots of good reasons to insert things into it. What about importing the logging module and configuring it prior to importing something that spews a ton of messages during its own import? Been there, done that. Yes, a system-wide database isn't actually as terrifying as you seem to think - AND a module-scale separation doesn't even help.
Straw man. It's more like using decimal.getcontext() and making changes. That's global. Do we have per-module Decimal contexts? Do we need them? No. In fact, the only way to change context is globally - though you can do so temporarily. That means you do not have any module-level separation *at all*. I don't hear the Decimal folks screaming about that. You want to add large amounts of completely unnecessary complexity on the basis that the module is the fundamental and sole correct place to namespace these. I'm not seeing any supporting arguments, other than "what if there were collisions? WON'T SOMEONE THINK OF THE COLLISIONS!". Please, show me where there are collisions across modules, and not within a module. That's what I asked, in the snippet you quoted.
I disagree, and I'm also being polite here. Let's keep it that way. ChrisA

On 2022-04-05 12:17 a.m., Chris Angelico wrote:
from units.survey import mile as s_mile from units.imperial import mile as i_mile from units.roman import mile as r_mile We could bikeshed endlessly on how exactly to tell the interpreter to use an imported name as a literal suffix (it could just be that it calls a new dunder), but it seems to me that the way to disambiguate a name conflict in imported modules is very much already a solved problem. I don't quite understand why you want to add a different system that introduces a name conflict issue AlexB

On 5/04/22 4:17 pm, Chris Angelico wrote:
If there's a single global registry, and they both register the unit under the same name, then there will be problems if both modules are imported by the same program, even if the two units are never used together in the same module.
What's your proposal?
I'm not sure, but we really need to avoid having a global registry. Treating units as ordinary names looked up as usual would be the simplest thing to do. If you really want units to be in a separate namespace, I think it would have to be per-module, with some variant of the import statement for getting things into it. from units.si import units * from units.imperial import units inch, ft, mile from units.nautical import units mile as nm
It's more like using decimal.getcontext() and making changes. That's global.
Personally I think giving Decimal a global context was a mistake, so arguing that "it's no worse than Decimal" isn't going to do much to convince me. :-) But in any case, a Decimal context and the proposed global unit registry are very different things. Just because one doesn't seem to cause problems doesn't mean the other won't either. -- Greg

On Tue, Apr 05, 2022 at 02:17:00PM +1000, Chris Angelico wrote:
That's not the real problem. The real problem is that my program may: * import ham, which registers mile as 1609.3412 m * import spam, which registers mile as 1609.344 m * import cheese, which registers mile as 1609.3472 m * import aardvark, which registers mile as 1609.3426 m * import hovercraft, which registers mile as 1853.181 m and then do calculations in miles, before converting to metres, and the results I get will be subtly (or not so subtly) different depending on the order I import those modules. (By the way, none of the above are nautical miles; of which there are at least three.)
"If I import * from cheddar first, then camembert, then I have issues".
And that is why you shouldn't `import *`. This is an old, well-known issue with wildcard imports.
You are correct that this is fundamentally identical to the problem that namespaces are designed to solve. This is why modern languages don't have one single system-wide namespace. We have 30+ years of Python programming, and 40-odd years of programming prior to Python, showing that the solution to the name collusion problem is to have distinct namespaces rather than one single system-wide namespace that everything writes to. That's exactly my point. Of course if I do this: from spam import mile from eggs import mile then I have a namespace collision that results in last value winning. But that's kinda obvious doncha think? :-) Importantly, just doing from spam import mile import eggs will not collide, except under the very unusual case that eggs gets up to no good by writing to the importing module's namespace. (Is that even possible? At import time, can eggs tell which module is importing it?)
Have you ever mutated sys.modules?
Not directly, no, except by the approved method of calling `import`, which never over-writes an existing entry, only adds new entries. Nor have I ever mutated the central registry of codecs to *replace* an existing encoder (like UTF-8) with my own. Likewise for error handlers. There's only a relatively small number of each, and the two registries change so rarely that there is next to zero chance that I might accidently trample over an existing codecs or error handler with my own. And I do not expect that arbitrary imports will make changes to those registries. Never have I worried that `import spam` might change the meaning of the 'utf-8' codec, or replace some error handler with one with the same name but different behaviour. But if there were thousands of codecs, and every second module I imported could potentially add or delete those codecs, then I would have to worry about these things. The system would be unworkable and we would have to find a better one. With units, there are thousands of named units, with many name collisions. The system would be unworkable with only a single interpreter-wide registry.
The situation is analogous, but not identical. The decimal context is not a interpreter-wide registry of long-lasting entities intended to be used by any and all modules. It is a per-thread part of the decimal API. Its not even a very close analogy: aside from sharing the vague concept of "global state" with your units registry, there's nothing like registering a unit ("furlongs per fortnight") in the decimal context. There are only eight settings, and you cannot set arbitary attributes in decimal contexts. The decimal context environment isn't even truly interpreter-wide. It is per thread storage, so every thread has its own independent environment. Other modules (except possibly your application's main module) are not expected to modify the current context, although that's not enforced. (This is Python: you can shoot yourself in the foot if you really want to.) It would be considered *badly-behaved* for other modules or functions to directly modify the current decimal context instead of using localcontext() to temporarily change the current context. P.S. localcontext makes a copy of the context. Just sayin'.
That's global. Do we have per-module Decimal contexts?
If you want it, you can have it. See PEP 567.
They don't need to, because arbitrary modules don't make changes to the decimal current context. That would be harmful, so well-behaved code uses localcontext(), and badly-behaved code doesn't get used. But with your central, interpreter-wide registry of units, modules which wish to use a named unit have no other choice than to register it with the central registry. If my module aarvark.py wants to use a named unit, the pottle (a historical measurement equal to 1/2 a British gallon), what can I do? I can check the registry to see if pottle is already defined. If it isn't, great, I can install it, and it will now be visible to every module, whether they need it or not. But what if its already there, with a different value? Now I have only two equally bad choices: 1. overwrite the system-wide pottle entry, breaking other modules; 2. or do without. Because the registry is system-wide, I cannot define my own pottle unit without nuking other module's pottle unit. -- Steve

On Sat, 9 Apr 2022 at 00:34, Steven D'Aprano <steve@pearwood.info> wrote:
Would it be better if you wrote it like this? import SI; si.register() I would be hard pressed to imagine a situation as pathological as you suggest. Aside from a (relatively small) number of common systems, most measurement systems are going to be sufficiently special purpose that they're going to be the entire application. If you have a library that chooses to register a common name like "mile", it's no different from that library doing something like "decimal.getcontext().prec = 2", which is a fully documented feature. Some features belong to the application, not the library, and I don't think that's spoiled other things before. We cope.
Right. Remind me why most command shells have a single system-wide namespace, then? Or is it a really good idea in programming but not in scripting?
Yes. I have never disputed the value of namespaces as a way of stopping names from colluding. Or colliding. What I'm disputing is that the *module* is the one and only namespace that is right here. You haven't yet shown a single bit of evidence for that.
(Is that even possible? At import time, can eggs tell which module is importing it?)
I'm sure anything's possible with sys._getframe.
It's an incredibly useful way to mock things. You provide a module before something else calls on it. (It's also a good way for a module to replace *itself*, although that's less commonly needed now that you can do module-level getattr.)
Right. And, again, these namespaces are not per-module, yet you aren't bothered by someone registering a name that you want. Why is the module the perfect scope for units?
You don't expect it. But somehow you DO expect arbitrary imports to mutate the unit namespace. Why?
[citation needed] Do libraries tend to work in this way, giving unitted values in a system different from the one the application uses? Is that actually a thing, or are you just guessing?
It's an interpreter-wide registry of long-lasting settings that affect any and all modules that use decimals. And yes, it's per-thread, but given your long-seated fear of threads, I'm surprised you even consider that to be a difference.
So? You can most certainly mess up some other module that uses decimals. Any module you import could set the default context's precision to a really low value, which would mess up all kinds of things. Yet we do not fear this, because we expect that libraries won't do that.
Right. So, wouldn't it be equally ill-behaved for a module to randomly register units? Why is this different? Aside from those whose specific purpose is to be unit providers, libraries shouldn't be registering units. Otherwise they are behaving badly. I don't see this as any different from what we already have.
P.S. localcontext makes a copy of the context. Just sayin'.
(By default)
Does it need that to be in the source code? The registry applies ONLY to source code (although, of course, it would also be a good place for parsers to look).
3. Use the unit in a non-source-code way.
Because the registry is system-wide, I cannot define my own pottle unit without nuking other module's pottle unit.
Good. You shouldn't be defining your own pottle unless you are the application. Global settings belong to the application. Libraries shouldn't be calling os.chdir(), messing with the default Decimal context, or monkeypatching the Random module to give a fixed sequence of numbers, without the consent of the application. I don't see a problem here. ChrisA

On Fri, Apr 8, 2022 at 8:29 AM Chris Angelico <rosuav@gmail.com> wrote:
another that's using nautical miles, but *not both in the same module*?
Absolutely! There is talk about "the Application" as though that's one thing, but Python applications these days can be quite large collections of third party packages -- each of which do not know about the other,a nd each of which may be using units in different ways. For example, I have an application that literally depends on four different JSON libraries -- each used by a different third-party package. Imagine if the configurable JSON encoding/decoding settings were global state -- that would be a disaster. Granted * Python is dynamic and has a global module namespace, so packages CAN monkey patch and make of mess of virtually anything. * "Well behaved" packages would not mess with the global configuration. But that doesn't mean that it wouldn't happen -- why encourage it? Why have a global registry and then tell people not to use it? Having a global registry/context/whatever for something that is designed/expected to be configured is dangerous and essentially useless. I'm not sure if this is really a good analogy, but it reminds me of the issues with system locale settings: Back in the day, it seemed like a great idea to have one central palceon a computer to set these nifty things that apply to that particular computer. But enter the internet, where the location the computer the code is running on could be completely unrelated to where the user is and what the user wants to see, and it's a complete mess. Add to that different operating systems, etc. To this day, Python struggles with these issues -- if you use the default settings to open a text file, it may get virtually any encoding depending on what system the program is running on -- there is a PEP in progress to fix that, but it's been a long time! Dateitme handling has the same issues -- I think the C libs STILL use the system timezone settings. And an early version of the numpy datetime implementation did too -- realy bad idea. In short: The context in which code is run should be in complete control of the person writing the code, not the person writing the "application". Again: practical use case with units: I maintain a primitive unit conversion lib -- in that lib, I have a "registry" of units and names and synonyms, etc. That registry is loaded at module import, and at that time it checks for conflicts, etc. Being Python, the registry could be altered at run time, but that is not exposed as part of the public API, and it's not a recommended or standard practice. And this lets me make all sorts of arbitrary decisions about what "mile" and "oz" and all that means, and it's not going to get broken by someone else that prefers different uses -- at least if they use the public API. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Sat, 9 Apr 2022 at 02:31, Christopher Barker <pythonchb@gmail.com> wrote:
You're misunderstanding the difference between "application" and "library" here. Those are four separate libraries, and each one has a single purpose: encoding/decoding stuff. It is not the application. It is not the primary purpose of the process. If one of those JSON libraries were to change your process's working directory, you would be extremely surprised. We aren't bothered by the fact that os.chdir() is global, we just accept that it belongs to the application, not a library. The Application *is* one thing. It calls on libraries, but there's only one thing that has command of this sort of thing. General rule: A library is allowed to change things that belong to the application if, and only if, it is at the behest of the application. That's a matter of etiquette rather than a hard-and-fast rule, but we decry badly-behaved libraries for violating it, rather than blaming the feature for being global.
For precisely the same reason that we have so many other global registries. It is simplest and cleanest to maintain consistency rather than try to have per-module boundaries. When you have per-module features, refactoring becomes more of a hassle. I've fielded multiple questions from people who do "import sys" in one module, and then try to use "sys.argv" in another module, not realising that the namespace into which the name 'sys' was imported belonged only to that module. It's not too hard to explain, but it's a thing that has to be learned. The more things that are per-module, the more things you have to think about when you refactor. It is a *good thing*, not a bad thing, that a large number of settings are completely global. We do not need per-module settings for everything, and it would be a nightmare to work with if we did.
Having a global registry/context/whatever for something that is designed/expected to be configured is dangerous and essentially useless.
Only if it's expected to be configured with some granularity. And, as with decimal.localcontext(), it's perfectly possible to have scopes much smaller than modules. So my question to you, just as to D'Aprano, is: why should this be at the module scope, not global, and not narrower?
What we now have is an even broader setting: the entire *planet* is being set into a default of UTF-8, one programming language at a time. We don't need it to be per-process any more, and we definitely never wanted it to be per-module or any other finer scope. The reason for having it centralized on the computer has always been that different applications could then agree on something. Let's say you set your computer to use ISO-8859-7 (or, if you're a Microsoft shop, you might use code page 1253 for the same purpose). You're telling every single application that you're planning to use Greek text, and that it should assume that eight-bit data is most likely to be in Greek. Since text files don't have inherent metadata identifying their encodings, it's not unreasonable to let the system decide it. Of course, that never worked all that well, so I'm not sorry to see more and more things go UTF-8 by default...
Dateitme handling has the same issues -- I think the C libs STILL use the system timezone settings. And an early version of the numpy datetime implementation did too -- realy bad idea.
In short: The context in which code is run should be in complete control of the person writing the code, not the person writing the "application".
Not sure what you mean there. Obviously any date/time with inherent timezone data should simply use that, but if a library is parsing something like "2022-04-09 02:46:17", should every single library have a way for you to tell it what timezone that is, or should it just use the system settings? I put it to you that this is something that belongs to the application, unless there's a VERY VERY VERY good reason for the library to override that. (In the case of timezone settings, that could mean having some sort of hidden metadata about that string, eg you're working with the Widgets Inc API and the library knows that Widgets Inc always send their timestamps in the Europe/Elbonia timezone.) And if you mean the interpretation of timezones themselves... that definitely does NOT belong in the library. I don't want to have to dig through every single dependency to see if it needs to have tzdata updated. One single global tzdata is absolutely fine, thank you very much. You may want to use one from PyPI or one from your operating system, and there's good reasons for both, but you definitely don't want every single library having its own copy. (It's big, anyhow.)
Again: practical use case with units:
I maintain a primitive unit conversion lib -- in that lib, I have a "registry" of units and names and synonyms, etc. That registry is loaded at module import, and at that time it checks for conflicts, etc. Being Python, the registry could be altered at run time, but that is not exposed as part of the public API, and it's not a recommended or standard practice. And this lets me make all sorts of arbitrary decisions about what "mile" and "oz" and all that means, and it's not going to get broken by someone else that prefers different uses -- at least if they use the public API.
Cool. The global repository that I suggest would be completely independent, unless you choose to synchronize them. The registry that you have would be used by your tools, and source code would use the interpreter-wide ones. This is not a conflict. Of course, since you have all the functionality already, it would make a lot of sense to offer an easy way to register all of your library's units with the system repository, thus making them all available; but that would be completely optional to both you and your users. ChrisA

Not sure this conversation actually relates to any proposals at this point, but ... On Fri, Apr 8, 2022 at 9:54 AM Chris Angelico <rosuav@gmail.com> wrote:
You're misunderstanding the difference between "application" and "library" here.
No, I'm not -- your misunderstanding my (poorly made, I guess) point. Those are four separate libraries, and each one has a
single purpose: encoding/decoding stuff. It is not the application.
Of course it's not -- my point is that my application is using a bunch of third party libraries, and a number of them are using JSON, and clearly they are all using it in a somewhat different way, and the people writing that library code absolutely don't want some global settings to change how they work. os.chdir()
is global, we just accept that it belongs to the application, not a library.
Sure -- but I'd actually say that a "current working dir" is actually not helpful -- libraries shouldn't use it, ever. It can be handy for command line applications, but as you say, it's only the application that should be working with it.
Sure -- but I'm talking about the application changing global state that then affects how libraries will work -- that can only be helpful if there's a very well established standard protocol -- like current working directory, and maybe logging.
exactly.
I'm not necessarily saying that a global registry is always a bad idea, but I think it's a bad idea for most things, and is for Decimal behavior, and Units
But that is the very idea of non-global namespaces -- you aren't going to get far in Python if you don't get that. Only if it's expected to be configured with some granularity. And, as
I do like the narrower option provided by decimal.localcontext() as for module scope, not global: The principle here is that the person writing code code needs to control the context that is used -- only that person, at that time, knows what's appropriate -- the "application developer" mya have no idea whatsoever how Decimal is being used in third party libraries. in fact, they may not even know that it is being used. You could say that library writers need to be careful not to use the global context -- which I agree with -- but then it's a really bad idea to make that a default or easy to use. And given the rarity of a large application not using any third-party libs, I don't see the benefit of a global context at all. Contrast this with, e.g. logging -- in that case, a third party lib generally will want to simply log stuff to the application-defined logging system it does not need to know (or care) where a debug message is sent only that it is sent where the application configuration wants it to go.
I'm not sure if this is really a good analogy, but it reminds me of the issues with system locale settings:
The reason for having it centralized on the computer has always been that different applications could then agree on something.
sure -- having a locale is a fine idea, the problem is when a programming language uses that locale by default, or even worse without even the choice of overriding it. If an application wants to, for instance: "display this information in the local computer's locale appropriate way" -- that's a perfect use case. But "read this text file, that could have come from anywhere, using the compteres' locale settings" was always a bad idea. Sure -- you may actually want to do that -- but it should be an explicit choice, not the default. text, and that it should assume that eight-bit data is most likely to
be in Greek. Since text files don't have inherent metadata identifying their encodings, it's not unreasonable to let the system decide it.
Well, it wasn't unreasonable back in the day , but it is now -- the odds that a text files comes from the local system are slim, and even worse, very unlikely that the code is being written and tested on a system with the same settings.
That ISO string has no TZ offset -- it is a naive datetime, and it absolutely , positively should be interpreted as such. That is EXACTLY what was wrong with the first numpy datetime implementation -- it interpreted an iso string without an offset as "local time" (which I believe the ISO spec says) and so applied the locale timezone -- which was almost always the wrong thing to do, period. We all had to go through machinations to work around that.
I think there's some confusion here: I'm not saying that libraries should override system settings -- I'm saying libraries should only use these types of system settings when explicitly asked to -- not by default -- and worst of all, libraries should not use system settings that can't be overridden by the application (which is what the old C time functions did (still do?)) Again, the behavior of some code should be clear and obvious to the person writing the code. If I write code such as: np.datetime64('2022-04-10T09:33:00') I should know exactly what I'm going to get (which I do now -- numpy fixed this a good while ago -- but in it's first implementation, it should literally provide a different result depending on the local computer's settings) That doesn't mean that: np.datetime64('2022-04-10T09:33:00', apply_locale_tz_offset=True) Isn't useful, it's just that it shouldn't be default, or even worse, should be a non-overridable default -- e.g. a "global context". And if you mean the interpretation of timezones themselves... of course not, no. That belongs in its own library, which libraries that need it can then choose (or not) to use.
One single global tzdata is absolutely fine, thank you very much.
Of course it is -- I"m not saying that nothing global is useful, I'm saying that sets of global defaults and all that are very useful, but they should always be explicitly specified. If I'm writing a library, I may choose to depend on pytz. But when I write the code, I'm making that choice -- i"m not writing code simply hoping that the application using my code has made the right choice of how to deal with timezones.
Again: practical use case with units:
But if I did that, then one lib registering my units with the global repository might break some other lib's use of the global repository. A global "default units" is fine, but then anyone wanting to override it should be working with a copy, not changing the same one that other packages might be using. Which I believe is exactly what pint does, for instance. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

I'll summarize by quoting only a small part, since most of the line of argument is the same, with different examples. On Mon, 11 Apr 2022 at 02:59, Christopher Barker <pythonchb@gmail.com> wrote:
Yes, exactly. You, as application author, should know exactly what you're going to get.
What if you have an intermediary library that calls on something else? What if, say, numpy calls on the datetime module rather than doing the parsing itself? Even if there were some way to make global config changes, *the application* will still know what's happening - because the application is the one that's in charge. Libraries should not be making changes to application configs except at the behest of the application. Libraries should do things that are affected by application configs, when they are working at the behest of the application. When a library is doing its own thing, it should be independent of such configs. This applies to system-wide things too; if you use os.sep anywhere, it doesn't mean you don't know what will happen. It specifically means that you DO know what will happen, and that you'll get the OS-appropriate directory separator. If you use os.sep and then assume that you're getting backslashes, you're doing it wrong. If you use os.sep and then print it out for the human to see, you're doing it right, and it'll probably be less surprising than hard-coding slash or backslash.
Then... libraries should not register units unless it's at the behest of the application? That's exactly what I've been saying. You might as well say that one lib adding something to sys.path might break some other lib's changes to sys.path. We're not bothered by that; we call that ill-behaved libraries.
A global "default units" is fine, but then anyone wanting to override it should be working with a copy, not changing the same one that other packages might be using.
Which I believe is exactly what pint does, for instance.
What does pint do if you want to have rules like "kilograms of mass and kilograms of force should be interchangeable"? Or does it simply mandate that they're not? I put it to you that these kinds of conversion rules *belong to the application*, not to any library that calls on pint. ChrisA

On Mon, 11 Apr 2022 at 06:48, Ethan Furman <ethan@stoneleaf.us> wrote:
Probably the same way that it's always been possible - with clunkier syntax and explicit multiplications. I'm not sure. Do you actually have a use-case where a library needs to do unit-aware arithmetic independently of the application, or is this purely hypothetical? ChrisA

On 4/10/22 14:09, Chris Angelico wrote:
So mostly with none of the benefits of the new systax.
At this point I am not a unit user, but it seems that several who are would like finer grained control, and you are arguing that global is fine; they offer json as an example where the application imposing global settings on their activities would really mess them up, and you don't seem to be hearing/acknowledging that. -- ~Ethan~

On Mon, 11 Apr 2022 at 09:45, Ethan Furman <ethan@stoneleaf.us> wrote:
Yes. If there are libraries that need to be completely independent of the application, they won't be able to take advantage of the new syntax.
I'm hearing it, I'm just not seeing the parallel. Remember: Nobody is ever saying that existing unit libraries have to go away. This is a proposal for a syntax that will allow for a more convenient way of writing them. So far, I've yet to see anything more than a 100% hypothetical "what if multiple libraries do things" concern, and at no point has ANYONE ever shown why the module level is the correct scope. Comparing with Decimal contexts shows that sometimes we need broader, and sometimes narrower, scope than the module, and that the module is simply not sufficient. Why bind unit definitions to the module if it's not even going to be useful? Having never personally used multiple JSON libraries at the same time, I have no idea what sort of global settings would get in their way, but I have most certainly used Decimal contexts, and I generally assume and expect that calls to library functions will be affected by changes to the context. If I do that change with decimal.getcontext().prec = N once at the top of the program, I fully expect that it will affect every module. Is this considered to be a bad thing? If not, why is it bad for units? ChrisA

On Sat, Apr 09, 2022 at 02:52:50AM +1000, Chris Angelico wrote:
You might not be, but those of us who use it, or *would* use it if it wasn't so dangerous, think differently. In any case, the idea that *units of measurement* (of which there are over 3000 in the "units" program, and an unbounded number of combinations of such) are in any way comparible to the single "current working directory" is foolish. Units are *values* that are used in calculations, not application wide settings. The idea that libraries shouldn't use their own units is as silly as the idea that libraries shouldn't use their own variables. Units are not classes, but they are sort of like them. You wouldn't insist on a single, interpreter wide database of classes, or claim that "libraries shouldn't create their own classes". -- Steve

On Sun, Apr 10, 2022 at 7:25 PM Chris Angelico <rosuav@gmail.com> wrote:
I don't think it helps to be making this parallel -- there is one "current working dir" at once -- that's how it's defined, it's by definition global, and good or bad, that's what it is. But I would say in my library code I NEVER use the current working dir - and certainly don't change it. If a path is passed in, then it's either relative or absolute, and that's OK -- it's in control of the user of the library (the application) -- not the library itself. I have seen code that caches the workingdir, change it, then puts it back -- but that's very much not thread safe, and I'd only recommend it maybe in tests. Back to units and Decimal -- I'm not necessarily advocating module scope -- that is a bit tricky, but in general, I have for the principle that the person writing the code (library author, say) can know and control what's going to happen: If my library converts from lbs to kg -- that had better work, and work the same way regardless of what the application author thinks is allowed, and regardless of what other libraries might need either. For Decimal -- if someone has written an accounting library -- it has damn better well use the proper precision and rounding rules for the kind of accounting it's doing. And it sure as heck shouldn't change because someone writing an application has a different need for Decimals. If all this means that it's impossible to have built-in syntax, then fine -- we shouldn't' have built in syntax -- it would only be useful for one-file scripts and the like, and there are other options for that -- like preprocessors.
I sure do -- have I not made this clear?
Libraries should do things that are affected by application configs, when they are working at the behest of the application.
When a library is doing its own thing, it should be independent of such configs.
Exactly -- the problem is that I can't imagine a single case where i'd want the behavior of any of my libraries to be altered by the "application" -- so I'd never turn that on, so i dont think there's any point in global context -- and it's dangerous if using the global context is default and easier to do -- folks will use them, and their code will break when it's used in a different context. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Sun, Apr 10, 2022 at 10:16:04PM -0700, Christopher Barker wrote:
If you google for it, there are about a million recipes and blog posts etc for changing the current working directory in a context manager, and starting with 3.11 it will be available in the stdlib: https://docs.python.org/3.11/library/contextlib.html#contextlib.chdir -- Steve

On Mon, Apr 11, 2022 at 12:21:41PM +1000, Chris Angelico wrote:
You know how every OS process has its own working directory? Just like that, except every module. Its probably too hard to implement in Python, at least for the benefit. (Lots of effort, only a small benefit, nett negative worth.) Especially since we would probably want the WD to use dynamic scoping, not lexical scoping. This is not a PEP proposing per-module WDs, not even a serious proposal for it. "One WD per process" is baked so deep into file I/O on Posix systems (and I presume Windows) that its probably impossible to implement in current systems. -- Steve

On Mon, 11 Apr 2022 at 19:30, Steven D'Aprano <steve@pearwood.info> wrote:
That's a really lovely theory. The problem is, it doesn't work that way. Every process is spawned in the working directory of its parent (modulo deliberate changes), and thereafter is completely independent. If one process sends a signal to another process, they have independent working directories. That doesn't make sense with modules, since they constantly call back and forth to each other. Imagine: import subprocess import os os.change_local_dir(...) What's the working directory of the subprocess module? Is it independent of the calling module? If so, what's the point of even HAVING a per-module working directory, since no Python code can ever directly open a file - it always calls a function in another module?
Its probably too hard to implement in Python, at least for the benefit. (Lots of effort, only a small benefit, nett negative worth.)
Massive negative worth.
The context manager changes the entire process's WD for a section of code. This makes sense, although it has its own consequences. Per-module *simply does not work*, nor does it make any sense. The module-scope hammer does not fit every nail. Stop trying to hammer in screws. ChrisA

The context manager changes the entire process's WD for a section of code. This makes sense, although it has its own consequences.
Actually, now that you say that— I think it makes my point: the fact that this context manager is necessary, and “has consequences” is because the working dir is global — not a good choice. The module-scope hammer does not fit every nail. Stop trying to hammer
in screws.
I don’t know about anyone else, but I’m not arguing for module scope. I’m arguing against implicit global configuration. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Mon, Apr 11, 2022 at 11:10 AM Chris Angelico <rosuav@gmail.com> wrote:
I'm agreeing with namespaces as well -- which I think is different than the idea of module scope for impict contexts Then we are using names, and can use all the rules form managing the cope of teh names: To use the example of one proposed unit syntax: distance = 500[miles] "miles" would need to be a valid name accessible to that scope -- the writer of that code can choose exactly what that name is, likely by some sort of import: from my_unit_system import miles as oped what I understand was being proposed via a "global registry", which is that code: distance = 500[miles] would work even if there were no name "miles" in that namespace(s) -- and it would, instead go look for it in the global registry -- which could have been manipulated by the application to mean nautical miles, or statute miles, or whatever. And THAT I think is a bad idea. What I'm not suggesting, because I think it wouldn't be that helpful, and maybe not possible would be to have something like: set_units_registry_to(my_units system) and then have: distance = 500[miles] use my_units_system's definition of miles in that module without having explicitly imported the name. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Tue, 12 Apr 2022 at 05:14, Christopher Barker <pythonchb@gmail.com> wrote:
It's a good thing we don't have a mutable builtins module, then. Oh right. :)
The trouble is, you now force everyone to do bulk imports - almost certainly star imports. How are you going to populate the namespace appropriately if not with a star import? What happens if you don't want the entire contents of the module? Having a registry means you can get finer granularity with registration functions, without a massively complex module and submodule system, or heaps of manual imports. ChrisA

On Mon, Apr 11, 2022 at 12:22 PM Chris Angelico <rosuav@gmail.com> wrote:
Python is a highly dynamic, you can monkey patch the heck out of almost anything -- you surely don't think is a good programming practice to alter the behavior of builtins at the "program" level ?
1) if you really want really easy to write units, then user, use star imports. But most code is going to use what, a half a dozen (or less) units? is it so bad to write: from my_unit_system import miles, ft, kg, lbs, gallons or: import my_unit_sytem as U and go from there -- frankly, it's how most of PYthon already works. If you want a quick and easy uit-aware calculator -- then write an application or pre-processor for the code.
if there were a way to use a module level registry that sure -- I'm not sure that's possible or easy, but please don't make it global so that when I write code in a library, I have no idea what context I'll be working in. But honestly, I don't think I like the idea -- but no one has actually fleshed out exactly how it would work -- so maybe I would like the actual proposal -- who knows? But I would hope that if anyone does come up with a proposal, they will address the core issue I'm harping on here: when I write code that may be run in the context of someone else's "application" (or my own, two years later :-) ) -- I want to know exactly what the unit calculations will mean, and that they won't be messed with at run time by a standard recommended practice. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On 4/11/22 11:06, Chris Angelico wrote:
Steven is, as are a few who have agreed that namespaces are the One True Way™ to do things.
That seems a grossly unfair characterization of those who don't agree with you. I think everyone should take a break from this thread -- it is apparent that no one is convincing any one else, so the final decision will be by the SC (assuming a PEP is ever made). -- ~Ethan~

On 4/11/22, Steven D'Aprano <steve@pearwood.info> wrote:
You know how every OS process has its own working directory? Just like that, except every module.
A per-thread working directory makes more sense to me. But it would be a lot of work to implement support for this in the os and io modules, for very little gain.
Windows has up to 27 working directories per process. There's the overall working directory directory, plus one for each drive.

On Mon, Apr 11, 2022 at 11:53:18AM -0500, Eryk Sun wrote:
Hmmm, yes, that does seem sensible.
Sure.
Today I learned something new, thank you. How does that work in practice? In Windows, if you just say the equivalent to `open('spam')`, how does the OS know which drive and WD to use? -- Steve

On 4/11/22, Steven D'Aprano <steve@pearwood.info> wrote:
"spam" is resolved against the process working directory, which could be a UNC path instead of a drive. OTOH, "Z:spam" is relative to the working directory on drive "Z:". If the latter is r"Z:\foo\bar", then "Z:spam" resolves to r"Z:\foo\bar\spam". The working directory on a drive gets set via os.chdir() when the process working directory is set to a path on the drive. It's implemented via reserved environment variables with names that begin with "=", such as "=Z:" set to r"Z:\foo\bar". Python's os.environ doesn't support getting or setting these variables, but WinAPI GetEnvironmentVariableW() and SetEnvironmentVariableW() do.

On Tue, 12 Apr 2022 at 03:49, Steven D'Aprano <steve@pearwood.info> wrote:
It uses the "default drive" + "current directory on that drive". If you say `open("c:spam")`, Windows uses "drive C" + "current directory on drive C". If you say `open("/spam")`, Windows uses "default drive" + "explicit directory". Hence there are 26 current directories (one per drive), plus the selection of current drive, which effectively chooses your current directory. ChrisA

On 4/11/22, Chris Angelico <rosuav@gmail.com> wrote:
If you say `open("/spam")`, Windows uses "default drive" + "explicit directory".
You can think of a default drive as being the drive of the current working directory, but there is no "default drive" per se that's stored separate from the working directory. Python and most other filesystem libraries generalize a UNC "\\server\share" path as a 'drive', in addition to drive-letter drives such as "Z:". However, the working directory is only remembered separately from the process working directory in the case of drive-letter drives, not UNC shares. If the working directory is r"\\server\share\foo\bar", then r"\spam" resolves to r"\\server\share\spam". If the working directory is r"\\server\share\foo\bar", then "spam" resolves to r"\\server\share\foo\bar\spam". However, the system will actually access this path relative to an open handle for the working directory. A handle for the process working directory is always kept open and thus protected from being renamed or deleted. Per-drive working directories are not kept open. They're just stored as path names in reserved environment variables.
If the process working directory is a DOS drive path, then 26 working directories are possible. If the process working directory is a UNC path, then 27 working directories are possible.

On Tue, 12 Apr 2022 at 04:46, Eryk Sun <eryksun@gmail.com> wrote:
Ah, fair. I described it that way because I thought that it was equivalent, but I stand corrected.
Which raises the question: what if the current directory no longer has a path name? Or is that simply not possible on Windows? I know that on Linux, I can unlink a directory while being in it (which creates interesting problems for bash, git, and other tools, but not fundamental ones for the process model itself), and many references WILL be resolved correctly. Or I can move the directory to another parent, and "pwd" says the original name (because that's a shell feature), but "ls .." will list the contents of the new directory.
Yup yup. This is what I get for oversimplifying and forgetting that UNC names are paths/drives too. (Don't even get me started on prefixing paths with \\?\ and what that changes. Windows has bizarre backward compatibility constraints.) ChrisA

On 4/11/22, Chris Angelico <rosuav@gmail.com> wrote:
Which raises the question: what if the current directory no longer has a path name? Or is that simply not possible on Windows?
The process working directory is opened without FILE_SHARE_DELETE sharing. This prevents opening the directory with DELETE access from any security context in user mode, even by the SYSTEM account. If the handle for the working directory is forcefully closed (e.g. via Process Explorer) and the directory is deleted, then accessing a relative path in the affected process fails with ERROR_INVALID_HANDLE (6) until the working directory is changed to a valid directory.
(Don't even get me started on prefixing paths with \\?\ and what that changes. Windows has bizarre backward compatibility constraints.)
Paths prefixed by \\?\ or \\.\ are not supported for the process working directory and should not be used in this case. The Windows API is buggy if the working directory is set to a prefixed path. For example, it fails to identify a drive such as r"\\?\C:" or r"\\?\UNC\server\share" in the working directory, in which case a rooted path such as r"\spam" can't be accessed.

I’ve only scanned this thread and may have missed an explanation for this, but why must there be a global registry? Explicit imports, possibly with a new dunder protocol for literal convertors feels like a better fit to the language to me. This also avoids the problem with a global registry you mention earlier. For example: : # si_units.py: : : class MeterConvertor: : def __literal__(self, value): : … : return converted : : m = MeterConvertor() : : : # script.py: : : from si_units import m : : length = 10_m # calls m.__literal__(…) That’s assuming this has to be in the language in the first place, I don’t have a strong opinion about that because I don’t have a use case for this feature myself. Ronald — Twitter / micro.blog: @ronaldoussoren Blog: https://blog.ronaldoussoren.net/

On Sun, 10 Apr 2022 at 18:44, Ronald Oussoren via Python-ideas <python-ideas@python.org> wrote:
That's precisely what Steven is arguing for, and I'm arguing against. Primarily because YAGNI, but also for a more fundamental reason: the registry of units is not simply a definition of name -> value, which a namespace is capable of (but I think is wrong for this situation), but it's also a definition of which units can be converted into which. That would affect how two objects interact with each other - can you add meters and miles? Can you treat the electron-volt as a unit of mass? All these sorts of questions belong to the application, and there's no useful way to divide them on module boundaries, much less other scopes. There's no point trying to make this scoped, just as we don't have lots of other things scoped (like sys.modules). Shared state across the application is a *good* thing, not a bad one. ChrisA

On Sun, Apr 10, 2022 at 2:37 AM Chris Angelico <rosuav@gmail.com> wrote:
Oh my god, no! I guess we do disagree -- if someone's using one of my packages that make use of unit conversions, I absolutely do not want its behavior to change because of the application developer's different ideas about what units mean. Or even worse, some other third party library changing the behavior of my lib, and the app developer having no idea that they are incompatible.
That could be true, if you are trying to build it into syntax -- which to me is an argument for not having it a language feature. There's no point trying to make this scoped, just as we don't have
lots of other things scoped (like sys.modules). Shared state across the application is a *good* thing, not a bad one.
Not in this case, it isn't (IMHO :-; ) -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

The registry is a mapping from names to objects implementing some kind of interface, and modules are a good solution for that.
Why not, that’s what we do for all other types and interfaces? Compatibility and interaction should IMHO be part of the implementation of a set of cooperating types, and that’s orthogonal to how you get access to those types.
I’m pretty sure singletons are considered to be a code smell in general ;-) But as I wrote earlier, I don’t have a use case for using this feature myself. The closest I get are tuples with the same shape but different interpretation, such as various color representations. For those using regular class syntax is obviously a better choice than using tuples (e.g. RBG(1,2,3)). The few times I need to deal with units I can either ignore them, or use a normal type (e.g. meters(1)). Ronald
— Twitter / micro.blog: @ronaldoussoren Blog: https://blog.ronaldoussoren.net/

Brian McCall writes:
According to you. I would like dollars and shares to be supported as well as perhaps "kg of apples" (vs "kg of oranges"). What is a unit? It is an attribute of a quantity that says how it may be combined with other quantities to produce more quantities. In some situations (determining whether I should have "driven the backroads so I wouldn't get weighed") I'm happy to ignore the commodities and add up the kg, but in others (making out the invoice) I don't want to multiply price of one commodity by the quantity of another. I'm sure there are other applications besides accounting and physics, and each will have its own idiosyncratic constraints on combining quantities, ie, its own units.
This is a job for ... Superlibrary! IOW, I'm with David. There may be a justification for supporting custom literals in the language, but really, in this it's just syntactic sugar for constructing an object of a derived class (ie, a number, unit couple) and should be designed that way. I think it's probably way overkill to support quantities (ie, numbers with units) as a primitive type in the language. For many computations, a library will be fast enough. For iterative computation or large data sets, I would guess it's very rare to want to check compatibility of units on every operation. Rather, for function calls you want to treat them like types of the actual arguments, and for vectorized computations, normally the whole vector will have the same units. And even if you did treat quantity as a primitive type, few existing programs are likely to ported to use the feature (and if the guts are some ancient Fortran library, they can't be, you'll have to strip the type information off the numbers before operating on them anyway).

On Mon, Apr 04, 2022 at 02:47:48PM -0000, Brian McCall wrote:
Every problem can be bounded by the amount of matter and energy in the universe :-) More practically, the problem is bounded by the number of addressable memory locations (2^64) and more practically still, by the amount of memory you actually have. Presumably there is only a finite number of named measurement units which have ever been used in history, maybe a few thousand or so. A few days ago I pointed out that the Unix "units" program listed 2000+ units. I upgraded to a more recent version, and it now has over 3000: [steve ~]$ units Currency exchange rates from FloatRates (USD base) on 2018-10-20 3070 units, 109 prefixes, 109 nonlinear units If the implementation had some reasonable hard limit like 20,000 named units, I wouldn't complain. But why? The obvious mapping from unit names to values is to use a dict, which is limited only by memory.
If you use the geometrized unit system, you need only one base unit, the metre. Everything can be written as a small power of length. But for a more practical system, I count a minimum of 12 base dimensions: * length * mass * time * electric current * thermodynamic temperature * luminous intensity * amount of substance * amount of information * currency * plane angle * solid angle * other dimensionless (the name "uno" was proposed in 2003) Some of these are strictly dimensionless in terms of the standard physics dimensional quantities, but we really don't want to be able to add 30 degrees to 1 yoctomol and get 1.1 radians. (Reasonable people may disagree. So may unreasonable people.) But as discussed in other parts of this loooong thread, there are other dimensionless quantities which we may wish to treat as distinct. Ratios? Then there are at least three types of non-linear units: - temperatures - log scales such as bels (decibels) and moment magnitude scale for earthquakes - piecewise linear (wire gauges etc) and lets not forget prefixes. -- Steve

Will Bradley writes:
I'm pretty sure there is no PEP on that specifically, and I can't recall it being proposed. I don't know about C++-style user literals, but in general Python has rejected user-defined syntax on the grounds that it makes the language harder to parse both for the compiler and for human beings. Python has a fixed set of operator symbols in any version (operator symbols have been added in the past and more may be added in the future), with a standard way of providing semantics for each operator (with a few exceptions: assignment is name binding, not an operation on objects, and the semantics can't be changed, 'is' is object identity and can't be overloaded, and the ternary operator 'x if test else y' can't be overloaded). Proposals for user-defined operator symbols as in Haskell's `` notation or R's %% notation have been outright rejected on that principle. There was a proposal to provide literal syntax for physical units like meters, kilograms, and seconds, along the the SI magnitude prefixes. I think that got to the "proto-PEP" stage, but it got a lot of weak opposition for a number of reasons, mostly "Python isn't intended for creating DSLs and the units module gives the semantics needed via classes", and petered out without an actual PEP IIRC. There are frequently proposals to give the Decimal constructor a literal syntax, always rejected on the grounds that it's not needed and there hasn't been a really compelling syntax that everybody likes. There are also frequent proposals to create special string literals, with occasionals successes like rawstrings (the r"" syntax) and more recently f-strings. The most popular is some kind of lookup syntax with the primary use case being internationalization, which typically grabs the single underscore as the marker for a localizable string (it's aliased to the gettext function). This is annoying and inconvenient because it conflicts with the common usage of "_" as a placeholder. This hasn't yet had a compelling proposal, I think there's a dormant PEP for "i-strings". I'm sorry that my memory is not entirely clear, but that gives you some idea of how such discussions have gone in the past. The f-string PEP and the i-string PEP, there's also a PEP for "null coalescing" operators I believe, would give you some idea of how the core devs think about these issues. Perhaps somebody with a better memory can be more precise. Until then, this is what I've got. :-) Steve

On Sun, Apr 03, 2022 at 01:09:00PM +0900, Stephen J. Turnbull wrote:
Python is excellent for creating DSLs. It is one of the things it is well known for. https://www.startpage.com/sp/search?query=writing+dsls+in+python
That's not my recollection. My recollection is that in principle, at least, there is a reasonable level of support for a built-in decimal type, no strong opposition, and consensus that the syntax that makes the most sense is a "d" suffix: 6.0123d The implementation would be a fixed-precision (64- or 128-bit) type rather than the variable precision implementation used in the decimal module, which would massively reduce the complexity of the implementation and the public interface. (No context manager for the builtin decimal, fixed precision, only one rounding mode, no user-control over what signals are trapped, etc. If you need all those bells and whistles, use the decimal module.) The discussion fizzled out rather than being rejected. Whether it would be rejected *now*, two or four(?) years later, by a different Steering Council, is another story.
There are also frequent proposals to create special string literals, with occasionals successes like rawstrings (the r"" syntax)
Raw strings were added in Python 1.5 to support the new re module: https://www.python.org/download/releases/1.5/whatsnew/ There was no formal mechanism for adding new features back then. -- Steve

On 2022-04-02 22:28, Steven D'Aprano wrote:
I'm not the person you're replying to, but a lot of those search results are pretty clearly not what was meant here. Python is fine for creating "real" DSLs, where the L is actually a separate language and Python is just parsing/interpreting it. What Python isn't so good at is creating quasi-DSLs or "DSDs" (domain specific dialects), where Python itself is the language and the domain-specific part is grafted on by use of objects, operator overloading, etc., so that what you run is actually a Python program that just looks and behaves a bit different from what you might expect from "vanilla" Python. This is the "Python isn't for DSLs" argument that I've seen mentioned on this list and elsewhere (although I agree that it's a pretty loose use of "DSL"). -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

*SOAP BOX WARNING* It's not often that I would say that C++ is easier to read or more WYSIWYG than Python, but in this case, C++ is clearly well ahead of Python. I have spent a fair amount of my own time, and I have seen so many others' time wasted because command line or input fields do not include units, or the inputs field units are accidentally handled with different units, or units are not used at all. I get the sentiment that Python, or programming languages in general, are not meant to deal with units. From the perspective of a computer scientist, I can understand why this would be seen as a level of abstraction too high for programming languages and core libraries to aspire to. But from the perspective of a scientist or engineer, units are a CORE part of language. Anyone who has taken science or engineering classes in college knows what happens when you turn in homework with missing units in your answers - zero credit. Anyone who has worked out complicated calculations by hand, or with the help of packages like "units" knows the sinking feeling and the red flags raised when your answer comes out in the wrong units. There has also been a shift in the expectations of scientists and engineers regarding their programming capabilities. A generation ago, a good many of them would not be expected to use their computers for anything more than writing documents, crunching numbers in a spreadsheet, or using a fully integrated task-specific application for which their employer paid dearly. These assumptions were codified in workflows and job descriptions. Today, if your workflow, especially in R&D, has a gap that Microsoft Office or task-specific software doesn't solve for you, then you are pretty much expected to write your own code. Job postings for engineering roles (other than software engineering) regularly include programming in their required skills. Software design, on the other hand, is rarely a required or hired skill. And even though these scientists and engineers are required to know how to program, they are almost never *paid* to write code. Spending any more time than needed writing code, even if it is to fill a critical gap in a workflow, is seen as a negative. So software design best practices are non-existent. All of this leads to very poor practices around and improper handling of an absolutely essential part of scientific and engineering language - units. If you had asked me twenty years ago if I thought units should be a native part of any programming language, I would have said absolutely - because in my youthful ignorance I had no idea what it would take to make such a thing work. Five years later, I would have said "not worth it". Now I'm back where I started. The lack of native language support for SI units is a problem for an entire segment of programmers. Programming languages took a big step forward in deciding that EVERYTHING is a pointer/reference, and EVERYTHING is an object. They need to take another step forward to say that EVERY number has a unit, including "unitless". Not having this language feature is becoming (or already is) a problem. The question is, is it Python's problem?

HEAR HEAR! BUT- SI units isn't enough. Engineers in the US and Canada (I have many colleagues in Canada and when I ask they always say: we pretend to use SI but we don't) have all kinds of units. Give us native, customizable units, or give us death! Who's with me??!! ... .... I'm kidding to a degree but I did feel a swell of excitement as I read this response. :) The libraries out there- pint is probably the biggest one- have filled those gap as much as they can, but there are so many shortfalls... The old engineering disciplines- mine (civil engineering), structural, electrical, etc- are the next frontier in the "software eats the world" revolution, and they desperately need a language with native units support. I was just on an interview call yesterday for a senior engineer role at a large multinational earth works engineering firm and we spent 15 minutes talking about software and what we see coming down the road when it comes to the need for our discipline to grow in its software creation capabilities. Python SHOULD be that language we do this with. It is awesome in every other way. But if it isn't DEAD SIMPLE to use units in python, it won't happen. I don't know what the solution is. I'm looking to you software engineers, you true geniuses and giants of your fields, to figure that out for me. But once you hand it to me I promise I will evangelize it to the ends of the Earth. On Sun, Apr 3, 2022, 2:56 PM Brian McCall <brian.patrick.mccall@gmail.com> wrote:

Looks like this segue moved on to a new thread, but I'm glad I'm not the only one who thinks this way!

On Mon, 4 Apr 2022 at 04:53, Brian McCall <brian.patrick.mccall@gmail.com> wrote:
If you had asked me twenty years ago if I thought units should be a native part of any programming language, I would have said absolutely - because in my youthful ignorance I had no idea what it would take to make such a thing work. Five years later, I would have said "not worth it". Now I'm back where I started. The lack of native language support for SI units is a problem for an entire segment of programmers. Programming languages took a big step forward in deciding that EVERYTHING is a pointer/reference, and EVERYTHING is an object. They need to take another step forward to say that EVERY number has a unit, including "unitless". Not having this language feature is becoming (or already is) a problem. The question is, is it Python's problem?
Part of the problem here is that Python has to be many many things. Which set of units is appropriate? For instance, in a lot of contexts, it's fine to simply attach K to the end of something to mean "a thousand", while still keeping it unitless; but in other contexts, 273K clearly is a unit of temperature. (Although I think the solution there is to hard-disallow prefixes without units, as otherwise there'd be all manner of collisions.) Is it valid to refer to fifteen Angstroms as 15A, or do you have to say 15Å, or 15e-10m and accept that it's now a float not an int? Similarly, what if you want to write a Python script that works in natural units - the Planck length, mass, time, and temperature? Purity and practicality are at odds here. Practicality says that you should be able to have "miles" as a unit, purity says that the only valid units are pure SI fundamentals and everything else is transformed into those. Leaving it to libraries would allow different Python programs to make different choices. But I would very much like to see a measure of language support for "number with alphabetic tag", without giving it any semantic meaning whatsoever. Python currently has precisely one such tag, and one conflicting piece of syntax: "10j" means "complex(imag=10)", and "10e1" means "100.0". (They can of course be combined, 10e1j does indeed mean 100*sqrt(-1).) This is what could be expanded. C++ does things differently, since it can actually compile things in, and declarations earlier in the file can redefine how later parts of the file get parsed. In Python, I think it'd make sense to syntactically accept *any* suffix, and then have a run-time translation table that can have anything registered; if you use a suffix that isn't registered, it's a run-time error. Something like this: import sys # sys.register_numeric_suffix("j", lambda n: complex(imag=n)) sys.register_numeric_suffix("m", lambda n: unit(n, "meter")) sys.register_numeric_suffix("mol", lambda n: unit(n, "mole")) (For backward compatibility, the "j" suffix probably still has to be handled at compilation time, which would mean you can't actually do that first one.) Using it would look something like this: def spread(): """Calculate the thickness of avocado when spread on a single slice of bread""" qty = 1.5mol area = 200mm * 200mm return qty / area Unfortunately, these would no longer be "literals" in the same way that imaginary numbers are, but let's call them "unit displays". To evaluate a unit display, you take the literal (1.5) and the unit (stored as a string, "mol"), and do a lookup into the core table (CPython would probably have an opcode for this, rather than doing it with a method that could be overridden, but it would basically be "sys.lookup_unit(1.5, 'mol')" or something). Whatever it gives back is the object you use. Does this seem like a plausible way to go about it? ChrisA

Why don't we allow different libraries to use different, incompatible implementations of integers, floating points, and bool? Standard units are just as immutable as any of these data types.

On Mon, 4 Apr 2022 at 18:28, Brian McCall <brian.patrick.mccall@gmail.com> wrote:
Why don't we allow different libraries to use different, incompatible implementations of integers, floating points, and bool? Standard units are just as immutable as any of these data types.
Those three data types are unambiguous, but a more reasonable parallel would be: Why don't we allow different libraries to use different, incompatible implementations of numbers? And we do. There are rationals and decimal floats in the standard library, and plenty of third party libraries with additional numeric data types. And guess what? There have been lots of calls for Decimal literals too :) So, yup, not that different. ChrisA

Asked and answered! Although, see below*, the additional representations of these numbers does not mean that "int", "bool", and "float" have no place in the core language. *Here is a URL to a GIF of the good people of Letterkenny saying "to be fair": https://media.giphy.com/media/Nl6T837bDWE1DPczq3/giphy.gif
And guess what? There have been lots of calls for Decimal literals too :)
I believe it, and I support it!

On Mon, Apr 04, 2022 at 08:27:45AM -0000, Brian McCall wrote:
Why don't we allow different libraries to use different, incompatible implementations of integers, floating points, and bool?
We do. numpy supports 32-bit and 64-bit ints and possibly others, gmpy supports mpz integers. I don't know about floats, but there's nothing stopping anyone from developing a library for 32-bit floats, or minifloats, or posits, or whatever.
Standard units are just as immutable as any of these data types.
Immutability vs mutability is just one design decision out of many that we would have to make. Regardless of which way we go, we still have to deal with the facts that: * There are an unlimited number of derived (non-SI) and compound units that people will want to use. * Many of those can have conflicting names, e.g. "mile" can refer to any of Roman mile, international mile, nautical mile, U.S. survey mile, Italian mile, Chinese mile, imperial mile, English *miles* (note plural), and many more. * To say nothing of having to deal with adjustments to the definitions, e.g. a kilometre in 1920 is not the same as a kilometre in 2020, and applications that care about high precision may care about the difference. Having a single interpreter-wide namespace for units will cause many name collisions. I expect that units should be scoped like variables are, with some variant of the LEGB (Local, Enclosing, Global, Builtin) scoping rules in place. At the very least, we should have two namespaces, per module and per interpreter. That will allow modules to register their own units without stomping all over those over other modules. -- Steve

Asked and answered!
* There are an unlimited number of derived (non-SI) and compound units that people will want to use.
Unlimited? You sure that problem can't be bounded? There are few things I can think of that could bound this problem in a performance-friendly manner. In terms of the internal representation of units, the representation that is use for machine calculations, there are only 7 units that need to be supported. Everything else is a product of powers of these 7 units. So you can represent every combination with 7 counters. And those counters do not need to have lots of bits. If you're using units in a way that leads to meters**255, then you might have a bug in your code, or you might be doing something that doesn't really need units. 4-8 bits are enough to store the powers of the 7 SI quantities (4-8 bytes). Translating those 7 quantities to the few hundred standard derived units can be handled by higher level libraries, which may still require counters of multiple types of units depending on the level and breadth of support being maintained.
True. It's a problem. Might require additional unit sets and/or namespaces. But in 3020, will we still be using Python?
Yes, yes, yes!

On Tue, 5 Apr 2022 at 00:48, Brian McCall <brian.patrick.mccall@gmail.com> wrote:
That would only be true if we had infinite-precision numbers. You can't simply store "this is two lengths divided by a time" and expect everything else to work perfectly.
The trouble with namespacing like this is that you need to be extremely verbose about which units you're using in which module. With a single interpreter-wide namespace, all you have to do is ask the SI module to register itself, and you're done, they're available. Is it really that much of a problem? Tell me: How often do you REALLY expect to have collisions within an application, but in different modules? YAGNI. ChrisA

On Tue, Apr 05, 2022 at 04:02:24AM +1000, Chris Angelico wrote:
You have no idea how many different definitions there are for "mile", do you? :-) And I don't just mean historical miles, before standardisation. I mean even in current use, in English speaking countries. (And its not just miles that this problem affects.) Sure, we can demand that every application that needs to deal with US survey miles and imperial miles and international miles give them all distinct names. That's one solution, but not the only solution. But even if you do that, having one interpreter-wide database that any random library can write to is asking for trouble. If this becomes widespread, expecting libraries to "just don't overwrite existing units" is not good enough. Wait until you import some library which is not quite so accurate in its definitions as yours, and it tramples all over your system-wide database with its own (slightly different) definitions. How would you like your unit conversions to differ according to the order in which you import your libraries? "If I import cheddar first, then camembert, my lander safely lands on Mars, but if I import camembert first, then cheddar, it crashes into the planet at 215 miles per hour." Awesome. Its 2022, and you're arguing in favour of a single system-wide database where any random module can monkey-patch the definitions used by all other modules. Ouch. This is exactly analogous to the situation Python would have if there were no per-module globals, just the system-wide builtins, and every library stored top-level variables and functions in that namespace. *shudders* Look, I know that in Python, any module *might* sneak into my globals and modify them. But in practice, they don't, and that would be considered malware if they did it without very good reason and very obvious documentation. But to have a situation where *by design* all modules trample over each other's defined units, that's a suboptimal design. (I'm being polite there.) -- Steve

On Tue, 5 Apr 2022 at 13:00, Steven D'Aprano <steve@pearwood.info> wrote:
I don't, but I know there are many. But that's not the problem. The problem is: Do you ever have one module that's using statute miles and another that's using nautical miles, but *not both in the same module*? The only reason to have them namespaced to modules is to allow different modules to use them independently. If your application needs to use both statute and nautical miles in the same module (most likely the main module), then it's going to have an issue, and your proposal adds a ton of complexity (that's a real unit, by the way, I totally didn't make it up) for no benefit whatsoever.
My solution is to allow the very very few applications that need both to do some sort of disambiguation. Of course, this is only significant if you need *literals* of all of them. The units themselves can be distinct, even if each one would want to register itself with the name "mile".
What's your proposal? from units.SI import * ? This pollutes your main namespace *and* still has all the same problems.
"If I import * from cheddar first, then camembert, then I have issues". What's the difference? You're looking at a fundamentally identical problem, and thinking that it's fundamentally solved by module-level separation? Show me some evidence.
Yup I am! Have you ever mutated sys.modules? That's a system-wide database. And there are lots of good reasons to insert things into it. What about importing the logging module and configuring it prior to importing something that spews a ton of messages during its own import? Been there, done that. Yes, a system-wide database isn't actually as terrifying as you seem to think - AND a module-scale separation doesn't even help.
Straw man. It's more like using decimal.getcontext() and making changes. That's global. Do we have per-module Decimal contexts? Do we need them? No. In fact, the only way to change context is globally - though you can do so temporarily. That means you do not have any module-level separation *at all*. I don't hear the Decimal folks screaming about that. You want to add large amounts of completely unnecessary complexity on the basis that the module is the fundamental and sole correct place to namespace these. I'm not seeing any supporting arguments, other than "what if there were collisions? WON'T SOMEONE THINK OF THE COLLISIONS!". Please, show me where there are collisions across modules, and not within a module. That's what I asked, in the snippet you quoted.
I disagree, and I'm also being polite here. Let's keep it that way. ChrisA

On 2022-04-05 12:17 a.m., Chris Angelico wrote:
from units.survey import mile as s_mile from units.imperial import mile as i_mile from units.roman import mile as r_mile We could bikeshed endlessly on how exactly to tell the interpreter to use an imported name as a literal suffix (it could just be that it calls a new dunder), but it seems to me that the way to disambiguate a name conflict in imported modules is very much already a solved problem. I don't quite understand why you want to add a different system that introduces a name conflict issue AlexB

On 5/04/22 4:17 pm, Chris Angelico wrote:
If there's a single global registry, and they both register the unit under the same name, then there will be problems if both modules are imported by the same program, even if the two units are never used together in the same module.
What's your proposal?
I'm not sure, but we really need to avoid having a global registry. Treating units as ordinary names looked up as usual would be the simplest thing to do. If you really want units to be in a separate namespace, I think it would have to be per-module, with some variant of the import statement for getting things into it. from units.si import units * from units.imperial import units inch, ft, mile from units.nautical import units mile as nm
It's more like using decimal.getcontext() and making changes. That's global.
Personally I think giving Decimal a global context was a mistake, so arguing that "it's no worse than Decimal" isn't going to do much to convince me. :-) But in any case, a Decimal context and the proposed global unit registry are very different things. Just because one doesn't seem to cause problems doesn't mean the other won't either. -- Greg

On Tue, Apr 05, 2022 at 02:17:00PM +1000, Chris Angelico wrote:
That's not the real problem. The real problem is that my program may: * import ham, which registers mile as 1609.3412 m * import spam, which registers mile as 1609.344 m * import cheese, which registers mile as 1609.3472 m * import aardvark, which registers mile as 1609.3426 m * import hovercraft, which registers mile as 1853.181 m and then do calculations in miles, before converting to metres, and the results I get will be subtly (or not so subtly) different depending on the order I import those modules. (By the way, none of the above are nautical miles; of which there are at least three.)
"If I import * from cheddar first, then camembert, then I have issues".
And that is why you shouldn't `import *`. This is an old, well-known issue with wildcard imports.
You are correct that this is fundamentally identical to the problem that namespaces are designed to solve. This is why modern languages don't have one single system-wide namespace. We have 30+ years of Python programming, and 40-odd years of programming prior to Python, showing that the solution to the name collusion problem is to have distinct namespaces rather than one single system-wide namespace that everything writes to. That's exactly my point. Of course if I do this: from spam import mile from eggs import mile then I have a namespace collision that results in last value winning. But that's kinda obvious doncha think? :-) Importantly, just doing from spam import mile import eggs will not collide, except under the very unusual case that eggs gets up to no good by writing to the importing module's namespace. (Is that even possible? At import time, can eggs tell which module is importing it?)
Have you ever mutated sys.modules?
Not directly, no, except by the approved method of calling `import`, which never over-writes an existing entry, only adds new entries. Nor have I ever mutated the central registry of codecs to *replace* an existing encoder (like UTF-8) with my own. Likewise for error handlers. There's only a relatively small number of each, and the two registries change so rarely that there is next to zero chance that I might accidently trample over an existing codecs or error handler with my own. And I do not expect that arbitrary imports will make changes to those registries. Never have I worried that `import spam` might change the meaning of the 'utf-8' codec, or replace some error handler with one with the same name but different behaviour. But if there were thousands of codecs, and every second module I imported could potentially add or delete those codecs, then I would have to worry about these things. The system would be unworkable and we would have to find a better one. With units, there are thousands of named units, with many name collisions. The system would be unworkable with only a single interpreter-wide registry.
The situation is analogous, but not identical. The decimal context is not a interpreter-wide registry of long-lasting entities intended to be used by any and all modules. It is a per-thread part of the decimal API. Its not even a very close analogy: aside from sharing the vague concept of "global state" with your units registry, there's nothing like registering a unit ("furlongs per fortnight") in the decimal context. There are only eight settings, and you cannot set arbitary attributes in decimal contexts. The decimal context environment isn't even truly interpreter-wide. It is per thread storage, so every thread has its own independent environment. Other modules (except possibly your application's main module) are not expected to modify the current context, although that's not enforced. (This is Python: you can shoot yourself in the foot if you really want to.) It would be considered *badly-behaved* for other modules or functions to directly modify the current decimal context instead of using localcontext() to temporarily change the current context. P.S. localcontext makes a copy of the context. Just sayin'.
That's global. Do we have per-module Decimal contexts?
If you want it, you can have it. See PEP 567.
They don't need to, because arbitrary modules don't make changes to the decimal current context. That would be harmful, so well-behaved code uses localcontext(), and badly-behaved code doesn't get used. But with your central, interpreter-wide registry of units, modules which wish to use a named unit have no other choice than to register it with the central registry. If my module aarvark.py wants to use a named unit, the pottle (a historical measurement equal to 1/2 a British gallon), what can I do? I can check the registry to see if pottle is already defined. If it isn't, great, I can install it, and it will now be visible to every module, whether they need it or not. But what if its already there, with a different value? Now I have only two equally bad choices: 1. overwrite the system-wide pottle entry, breaking other modules; 2. or do without. Because the registry is system-wide, I cannot define my own pottle unit without nuking other module's pottle unit. -- Steve

On Sat, 9 Apr 2022 at 00:34, Steven D'Aprano <steve@pearwood.info> wrote:
Would it be better if you wrote it like this? import SI; si.register() I would be hard pressed to imagine a situation as pathological as you suggest. Aside from a (relatively small) number of common systems, most measurement systems are going to be sufficiently special purpose that they're going to be the entire application. If you have a library that chooses to register a common name like "mile", it's no different from that library doing something like "decimal.getcontext().prec = 2", which is a fully documented feature. Some features belong to the application, not the library, and I don't think that's spoiled other things before. We cope.
Right. Remind me why most command shells have a single system-wide namespace, then? Or is it a really good idea in programming but not in scripting?
Yes. I have never disputed the value of namespaces as a way of stopping names from colluding. Or colliding. What I'm disputing is that the *module* is the one and only namespace that is right here. You haven't yet shown a single bit of evidence for that.
(Is that even possible? At import time, can eggs tell which module is importing it?)
I'm sure anything's possible with sys._getframe.
It's an incredibly useful way to mock things. You provide a module before something else calls on it. (It's also a good way for a module to replace *itself*, although that's less commonly needed now that you can do module-level getattr.)
Right. And, again, these namespaces are not per-module, yet you aren't bothered by someone registering a name that you want. Why is the module the perfect scope for units?
You don't expect it. But somehow you DO expect arbitrary imports to mutate the unit namespace. Why?
[citation needed] Do libraries tend to work in this way, giving unitted values in a system different from the one the application uses? Is that actually a thing, or are you just guessing?
It's an interpreter-wide registry of long-lasting settings that affect any and all modules that use decimals. And yes, it's per-thread, but given your long-seated fear of threads, I'm surprised you even consider that to be a difference.
So? You can most certainly mess up some other module that uses decimals. Any module you import could set the default context's precision to a really low value, which would mess up all kinds of things. Yet we do not fear this, because we expect that libraries won't do that.
Right. So, wouldn't it be equally ill-behaved for a module to randomly register units? Why is this different? Aside from those whose specific purpose is to be unit providers, libraries shouldn't be registering units. Otherwise they are behaving badly. I don't see this as any different from what we already have.
P.S. localcontext makes a copy of the context. Just sayin'.
(By default)
Does it need that to be in the source code? The registry applies ONLY to source code (although, of course, it would also be a good place for parsers to look).
3. Use the unit in a non-source-code way.
Because the registry is system-wide, I cannot define my own pottle unit without nuking other module's pottle unit.
Good. You shouldn't be defining your own pottle unless you are the application. Global settings belong to the application. Libraries shouldn't be calling os.chdir(), messing with the default Decimal context, or monkeypatching the Random module to give a fixed sequence of numbers, without the consent of the application. I don't see a problem here. ChrisA

On Fri, Apr 8, 2022 at 8:29 AM Chris Angelico <rosuav@gmail.com> wrote:
another that's using nautical miles, but *not both in the same module*?
Absolutely! There is talk about "the Application" as though that's one thing, but Python applications these days can be quite large collections of third party packages -- each of which do not know about the other,a nd each of which may be using units in different ways. For example, I have an application that literally depends on four different JSON libraries -- each used by a different third-party package. Imagine if the configurable JSON encoding/decoding settings were global state -- that would be a disaster. Granted * Python is dynamic and has a global module namespace, so packages CAN monkey patch and make of mess of virtually anything. * "Well behaved" packages would not mess with the global configuration. But that doesn't mean that it wouldn't happen -- why encourage it? Why have a global registry and then tell people not to use it? Having a global registry/context/whatever for something that is designed/expected to be configured is dangerous and essentially useless. I'm not sure if this is really a good analogy, but it reminds me of the issues with system locale settings: Back in the day, it seemed like a great idea to have one central palceon a computer to set these nifty things that apply to that particular computer. But enter the internet, where the location the computer the code is running on could be completely unrelated to where the user is and what the user wants to see, and it's a complete mess. Add to that different operating systems, etc. To this day, Python struggles with these issues -- if you use the default settings to open a text file, it may get virtually any encoding depending on what system the program is running on -- there is a PEP in progress to fix that, but it's been a long time! Dateitme handling has the same issues -- I think the C libs STILL use the system timezone settings. And an early version of the numpy datetime implementation did too -- realy bad idea. In short: The context in which code is run should be in complete control of the person writing the code, not the person writing the "application". Again: practical use case with units: I maintain a primitive unit conversion lib -- in that lib, I have a "registry" of units and names and synonyms, etc. That registry is loaded at module import, and at that time it checks for conflicts, etc. Being Python, the registry could be altered at run time, but that is not exposed as part of the public API, and it's not a recommended or standard practice. And this lets me make all sorts of arbitrary decisions about what "mile" and "oz" and all that means, and it's not going to get broken by someone else that prefers different uses -- at least if they use the public API. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Sat, 9 Apr 2022 at 02:31, Christopher Barker <pythonchb@gmail.com> wrote:
You're misunderstanding the difference between "application" and "library" here. Those are four separate libraries, and each one has a single purpose: encoding/decoding stuff. It is not the application. It is not the primary purpose of the process. If one of those JSON libraries were to change your process's working directory, you would be extremely surprised. We aren't bothered by the fact that os.chdir() is global, we just accept that it belongs to the application, not a library. The Application *is* one thing. It calls on libraries, but there's only one thing that has command of this sort of thing. General rule: A library is allowed to change things that belong to the application if, and only if, it is at the behest of the application. That's a matter of etiquette rather than a hard-and-fast rule, but we decry badly-behaved libraries for violating it, rather than blaming the feature for being global.
For precisely the same reason that we have so many other global registries. It is simplest and cleanest to maintain consistency rather than try to have per-module boundaries. When you have per-module features, refactoring becomes more of a hassle. I've fielded multiple questions from people who do "import sys" in one module, and then try to use "sys.argv" in another module, not realising that the namespace into which the name 'sys' was imported belonged only to that module. It's not too hard to explain, but it's a thing that has to be learned. The more things that are per-module, the more things you have to think about when you refactor. It is a *good thing*, not a bad thing, that a large number of settings are completely global. We do not need per-module settings for everything, and it would be a nightmare to work with if we did.
Having a global registry/context/whatever for something that is designed/expected to be configured is dangerous and essentially useless.
Only if it's expected to be configured with some granularity. And, as with decimal.localcontext(), it's perfectly possible to have scopes much smaller than modules. So my question to you, just as to D'Aprano, is: why should this be at the module scope, not global, and not narrower?
What we now have is an even broader setting: the entire *planet* is being set into a default of UTF-8, one programming language at a time. We don't need it to be per-process any more, and we definitely never wanted it to be per-module or any other finer scope. The reason for having it centralized on the computer has always been that different applications could then agree on something. Let's say you set your computer to use ISO-8859-7 (or, if you're a Microsoft shop, you might use code page 1253 for the same purpose). You're telling every single application that you're planning to use Greek text, and that it should assume that eight-bit data is most likely to be in Greek. Since text files don't have inherent metadata identifying their encodings, it's not unreasonable to let the system decide it. Of course, that never worked all that well, so I'm not sorry to see more and more things go UTF-8 by default...
Dateitme handling has the same issues -- I think the C libs STILL use the system timezone settings. And an early version of the numpy datetime implementation did too -- realy bad idea.
In short: The context in which code is run should be in complete control of the person writing the code, not the person writing the "application".
Not sure what you mean there. Obviously any date/time with inherent timezone data should simply use that, but if a library is parsing something like "2022-04-09 02:46:17", should every single library have a way for you to tell it what timezone that is, or should it just use the system settings? I put it to you that this is something that belongs to the application, unless there's a VERY VERY VERY good reason for the library to override that. (In the case of timezone settings, that could mean having some sort of hidden metadata about that string, eg you're working with the Widgets Inc API and the library knows that Widgets Inc always send their timestamps in the Europe/Elbonia timezone.) And if you mean the interpretation of timezones themselves... that definitely does NOT belong in the library. I don't want to have to dig through every single dependency to see if it needs to have tzdata updated. One single global tzdata is absolutely fine, thank you very much. You may want to use one from PyPI or one from your operating system, and there's good reasons for both, but you definitely don't want every single library having its own copy. (It's big, anyhow.)
Again: practical use case with units:
I maintain a primitive unit conversion lib -- in that lib, I have a "registry" of units and names and synonyms, etc. That registry is loaded at module import, and at that time it checks for conflicts, etc. Being Python, the registry could be altered at run time, but that is not exposed as part of the public API, and it's not a recommended or standard practice. And this lets me make all sorts of arbitrary decisions about what "mile" and "oz" and all that means, and it's not going to get broken by someone else that prefers different uses -- at least if they use the public API.
Cool. The global repository that I suggest would be completely independent, unless you choose to synchronize them. The registry that you have would be used by your tools, and source code would use the interpreter-wide ones. This is not a conflict. Of course, since you have all the functionality already, it would make a lot of sense to offer an easy way to register all of your library's units with the system repository, thus making them all available; but that would be completely optional to both you and your users. ChrisA

Not sure this conversation actually relates to any proposals at this point, but ... On Fri, Apr 8, 2022 at 9:54 AM Chris Angelico <rosuav@gmail.com> wrote:
You're misunderstanding the difference between "application" and "library" here.
No, I'm not -- your misunderstanding my (poorly made, I guess) point. Those are four separate libraries, and each one has a
single purpose: encoding/decoding stuff. It is not the application.
Of course it's not -- my point is that my application is using a bunch of third party libraries, and a number of them are using JSON, and clearly they are all using it in a somewhat different way, and the people writing that library code absolutely don't want some global settings to change how they work. os.chdir()
is global, we just accept that it belongs to the application, not a library.
Sure -- but I'd actually say that a "current working dir" is actually not helpful -- libraries shouldn't use it, ever. It can be handy for command line applications, but as you say, it's only the application that should be working with it.
Sure -- but I'm talking about the application changing global state that then affects how libraries will work -- that can only be helpful if there's a very well established standard protocol -- like current working directory, and maybe logging.
exactly.
I'm not necessarily saying that a global registry is always a bad idea, but I think it's a bad idea for most things, and is for Decimal behavior, and Units
But that is the very idea of non-global namespaces -- you aren't going to get far in Python if you don't get that. Only if it's expected to be configured with some granularity. And, as
I do like the narrower option provided by decimal.localcontext() as for module scope, not global: The principle here is that the person writing code code needs to control the context that is used -- only that person, at that time, knows what's appropriate -- the "application developer" mya have no idea whatsoever how Decimal is being used in third party libraries. in fact, they may not even know that it is being used. You could say that library writers need to be careful not to use the global context -- which I agree with -- but then it's a really bad idea to make that a default or easy to use. And given the rarity of a large application not using any third-party libs, I don't see the benefit of a global context at all. Contrast this with, e.g. logging -- in that case, a third party lib generally will want to simply log stuff to the application-defined logging system it does not need to know (or care) where a debug message is sent only that it is sent where the application configuration wants it to go.
I'm not sure if this is really a good analogy, but it reminds me of the issues with system locale settings:
The reason for having it centralized on the computer has always been that different applications could then agree on something.
sure -- having a locale is a fine idea, the problem is when a programming language uses that locale by default, or even worse without even the choice of overriding it. If an application wants to, for instance: "display this information in the local computer's locale appropriate way" -- that's a perfect use case. But "read this text file, that could have come from anywhere, using the compteres' locale settings" was always a bad idea. Sure -- you may actually want to do that -- but it should be an explicit choice, not the default. text, and that it should assume that eight-bit data is most likely to
be in Greek. Since text files don't have inherent metadata identifying their encodings, it's not unreasonable to let the system decide it.
Well, it wasn't unreasonable back in the day , but it is now -- the odds that a text files comes from the local system are slim, and even worse, very unlikely that the code is being written and tested on a system with the same settings.
That ISO string has no TZ offset -- it is a naive datetime, and it absolutely , positively should be interpreted as such. That is EXACTLY what was wrong with the first numpy datetime implementation -- it interpreted an iso string without an offset as "local time" (which I believe the ISO spec says) and so applied the locale timezone -- which was almost always the wrong thing to do, period. We all had to go through machinations to work around that.
I think there's some confusion here: I'm not saying that libraries should override system settings -- I'm saying libraries should only use these types of system settings when explicitly asked to -- not by default -- and worst of all, libraries should not use system settings that can't be overridden by the application (which is what the old C time functions did (still do?)) Again, the behavior of some code should be clear and obvious to the person writing the code. If I write code such as: np.datetime64('2022-04-10T09:33:00') I should know exactly what I'm going to get (which I do now -- numpy fixed this a good while ago -- but in it's first implementation, it should literally provide a different result depending on the local computer's settings) That doesn't mean that: np.datetime64('2022-04-10T09:33:00', apply_locale_tz_offset=True) Isn't useful, it's just that it shouldn't be default, or even worse, should be a non-overridable default -- e.g. a "global context". And if you mean the interpretation of timezones themselves... of course not, no. That belongs in its own library, which libraries that need it can then choose (or not) to use.
One single global tzdata is absolutely fine, thank you very much.
Of course it is -- I"m not saying that nothing global is useful, I'm saying that sets of global defaults and all that are very useful, but they should always be explicitly specified. If I'm writing a library, I may choose to depend on pytz. But when I write the code, I'm making that choice -- i"m not writing code simply hoping that the application using my code has made the right choice of how to deal with timezones.
Again: practical use case with units:
But if I did that, then one lib registering my units with the global repository might break some other lib's use of the global repository. A global "default units" is fine, but then anyone wanting to override it should be working with a copy, not changing the same one that other packages might be using. Which I believe is exactly what pint does, for instance. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

I'll summarize by quoting only a small part, since most of the line of argument is the same, with different examples. On Mon, 11 Apr 2022 at 02:59, Christopher Barker <pythonchb@gmail.com> wrote:
Yes, exactly. You, as application author, should know exactly what you're going to get.
What if you have an intermediary library that calls on something else? What if, say, numpy calls on the datetime module rather than doing the parsing itself? Even if there were some way to make global config changes, *the application* will still know what's happening - because the application is the one that's in charge. Libraries should not be making changes to application configs except at the behest of the application. Libraries should do things that are affected by application configs, when they are working at the behest of the application. When a library is doing its own thing, it should be independent of such configs. This applies to system-wide things too; if you use os.sep anywhere, it doesn't mean you don't know what will happen. It specifically means that you DO know what will happen, and that you'll get the OS-appropriate directory separator. If you use os.sep and then assume that you're getting backslashes, you're doing it wrong. If you use os.sep and then print it out for the human to see, you're doing it right, and it'll probably be less surprising than hard-coding slash or backslash.
Then... libraries should not register units unless it's at the behest of the application? That's exactly what I've been saying. You might as well say that one lib adding something to sys.path might break some other lib's changes to sys.path. We're not bothered by that; we call that ill-behaved libraries.
A global "default units" is fine, but then anyone wanting to override it should be working with a copy, not changing the same one that other packages might be using.
Which I believe is exactly what pint does, for instance.
What does pint do if you want to have rules like "kilograms of mass and kilograms of force should be interchangeable"? Or does it simply mandate that they're not? I put it to you that these kinds of conversion rules *belong to the application*, not to any library that calls on pint. ChrisA

On Mon, 11 Apr 2022 at 06:48, Ethan Furman <ethan@stoneleaf.us> wrote:
Probably the same way that it's always been possible - with clunkier syntax and explicit multiplications. I'm not sure. Do you actually have a use-case where a library needs to do unit-aware arithmetic independently of the application, or is this purely hypothetical? ChrisA

On 4/10/22 14:09, Chris Angelico wrote:
So mostly with none of the benefits of the new systax.
At this point I am not a unit user, but it seems that several who are would like finer grained control, and you are arguing that global is fine; they offer json as an example where the application imposing global settings on their activities would really mess them up, and you don't seem to be hearing/acknowledging that. -- ~Ethan~

On Mon, 11 Apr 2022 at 09:45, Ethan Furman <ethan@stoneleaf.us> wrote:
Yes. If there are libraries that need to be completely independent of the application, they won't be able to take advantage of the new syntax.
I'm hearing it, I'm just not seeing the parallel. Remember: Nobody is ever saying that existing unit libraries have to go away. This is a proposal for a syntax that will allow for a more convenient way of writing them. So far, I've yet to see anything more than a 100% hypothetical "what if multiple libraries do things" concern, and at no point has ANYONE ever shown why the module level is the correct scope. Comparing with Decimal contexts shows that sometimes we need broader, and sometimes narrower, scope than the module, and that the module is simply not sufficient. Why bind unit definitions to the module if it's not even going to be useful? Having never personally used multiple JSON libraries at the same time, I have no idea what sort of global settings would get in their way, but I have most certainly used Decimal contexts, and I generally assume and expect that calls to library functions will be affected by changes to the context. If I do that change with decimal.getcontext().prec = N once at the top of the program, I fully expect that it will affect every module. Is this considered to be a bad thing? If not, why is it bad for units? ChrisA

On Sat, Apr 09, 2022 at 02:52:50AM +1000, Chris Angelico wrote:
You might not be, but those of us who use it, or *would* use it if it wasn't so dangerous, think differently. In any case, the idea that *units of measurement* (of which there are over 3000 in the "units" program, and an unbounded number of combinations of such) are in any way comparible to the single "current working directory" is foolish. Units are *values* that are used in calculations, not application wide settings. The idea that libraries shouldn't use their own units is as silly as the idea that libraries shouldn't use their own variables. Units are not classes, but they are sort of like them. You wouldn't insist on a single, interpreter wide database of classes, or claim that "libraries shouldn't create their own classes". -- Steve

On Sun, Apr 10, 2022 at 7:25 PM Chris Angelico <rosuav@gmail.com> wrote:
I don't think it helps to be making this parallel -- there is one "current working dir" at once -- that's how it's defined, it's by definition global, and good or bad, that's what it is. But I would say in my library code I NEVER use the current working dir - and certainly don't change it. If a path is passed in, then it's either relative or absolute, and that's OK -- it's in control of the user of the library (the application) -- not the library itself. I have seen code that caches the workingdir, change it, then puts it back -- but that's very much not thread safe, and I'd only recommend it maybe in tests. Back to units and Decimal -- I'm not necessarily advocating module scope -- that is a bit tricky, but in general, I have for the principle that the person writing the code (library author, say) can know and control what's going to happen: If my library converts from lbs to kg -- that had better work, and work the same way regardless of what the application author thinks is allowed, and regardless of what other libraries might need either. For Decimal -- if someone has written an accounting library -- it has damn better well use the proper precision and rounding rules for the kind of accounting it's doing. And it sure as heck shouldn't change because someone writing an application has a different need for Decimals. If all this means that it's impossible to have built-in syntax, then fine -- we shouldn't' have built in syntax -- it would only be useful for one-file scripts and the like, and there are other options for that -- like preprocessors.
I sure do -- have I not made this clear?
Libraries should do things that are affected by application configs, when they are working at the behest of the application.
When a library is doing its own thing, it should be independent of such configs.
Exactly -- the problem is that I can't imagine a single case where i'd want the behavior of any of my libraries to be altered by the "application" -- so I'd never turn that on, so i dont think there's any point in global context -- and it's dangerous if using the global context is default and easier to do -- folks will use them, and their code will break when it's used in a different context. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Sun, Apr 10, 2022 at 10:16:04PM -0700, Christopher Barker wrote:
If you google for it, there are about a million recipes and blog posts etc for changing the current working directory in a context manager, and starting with 3.11 it will be available in the stdlib: https://docs.python.org/3.11/library/contextlib.html#contextlib.chdir -- Steve

On Mon, Apr 11, 2022 at 12:21:41PM +1000, Chris Angelico wrote:
You know how every OS process has its own working directory? Just like that, except every module. Its probably too hard to implement in Python, at least for the benefit. (Lots of effort, only a small benefit, nett negative worth.) Especially since we would probably want the WD to use dynamic scoping, not lexical scoping. This is not a PEP proposing per-module WDs, not even a serious proposal for it. "One WD per process" is baked so deep into file I/O on Posix systems (and I presume Windows) that its probably impossible to implement in current systems. -- Steve

On Mon, 11 Apr 2022 at 19:30, Steven D'Aprano <steve@pearwood.info> wrote:
That's a really lovely theory. The problem is, it doesn't work that way. Every process is spawned in the working directory of its parent (modulo deliberate changes), and thereafter is completely independent. If one process sends a signal to another process, they have independent working directories. That doesn't make sense with modules, since they constantly call back and forth to each other. Imagine: import subprocess import os os.change_local_dir(...) What's the working directory of the subprocess module? Is it independent of the calling module? If so, what's the point of even HAVING a per-module working directory, since no Python code can ever directly open a file - it always calls a function in another module?
Its probably too hard to implement in Python, at least for the benefit. (Lots of effort, only a small benefit, nett negative worth.)
Massive negative worth.
The context manager changes the entire process's WD for a section of code. This makes sense, although it has its own consequences. Per-module *simply does not work*, nor does it make any sense. The module-scope hammer does not fit every nail. Stop trying to hammer in screws. ChrisA

The context manager changes the entire process's WD for a section of code. This makes sense, although it has its own consequences.
Actually, now that you say that— I think it makes my point: the fact that this context manager is necessary, and “has consequences” is because the working dir is global — not a good choice. The module-scope hammer does not fit every nail. Stop trying to hammer
in screws.
I don’t know about anyone else, but I’m not arguing for module scope. I’m arguing against implicit global configuration. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Mon, Apr 11, 2022 at 11:10 AM Chris Angelico <rosuav@gmail.com> wrote:
I'm agreeing with namespaces as well -- which I think is different than the idea of module scope for impict contexts Then we are using names, and can use all the rules form managing the cope of teh names: To use the example of one proposed unit syntax: distance = 500[miles] "miles" would need to be a valid name accessible to that scope -- the writer of that code can choose exactly what that name is, likely by some sort of import: from my_unit_system import miles as oped what I understand was being proposed via a "global registry", which is that code: distance = 500[miles] would work even if there were no name "miles" in that namespace(s) -- and it would, instead go look for it in the global registry -- which could have been manipulated by the application to mean nautical miles, or statute miles, or whatever. And THAT I think is a bad idea. What I'm not suggesting, because I think it wouldn't be that helpful, and maybe not possible would be to have something like: set_units_registry_to(my_units system) and then have: distance = 500[miles] use my_units_system's definition of miles in that module without having explicitly imported the name. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Tue, 12 Apr 2022 at 05:14, Christopher Barker <pythonchb@gmail.com> wrote:
It's a good thing we don't have a mutable builtins module, then. Oh right. :)
The trouble is, you now force everyone to do bulk imports - almost certainly star imports. How are you going to populate the namespace appropriately if not with a star import? What happens if you don't want the entire contents of the module? Having a registry means you can get finer granularity with registration functions, without a massively complex module and submodule system, or heaps of manual imports. ChrisA

On Mon, Apr 11, 2022 at 12:22 PM Chris Angelico <rosuav@gmail.com> wrote:
Python is a highly dynamic, you can monkey patch the heck out of almost anything -- you surely don't think is a good programming practice to alter the behavior of builtins at the "program" level ?
1) if you really want really easy to write units, then user, use star imports. But most code is going to use what, a half a dozen (or less) units? is it so bad to write: from my_unit_system import miles, ft, kg, lbs, gallons or: import my_unit_sytem as U and go from there -- frankly, it's how most of PYthon already works. If you want a quick and easy uit-aware calculator -- then write an application or pre-processor for the code.
if there were a way to use a module level registry that sure -- I'm not sure that's possible or easy, but please don't make it global so that when I write code in a library, I have no idea what context I'll be working in. But honestly, I don't think I like the idea -- but no one has actually fleshed out exactly how it would work -- so maybe I would like the actual proposal -- who knows? But I would hope that if anyone does come up with a proposal, they will address the core issue I'm harping on here: when I write code that may be run in the context of someone else's "application" (or my own, two years later :-) ) -- I want to know exactly what the unit calculations will mean, and that they won't be messed with at run time by a standard recommended practice. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On 4/11/22 11:06, Chris Angelico wrote:
Steven is, as are a few who have agreed that namespaces are the One True Way™ to do things.
That seems a grossly unfair characterization of those who don't agree with you. I think everyone should take a break from this thread -- it is apparent that no one is convincing any one else, so the final decision will be by the SC (assuming a PEP is ever made). -- ~Ethan~

On 4/11/22, Steven D'Aprano <steve@pearwood.info> wrote:
You know how every OS process has its own working directory? Just like that, except every module.
A per-thread working directory makes more sense to me. But it would be a lot of work to implement support for this in the os and io modules, for very little gain.
Windows has up to 27 working directories per process. There's the overall working directory directory, plus one for each drive.

On Mon, Apr 11, 2022 at 11:53:18AM -0500, Eryk Sun wrote:
Hmmm, yes, that does seem sensible.
Sure.
Today I learned something new, thank you. How does that work in practice? In Windows, if you just say the equivalent to `open('spam')`, how does the OS know which drive and WD to use? -- Steve

On 4/11/22, Steven D'Aprano <steve@pearwood.info> wrote:
"spam" is resolved against the process working directory, which could be a UNC path instead of a drive. OTOH, "Z:spam" is relative to the working directory on drive "Z:". If the latter is r"Z:\foo\bar", then "Z:spam" resolves to r"Z:\foo\bar\spam". The working directory on a drive gets set via os.chdir() when the process working directory is set to a path on the drive. It's implemented via reserved environment variables with names that begin with "=", such as "=Z:" set to r"Z:\foo\bar". Python's os.environ doesn't support getting or setting these variables, but WinAPI GetEnvironmentVariableW() and SetEnvironmentVariableW() do.

On Tue, 12 Apr 2022 at 03:49, Steven D'Aprano <steve@pearwood.info> wrote:
It uses the "default drive" + "current directory on that drive". If you say `open("c:spam")`, Windows uses "drive C" + "current directory on drive C". If you say `open("/spam")`, Windows uses "default drive" + "explicit directory". Hence there are 26 current directories (one per drive), plus the selection of current drive, which effectively chooses your current directory. ChrisA

On 4/11/22, Chris Angelico <rosuav@gmail.com> wrote:
If you say `open("/spam")`, Windows uses "default drive" + "explicit directory".
You can think of a default drive as being the drive of the current working directory, but there is no "default drive" per se that's stored separate from the working directory. Python and most other filesystem libraries generalize a UNC "\\server\share" path as a 'drive', in addition to drive-letter drives such as "Z:". However, the working directory is only remembered separately from the process working directory in the case of drive-letter drives, not UNC shares. If the working directory is r"\\server\share\foo\bar", then r"\spam" resolves to r"\\server\share\spam". If the working directory is r"\\server\share\foo\bar", then "spam" resolves to r"\\server\share\foo\bar\spam". However, the system will actually access this path relative to an open handle for the working directory. A handle for the process working directory is always kept open and thus protected from being renamed or deleted. Per-drive working directories are not kept open. They're just stored as path names in reserved environment variables.
If the process working directory is a DOS drive path, then 26 working directories are possible. If the process working directory is a UNC path, then 27 working directories are possible.

On Tue, 12 Apr 2022 at 04:46, Eryk Sun <eryksun@gmail.com> wrote:
Ah, fair. I described it that way because I thought that it was equivalent, but I stand corrected.
Which raises the question: what if the current directory no longer has a path name? Or is that simply not possible on Windows? I know that on Linux, I can unlink a directory while being in it (which creates interesting problems for bash, git, and other tools, but not fundamental ones for the process model itself), and many references WILL be resolved correctly. Or I can move the directory to another parent, and "pwd" says the original name (because that's a shell feature), but "ls .." will list the contents of the new directory.
Yup yup. This is what I get for oversimplifying and forgetting that UNC names are paths/drives too. (Don't even get me started on prefixing paths with \\?\ and what that changes. Windows has bizarre backward compatibility constraints.) ChrisA

On 4/11/22, Chris Angelico <rosuav@gmail.com> wrote:
Which raises the question: what if the current directory no longer has a path name? Or is that simply not possible on Windows?
The process working directory is opened without FILE_SHARE_DELETE sharing. This prevents opening the directory with DELETE access from any security context in user mode, even by the SYSTEM account. If the handle for the working directory is forcefully closed (e.g. via Process Explorer) and the directory is deleted, then accessing a relative path in the affected process fails with ERROR_INVALID_HANDLE (6) until the working directory is changed to a valid directory.
(Don't even get me started on prefixing paths with \\?\ and what that changes. Windows has bizarre backward compatibility constraints.)
Paths prefixed by \\?\ or \\.\ are not supported for the process working directory and should not be used in this case. The Windows API is buggy if the working directory is set to a prefixed path. For example, it fails to identify a drive such as r"\\?\C:" or r"\\?\UNC\server\share" in the working directory, in which case a rooted path such as r"\spam" can't be accessed.

I’ve only scanned this thread and may have missed an explanation for this, but why must there be a global registry? Explicit imports, possibly with a new dunder protocol for literal convertors feels like a better fit to the language to me. This also avoids the problem with a global registry you mention earlier. For example: : # si_units.py: : : class MeterConvertor: : def __literal__(self, value): : … : return converted : : m = MeterConvertor() : : : # script.py: : : from si_units import m : : length = 10_m # calls m.__literal__(…) That’s assuming this has to be in the language in the first place, I don’t have a strong opinion about that because I don’t have a use case for this feature myself. Ronald — Twitter / micro.blog: @ronaldoussoren Blog: https://blog.ronaldoussoren.net/

On Sun, 10 Apr 2022 at 18:44, Ronald Oussoren via Python-ideas <python-ideas@python.org> wrote:
That's precisely what Steven is arguing for, and I'm arguing against. Primarily because YAGNI, but also for a more fundamental reason: the registry of units is not simply a definition of name -> value, which a namespace is capable of (but I think is wrong for this situation), but it's also a definition of which units can be converted into which. That would affect how two objects interact with each other - can you add meters and miles? Can you treat the electron-volt as a unit of mass? All these sorts of questions belong to the application, and there's no useful way to divide them on module boundaries, much less other scopes. There's no point trying to make this scoped, just as we don't have lots of other things scoped (like sys.modules). Shared state across the application is a *good* thing, not a bad one. ChrisA

On Sun, Apr 10, 2022 at 2:37 AM Chris Angelico <rosuav@gmail.com> wrote:
Oh my god, no! I guess we do disagree -- if someone's using one of my packages that make use of unit conversions, I absolutely do not want its behavior to change because of the application developer's different ideas about what units mean. Or even worse, some other third party library changing the behavior of my lib, and the app developer having no idea that they are incompatible.
That could be true, if you are trying to build it into syntax -- which to me is an argument for not having it a language feature. There's no point trying to make this scoped, just as we don't have
lots of other things scoped (like sys.modules). Shared state across the application is a *good* thing, not a bad one.
Not in this case, it isn't (IMHO :-; ) -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

The registry is a mapping from names to objects implementing some kind of interface, and modules are a good solution for that.
Why not, that’s what we do for all other types and interfaces? Compatibility and interaction should IMHO be part of the implementation of a set of cooperating types, and that’s orthogonal to how you get access to those types.
I’m pretty sure singletons are considered to be a code smell in general ;-) But as I wrote earlier, I don’t have a use case for using this feature myself. The closest I get are tuples with the same shape but different interpretation, such as various color representations. For those using regular class syntax is obviously a better choice than using tuples (e.g. RBG(1,2,3)). The few times I need to deal with units I can either ignore them, or use a normal type (e.g. meters(1)). Ronald
— Twitter / micro.blog: @ronaldoussoren Blog: https://blog.ronaldoussoren.net/

Brian McCall writes:
According to you. I would like dollars and shares to be supported as well as perhaps "kg of apples" (vs "kg of oranges"). What is a unit? It is an attribute of a quantity that says how it may be combined with other quantities to produce more quantities. In some situations (determining whether I should have "driven the backroads so I wouldn't get weighed") I'm happy to ignore the commodities and add up the kg, but in others (making out the invoice) I don't want to multiply price of one commodity by the quantity of another. I'm sure there are other applications besides accounting and physics, and each will have its own idiosyncratic constraints on combining quantities, ie, its own units.
This is a job for ... Superlibrary! IOW, I'm with David. There may be a justification for supporting custom literals in the language, but really, in this it's just syntactic sugar for constructing an object of a derived class (ie, a number, unit couple) and should be designed that way. I think it's probably way overkill to support quantities (ie, numbers with units) as a primitive type in the language. For many computations, a library will be fast enough. For iterative computation or large data sets, I would guess it's very rare to want to check compatibility of units on every operation. Rather, for function calls you want to treat them like types of the actual arguments, and for vectorized computations, normally the whole vector will have the same units. And even if you did treat quantity as a primitive type, few existing programs are likely to ported to use the feature (and if the guts are some ancient Fortran library, they can't be, you'll have to strip the type information off the numbers before operating on them anyway).

On Mon, Apr 04, 2022 at 02:47:48PM -0000, Brian McCall wrote:
Every problem can be bounded by the amount of matter and energy in the universe :-) More practically, the problem is bounded by the number of addressable memory locations (2^64) and more practically still, by the amount of memory you actually have. Presumably there is only a finite number of named measurement units which have ever been used in history, maybe a few thousand or so. A few days ago I pointed out that the Unix "units" program listed 2000+ units. I upgraded to a more recent version, and it now has over 3000: [steve ~]$ units Currency exchange rates from FloatRates (USD base) on 2018-10-20 3070 units, 109 prefixes, 109 nonlinear units If the implementation had some reasonable hard limit like 20,000 named units, I wouldn't complain. But why? The obvious mapping from unit names to values is to use a dict, which is limited only by memory.
If you use the geometrized unit system, you need only one base unit, the metre. Everything can be written as a small power of length. But for a more practical system, I count a minimum of 12 base dimensions: * length * mass * time * electric current * thermodynamic temperature * luminous intensity * amount of substance * amount of information * currency * plane angle * solid angle * other dimensionless (the name "uno" was proposed in 2003) Some of these are strictly dimensionless in terms of the standard physics dimensional quantities, but we really don't want to be able to add 30 degrees to 1 yoctomol and get 1.1 radians. (Reasonable people may disagree. So may unreasonable people.) But as discussed in other parts of this loooong thread, there are other dimensionless quantities which we may wish to treat as distinct. Ratios? Then there are at least three types of non-linear units: - temperatures - log scales such as bels (decibels) and moment magnitude scale for earthquakes - piecewise linear (wire gauges etc) and lets not forget prefixes. -- Steve
participants (14)
-
Alexandre Brault
-
Brendan Barnwell
-
Brian McCall
-
Chris Angelico
-
Christopher Barker
-
Eryk Sun
-
Ethan Furman
-
Greg Ewing
-
MRAB
-
Ricky Teachey
-
Ronald Oussoren
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Will Bradley