
On Fri, May 28, 2021 at 5:19 AM Brendan Barnwell brenbarn@brenbarn.net wrote:
On 2021-05-27 05:18, Steven D'Aprano wrote:
On Thu, May 27, 2021 at 07:56:16AM -0000, Shreyan Avigyan wrote:
Lot of programming languages have something known as static variable storage in *functions* not *classes*. Static variable storage means a variable limited to a function yet the data it points to persists until the end of the program.
+1 on this idea.
One common use for function defaults is to optimize function lookups to local variables instead of global or builtins:
def func(arg, len=len): # now len is a fast local lookup instead of a slow name lookup
Benchmarking shows that this actually does make a significant difference to performance, but it's a technique under-used because of the horribleness of a len=len parameter.
(Raymond Hettinger is, I think, a proponent of this optimization trick. At least I learned it from his code.)
I don't see this as a great motivation for this feature. If the goal
is to make things faster I think that would be better served by making the interpreter smarter or adding other global-level optimizations. As it is, you're just trading one "manual" optimization (len=len) for another (static len).
Yes, the new one is perhaps slightly less ugly, but it still puts the
onus on the user to manually "declare" variables as local, not because they are semantically local in any way, but just because we want a faster lookup. I see that as still a hack. A non-hack would be some kind of JIT or optimizing interpreter that actually reasons about how the variables are used so that the programmer doesn't have to waste time worrying about hand-tuning optimizations like this.
If you're doing a lot of length checks, the standard semantics of Python demand that the name 'len' be looked up every time it's called. That's expensive - first you check the module globals, then you check the builtins. With some sort of local reference, the semantics change: now the name 'len' is looked up once, and the result is cached. That means that creating globals()["len"] in the middle of the function will no longer affect its behaviour.
An optimizing compiler that did this would be a nightmare. Explicitly choosing which names to retain means that the programmer is in control.
The biggest problem with the default argument trick is that it makes it look as if those arguments are part of the function's API, where they're really just an optimization. Consider:
def shuffle(things, *, randrange=random.randrange): ...
def merge_shortest(things, *, len=len): ...
Is it reasonable to pass a different randrange function to shuffle()? Absolutely! You might have a dedicated random.Random instance (maybe a seeded PRNG for reproducible results). Is it reasonable to pass a different len function to merge_shortest()? Probably not - it looks like it's probably an optimization. Yes, you could say "_len=len", but now your optimization infects the entire body of the function, instead of being a simple change in the function header.
With statics, you could write it like this:
def merge_shortest(things): static len=len ...
Simple. Easy. Reliable. (And this usage would work with pretty much any of the defined semantics.) There's no more confusion.
So basically for me anything that involves the programmer saying "Please make this part faster" is a hack. :-) We all want everything to be as fast as possible all the time, and in sofar as we're concerned about speed we should focus on making the entire interpreter smarter so everything is faster, rather than adding new ways for the programmer do extra work to make just a few things faster.
It's never a bad thing to make the interpreter smarter and faster, if it can be done without semantic changes. (Mark Shannon has some current plans that, I believe, fit that description.) This is different, though - the behaviour WILL change, so it MUST be under programmer control.
Even something like a way of specifying constants (which has been
proposed in another thread) would be better to my eye. That would let certain variables be marked as "safe" so that they could always be looked up fast because we'd be sure they're never going to change.
Question: When does this constant get looked up?
def merge_shortest(things): constant len=len ...
Is it looked up as the function begins execution, or when the function is defined? How much are you going to assume that it won't change?
As to the original proposal, I'm not in favor of it. It's fairly
uncommon for me to want to do this, and in the cases where I do, Python classes are simple enough that I can just make a class with a method (or a __call__ if I want to be really cool) that stores the data in a way that's more transparent and more clearly connected to the normal ways of storing state in Python. It just isn't worth adding yet another complexity to the language for this minor use case.
Yes, static state can always be implemented with a class or closure instead. You may notice that the optimization technique still exists, and for very good reason :) Plus, it's often just unnecessary overhead to lay out your code that way. It should be trivially easy to convert something from a global to a function-scoped static, but reworking it to a closure/class is an actual refactor with notable changes.
Python could have been defined with no classes. Instead, we could have all just used factory functions, with a bunch of local variables in the constructor for private state, and a bunch of things packaged up into a dict as the public API. Why do we have classes? Because they are better at expressing the things we need to express.
ChrisA