Bad programming style in decorators?
In most decorator tutorials, it's taught using functions inside functions. Isn't this inefficient because every time the decorator is called, you're redefining the function, which takes up much more memory? I prefer defining decorators as classes with __call__ overridden. Is there a reason why decorators are taught with functions inside functions? -- -Surya Subbarao
On Jan 2, 2016, at 10:14, u8y7541 The Awesome Person <surya.subbarao1@gmail.com> wrote:
In most decorator tutorials, it's taught using functions inside functions. Isn't this inefficient because every time the decorator is called, you're redefining the function, which takes up much more memory?
No. First, most decorators are only called once. For example: @lru_cache(maxsize=None) def fib(n) if n < 2: return n return fib(n-1) + fib(n-2) The lru_cache function gets called once, and creates and returns a decorator function. That decorator function is passed fib, and creates and returns a wrapper function. That wrapper function is what gets stored in the globals as fib. It may then get called a zillion times, but it doesn't create or call any new function. So, why should you care whether lru_cache is implemented with a function or a class? You're talking about a difference of a few dozen bytes, once in the entire lifetime of your program. Plus, where do you get the idea that a function object is "much larger"? Each new function that gets built used the same code object, globals dict, etc., so you're only paying for the cost of a function object header, plus a tuple of cell objects (pointers) for any state variables. Your alternative is to create a class instance header (maybe a little smaller than a function object header), and store all those state variables in a dict (33-50% bigger even with the new split-instance-dict optimizations). Anyway, I'm willing to bet that in this case, the function is ~256 bytes while the class is ~1024, so you're actually wasting rather than saving memory. But either way, it's far too little memory to care.
I prefer defining decorators as classes with __call__ overridden. Is there a reason why decorators are taught with functions inside functions?
Because most decorators are more concise, more readable, and easier to understand that way. And that's far more important than a micro-optimization which may actually be a pessimization but which even more likely isn't going to matter at all.
On 3 January 2016 at 04:56, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
On Jan 2, 2016, at 10:14, u8y7541 The Awesome Person <surya.subbarao1@gmail.com> wrote:
In most decorator tutorials, it's taught using functions inside functions. Isn't this inefficient because every time the decorator is called, you're redefining the function, which takes up much more memory?
No.
First, most decorators are only called once. For example:
@lru_cache(maxsize=None) def fib(n) if n < 2: return n return fib(n-1) + fib(n-2)
The lru_cache function gets called once, and creates and returns a decorator function. That decorator function is passed fib, and creates and returns a wrapper function. That wrapper function is what gets stored in the globals as fib. It may then get called a zillion times, but it doesn't create or call any new function.
So, why should you care whether lru_cache is implemented with a function or a class? You're talking about a difference of a few dozen bytes, once in the entire lifetime of your program.
We need to make a slight terminology clarification here, as the answer to Surya's question changes depend on whether we're talking about implementing wrapper functions inside decorators (like the "_lru_cache_wrapper" that lru_cache wraps around the passed in callable), or about implementing decorators inside decorator factories (like the transient "decorating_function" that lru_cache uses to apply the wrapper to the function being defined). Most of the time that distinction isn't important, so folks use the more informal approach of using "decorator" to refer to both decorators and decorator factories, but this is a situation where the difference matters. Every decorator factor does roughly the same thing: when called, it produces a new instance of a callable type which accepts a single function as its sole argument. From the perspective of the *user* of the decorator factory, it doesn't matter whether internally that's handled using a def statement, a lambda expression, functools.partial, instantiating a class that defines a custom __call__ method, or some other technique. It's also rare for decorator factories to be invoked in code that's a performance bottleneck, so it's generally more important to optimise for readability and maintainability when writing them than it is to optimise for speed. The wrapper functions themselves, though, exist in a one:one correspondence with the functions they're applied to - when you apply functools.lru_cache to a function, the transient decorator produced by the decorator factory only lasts as long as the execution of the function definition, but the wrapper function lasts for as long as the wrapped function does, and gets invoked every time that function is called (and if a function is performance critical enough for the results to be worth caching, then it's likely performance critical enough to be thinking about micro-optimisations). As such, from a micro-optimisation perspective, it's reasonable to want to know the answers to: * Which is faster, defining a new function object, or instantiating an existing class? * Which is faster, calling a function object that accepts a single parameter, or calling a class with a custom __call__ method? * Which uses more memory, defining a new function object, or instantiating an existing class? The answers to these questions can technically vary by implementation, but in practice, CPython's likely to be representative of their *relative* performance for any given implementation, so we can use it to check whether or not our intuitions about relative speed and memory consumption are correct. For the first question then, here are the numbers I get locally for CPython 3.4: $ python3 -m timeit "def f(): pass" 10000000 loops, best of 3: 0.0744 usec per loop $ python3 -m timeit -s "class C: pass" "c = C()" 10000000 loops, best of 3: 0.113 usec per loop The trick here is to realise that *at runtime*, a def statement is really just instantiating a new instance of types.FunctionType - most of the heavy lifting has already been done at compile time. The reason it manages to be faster than typical class instantiation is because we get to use customised bytecode operating on constant values rather than having to look the class up by name and making a standard function call:
dis.dis("def f(): pass") 1 0 LOAD_CONST 0 (<code object f at 0x7fe875aff0c0, file "<dis>", line 1>) 3 LOAD_CONST 1 ('f') 6 MAKE_FUNCTION 0 9 STORE_NAME 0 (f) 12 LOAD_CONST 2 (None) 15 RETURN_VALUE dis.dis("c = C()") 1 0 LOAD_NAME 0 (C) 3 CALL_FUNCTION 0 (0 positional, 0 keyword pair) 6 STORE_NAME 1 (c) 9 LOAD_CONST 0 (None) 12 RETURN_VALUE
For the second question: $ python3 -m timeit -s "def f(arg): pass" "f(None)" 10000000 loops, best of 3: 0.111 usec per loop [ncoghlan@thechalk ~]$ python3 -m timeit -s "class C:" -s " def __call__(self, arg): pass" -s "c = C()" "c(None)" 1000000 loops, best of 3: 0.232 usec per loop Again, we see that the native function outperforms the class with a custom __call__ method. There's no difference in the bytecode this time, but rather a difference in what happens inside the CALL_FUNCTION opcode: for the second case, we first have to retrieve the bound c.__call__() method, and then call *that* as c.__call__(None), which in turn internally calls C.__call__(c, None), while for the native function case we get to skip straight to running the called function. The speed difference can be significantly reduced (but not entirely eliminated), by caching the bound method during setup: $ python3 -m timeit -s "class C:" -s " def __call__(self, arg): pass" -s "c_call = C().__call__" "c_call(None)" 10000000 loops, best of 3: 0.115 usec per loop Finally, we get to the question of relative size: are function instances larger or smaller than your typical class instance? Again, we don't have to guess, we can use the interpreter to experiment and check our assumptions: >>> import sys >>> def f(): pass ... >>> sys.getsizeof(f) 136 >>> class C(): pass ... >>> sys.getsizeof(C()) 56 That's a potentially noticeable difference if we're applying the wrapper often enough - the native function is 80 bytes larger than an empty standard class instance. Looking at the available data attributes on f, we can see the likely causes of the difference: >>> set(dir(f)) - set(dir(C())) {'__code__', '__defaults__', '__name__', '__closure__', '__get__', '__kwdefaults__', '__qualname__', '__annotations__', '__globals__', '__call__'} There are 10 additional attributes there, although 2 of them (__get__ and __call__) relate to methods our native function has defined, but the empty class doesn't. The other 8 represent additional pieces of data stored (or potentially stored) per function, that we don't store for a typical class instance. However, we also need to account for the overhead of defining a new class object, and that's a non-trivial amount of memory when we're talking about a size difference of only 80 bytes per wrapped function: >>> sys.getsizeof(C) 976 That means if a wrapper function is only used a few times in any given run of the program, then native functions will be faster *and* use less memory (at least on CPython). If the wrapper is used more often than that, then native functions will still be the fastest option, but not the lowest memory option. Furthermore, if we decide to cache the bound __call__ method to reduce the speed impact of using a custom __call__ method, we give up most of the memory gains: >>> sys.getsizeof(C().__call__) 64 This all suggests that if your application is severely memory constrained (e.g. it's running on an embedded interpreter like MicroPython), then it *might* make sense to incur the extra complexity of using classes with a custom __call__ method to define wrapper functions, over just using a nested function. For more typical cases though, the difference is going to disappear into the noise, so you're likely to be better off defaulting to using nested function definitions, and only switching to the class based version in cases where it's more readable and maintainable (and in those cases considering whether or not it might make sense to return the bound __call__ method from the decorator, rather than the callable itself). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
The wrapper functions themselves, though, exist in a one:one
correspondence with the functions they're applied to - when you apply functools.lru_cache to a function, the transient decorator produced by the decorator factory only lasts as long as the execution of the function definition, but the wrapper function lasts for as long as the wrapped function does, and gets invoked every time that function is called (and if a function is performance critical enough for the results to be worth caching, then it's likely performance critical enough to be thinking about micro-optimisations). (Nick Coghlan)
Yes, that is what I was thinking of. Just like Quake's fast inverse square root. Even though it is a micro-optimization, it greatly affects how fast the game runs. But, as I explained, the function will _not_ be redefined and trashed every
frame; it will be created one time. (Andrew Barnert)
Hmm... Nick says different... This all suggests that if your application is severely memory
constrained (e.g. it's running on an embedded interpreter like MicroPython), then it *might* make sense to incur the extra complexity of using classes with a custom __call__ method to define wrapper functions, over just using a nested function. (Nick Coghlan)
Yes, I was thinking of that when I started this thread, but this thread is just from my speculation. -- -Surya Subbarao
If you want more discussion please discuss a specific example, showing the decorator code itself (not just the decorator call). The upshot is that creating a function object is pretty efficient and probably more efficient than instantiating a class -- if you don't believe that write a micro-benchmark. On Sat, Jan 2, 2016 at 8:00 PM, u8y7541 The Awesome Person < surya.subbarao1@gmail.com> wrote:
The wrapper functions themselves, though, exist in a one:one
correspondence with the functions they're applied to - when you apply functools.lru_cache to a function, the transient decorator produced by the decorator factory only lasts as long as the execution of the function definition, but the wrapper function lasts for as long as the wrapped function does, and gets invoked every time that function is called (and if a function is performance critical enough for the results to be worth caching, then it's likely performance critical enough to be thinking about micro-optimisations). (Nick Coghlan)
Yes, that is what I was thinking of. Just like Quake's fast inverse square root. Even though it is a micro-optimization, it greatly affects how fast the game runs.
But, as I explained, the function will _not_ be redefined and trashed
every frame; it will be created one time. (Andrew Barnert)
Hmm... Nick says different...
This all suggests that if your application is severely memory
constrained (e.g. it's running on an embedded interpreter like MicroPython), then it *might* make sense to incur the extra complexity of using classes with a custom __call__ method to define wrapper functions, over just using a nested function. (Nick Coghlan)
Yes, I was thinking of that when I started this thread, but this thread is just from my speculation.
-- -Surya Subbarao
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
On 3 January 2016 at 13:00, u8y7541 The Awesome Person <surya.subbarao1@gmail.com> wrote:
The wrapper functions themselves, though, exist in a one:one correspondence with the functions they're applied to - when you apply functools.lru_cache to a function, the transient decorator produced by the decorator factory only lasts as long as the execution of the function definition, but the wrapper function lasts for as long as the wrapped function does, and gets invoked every time that function is called (and if a function is performance critical enough for the results to be worth caching, then it's likely performance critical enough to be thinking about micro-optimisations). (Nick Coghlan)
Yes, that is what I was thinking of. Just like Quake's fast inverse square root. Even though it is a micro-optimization, it greatly affects how fast the game runs.
For Python, much bigger performance pay-offs are available without changing the code by adopting tools like PyPy, Cython and Numba. Worrying about micro-optimisations like this usually only makes sense if a profiler has identified the code as a hotspot for your particular workload (and sometimes not even then).
But, as I explained, the function will _not_ be redefined and trashed every frame; it will be created one time. (Andrew Barnert)
Hmm... Nick says different...
This all suggests that if your application is severely memory constrained (e.g. it's running on an embedded interpreter like MicroPython), then it *might* make sense to incur the extra complexity of using classes with a custom __call__ method to define wrapper functions, over just using a nested function. (Nick Coghlan)
The memory difference is only per function defined using the wrapper, not per call. The second speed difference I described (how long the CALL_FUNCTION opcode takes) is per call, and there native functions are the clear winner (followed by bound methods, and custom callable objects a relatively distant third). The other thing to keep in mind is that the examples I showed were focused specifically on measuring the differences in overhead, so the function bodies don't actually do anything, and the class instances didn't contain any state of their own. Adding even a single instance/closure variable is likely to swamp the differences in memory consumption between a native function and a class instance. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Whoops, Nick already did the micro-benchmarks, and showed that creating a function object is faster than instantiating a class. He also measured the size, but I think he forgot that sys.getsizeof() doesn't report the size (recursively) of contained objects -- a class instance references a dict which is another 288 bytes (though if you care you can get rid of this by using __slots__). I expect that calling an instance using __call__ is also slower than calling a function (but you can do your own benchmarks :-). On Sat, Jan 2, 2016 at 8:39 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 3 January 2016 at 13:00, u8y7541 The Awesome Person <surya.subbarao1@gmail.com> wrote:
The wrapper functions themselves, though, exist in a one:one correspondence with the functions they're applied to - when you apply functools.lru_cache to a function, the transient decorator produced by the decorator factory only lasts as long as the execution of the function definition, but the wrapper function lasts for as long as the wrapped function does, and gets invoked every time that function is called (and if a function is performance critical enough for the results to be worth caching, then it's likely performance critical enough to be thinking about micro-optimisations). (Nick Coghlan)
Yes, that is what I was thinking of. Just like Quake's fast inverse
square
root. Even though it is a micro-optimization, it greatly affects how fast the game runs.
For Python, much bigger performance pay-offs are available without changing the code by adopting tools like PyPy, Cython and Numba. Worrying about micro-optimisations like this usually only makes sense if a profiler has identified the code as a hotspot for your particular workload (and sometimes not even then).
But, as I explained, the function will _not_ be redefined and trashed every frame; it will be created one time. (Andrew Barnert)
Hmm... Nick says different...
This all suggests that if your application is severely memory constrained (e.g. it's running on an embedded interpreter like MicroPython), then it *might* make sense to incur the extra complexity of using classes with a custom __call__ method to define wrapper functions, over just using a nested function. (Nick Coghlan)
The memory difference is only per function defined using the wrapper, not per call. The second speed difference I described (how long the CALL_FUNCTION opcode takes) is per call, and there native functions are the clear winner (followed by bound methods, and custom callable objects a relatively distant third).
The other thing to keep in mind is that the examples I showed were focused specifically on measuring the differences in overhead, so the function bodies don't actually do anything, and the class instances didn't contain any state of their own. Adding even a single instance/closure variable is likely to swamp the differences in memory consumption between a native function and a class instance.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
On 3 January 2016 at 13:48, Guido van Rossum <guido@python.org> wrote:
Whoops, Nick already did the micro-benchmarks, and showed that creating a function object is faster than instantiating a class. He also measured the size, but I think he forgot that sys.getsizeof() doesn't report the size (recursively) of contained objects -- a class instance references a dict which is another 288 bytes (though if you care you can get rid of this by using __slots__).
You're right I forgot to account for that (54 bytes without __slots__ did seem surprisingly small!), but functions also always allocate f.__annotations__ at the moment. Always allocating f.__annotations__ actually puzzled me a bit - did we do that for a specific reason, or did we just not think of setting it to None when it's unused to save space the way we do for other function attributes? (__closure__, __defaults__, etc) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Thanks for explaining the differences tho. I got confused between the decorator and the decorator factory, thinking the decorator had a function inside it. Sorry :) On Sat, Jan 2, 2016 at 10:42 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Whoops, Nick already did the micro-benchmarks, and showed that creating a function object is faster than instantiating a class. He also measured
On 3 January 2016 at 13:48, Guido van Rossum <guido@python.org> wrote: the
size, but I think he forgot that sys.getsizeof() doesn't report the size (recursively) of contained objects -- a class instance references a dict which is another 288 bytes (though if you care you can get rid of this by using __slots__).
You're right I forgot to account for that (54 bytes without __slots__ did seem surprisingly small!), but functions also always allocate f.__annotations__ at the moment.
Always allocating f.__annotations__ actually puzzled me a bit - did we do that for a specific reason, or did we just not think of setting it to None when it's unused to save space the way we do for other function attributes? (__closure__, __defaults__, etc)
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
-- -Surya Subbarao
On Sat, Jan 2, 2016 at 10:42 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Whoops, Nick already did the micro-benchmarks, and showed that creating a function object is faster than instantiating a class. He also measured
On 3 January 2016 at 13:48, Guido van Rossum <guido@python.org> wrote: the
size, but I think he forgot that sys.getsizeof() doesn't report the size (recursively) of contained objects -- a class instance references a dict which is another 288 bytes (though if you care you can get rid of this by using __slots__).
You're right I forgot to account for that (54 bytes without __slots__ did seem surprisingly small!), but functions also always allocate f.__annotations__ at the moment.
Always allocating f.__annotations__ actually puzzled me a bit - did we do that for a specific reason, or did we just not think of setting it to None when it's unused to save space the way we do for other function attributes? (__closure__, __defaults__, etc)
Where do you see that happening? The code in funcobject.c seems to indicate that it's created on demand. (And that's how I remember it always being.) -- --Guido van Rossum (python.org/~guido)
Output from both 3.4.1 and 3.5.0:
def foo(): pass>>> foo.__annotations__{} Probably an oversight. I'm also not a C expert, but func_get_annotations (line 396 and onwards in funcobject.c) explicitely returns a new, empty dict if the function doesn't have any annotations (unlike all the other slots, like __defaults__ or __kwdefaults__, which merely return None if they're not present). From: guido@python.org Date: Mon, 4 Jan 2016 16:26:53 -0800 To: ncoghlan@gmail.com Subject: Re: [Python-ideas] Bad programming style in decorators? CC: python-ideas@python.org; surya.subbarao1@gmail.com
On Sat, Jan 2, 2016 at 10:42 PM, Nick Coghlan <ncoghlan@gmail.com> wrote: On 3 January 2016 at 13:48, Guido van Rossum <guido@python.org> wrote:
Whoops, Nick already did the micro-benchmarks, and showed that creating a
function object is faster than instantiating a class. He also measured the
size, but I think he forgot that sys.getsizeof() doesn't report the size
(recursively) of contained objects -- a class instance references a dict
which is another 288 bytes (though if you care you can get rid of this by
using __slots__).
You're right I forgot to account for that (54 bytes without __slots__ did seem surprisingly small!), but functions also always allocate f.__annotations__ at the moment. Always allocating f.__annotations__ actually puzzled me a bit - did we do that for a specific reason, or did we just not think of setting it to None when it's unused to save space the way we do for other function attributes? (__closure__, __defaults__, etc) Where do you see that happening? The code in funcobject.c seems to indicate that it's created on demand. (And that's how I remember it always being.) -- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Jan 4, 2016, at 19:08, Emanuel Barry <vgr255@live.ca> wrote:
Output from both 3.4.1 and 3.5.0:
def foo(): pass foo.__annotations__ {}
Probably an oversight. I'm also not a C expert, but func_get_annotations (line 396 and onwards in funcobject.c) explicitely returns a new, empty dict if the function doesn't have any annotations (unlike all the other slots, like __defaults__ or __kwdefaults__, which merely return None if they're not present).
But that code implies if you just create a new function object and never check its __annotations__, it's not wasting any space for them. Otherwise, it wouldn't have to check for NULL there. We can't test this _directly_ from Python, but with a bit of ctypes hackery and funcobject.h, we can define a PyFunctionObject(Structure)... or, keeping things a bit more concise but a lot more hacky for the purposes of email: def func(): pass pf = cast(id(func), POINTER(c_voidp)) assert pf[2] == id(func.__code__) assert not pf[12] f.__annotations__ assert pf[12] So it is indeed NULL until you check it, and then it becomes something (an empty dict).
On 5 January 2016 at 10:26, Guido van Rossum <guido@python.org> wrote:
On Sat, Jan 2, 2016 at 10:42 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Always allocating f.__annotations__ actually puzzled me a bit - did we do that for a specific reason, or did we just not think of setting it to None when it's unused to save space the way we do for other function attributes? (__closure__, __defaults__, etc)
Where do you see that happening? The code in funcobject.c seems to indicate that it's created on demand. (And that's how I remember it always being.)
I didn't check the code, only the behaviour, so I missed that querying f.__annotations__ was implicitly creating the dictionary. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Jan 2, 2016, at 19:00, u8y7541 The Awesome Person <surya.subbarao1@gmail.com> wrote:
The wrapper functions themselves, though, exist in a one:one correspondence with the functions they're applied to - when you apply functools.lru_cache to a function, the transient decorator produced by the decorator factory only lasts as long as the execution of the function definition, but the wrapper function lasts for as long as the wrapped function does, and gets invoked every time that function is called (and if a function is performance critical enough for the results to be worth caching, then it's likely performance critical enough to be thinking about micro-optimisations). (Nick Coghlan)
Yes, that is what I was thinking of. Just like Quake's fast inverse square root. Even though it is a micro-optimization, it greatly affects how fast the game runs.
Of course micro-optimizations _can_ matter--when you're optimizing the work done in the inner loop of a program that's CPU-bound, even a few percent can make a difference. But that doesn't mean they _always_ matter. Saving 50ns in some code that runs thousands of times per frames makes a difference; saving 50ns in some code that happens once at startup does not. That's why we have profiling tools: so you can find the bit of your program where you're spending 99% of your time doing something a billion times, and optimize that part. And it also doesn't mean that everything that sounds like it should be lighter is worth doing. You have to actually test it and see. In the typical case where you're replacing one function object with one class object and one instance object, that's actually taking more space, not less.
But, as I explained, the function will _not_ be redefined and trashed every frame; it will be created one time. (Andrew Barnert)
Hmm... Nick says different...
No, Nick doesn't say different. Read it again. The wrapper function lives as long as the wrapped function lives. It doesn't get created anew each time you call it. If you don't understand this, it may help to profile [fib(i) for i in range(10000)]. You'll see that the wrapper function gets called a ton of times, the wrapper function gets called 10000 times, and the factory function (which created wrapper functions) gets called 0 times.
This all suggests that if your application is severely memory constrained (e.g. it's running on an embedded interpreter like MicroPython), then it *might* make sense to incur the extra complexity of using classes with a custom __call__ method to define wrapper functions, over just using a nested function. (Nick Coghlan)
Yes, I was thinking of that when I started this thread, but this thread is just from my speculation.
Nick is saying that there may be some cases where it might make sense to use a class. That doesn't at all support your idea that tutorials should teach using classes instead of functions. In general, using functions will be faster; in the most common case, using functions will use less memory; most importantly, in the vast majority of cases, it won't matter anyway. Maybe a MicroPython tutorial should have a section on how running on a machine with only 4KB changes a lot of the usual tradeoffs, using a decorator as an example. But a tutorial on decorators should show using a function, because it's the simplest, most readable way to do it.
If you don't understand this, it may help to profile [fib(i) for i in range(10000)]. You'll see that the wrapper function gets called a ton of times, the wrapper function gets called 10000 times, and the factory function (which created wrapper functions) gets called 0 times.
Ah, I see now. Thank you. On Sat, Jan 2, 2016 at 7:50 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
On Jan 2, 2016, at 19:00, u8y7541 The Awesome Person < surya.subbarao1@gmail.com> wrote:
The wrapper functions themselves, though, exist in a one:one
correspondence with the functions they're applied to - when you apply functools.lru_cache to a function, the transient decorator produced by the decorator factory only lasts as long as the execution of the function definition, but the wrapper function lasts for as long as the wrapped function does, and gets invoked every time that function is called (and if a function is performance critical enough for the results to be worth caching, then it's likely performance critical enough to be thinking about micro-optimisations). (Nick Coghlan)
Yes, that is what I was thinking of. Just like Quake's fast inverse square root. Even though it is a micro-optimization, it greatly affects how fast the game runs.
Of course micro-optimizations _can_ matter--when you're optimizing the work done in the inner loop of a program that's CPU-bound, even a few percent can make a difference.
But that doesn't mean they _always_ matter. Saving 50ns in some code that runs thousands of times per frames makes a difference; saving 50ns in some code that happens once at startup does not. That's why we have profiling tools: so you can find the bit of your program where you're spending 99% of your time doing something a billion times, and optimize that part.
And it also doesn't mean that everything that sounds like it should be lighter is worth doing. You have to actually test it and see. In the typical case where you're replacing one function object with one class object and one instance object, that's actually taking more space, not less.
But, as I explained, the function will _not_ be redefined and trashed
every frame; it will be created one time. (Andrew Barnert)
Hmm... Nick says different...
No, Nick doesn't say different. Read it again. The wrapper function lives as long as the wrapped function lives. It doesn't get created anew each time you call it.
If you don't understand this, it may help to profile [fib(i) for i in range(10000)]. You'll see that the wrapper function gets called a ton of times, the wrapper function gets called 10000 times, and the factory function (which created wrapper functions) gets called 0 times.
This all suggests that if your application is severely memory
constrained (e.g. it's running on an embedded interpreter like MicroPython), then it *might* make sense to incur the extra complexity of using classes with a custom __call__ method to define wrapper functions, over just using a nested function. (Nick Coghlan)
Yes, I was thinking of that when I started this thread, but this thread is just from my speculation.
Nick is saying that there may be some cases where it might make sense to use a class. That doesn't at all support your idea that tutorials should teach using classes instead of functions. In general, using functions will be faster; in the most common case, using functions will use less memory; most importantly, in the vast majority of cases, it won't matter anyway. Maybe a MicroPython tutorial should have a section on how running on a machine with only 4KB changes a lot of the usual tradeoffs, using a decorator as an example. But a tutorial on decorators should show using a function, because it's the simplest, most readable way to do it.
-- -Surya Subbarao
Nick Coghlan writes:
(and if a function is performance critical enough for the results to be worth caching, then it's likely performance critical enough to be thinking about micro-optimisations).
Maybe. It could be that the "real" implementation is Very Expensive to invoke, and/or that the characteristics of how the function is called change the complexity class of an algorithm that calls it for a cached vs non-cached version.
On 03.01.16 04:42, Nick Coghlan wrote:
Finally, we get to the question of relative size: are function instances larger or smaller than your typical class instance? Again, we don't have to guess, we can use the interpreter to experiment and check our assumptions:
>>> import sys >>> def f(): pass ... >>> sys.getsizeof(f) 136 >>> class C(): pass ... >>> sys.getsizeof(C()) 56
sys.getsizeof() returns only the bare size of the object, not including the size of subobjects. To calculate total size you have to sum sizes of all subobjects recursively. [1] [1] http://bugs.python.org/file31822/gettotalsizeof.py
participants (7)
-
Andrew Barnert
-
Emanuel Barry
-
Guido van Rossum
-
Nick Coghlan
-
Random832
-
Serhiy Storchaka
-
u8y7541 The Awesome Person