Whoops, Nick already did the micro-benchmarks, and showed that creating a function object is faster than instantiating a class. He also measured the size, but I think he forgot that sys.getsizeof() doesn't report the size (recursively) of contained objects -- a class instance references a dict which is another 288 bytes (though if you care you can get rid of this by using __slots__). I expect that calling an instance using __call__ is also slower than calling a function (but you can do your own benchmarks :-).

On Sat, Jan 2, 2016 at 8:39 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 3 January 2016 at 13:00, u8y7541 The Awesome Person
<surya.subbarao1@gmail.com> wrote:
>> The wrapper functions themselves, though, exist in a one:one
>> correspondence with the functions they're applied to - when you apply
>> functools.lru_cache to a function, the transient decorator produced by
>> the decorator factory only lasts as long as the execution of the
>> function definition, but the wrapper function lasts for as long as the
>> wrapped function does, and gets invoked every time that function is
>> called (and if a function is performance critical enough for the
>> results to be worth caching, then it's likely performance critical
>> enough to be thinking about micro-optimisations). (Nick Coghlan)
> Yes, that is what I was thinking of. Just like Quake's fast inverse square
> root. Even though it is a micro-optimization, it greatly affects how fast
> the game runs.

For Python, much bigger performance pay-offs are available without
changing the code by adopting tools like PyPy, Cython and Numba.
Worrying about micro-optimisations like this usually only makes sense
if a profiler has identified the code as a hotspot for your particular
workload (and sometimes not even then).

>> But, as I explained, the function will _not_ be redefined and trashed
>> every frame; it will be created one time. (Andrew Barnert)
> Hmm... Nick says different...
>> This all suggests that if your application is severely memory
>> constrained (e.g. it's running on an embedded interpreter like
>> MicroPython), then it *might* make sense to incur the extra complexity
>> of using classes with a custom __call__ method to define wrapper
>> functions, over just using a nested function. (Nick Coghlan)

The memory difference is only per function defined using the wrapper,
not per call. The second speed difference I described (how long the
CALL_FUNCTION opcode takes) is per call, and there native functions
are the clear winner (followed by bound methods, and custom callable
objects a relatively distant third).

The other thing to keep in mind is that the examples I showed were
focused specifically on measuring the differences in overhead, so the
function bodies don't actually do anything, and the class instances
didn't contain any state of their own. Adding even a single
instance/closure variable is likely to swamp the differences in memory
consumption between a native function and a class instance.


Nick Coghlan   |   ncoghlan@gmail.com   |   Brisbane, Australia
Python-ideas mailing list
Code of Conduct: http://python.org/psf/codeofconduct/

--Guido van Rossum (python.org/~guido)