On Thu, Sep 5, 2019 at 9:31 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Sep 05, 2019 at 07:15:27PM +1000, Chris Angelico wrote:
Hang on hang on.... what's this situation where you're checking types of a zillion objects?
An earlier version of the statistics module used lots of isinstance checks in order to support arbitrary numeric types, and was a lot slower. The current version avoids most of those at the cost of being a lot less clear and elegant, but it improved performance somewhat.
That's fair, although if someone's concerned about squeaking the performance as hard as possible, they'll probably be using numpy.
On my PC, an isinstance check against a single concrete type (not an ABC) is about three times as expensive as a float arithmetic operation, so in a tight loop it may not be an insignificant cost.
How often do you check against a union type (or, in current code, against a tuple of types)? This proposal wouldn't affect anything that checks against a single type.
I think there's a bigger problem there than whether isinstance(x, int|str) is slower than isinstance(x, (int,str)) ! Even if this change DOES have a measurable impact on the time to do those checks, it only applies to unions, and if that's a notable proportion of your total run time, maybe there's a better way to architect this.
Maybe... but in my experience, only at the cost of writing quite ugly code.
Perhaps. But for this to matter, you would need: 1) some sort of complicated dispatch handler that has to handle subclasses (so you can't just look up type(x) in a dict) 2) handling of multiple types the same way (so you want to do union isinstances rather than each one being done individually) 3) little enough other code that a performance regression in isinstance makes a measurable difference 4) clean code that you don't want to disrupt for the sake of performance Seems like a fairly uncommon case to me. Maybe I'm wrong.
But having said all that, I'm not sure that we should be rejecting this proposal on the basis of performance when we haven't got any working code to measure performance of :)
Definitely. I don't think performance should be a major consideration until code cleanliness has been proven or disproven.
isinstance is a wrapper around PyObject_IsInstance(obj, class_or_tuple), and if I'm reading the C code correctly, PyObject_IsInstance is roughly equivalent to this Python pseudo-code:
# Except in C, not Python def isinstance(obj, class_or_tuple): if type(class_or_tuple) is tuple: for C in class_or_tuple: if isinstance(obj, C): return True else: ...
If Union is a built-in, we could have something like this:
def isinstance(obj, class_or_tuple): if type(class_or_tuple) is Union: class_or_tuple = class_or_tuple.__union_params__ # followed by the same code as above
typing.Union already defines .__union_params__ which returns a tuple of the classes used to construct the union, so in principle at least, there need be no significant performance hit from supporting Unions.
That seems like a pretty good optimization! ChrisA