Using vectorcall for tp_new and tp_init
Hello, I'm starting this thread to brainstorm for using vectorcall to speed up creating instances of Python classes. Currently the following happens when creating an instance of a Python class X using X(.....) and assuming that __new__ and __init__ are Python functions and that the metaclass of X is simply "type": 1. type_call (the tp_call wrapper for type) is invoked with arguments (X, args, kwargs). 2. type_call calls slot_tp_new with arguments (X, args, kwargs). 3. slot_tp_new calls X.__new__, prepending X to the args tuple. A new object obj is returned. 4. type_call calls slot_tp_init with arguments (obj, args, kwargs). 5. slot_tp_init calls type(obj).__init__ method, prepending obj to the args tuple. A new object obj is returned. In the worst case, no less than 6 temporary objects are needed just to pass arguments around: 1. An args tuple and kwargs dict for tp_call 3. An args array with X prepended and a kwnames tuple for __new__ 5. An args array with obj prepended and a kwnames tuple for __init__ This is clearly not as efficient as it could be. An obvious solution would be to introduce variants of tp_new and tp_init using the vectorcall protocol. Assuming PY_VECTORCALL_ARGUMENTS_OFFSET is used, all 6 temporary allocations could be dropped. The implementation could be in the form of two new slots tp_vector_new and tp_vector_init. Since we're just dealing with type slots here (as opposed to offsets in an object structure), this should be easier to implement than PEP 590 itself. Jeroen.
participants (3)
-
Jeroen Demeyer
-
Mark Shannon
-
Terry Reedy