Why does __getitem__ slot of builtin call sequence methods first?

The new ndarray object of scipy core (successor to Numeric Python) is a C extension type that has a getitem defined in both the as_mapping and the as_sequence structure. The as_sequence mapping is just so PySequence_GetItem will work correctly. As exposed to Python the ndarray object as a .__getitem__ wrapper method. Why does this wrapper call the sequence getitem instead of the mapping getitem method? Is there anyway to get at a mapping-style __getitem__ method from Python? This looks like a bug to me (which is why I'm posting here...) Thanks for any help or insight. -Travis Oliphant

On 10/1/05, Travis Oliphant <oliphant@ee.byu.edu> wrote:
Hmm... I'm sure the answer is in typeobject.c, but that is one of the more obfuscated parts of Python's guts. I wrote it four years ago and since then I've apparently lost enough brain cells (or migrated them from language implementation to to language design service :) that I don't understand it inside out any more like I did while I was in the midst of it. However, I wonder if the logic isn't such that if you define both sq_item and mp_subscript, __getitem__ calls sq_item; I wonder if by removing sq_item it might call mp_subscript? Worth a try, anyway. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Thanks for the tip. I think I figured out the problem, and it was my misunderstanding of how types inherit in C that was the source of my problem. Basically, Python is doing what you would expect, the mp_item is used for __getitem__ if both mp_item and sq_item are present. However, the addition of these descriptors (and therefore the resolution of any comptetion for __getitem__ calls) is done *before* the inheritance of any slots takes place. The new ndarray object inherits from a "big" array object that doesn't define the sequence and buffer protocols (which have the size limiting int dependencing in their interfaces). The ndarray object has standard tp_as_sequence and tp_as_buffer slots filled. Figuring the array object would inherit its tp_as_mapping protocol from "big" array (which it does just fine), I did not explicitly set that slot in its Type object. Thus, when PyType_Ready was called on the ndarray object, the tp_as_mapping was NULL and so __getitem__ mapped to the sequence-defined version. Later the tp_as_mapping slots were inherited but too late for __getitem__ to be what I expected. The easy fix was to initialize the tp_as_mapping slot before calling PyType_Ready. Hopefully, somebody else searching in the future for an answer to their problem will find this discussion useful. Thanks for your help, -Travis

Travis Oliphant <oliphant@ee.byu.edu> writes:
Oof. That'd do it.
I guess the reason this hasn't come up before is that non-trivial C inheritance is still pretty rare.
Well, it sounds like a bug that should be easy to fix. I can't think of a reason to do slot wrapper generation before slot inheritance, though I wouldn't like to bet more than a beer on not having missed something... Cheers, mwh -- There are two kinds of large software systems: those that evolved from small systems and those that don't work. -- Seen on slashdot.org, then quoted by amk

Guido van Rossum wrote:
As near as I can tell, the C/API documentation is silent on how slots are populated when multiple methods mapping to the same slot are defined by a C object, but this is a quote from the comment describing add_operators() in typeobject.c:
Further, in PyObject_GetItem (in abstract.c), tp_as_mapping->mp_subscript is checked first, with tp_as_sequence->mp_item only being checked if mp_subscript isn't found. Importantly, this is the function invoked by the BINARY_SUBSCR opcode. So, the *intent* certainly appears to be that mp_subscript should be preferred both by the C abstract object API and from normal Python code. *However*, the precedence applied by add_operators() is governed by the slotdefs structure in typeobject.c, which, according to the above comment, is meant to match the order the slots appear in memory in the _typeobject structure in object.h, and favour the mapping methods over the sequence methods. There's actually two serious problems with the description in this comment: Firstly, the two orders don't actually match. In the object layout, the ordering of the abstract object methods is as follows: PyNumberMethods *tp_as_number; PySequenceMethods *tp_as_sequence; PyMappingMethods *tp_as_mapping; But in the slotdefs table, the PySequence and PyMapping slots are listed first, followed by the PyNumber methods. Secondly, in both the object layout and the slotdefs table, the PySequence methods appear *before* the PyMapping methods, which means that tp_as_sequence->sq_item appears as "__getitem__" even though a subscript operation will actually invoke "tp_as_mapping->mp_subscript". In short, I think Travis is right in calling this behaviour a bug. There's a similar problem with the methods that exist in both tp_as_number and tp_as_sequence - the abstract C API and the Python intepreter will favour the tp_as_number methods, but the slot definitions will favour tp_as_sequence. The fix is actually fairly simple: reorder the slotdefs table so that the sequence of slots is "Number, Mapping, Sequence" rather than adhering strictly to the sequence of methods given in the definition of _typeobject. The only objects affected by this change would be C extension objects which define two C-level methods which map to the same Python-level slot name. The observed behavioural change is that the methods accessible via the Python-level slot names would change (either from the Sequence method to the Mapping method, or from the Sequence method to the Number method). Given that the only documentation I can find of the behaviour in that scenario is a comment in typeobject.c, that the implementation doesn't currently match the comment, and that the current implementation means that the methods accessed via the slot names don't match the methods normal Python syntax actually invokes, I find it hard to see how fixing it could cause any signficant problems. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.blogspot.com

Nick Coghlan wrote: [A load of baloney] Scratch everything I said in my last message - init_slotdefs() sorts the slotdefs table correctly, so that the order it is written in the source is irrelevant. Travis found the real answer to his problem. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.blogspot.com
participants (4)
-
Guido van Rossum
-
Michael Hudson
-
Nick Coghlan
-
Travis Oliphant