[Python-Dev] Why does getitem slot of builtin call sequence methods first?

Sun Oct 2 05:23:19 CEST 2005

Guido van Rossum wrote:
> Hmm... I'm sure the answer is in typeobject.c, but that is one of the
> more obfuscated parts of Python's guts. I wrote it four years ago and
> since then I've apparently lost enough brain cells (or migrated them
> from language implementation to to language design service :) that I
> don't understand it inside out any more like I did while I was in the
> midst of it.
> 
> However, I wonder if the logic isn't such that if you define both
> sq_item and mp_subscript, __getitem__ calls sq_item; I wonder if by
> removing sq_item it might call mp_subscript? Worth a try, anyway.

As near as I can tell, the C/API documentation is silent on how slots are 
populated when multiple methods mapping to the same slot are defined by a C 
object, but this is a quote from the comment describing add_operators() in 
typeobject.c:
>   In the latter case, the first slotdef entry encoutered wins.  Since
>    slotdef entries are sorted by the offset of the slot in the
>    PyHeapTypeObject, this gives us some control over disambiguating
>    between competing slots: the members of PyHeapTypeObject are listed
>    from most general to least general, so the most general slot is
>    preferred.  In particular, because as_mapping comes before as_sequence,
>    for a type that defines both mp_subscript and sq_item, mp_subscript
>    wins.

Further, in PyObject_GetItem (in abstract.c), tp_as_mapping->mp_subscript is 
checked first, with tp_as_sequence->mp_item only being checked if mp_subscript 
isn't found. Importantly, this is the function invoked by the BINARY_SUBSCR 
opcode.

So, the *intent* certainly appears to be that mp_subscript should be preferred 
both by the C abstract object API and from normal Python code.

*However*, the precedence applied by add_operators() is governed by the 
slotdefs structure in typeobject.c, which, according to the above comment, is 
meant to match the order the slots appear in memory in the _typeobject 
structure in object.h, and favour the mapping methods over the sequence methods.

There's actually two serious problems with the description in this comment:

Firstly, the two orders don't actually match. In the object layout, the 
ordering of the abstract object methods is as follows:
	PyNumberMethods *tp_as_number;
	PySequenceMethods *tp_as_sequence;
	PyMappingMethods *tp_as_mapping;

But in the slotdefs table, the PySequence and PyMapping slots are listed 
first, followed by the PyNumber methods.

Secondly, in both the object layout and the slotdefs table, the PySequence 
methods appear *before* the PyMapping methods, which means that 
tp_as_sequence->sq_item appears as "__getitem__" even though a subscript 
operation will actually invoke "tp_as_mapping->mp_subscript".

In short, I think Travis is right in calling this behaviour a bug. There's a 
similar problem with the methods that exist in both tp_as_number and 
tp_as_sequence - the abstract C API and the Python intepreter will favour the 
tp_as_number methods, but the slot definitions will favour tp_as_sequence.

The fix is actually fairly simple: reorder the slotdefs table so that the 
sequence of slots is "Number, Mapping, Sequence" rather than adhering strictly 
to the sequence of methods given in the definition of _typeobject.

The only objects affected by this change would be C extension objects which 
define two C-level methods which map to the same Python-level slot name. The 
observed behavioural change is that the methods accessible via the 
Python-level slot names would change (either from the Sequence method to the 
Mapping method, or from the Sequence method to the Number method).

Given that the only documentation I can find of the behaviour in that scenario 
is a comment in typeobject.c, that the implementation doesn't currently match 
the comment, and that the current implementation means that the methods 
accessed via the slot names don't match the methods normal Python syntax 
actually invokes, I find it hard to see how fixing it could cause any 
signficant problems.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://boredomandlaziness.blogspot.com

[Python-Dev] Why does __getitem__ slot of builtin call sequence methods first?

[Python-Dev] Why does getitem slot of builtin call sequence methods first?