[Python-Dev] Why does __getitem__ slot of builtin call sequence methods first?
ncoghlan at gmail.com
Sun Oct 2 05:23:19 CEST 2005
Guido van Rossum wrote:
> Hmm... I'm sure the answer is in typeobject.c, but that is one of the
> more obfuscated parts of Python's guts. I wrote it four years ago and
> since then I've apparently lost enough brain cells (or migrated them
> from language implementation to to language design service :) that I
> don't understand it inside out any more like I did while I was in the
> midst of it.
> However, I wonder if the logic isn't such that if you define both
> sq_item and mp_subscript, __getitem__ calls sq_item; I wonder if by
> removing sq_item it might call mp_subscript? Worth a try, anyway.
As near as I can tell, the C/API documentation is silent on how slots are
populated when multiple methods mapping to the same slot are defined by a C
object, but this is a quote from the comment describing add_operators() in
> In the latter case, the first slotdef entry encoutered wins. Since
> slotdef entries are sorted by the offset of the slot in the
> PyHeapTypeObject, this gives us some control over disambiguating
> between competing slots: the members of PyHeapTypeObject are listed
> from most general to least general, so the most general slot is
> preferred. In particular, because as_mapping comes before as_sequence,
> for a type that defines both mp_subscript and sq_item, mp_subscript
Further, in PyObject_GetItem (in abstract.c), tp_as_mapping->mp_subscript is
checked first, with tp_as_sequence->mp_item only being checked if mp_subscript
isn't found. Importantly, this is the function invoked by the BINARY_SUBSCR
So, the *intent* certainly appears to be that mp_subscript should be preferred
both by the C abstract object API and from normal Python code.
*However*, the precedence applied by add_operators() is governed by the
slotdefs structure in typeobject.c, which, according to the above comment, is
meant to match the order the slots appear in memory in the _typeobject
structure in object.h, and favour the mapping methods over the sequence methods.
There's actually two serious problems with the description in this comment:
Firstly, the two orders don't actually match. In the object layout, the
ordering of the abstract object methods is as follows:
But in the slotdefs table, the PySequence and PyMapping slots are listed
first, followed by the PyNumber methods.
Secondly, in both the object layout and the slotdefs table, the PySequence
methods appear *before* the PyMapping methods, which means that
tp_as_sequence->sq_item appears as "__getitem__" even though a subscript
operation will actually invoke "tp_as_mapping->mp_subscript".
In short, I think Travis is right in calling this behaviour a bug. There's a
similar problem with the methods that exist in both tp_as_number and
tp_as_sequence - the abstract C API and the Python intepreter will favour the
tp_as_number methods, but the slot definitions will favour tp_as_sequence.
The fix is actually fairly simple: reorder the slotdefs table so that the
sequence of slots is "Number, Mapping, Sequence" rather than adhering strictly
to the sequence of methods given in the definition of _typeobject.
The only objects affected by this change would be C extension objects which
define two C-level methods which map to the same Python-level slot name. The
observed behavioural change is that the methods accessible via the
Python-level slot names would change (either from the Sequence method to the
Mapping method, or from the Sequence method to the Number method).
Given that the only documentation I can find of the behaviour in that scenario
is a comment in typeobject.c, that the implementation doesn't currently match
the comment, and that the current implementation means that the methods
accessed via the slot names don't match the methods normal Python syntax
actually invokes, I find it hard to see how fixing it could cause any
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
More information about the Python-Dev