
Hello, I find a strange discrepancy in Python with regards to slice subscripting of objects, at the C API level. I mean things like obj[start:end:step]. I'd expect slice subscripts to be part of the sequence interface, and yet they are not. In fact, they are part of the mapping interface. For example, the list object has its slice get/set methods assigned to a PyMappingMethods struct. So does a bytes object, and pretty much every other object that wants to support subscripts. This doesn't align well with the documentation, in at least two places. 1) The library documentation (http://docs.python.org/dev/library/stdtypes.html) in 4.8 says: "Mappings are mutable objects. There is currently only one standard mapping type, the dictionary" Why then does a list implement the mapping interface? Moreover, why does bytes, an immutable object, implement the mapping interface? 2) The same documentation page in 4.6 says, in the operation table: s[i:j] slice of s from i to j s[i:j:k] slice of s from i to j with step k But in the implementation, the slice subscripts are part of the mapping, not the sequence inteface. The PySequenceMethods structure does have fields for slice accessors, but their naming (was_sq_slice, was_sq_ass_slice) suggests they're just deprecated placeholders. This also doesn't align well with logic, since mappings like dict have no real meaning for slice subscripts. These logically belong to a sequence. Moreover, it takes subscripts for single a single numeric index away from subscripts for a slice into a different protocol (the former in sequence, the latter in mapping). I realize I must be missing some piece of the history here and not suggesting to change anything. I do think that the documentation, especially in the area of the type object that defines the sequence and mapping protocols, could be clarified to express what is expected of a new type that wants to act as a sequence. In particular, it should be said explicitly that such a type must implement the mapping protocol if it wants slice subscripting. If this makes any sense at all, I will open an issue. Eli

This doesn't align well with the documentation, in at least two places.
<snip> Another place is in http://docs.python.org/dev/reference/datamodel.html: " object.__getitem__(self, key) Called to implement evaluation of self[key]. For sequence types, the accepted keys should be integers and slice objects. [...] " Once again, at the C API level this isn't accurate since only integer keys are handled by the sequence protocol, leaving slice keys to the mapping protocol. The datamodel doc should stay as it is, because it's correct for Python-written classes. But the relevant C API sections really need some clarification. Eli

Eli Bendersky, 03.03.2012 09:36:
I think that's (partly?) for historical reasons. Originally, there were the slicing functions as part of the sequence interface. They took a start and an end index of the slice. Then, extended slicing was added to the language, and that used a slice object, which didn't fit into the sequence slicing interface. So the interface was unified using the existing mapping getitem interface, and the sequence slicing functions were eventually deprecated and removed in Py3. Stefan

On Sat, Mar 3, 2012 at 11:24, Stefan Behnel <stefan_ml@behnel.de> wrote:
This make sense. Not that now there's also duplication in almost all objects because the mapping protocol essentially supersedes the sequence protocol for accessing elements. I.e. sq_item and sq_ass_item are no longer needed if an object implements the mapping protocol, because the mapping interface has precedence, and mp_subscript & mp_ass_subscript are called instead, respectively. Because of that, the first thing they do is check whether the index is a simple number and do the work of their sequence protocol cousins. This duplicates code in almost all objects that need to support __getitem__. Eli

Hi,
It comes from: http://hg.python.org/cpython/rev/245224d1b8c9 http://bugs.python.org/issue400998 Written by Michael Hudson and reviewed by Guido. I wonder why this patch chose to add mapping protocol support to tuples and lists, rather than add a tp_ slot for extended slicing. Regards Antoine.

Why a separate tp_ slot for extended slicing? ISTM slicing pertains to sequences, similarly to other numeric indices. If you look at PySequenceMethods it has these (apparently no longer used fields): void *was_sq_slice; void *was_sq_ass_slice; These were "simple" slices (pairs of numbers). I suppose if any change is considered, these fields can be re-incarnated to accept PyObject* slices similarly to the current mp_subscript and mp_ass_subscript. Eli

On Sat, Mar 3, 2012 at 4:20 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
That's long ago... IIRC it was for binary compatibility -- I didn't want to add an extra slot to the sq struct because it would require recompilation of 3rd party extensions. At the time that was an important concern. -- --Guido van Rossum (python.org/~guido)

On Sat, Mar 3, 2012 at 19:58, Guido van Rossum <guido@python.org> wrote:
Perhaps the situation can be fixed now without binary compatibility concerns. PySequenceMethods is: typedef struct { lenfunc sq_length; binaryfunc sq_concat; ssizeargfunc sq_repeat; ssizeargfunc sq_item; void *was_sq_slice; ssizeobjargproc sq_ass_item; void *was_sq_ass_slice; objobjproc sq_contains; binaryfunc sq_inplace_concat; ssizeargfunc sq_inplace_repeat; } PySequenceMethods; The slots "was_sq_slice" and "was_sq_ass_slice" aren't used any longer. These can be re-incarnated to accept a slice object, and sequence objects can be rewritten to use them instead of implementing the mapping protocol (is there any reason listobject implements the mapping protocol, other than to gain the ability to use slices for __getitem__?). Existing 3rd party extensions don't *need* to be recompiled or changed, however. They *can* be, if their authors are interested, of course. Eli

On Sat, Mar 3, 2012 at 10:18, Eli Bendersky <eliben@gmail.com> wrote:
Why even have separate tp_as_sequence and tp_as_mapping anymore? That particular distinction never existed for Python types, so why should it exist for C types at all? I forget if there was ever a real point to it, but all it seems to do now is create confusion, what with many sequence types implementing both, and PyMapping_Check() and PySequence_Check() doing seemingly random things to come up with somewhat sensible answers. Do note that the dict type actually implements tp_as_sequence (in order to support containtment tests) and that PySequence_Check() has to explicitly return 0 for dicts -- which means that it will give the "wrong" answer for another type that behaves exactly like dicts. Getting rid of the misleading distinction seems like a much better idea than trying to re-conflate some of the issues. -- Thomas Wouters <thomas@python.org> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

On Sat, 3 Mar 2012 12:59:13 -0800 Thomas Wouters <thomas@python.org> wrote:
Ironically, most of the confusion stems from sequence types implementing the mapping protocol for extended slicing.
It seems to be a leftover: int PySequence_Check(PyObject *s) { if (PyDict_Check(s)) return 0; return s != NULL && s->ob_type->tp_as_sequence && s->ob_type->tp_as_sequence->sq_item != NULL; } Dict objects have a NULL sq_item so even removing the explicit check would still return the right answer.
Getting rid of the misleading distinction seems like a much better idea than trying to re-conflate some of the issues.
This proposal sounds rather backwards, given that we now have separate Mapping and Sequence ABCs. Regards Antoine.

On Sat, Mar 3, 2012 at 13:02, Antoine Pitrou <solipsis@pitrou.net> wrote:
I'm not sure how the ABCs, which are abstract declarations of semantics, tie into this specific implementation detail. ABCs work just as well for Python types as for C types, and Python types don't have this distinction. The distinction in C types has been *practically* useless for years, so why should it stay? What is the actual benefit here? -- Thomas Wouters <thomas@python.org> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

On Sat, Mar 3, 2012 at 13:12, Stefan Behnel <stefan_ml@behnel.de> wrote:
It's not hard to do this in a backward-compatible way. Either grow one of the tp_as_* to include everything a 'unified' tp_as_everything struct would need, or add a new tp_as_everything slot in the type struct. Then add a tp_flag to indicate that the type has this new layout/slot and guard all uses of the new slots with a check for that flag. If the type doesn't have the new layout or doesn't have it or the slots in it set, the code can fall back to the old try-one-and-then-the-other behaviour of dealing with tp_as_sequence and tp_as_mapping. (Let's not forget about tp_as_sequence.sq_concat, tp_as_number.nb_add, tp_as_sequence.sq_repeat and tp_as_number.nb_mul either.) -- Thomas Wouters <thomas@python.org> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

There's nothing to unify, really, since PyMappingMethods is just a subset of PySequenceMethods: typedef struct { lenfunc mp_length; binaryfunc mp_subscript; objobjargproc mp_ass_subscript; } PyMappingMethods; with the small difference that in PySequenceMethods sq_item and sq_ass_item just accept numeric indices. However, if PySequenceMethods has the was_sq_sclies and was_sq_ass_slice fields are reinstated to accept a generic PyObject, PyMappingMethods will be a true subset. If we look at the code, this becomes even clearer: in a full grep on the Python 3.3 source, there is no object that defines tp_as_mapping but does not also define tp_as_sequence, except Modules/_sqlite/row.c [I'm not familiar enough with the _sqlite module, but there's a chance it would make sense for the Row to be a sequence too]. Eli

On Sun, Mar 4, 2012 at 12:24 PM, Thomas Wouters <thomas@python.org> wrote:
(Let's not forget about tp_as_sequence.sq_concat, tp_as_number.nb_add, tp_as_sequence.sq_repeat and tp_as_number.nb_mul either.)
Indeed, let's not forget about those, which are a compatibility problem in and of themselves: http://bugs.python.org/issue11477 At most, the tp_mapping and tp_as_sequence overlap should be an FAQ entry in the devguide that says "yes, the implementation of this is weird. It's like that for historical reasons, and fixing it is a long way down the priority list for changes" Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Thomas Wouters wrote:
I imagine the original motivation was to provide a fast path for types that take ints as indexes. Also, it dates from the very beginnings of Python, before it had user defined classes. At that time the archetypal sequence (list) and the archetypal mapping (dict) were very distinct -- I don't think dicts supported 'in' then, so there was no overlap. It looks like a case of "it seemed like a good idea at the time". The distinction broke down fairly soon after, but it's so embedded in the extension module API that it's been very hard to get rid of. -- Greg

This doesn't align well with the documentation, in at least two places.
<snip> Another place is in http://docs.python.org/dev/reference/datamodel.html: " object.__getitem__(self, key) Called to implement evaluation of self[key]. For sequence types, the accepted keys should be integers and slice objects. [...] " Once again, at the C API level this isn't accurate since only integer keys are handled by the sequence protocol, leaving slice keys to the mapping protocol. The datamodel doc should stay as it is, because it's correct for Python-written classes. But the relevant C API sections really need some clarification. Eli

Eli Bendersky, 03.03.2012 09:36:
I think that's (partly?) for historical reasons. Originally, there were the slicing functions as part of the sequence interface. They took a start and an end index of the slice. Then, extended slicing was added to the language, and that used a slice object, which didn't fit into the sequence slicing interface. So the interface was unified using the existing mapping getitem interface, and the sequence slicing functions were eventually deprecated and removed in Py3. Stefan

On Sat, Mar 3, 2012 at 11:24, Stefan Behnel <stefan_ml@behnel.de> wrote:
This make sense. Not that now there's also duplication in almost all objects because the mapping protocol essentially supersedes the sequence protocol for accessing elements. I.e. sq_item and sq_ass_item are no longer needed if an object implements the mapping protocol, because the mapping interface has precedence, and mp_subscript & mp_ass_subscript are called instead, respectively. Because of that, the first thing they do is check whether the index is a simple number and do the work of their sequence protocol cousins. This duplicates code in almost all objects that need to support __getitem__. Eli

Hi,
It comes from: http://hg.python.org/cpython/rev/245224d1b8c9 http://bugs.python.org/issue400998 Written by Michael Hudson and reviewed by Guido. I wonder why this patch chose to add mapping protocol support to tuples and lists, rather than add a tp_ slot for extended slicing. Regards Antoine.

Why a separate tp_ slot for extended slicing? ISTM slicing pertains to sequences, similarly to other numeric indices. If you look at PySequenceMethods it has these (apparently no longer used fields): void *was_sq_slice; void *was_sq_ass_slice; These were "simple" slices (pairs of numbers). I suppose if any change is considered, these fields can be re-incarnated to accept PyObject* slices similarly to the current mp_subscript and mp_ass_subscript. Eli

On Sat, Mar 3, 2012 at 4:20 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
That's long ago... IIRC it was for binary compatibility -- I didn't want to add an extra slot to the sq struct because it would require recompilation of 3rd party extensions. At the time that was an important concern. -- --Guido van Rossum (python.org/~guido)

On Sat, Mar 3, 2012 at 19:58, Guido van Rossum <guido@python.org> wrote:
Perhaps the situation can be fixed now without binary compatibility concerns. PySequenceMethods is: typedef struct { lenfunc sq_length; binaryfunc sq_concat; ssizeargfunc sq_repeat; ssizeargfunc sq_item; void *was_sq_slice; ssizeobjargproc sq_ass_item; void *was_sq_ass_slice; objobjproc sq_contains; binaryfunc sq_inplace_concat; ssizeargfunc sq_inplace_repeat; } PySequenceMethods; The slots "was_sq_slice" and "was_sq_ass_slice" aren't used any longer. These can be re-incarnated to accept a slice object, and sequence objects can be rewritten to use them instead of implementing the mapping protocol (is there any reason listobject implements the mapping protocol, other than to gain the ability to use slices for __getitem__?). Existing 3rd party extensions don't *need* to be recompiled or changed, however. They *can* be, if their authors are interested, of course. Eli

On Sat, Mar 3, 2012 at 10:18, Eli Bendersky <eliben@gmail.com> wrote:
Why even have separate tp_as_sequence and tp_as_mapping anymore? That particular distinction never existed for Python types, so why should it exist for C types at all? I forget if there was ever a real point to it, but all it seems to do now is create confusion, what with many sequence types implementing both, and PyMapping_Check() and PySequence_Check() doing seemingly random things to come up with somewhat sensible answers. Do note that the dict type actually implements tp_as_sequence (in order to support containtment tests) and that PySequence_Check() has to explicitly return 0 for dicts -- which means that it will give the "wrong" answer for another type that behaves exactly like dicts. Getting rid of the misleading distinction seems like a much better idea than trying to re-conflate some of the issues. -- Thomas Wouters <thomas@python.org> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

On Sat, 3 Mar 2012 12:59:13 -0800 Thomas Wouters <thomas@python.org> wrote:
Ironically, most of the confusion stems from sequence types implementing the mapping protocol for extended slicing.
It seems to be a leftover: int PySequence_Check(PyObject *s) { if (PyDict_Check(s)) return 0; return s != NULL && s->ob_type->tp_as_sequence && s->ob_type->tp_as_sequence->sq_item != NULL; } Dict objects have a NULL sq_item so even removing the explicit check would still return the right answer.
Getting rid of the misleading distinction seems like a much better idea than trying to re-conflate some of the issues.
This proposal sounds rather backwards, given that we now have separate Mapping and Sequence ABCs. Regards Antoine.

On Sat, Mar 3, 2012 at 13:02, Antoine Pitrou <solipsis@pitrou.net> wrote:
I'm not sure how the ABCs, which are abstract declarations of semantics, tie into this specific implementation detail. ABCs work just as well for Python types as for C types, and Python types don't have this distinction. The distinction in C types has been *practically* useless for years, so why should it stay? What is the actual benefit here? -- Thomas Wouters <thomas@python.org> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

On Sat, Mar 3, 2012 at 13:12, Stefan Behnel <stefan_ml@behnel.de> wrote:
It's not hard to do this in a backward-compatible way. Either grow one of the tp_as_* to include everything a 'unified' tp_as_everything struct would need, or add a new tp_as_everything slot in the type struct. Then add a tp_flag to indicate that the type has this new layout/slot and guard all uses of the new slots with a check for that flag. If the type doesn't have the new layout or doesn't have it or the slots in it set, the code can fall back to the old try-one-and-then-the-other behaviour of dealing with tp_as_sequence and tp_as_mapping. (Let's not forget about tp_as_sequence.sq_concat, tp_as_number.nb_add, tp_as_sequence.sq_repeat and tp_as_number.nb_mul either.) -- Thomas Wouters <thomas@python.org> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!

There's nothing to unify, really, since PyMappingMethods is just a subset of PySequenceMethods: typedef struct { lenfunc mp_length; binaryfunc mp_subscript; objobjargproc mp_ass_subscript; } PyMappingMethods; with the small difference that in PySequenceMethods sq_item and sq_ass_item just accept numeric indices. However, if PySequenceMethods has the was_sq_sclies and was_sq_ass_slice fields are reinstated to accept a generic PyObject, PyMappingMethods will be a true subset. If we look at the code, this becomes even clearer: in a full grep on the Python 3.3 source, there is no object that defines tp_as_mapping but does not also define tp_as_sequence, except Modules/_sqlite/row.c [I'm not familiar enough with the _sqlite module, but there's a chance it would make sense for the Row to be a sequence too]. Eli

On Sun, Mar 4, 2012 at 12:24 PM, Thomas Wouters <thomas@python.org> wrote:
(Let's not forget about tp_as_sequence.sq_concat, tp_as_number.nb_add, tp_as_sequence.sq_repeat and tp_as_number.nb_mul either.)
Indeed, let's not forget about those, which are a compatibility problem in and of themselves: http://bugs.python.org/issue11477 At most, the tp_mapping and tp_as_sequence overlap should be an FAQ entry in the devguide that says "yes, the implementation of this is weird. It's like that for historical reasons, and fixing it is a long way down the priority list for changes" Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Thomas Wouters wrote:
I imagine the original motivation was to provide a fast path for types that take ints as indexes. Also, it dates from the very beginnings of Python, before it had user defined classes. At that time the archetypal sequence (list) and the archetypal mapping (dict) were very distinct -- I don't think dicts supported 'in' then, so there was no overlap. It looks like a case of "it seemed like a good idea at the time". The distinction broke down fairly soon after, but it's so embedded in the extension module API that it's been very hard to get rid of. -- Greg
participants (7)
-
Antoine Pitrou
-
Eli Bendersky
-
Greg Ewing
-
Guido van Rossum
-
Nick Coghlan
-
Stefan Behnel
-
Thomas Wouters