[Python-3000] Removing simple slicing

Thomas Wouters thomas at python.org
Tue Aug 28 17:42:26 CEST 2007


I updated the patches destined for the trunk (slice-object support for all
objects that supported simple slicing, and actual extended slicing support
for most of them) and checked them in. Next stop is cleaning up the actual
slice-removal bits. I do have two remaining issues: what do we do about
PyMapping_Check(), and should I make the post an actual PEP? I'm thinking
PyMapping_Check() should be removed, or made to look at
tp_as_sequence->sq_item instead of tp_as_sequence->sq_slice and deprecated.
I also think this change should be documented in an actual PEP, as it'll be
a pretty big change for any C extension implementing simple slices but not
slice-object support.

On 8/24/07, Thomas Wouters <thomas at python.org> wrote:
>
>
> I did some work at last year's Google sprint on removing the simple
> slicing API (__getslice__, tp_as_sequence->sq_slice) in favour of the more
> flexible sliceobject API (__getitem__ and tp_as_mapping->mp_subscript using
> slice objects as index.) For some more detail, see the semi-PEP below. (I
> hesitate to call it a PEP because it's way past the Py3k PEP deadline, but
> the email I was originally going to send on this subject grew in such a size
> that I figured I might as well use PEP layout and use the opportunity to
> record some best practices and behaviour. And the change should probably be
> recorded in a PEP anyway, even though it has never been formally proposed,
> just taken as a given.)
>
> If anyone is bored and/or interested in doing some complicated work, there
> is still a bit of (optional) work to be done in this area: I uploaded
> patches to be applied to the trunk SF 8 months ago -- extended slicing
> support for a bunch of types. Some of that extended slicing support is
> limited to step-1 slices, though, most notably UserString.MutableStringand ctypes. I can guarantee adding non-step-1 support to them is a
> challenging and fulfilling exercise, having done it for several types, but I
> can't muster the intellectual stamina to do it for these (to me) fringe
> types. The patches can be found in Roundup: http://bugs.python.org/issue?%40search_text=&title=&%40columns=title&id=&%40columns=id&creation=&creator=twouters&activity=&%40columns=activity&%40sort=activity&actor=&type=&components=&versions=&severity=&dependencies=&assignee=&keywords=&priority=&%40group=priority&status=1&%40columns=status&resolution=&%40pagesize=50&%40startwith=0&%40action=search
> (there doesn't seem to be a shorter URL; just search for issues created by
> 'twouters' instead.)
>
> If nobody cares, I will be checking these patches into the trunk this
> weekend (after updating them), and then update and check in the rest of the
> p3yk-noslice branch into the py3k branch.
>
> Abstract
> ========
>
> This proposal discusses getting rid of the two types of slicing Python
> uses,
> ``simple`` and ``extended``. Extended slicing was added later, and uses a
> different API at both the C and the Python level for backward
> compatibility.
> Extended slicing can express everything simple slicing can express,
> however, making the simple slicing API practically redundant.
>
> A Tale of Two APIs
> ==================
>
> Simple slicing is a slice operation without a step, Ellipsis or tuple of
> slices -- the archetypical slice of just `start` and/or `stop`, with a
> single colon separating them and both sides being optional::
>
>     L[1:3]
>     L[2:]
>     L[:-5]
>     L[:]
>
> An extended slice is any slice that isn't simple::
>
>     L[1:5:2]
>     L[1:3, 8:10]
>     L[1, ..., 5:-2]
>     L[1:3:]
>
> (Note that the presence of an extra colon in the last example makes the
> very
> first simple slice an extended slice, but otherwise expresses the exact
> same
> slicing operation.)
>
> In applying a simple slice, Python does the work of translating omitted,
> out
> of bounds or negative indices into the appropriate actual indices, based
> on
> the length of the sequence. The normalized ``start`` and ``stop`` indices
> are then passed to the appropriate method: ``__getslice__``,
> ``__setslice__`` or ``__delslice__`` for Python classes,
> ``tp_as_sequence``'s ``sq_slice`` or ``sq_ass_slice`` for C types.
>
> For extended slicing, no special handling of slice indices is done. The
> indices in ``start:stop:step`` are wrapped in a ``slice`` object, with
> missing indices represented as None. The indices are otherwise taken
> as-is.
> The sequence object is then indexed with the slice object as if it were a
> mapping: ``__getitem__``,`` __setitem__`` or ``__delitem__`` for Python
> classes, ``tp_as_mapping``'s ``mp_subscript`` or ``mp_ass_subscript``.
> It is entirely up to the sequence to interpret the meaning of missing, out
>
> of bounds or negative indices, let alone non-numerical indices like tuples
> or Ellipsis or arbitrary objects.
>
> Since at least Python 2.1, applying a simple slice to an object that does
> not
> implement the simple slicing API will fall back to using extended slicing,
>
> calling __getitem__ (or mp_subscript) instead of __getslice__ (or
> sq_slice),
> and similarly for slice assignment/deletion.
>
> Problems
> ========
>
> Aside from the obvious disadvantage of having two ways to do the same
> thing,
> simple slicing is an inconvenient wart for several reasons:
>
>  1) It (passively) promotes supporting only simple slicing, as observed by
>     the builtin types only supporting extended slicing many years after
>     extended slicing was introduced.
>
>  2) The Python VM dedicates 12 of its opcodes, about 11%, to support
>     simple slicing, and effectively reserves another 13 for code
>     convenience. Reducing the Big Switch in the bytecode interpreter
>     would certainly not hurt Python performance.
>
>  5) The same goes for the number of functions, macros and
> function-pointers
>     supporting simple slicing, although the impact would be
> maintainability
>     and readability of the source rather than performance.
>
> Proposed Solution
> =================
>
> The proposed solution, as implemented in the p3yk-noslice SVN branch, gets
> rid of the simple slicing methods and PyType entries. The simple C API
> (using ``Py_ssize_t`` for start and stop) remains, but creates a slice
> object as necessary instead. Various types had to be updated to support
> slice objects, or improve the simple slicing case of extended slicing.
>
> The result is that ``__getslice__``, ``__setslice__`` and ``__delslice__``
> are no longer
> called in any situation. Classes that delegate ``__getitem__`` (or the C
> equivalent) to a sequence type get any slicing behaviour of that type for
> free. Classes that implement their own slicing will have to be modified to
>
> accept slice objects and process the indices themselves. This means that
> at
> the C level, like is already the case at the Python level, the same method
> is used for mapping-like access as for slicing. C types will still want to
>
> implement ``tp_as_sequence->sq_item``, but that function will only be
> called
> when using the ``PySequence_*Item()`` API. Those API functions do not
> (yet) fall
> back to using ``tp_as_mapping->mp_subscript``, although they possibly
> should.
>
> A casualty of this change is ``PyMapping_Check()``. It used to check for
> ``tp_as_mapping`` being available, and was modified to check for
> ``tp_as_mapping`` but *not* ``tp_as_sequence->sq_slice`` when extended
> slicing was added to the builtin types. It could conceivably check for
> ``tp_as_sequence->sq_item`` instead of ``sq_slice``, but the added value
> is
> unclear (especially considering ABCs.) In the standard library and CPython
>
> itself, ``PyMapping_Check()`` is used mostly to provide early errors, for
> instance by checking the arguments to ``exec()``.
>
> Alternate Solution
> ------------------
>
> A possible alternative to removing simple slicing completely, would be to
> introduce a new typestruct hook, with the same signature as
> ``tp_as_mapping->mp_subscript``, which would be called for slicing
> operations. All as-mapping index operations would have to fall back to
> this
> new ``sq_extended_slice`` hook, in order for ``seq[slice(...)]`` to work
> as
> expected. For some added efficiency and error-checking, expressions using
> actual slice syntax could compile into bytecodes specific for slicing (of
> which there would only be three, instead of twelve.) This approach would
> simplify C types wanting to support extended slicing but not
> arbitrary-object indexing (and vice-versa) somewhat, but the benefit seems
> too small to warrant the added complexity in the CPython runtime itself.
>
>
> Implementing Extended Slicing
> =============================
>
> Supporting extended slicing in C types is not as easily done as supporting
> simple slicing. There are a number of edgecases in interpreting the odder
> combinations of ``start``, ``stop`` and ``step``. This section tries to
> give
> some explanations and best practices.
>
> Extended Slicing in C
> ---------------------
>
> Because the mapping API takes precedence over the sequence API, any
> ``tp_as_mapping->mp_subscript`` and ``tp_as_mapping->mp_ass_subscript``
> functions need to proper typechecks on their argument. In Python 2.5 and
> later, this is best done using ``PyIndex_Check()`` and ``PySlice_Check()``
>
> (and possibly ``PyTuple_Check()`` and comparison against ``Py_Ellipsis``.)
> For compatibility with Python 2.4 and earlier, ``PyIndex_Check()`` would
> have to be replaced with ``PyInt_Check()`` and ``PyLong_Check()``.
>
> Indices that pass ``PyIndex_Check()`` should be converted to a
> ``Py_ssize_t`` using ``PyIndex_AsSsizeT()`` and delegated to
> ``tp_as_sequence->sq_item``. (For compatibility with Python 2.4, use
> ``PyNumber_AsLong()`` and downcast to an ``int`` instead.)
>
> The exact meaning of tuples of slices, and of Ellipsis, is up to the type,
> as no standard-library types support it. It may be useful to use the same
> convention as the Numpy package. Slices inside tuples, if supported,
> should
> probably follow the same rules as direct slices.
>
> From slice objects, correct indices can be extracted with
> ``PySlice_GetIndicesEx()``. Negative and out-of-bounds indices will be
> adjusted based on the provided length, but a negative ``step``, and a
> ``stop`` before a ``step`` are kept as-is. This means that, for a getslice
> operation, a simple for-loop can be used to visit the correct items in the
> correct order::
>
>     for (cur = start, i = 0; i < slicelength; cur += step, i++)
>         dest[i] = src[cur];
>
>
> If ``PySlice_GetIndicesEx()`` is not appropriate, the individual indices
> can
> be extracted from the ``PySlice`` object. If the indices are to be
> converted
> to C types, that should be done using ``PyIndex_Check()``,
> ``PyIndex_AsSsizeT()`` and the ``Py_ssize_t`` type, except that ``None``
> should be accepted as the default value for the index.
>
> For deleting slices (``mp_ass_subscript`` called with ``NULL`` as
> value) where the order does not matter, a reverse slice can be turned into
>
> the equivalent forward slice with::
>
>     if (step < 0) {
>         stop = start + 1;
>         start = stop + step*(slicelength - 1) - 1;
>         step = -step;
>     }
>
>
> For slice assignment with a ``step`` other than 1, it's usually necessary
> to
> require the source iterable to have the same length as the slice. When
> assigning to a slice of length 0, care needs to be taken to select the
> right
> insertion point. For a slice S[5:2], the correct insertion point is before
>
> index 5, not before index 2.
>
> For both deleting slice and slice assignment, it is important to remember
> arbitrary Python code may be executed when calling Py_DECREF() or
> otherwise
> interacting with arbitrary objects. Because of that, it's important your
> datatype stays consistent throughout the operation. Either operate on a
> copy
> of your datatype, or delay (for instance) Py_DECREF() calls until the
> datatype is updated. The latter is usually done by keeping a scratchpad of
>
> to-be-DECREF'ed items.
>
> Extended slicing in Python
> --------------------------
>
> The simplest way to support extended slicing in Python is by delegating to
> an underlying type that already supports extended slicing. The class can
> simply index the underlying type with the slice object (or tuple) it was
> indexed with.
>
> Barring that, the Python code will have to pretty much apply
> the same logic as the C type. ``PyIndex_AsSsizeT()`` is available as
> ``operator.index()``, with a ``try/except`` block replacing
> ``PyIndex_Check()``. ``isinstance(o, slice)`` and ``sliceobj.indices()``
> replace ``PySlice_Check()`` and ``PySlice_GetIndices()``, but the
> slicelength
> (which is provided by ``PySlice_GetIndicesEx()``) has to be calculated
> manually.
>
> Testing extended slicing
> ------------------------
>
> Proper tests of extended slicing capabilities should at least include the
> following (if the operations are supported), assuming a sequence of
> length 10. Triple-colon notation is used everywhere so it uses extended
> slicing even in Python 2.5 and earlier::
>
>    S[2:5:] (same as S[2:5])
>    S[5:2:] (same as S[5:2], an empty slice)
>    S[::] (same as S[:], a copy of the sequence)
>    S[:2:] (same as S[:2])
>    S[:11:] (same as S[:11], a copy of the sequence)
>    S[5::] (same as S[5:])
>    S[-11::] (same as S[-11:], a copy of the sequence)
>    S[-5:2:1] (same as S[:2])
>    S[-5:-2:2] (same as S[-5:-2], an empty slice)
>    S[5:2:-1] (the reverse of S[2:4])
>    S[-2:-5:-1] (the reverse of S[-4:-1])
>
>    S[:5:2] ([ S[0], S[2], S[4] ]))
>    S[9::2] ([ S[9] ])
>    S[8::2] ([ S[8] ])
>    S[7::2] ([ S[7], S[9]])
>    S[1::-1] ([ S[1], S[0] ])
>    S[1:0:-1] ([ S[1] ], does not include S[0]!)
>    S[1:-1:-1] (an empty slice)
>    S[::10] ([ S[0] ])
>    S[::-10] ([ S[9] ])
>
>    S[2:5:] = [1, 2, 3] ([ S[2], S[3], S[4] ] become [1, 2, 3])
>    S[2:5:] = [1] (S[2] becomes 1, S[3] and S[4] are deleted)
>    S[5:2:] = [1, 2, 3] ([1, 2, 3] inserted before S[5])
>    S[2:5:2] = [1, 2] ([ S[2], S[4] ] become [1, 2])
>    S[5:2:-2] = [1, 2] ([ S[3], S[5] ] become [2, 1])
>    S[3::3] = [1, 2, 3] ([ S[3], S[6], S[9] ] become [1, 2, 3])
>    S[:-5:-2] = [1, 2] ([ S[7], S[9] ] become [2, 1])
>
>    S[::-1] = S (reverse S in-place awkwardly)
>    S[:5:] = S (replaces S[:5] with a copy of S)
>
>    S[2:5:2] = [1, 2, 3] (error: assigning length-3 to slicelength-2)
>    S[2:5:2] = None (error: need iterable)
>
>
>


-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070828/1d5ece79/attachment-0001.htm 


More information about the Python-3000 mailing list