[Python-3000] Removing simple slicing

Thomas Wouters thomas at python.org
Fri Aug 24 16:33:47 CEST 2007


I did some work at last year's Google sprint on removing the simple slicing
API (__getslice__, tp_as_sequence->sq_slice) in favour of the more flexible
sliceobject API (__getitem__ and tp_as_mapping->mp_subscript using slice
objects as index.) For some more detail, see the semi-PEP below. (I hesitate
to call it a PEP because it's way past the Py3k PEP deadline, but the email
I was originally going to send on this subject grew in such a size that I
figured I might as well use PEP layout and use the opportunity to record
some best practices and behaviour. And the change should probably be
recorded in a PEP anyway, even though it has never been formally proposed,
just taken as a given.)

If anyone is bored and/or interested in doing some complicated work, there
is still a bit of (optional) work to be done in this area: I uploaded
patches to be applied to the trunk SF 8 months ago -- extended slicing
support for a bunch of types. Some of that extended slicing support is
limited to step-1 slices, though, most notably UserString.MutableString and
ctypes. I can guarantee adding non-step-1 support to them is a challenging
and fulfilling exercise, having done it for several types, but I can't
muster the intellectual stamina to do it for these (to me) fringe types. The
patches can be found in Roundup:
http://bugs.python.org/issue?%40search_text=&title=&%40columns=title&id=&%40columns=id&creation=&creator=twouters&activity=&%40columns=activity&%40sort=activity&actor=&type=&components=&versions=&severity=&dependencies=&assignee=&keywords=&priority=&%40group=priority&status=1&%40columns=status&resolution=&%40pagesize=50&%40startwith=0&%40action=search(there
doesn't seem to be a shorter URL; just search for issues created by
'twouters' instead.)

If nobody cares, I will be checking these patches into the trunk this
weekend (after updating them), and then update and check in the rest of the
p3yk-noslice branch into the py3k branch.

Abstract
========

This proposal discusses getting rid of the two types of slicing Python uses,
``simple`` and ``extended``. Extended slicing was added later, and uses a
different API at both the C and the Python level for backward compatibility.
Extended slicing can express everything simple slicing can express,
however, making the simple slicing API practically redundant.

A Tale of Two APIs
==================

Simple slicing is a slice operation without a step, Ellipsis or tuple of
slices -- the archetypical slice of just `start` and/or `stop`, with a
single colon separating them and both sides being optional::

    L[1:3]
    L[2:]
    L[:-5]
    L[:]

An extended slice is any slice that isn't simple::

    L[1:5:2]
    L[1:3, 8:10]
    L[1, ..., 5:-2]
    L[1:3:]

(Note that the presence of an extra colon in the last example makes the very
first simple slice an extended slice, but otherwise expresses the exact same
slicing operation.)

In applying a simple slice, Python does the work of translating omitted, out
of bounds or negative indices into the appropriate actual indices, based on
the length of the sequence. The normalized ``start`` and ``stop`` indices
are then passed to the appropriate method: ``__getslice__``,
``__setslice__`` or ``__delslice__`` for Python classes,
``tp_as_sequence``'s ``sq_slice`` or ``sq_ass_slice`` for C types.

For extended slicing, no special handling of slice indices is done. The
indices in ``start:stop:step`` are wrapped in a ``slice`` object, with
missing indices represented as None. The indices are otherwise taken as-is.
The sequence object is then indexed with the slice object as if it were a
mapping: ``__getitem__``,`` __setitem__`` or ``__delitem__`` for Python
classes, ``tp_as_mapping``'s ``mp_subscript`` or ``mp_ass_subscript``.
It is entirely up to the sequence to interpret the meaning of missing, out
of bounds or negative indices, let alone non-numerical indices like tuples
or Ellipsis or arbitrary objects.

Since at least Python 2.1, applying a simple slice to an object that does
not
implement the simple slicing API will fall back to using extended slicing,
calling __getitem__ (or mp_subscript) instead of __getslice__ (or sq_slice),
and similarly for slice assignment/deletion.

Problems
========

Aside from the obvious disadvantage of having two ways to do the same thing,
simple slicing is an inconvenient wart for several reasons:

 1) It (passively) promotes supporting only simple slicing, as observed by
    the builtin types only supporting extended slicing many years after
    extended slicing was introduced.

 2) The Python VM dedicates 12 of its opcodes, about 11%, to support
    simple slicing, and effectively reserves another 13 for code
    convenience. Reducing the Big Switch in the bytecode interpreter
    would certainly not hurt Python performance.

 5) The same goes for the number of functions, macros and function-pointers
    supporting simple slicing, although the impact would be maintainability
    and readability of the source rather than performance.

Proposed Solution
=================

The proposed solution, as implemented in the p3yk-noslice SVN branch, gets
rid of the simple slicing methods and PyType entries. The simple C API
(using ``Py_ssize_t`` for start and stop) remains, but creates a slice
object as necessary instead. Various types had to be updated to support
slice objects, or improve the simple slicing case of extended slicing.

The result is that ``__getslice__``, ``__setslice__`` and ``__delslice__``
are no longer
called in any situation. Classes that delegate ``__getitem__`` (or the C
equivalent) to a sequence type get any slicing behaviour of that type for
free. Classes that implement their own slicing will have to be modified to
accept slice objects and process the indices themselves. This means that at
the C level, like is already the case at the Python level, the same method
is used for mapping-like access as for slicing. C types will still want to
implement ``tp_as_sequence->sq_item``, but that function will only be called
when using the ``PySequence_*Item()`` API. Those API functions do not (yet)
fall
back to using ``tp_as_mapping->mp_subscript``, although they possibly
should.

A casualty of this change is ``PyMapping_Check()``. It used to check for
``tp_as_mapping`` being available, and was modified to check for
``tp_as_mapping`` but *not* ``tp_as_sequence->sq_slice`` when extended
slicing was added to the builtin types. It could conceivably check for
``tp_as_sequence->sq_item`` instead of ``sq_slice``, but the added value is
unclear (especially considering ABCs.) In the standard library and CPython
itself, ``PyMapping_Check()`` is used mostly to provide early errors, for
instance by checking the arguments to ``exec()``.

Alternate Solution
------------------

A possible alternative to removing simple slicing completely, would be to
introduce a new typestruct hook, with the same signature as
``tp_as_mapping->mp_subscript``, which would be called for slicing
operations. All as-mapping index operations would have to fall back to this
new ``sq_extended_slice`` hook, in order for ``seq[slice(...)]`` to work as
expected. For some added efficiency and error-checking, expressions using
actual slice syntax could compile into bytecodes specific for slicing (of
which there would only be three, instead of twelve.) This approach would
simplify C types wanting to support extended slicing but not
arbitrary-object indexing (and vice-versa) somewhat, but the benefit seems
too small to warrant the added complexity in the CPython runtime itself.


Implementing Extended Slicing
=============================

Supporting extended slicing in C types is not as easily done as supporting
simple slicing. There are a number of edgecases in interpreting the odder
combinations of ``start``, ``stop`` and ``step``. This section tries to give
some explanations and best practices.

Extended Slicing in C
---------------------

Because the mapping API takes precedence over the sequence API, any
``tp_as_mapping->mp_subscript`` and ``tp_as_mapping->mp_ass_subscript``
functions need to proper typechecks on their argument. In Python 2.5 and
later, this is best done using ``PyIndex_Check()`` and ``PySlice_Check()``
(and possibly ``PyTuple_Check()`` and comparison against ``Py_Ellipsis``.)
For compatibility with Python 2.4 and earlier, ``PyIndex_Check()`` would
have to be replaced with ``PyInt_Check()`` and ``PyLong_Check()``.

Indices that pass ``PyIndex_Check()`` should be converted to a
``Py_ssize_t`` using ``PyIndex_AsSsizeT()`` and delegated to
``tp_as_sequence->sq_item``. (For compatibility with Python 2.4, use
``PyNumber_AsLong()`` and downcast to an ``int`` instead.)

The exact meaning of tuples of slices, and of Ellipsis, is up to the type,
as no standard-library types support it. It may be useful to use the same
convention as the Numpy package. Slices inside tuples, if supported, should
probably follow the same rules as direct slices.

>From slice objects, correct indices can be extracted with
``PySlice_GetIndicesEx()``. Negative and out-of-bounds indices will be
adjusted based on the provided length, but a negative ``step``, and a
``stop`` before a ``step`` are kept as-is. This means that, for a getslice
operation, a simple for-loop can be used to visit the correct items in the
correct order::

    for (cur = start, i = 0; i < slicelength; cur += step, i++)
        dest[i] = src[cur];


If ``PySlice_GetIndicesEx()`` is not appropriate, the individual indices can
be extracted from the ``PySlice`` object. If the indices are to be converted
to C types, that should be done using ``PyIndex_Check()``,
``PyIndex_AsSsizeT()`` and the ``Py_ssize_t`` type, except that ``None``
should be accepted as the default value for the index.

For deleting slices (``mp_ass_subscript`` called with ``NULL`` as
value) where the order does not matter, a reverse slice can be turned into
the equivalent forward slice with::

    if (step < 0) {
        stop = start + 1;
        start = stop + step*(slicelength - 1) - 1;
        step = -step;
    }


For slice assignment with a ``step`` other than 1, it's usually necessary to
require the source iterable to have the same length as the slice. When
assigning to a slice of length 0, care needs to be taken to select the right
insertion point. For a slice S[5:2], the correct insertion point is before
index 5, not before index 2.

For both deleting slice and slice assignment, it is important to remember
arbitrary Python code may be executed when calling Py_DECREF() or otherwise
interacting with arbitrary objects. Because of that, it's important your
datatype stays consistent throughout the operation. Either operate on a copy
of your datatype, or delay (for instance) Py_DECREF() calls until the
datatype is updated. The latter is usually done by keeping a scratchpad of
to-be-DECREF'ed items.

Extended slicing in Python
--------------------------

The simplest way to support extended slicing in Python is by delegating to
an underlying type that already supports extended slicing. The class can
simply index the underlying type with the slice object (or tuple) it was
indexed with.

Barring that, the Python code will have to pretty much apply
the same logic as the C type. ``PyIndex_AsSsizeT()`` is available as
``operator.index()``, with a ``try/except`` block replacing
``PyIndex_Check()``. ``isinstance(o, slice)`` and ``sliceobj.indices()``
replace ``PySlice_Check()`` and ``PySlice_GetIndices()``, but the
slicelength
(which is provided by ``PySlice_GetIndicesEx()``) has to be calculated
manually.

Testing extended slicing
------------------------

Proper tests of extended slicing capabilities should at least include the
following (if the operations are supported), assuming a sequence of
length 10. Triple-colon notation is used everywhere so it uses extended
slicing even in Python 2.5 and earlier::

   S[2:5:] (same as S[2:5])
   S[5:2:] (same as S[5:2], an empty slice)
   S[::] (same as S[:], a copy of the sequence)
   S[:2:] (same as S[:2])
   S[:11:] (same as S[:11], a copy of the sequence)
   S[5::] (same as S[5:])
   S[-11::] (same as S[-11:], a copy of the sequence)
   S[-5:2:1] (same as S[:2])
   S[-5:-2:2] (same as S[-5:-2], an empty slice)
   S[5:2:-1] (the reverse of S[2:4])
   S[-2:-5:-1] (the reverse of S[-4:-1])

   S[:5:2] ([ S[0], S[2], S[4] ]))
   S[9::2] ([ S[9] ])
   S[8::2] ([ S[8] ])
   S[7::2] ([ S[7], S[9]])
   S[1::-1] ([ S[1], S[0] ])
   S[1:0:-1] ([ S[1] ], does not include S[0]!)
   S[1:-1:-1] (an empty slice)
   S[::10] ([ S[0] ])
   S[::-10] ([ S[9] ])

   S[2:5:] = [1, 2, 3] ([ S[2], S[3], S[4] ] become [1, 2, 3])
   S[2:5:] = [1] (S[2] becomes 1, S[3] and S[4] are deleted)
   S[5:2:] = [1, 2, 3] ([1, 2, 3] inserted before S[5])
   S[2:5:2] = [1, 2] ([ S[2], S[4] ] become [1, 2])
   S[5:2:-2] = [1, 2] ([ S[3], S[5] ] become [2, 1])
   S[3::3] = [1, 2, 3] ([ S[3], S[6], S[9] ] become [1, 2, 3])
   S[:-5:-2] = [1, 2] ([ S[7], S[9] ] become [2, 1])

   S[::-1] = S (reverse S in-place awkwardly)
   S[:5:] = S (replaces S[:5] with a copy of S)

   S[2:5:2] = [1, 2, 3] (error: assigning length-3 to slicelength-2)
   S[2:5:2] = None (error: need iterable)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070824/779e59cb/attachment-0001.htm 


More information about the Python-3000 mailing list