[Python-ideas] Where did we go wrong with negative stride?
Steven D'Aprano
steve at pearwood.info
Mon Oct 28 03:20:47 CET 2013
On Sun, Oct 27, 2013 at 01:38:05PM -0500, Tim Peters wrote:
> I may have a different slant on this. I've found that - by far - the
> most successful way to "teach slices" to newcomers is to invite them
> to view indices as being _between_ sequence elements.
>
> <position 0> <element> <position 1> <element> <position 2> <element>
> <position 3> ...
>
> Then i:j selects the elements between position i and position j.
>
> >>> "abcde"[2:4]
> 'cd'
I really like that view point, but it has a major problem. As
beautifully elegant as the "cut between positions" model is for
stride=1, it doesn't extend to non-unit strides. You cannot think about
non-contiguous slices in terms of a pair of cuts at position <start> and
<end>. I believe that the cleanest way to understand non-contiguous
slices with stride > 1 is to think of array indices. That same model
works for the negative stride case too.
Further details below.
> But for negative strides this is all screwed up ;-)
>
> >>> "abcde"[4:2:-1]
> 'ed'
>
> They're not getting the elements between "positions" 2 and 4 then,
> they're getting the elements between positions 3 and 5.
As I suggested above, the "between elements" model doesn't work for
non-unit strides. Consider this example:
py> s = "abcdefghi"
py> s[1:8:2]
'bdfh'
Here are the labelled between-element positions, best viewed with a
monospaced font:
|a|b|c|d|e|f|g|h|i|
0 1 2 3 4 5 6 7 8 9
Since the slice here is non-contiguous, we don't have a single pair of
cuts, but a series of them:
s[1:8:2] => s[1:2:1] + s[3:4:1] + s[5:6:1] + s[7:8:1]
=> 'bdfh'
that is, start at the <start> position and make a thin (one element)
slice, advance forward by step and repeat until you reach the <end>
position. But that's just a longer way of writing this:
s[1:8:2] => s[1] + s[3] + s[5] + s[7]
=> 'bdfh'
which I maintain is a cleaner way to think about non-unit step-sizes.
It's certainly *shorter* to think of indexing rather than repeated thin
slices, and it avoids the mistake (which I originally made) of thinking
that each subslice has to be <stride> wide.
# Not this!
s[1:8:2] => s[1:3] + s[3:5] + s[5:7] + s[7:9]
=> 'bcdefghi'
# Or this!
s[1:8:2] => s[1:3] + s[4:6] + s[7:9]
'bcefhi'
So I think that the cleanest way of thinking about *positive* non-unit
strides is terms of array indexing. *Negative* non-unit strides,
including -1, are no different. First, here's an example with negative
positions:
py> s[-1:-8:-2]
'igec'
which is just like
s[-1:-8:-2] => s[-1] + s[-3] + s[-5] + s[-7]
=> 'igec'
which is precisely the same as the positive step case: start at the
<start>, continue until the end, stepping by <step> each time. If you
insist on the "cut between positions" way of thinking, we can do that
too:
s[-1:-8:-2] => s[-1:-2:-1] + s[-3:-4:-1] + s[-5:-6:-1] + s[-7:-8:-1]
=> 'igec'
10 9 8 7 6 5 4 3 2 1 # all negative positions
|a|b|c|d|e|f|g|h|i|
The slice from -1 to -2 is "i", from -3 to -4 is "g", and so forth,
exactly as for positive positions, except that negative positions are
one-based instead of zero-based.
(If ints could distinguish -0 from 0, we could fix that.)
Here's an example like the one that Tim described as "screwed up", with
positive start and end positions and -1 stride:
py> s[6:2:-1]
'gfed'
This is not "take the slice from [2:6] and reverse it", which would give
'fedc'. That doesn't work, because slices are always closed at <start>
and open at <end>, no matter which direction you go:
* [2:6:1] is closed at 2 (the start), open at 6 (the end)
* [6:2:-1] is closed at 6 (the start), open at 2 (the end)
If you are expecting differently, then (I believe) you are expecting
that slices are closed on the *left* (lowest number), open on the
*right* (highest number). But that's not what slices do. (Whether they
*should* do it is another story.)
However, the array index viewpoint works just fine here too:
s[6:2:-1] => s[6] + s[5] + s[4] + s[3]
Start at index 6, step down by -1 each time, stop at 2.
As elegant as "cut between elements" is for the common case where the
stride is 1, it doesn't work for stride != 1. I'm not concerned about
the model breaking down for negative strides when it breaks down for
positive non-unit strides too :-)
> Why?
> "Because that's how it works" - they have to switch from thinking
> about positions to thinking about array indexing.
But you already have to do this as soon as you allow non-unit strides!
You even do it yourself, below:
"the slice indices selected will be..."
so you can't get away from indexes even if you try.
> So I would prefer that the i:j in s[i:j:k] _always_ specify the
> positions in play:
>
> If i < 0:
> i += len(s) # same as now
> if j < 0:
> j += len(s) # same as now
> if i >= j:
> the slice is empty! # this is different - the sign of k is irrelevant
> else:
> the slice indices selected will be
> i, i + abs(k), i + 2*abs(k), ...
> up to but not including j
> if k is negative, this index sequence will be taken in reverse order
In other words, you want negative strides to just mean "reverse the
slice". Perhaps that would have been a good design. But we already have
two good idioms for reversing slices:
reversed(seq[start:stop:step])
seq[start:stop:step][::-1]
> Then "abcde"[4:2:-1} would be "", while "abcde"[2:4:-1] would be "dc",
> the reverse of "abcde"[2:4]. And s[0:len(s):-1] would be the same as
> reversed(s).
>
> So it's always a semi-open range, inclusive "at the left" and
> exclusive "at the right". But that's more a detail:
It isn't a mere detail, it is the core of the change: changing from
inclusive at the start to inclusive on the left, which are not the same
thing. This is a significant semantic change.
(Of course it is. You don't like the current semantics, since they trick
you into off-by-one errors for negative strides. If the change was
insignificant, it wouldn't help.)
One consequence of this proposed change is that the <start> parameter is
no longer always the first element returned. Sometimes <start> will be
last rather than first. That disturbs me.
> the _point_ is
> to preserve the mental model of selecting the elements "between
> position". Of course I'd change range() similarly.
Currently, this is how you use range to count down from 10 to 1:
range(10, 0, -1) # 0 is excluded
To me, this makes perfect sense: I want to start counting at 10, so the
first argument I give is 10 no matter whether I'm counting up or
counting down.
With your suggestion, we'd have:
range(1, 11, -1) # 11 is excluded
So here I have to put one more than the number I want to start with as
the *second* argument, and the last number first, just because I'm
counting down. I don't consider that an improvement. Certainly not an
improvement worth breaking backwards compatibility for.
> > Are we stuck with this forever?
>
> Probably :-(
Assuming we want to change -- and I'm not convinced we should -- there's
always Python 4000, or if necessary
from __future__ import negative_slices_reverse
--
Steven
More information about the Python-ideas
mailing list