[Python-3000] Making more effective use of slice objects in Py3k

Mon Aug 28 13:14:14 CEST 2006

Nick Coghlan wrote:
> This idea is inspired by the find/rfind string discussion (particularly a 
> couple of comments from Jim and Ron), but I think the applicability may prove 
> to be wider than just string methods (e.g. I suspect it may prove useful for 
> the bytes() type as well).

If I'm following the ideas here which was based (only in part) on my 
suggestion.  It's not a major feature request, but instead a combination 
of various small changes in which each may have some benefits of their 
own. The proposal is more in line with cleaning up things so they can 
(if one desires) get them to work together easier.  But that needn't be 
the main reason for doing it.

I also recognize that python has many very specific functions and 
modules, many of which are highly optimized.  Most of the major problems 
have already been solved in that way, so it is really hard to find 
things that make a big difference.  But I don't think that means we 
shouldn't work on making small improvements to things where they are 
possible, even if it's only to make it a bit easier to remember and/or 
learn.

> I think an enriched slicing model that allows sequence views to be expressed 
> easily as "this slice of this sequence" would allow this to be dealt with 
> cleanly, without requiring every sequence to provide a corresponding "sequence 
> view" with non-copying semantics. I think Guido's concern that people will 
> reach for string views when they don't need them is also valid (as I believe 
> that it is most often inexperience that leads to premature optimization that 
> then leads to needless code complexity).

I agree with both of these, but maybe we should concentrate on the 
individual changes and not a big picture to justify a group of changes. 
  The individual changes or enhancements need to stand on their own.

So in that light, the following individual *separate* items is what I 
would focus on for now. (Not string views or slice partition functions. 
Let those come later if they prove useful.)

> The specific changes I suggest based on the find/rfind discussion are:
> 
>    1. make range() (what used to be xrange()) a subclass of slice(), so that 
> range objects can be used to index sequences. The only differences between 
> range() and slice() would then be that start/stop/step will never be None for 
> range instances, and range instances act like an immutable sequence while 
> slice instances do not (i.e. range objects would grow an indices() method).

1. Remove None stored as indices in slice objects. Depending on the step 
value, Any Nones can be converted to 0 or -1 immediately, the step 
should never be None or Zero.

Once the slice is created the Nones are not needed, valid index values 
can be determined. This moves the checks forward to slice object 
creation time from slice object use time.

If a slice object is reused, then there might be some (micro) 
performance benefits if it is defined outside a loop and then used 
multiple times inside a loop.

Also the indices can be read and used directly via  slice.start, etc... 
without having to check for None or invalid index's if someone wants to 
do that.

>    2. change range() and slice() to accept slice() instances as arguments so 
> that range(range(0)) is equivalent to range(0). (range(x) may throw ValueError 
> if x.stop is None).

2. Enable slices and ranges to be converted back and forth.

This works now.

 >>> xrange(*slice(1,-1,1).indices(10))
xrange(1, 9)

There is no way to get the indices from an xrange object. They are not 
available via attributes or methods, (that I know of), but they can be 
gotten by parsing the __repr__ string.

So this doesn't work.

     slice(*xrange(1,10,1).indices())   # no indices method

While I don't have any real specific use case for this item, it may have 
some educational or introspective value. ie... something to teach the 
relationships of each.  An xrange() object can also be defined outside a 
loop and then used multiple times in an inner loop.

3. Continue to make xrange() and slice() a bit more alike in how they 
work and the values they return, but keep them separate and don't 
subclass range from slice.  Each has a definite different purpose 
although they are related in some ways they shouldn't try to 'be' the 
other I think.

The following examples show some inconsistencies in how they work or 
where they could be more alike.  For example viewing a xrange vs slice 
objects returns differing representations depending on what the values 
of the indices are.  These are just minor (barely) annoyances, and there 
isn't anything actually wrong, but they could be improved a bit I think.

# slice always shows all three values if viewed. (This is ok)
 >>> slice(10)
slice(None, 10, None)    # None stored as indices.
 >>> slice(0, 10, 1)
slice(0, 10, 1)

# - xrange only shows values different from the defaults.
 >>> xrange(10)
xrange(10)
 >>> xrange(1, 10)
xrange(1, 10)
 >>> xrange(0, 10, 1)
xrange(10)              # hides 0 and 1

# - The xrange stop value is always an even increment of
# the step value + start.
is even numbered.
 >>> xrange(1, 10, 2)
xrange(1, 11, 2)        # 11! why not 10 here?
 >>> xrange(0, 10, 3)
xrange(0, 12, 3)        # and 10 here instead of 12?

# slice accepts anything!
 >>> slice(1, 10, 0)         # zero for step
slice(1, 10, 0)
 >>> slice(list, int, dict)
slice(<type 'list'>, <type 'int'>, <type 'dict'>)

# xrange rejects any invalid index's.
 >>> xrange(None, 10, None)           # None not an integer.
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
TypeError: an integer is required

 >>> xrange(1, 10, 0)
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
ValueError: xrange() arg 3 must not be zero

4. Allow slice objects to be sub-classed. That will allow for 
experimentation and or for programmers to modify slice in ways they may 
find useful for their "own" applications.  Most likely it would be a way 
to group methods together that all use the same start, stop and or step 
indices.  And then could it be possible to apply those via the slice 
operation at once?

5. Find a way to avoid slice wrap-a-rounds.  These happen when iterating 
past zero in either direction.  It usually requires a different approach 
and/or check to avoid going past the zero/-1 boundary.

One thought I've had on this is to allow only positive integers along 
with a symbol to indicate an index is to be counted from the far end. 
Then an exception could be raised if a negative index is used.

Possibly something like:
    [i:\j]     # '\' indicate j is to be counted from the far end.

The line continuation back slash could be special cased for use with 
slices I think.  But some other symbol might be better.

I think this group of separate items taken together will do what the 
title in this thread suggests.  But each of these is a separate item in 
itself as well and has its own reasons why it could be helpful.

Regarding the other items...

The above changes possibly make some (or most) of the other suggestions 
possible and/or easier to implement.  So then a programmer can roll 
their own string views or slice partition functions in a clean way if 
they want to.  That's the point of the "Making more effective use of 
slice objects".  Its not a specific idea, but a generality that may come 
about by doing these other smaller things first.   And doing them as a 
group is probably a good way to address these things.

I hope this clarifies at least my view point if not Nicks. But I'll keep 
an open mind and see what he has to offer in his PEP.

Cheers,
    Ron