[Python-3000] Making more effective use of slice objects in Py3k
Ron Adam
rrr at ronadam.com
Mon Aug 28 13:14:14 CEST 2006
Nick Coghlan wrote:
> This idea is inspired by the find/rfind string discussion (particularly a
> couple of comments from Jim and Ron), but I think the applicability may prove
> to be wider than just string methods (e.g. I suspect it may prove useful for
> the bytes() type as well).
If I'm following the ideas here which was based (only in part) on my
suggestion. It's not a major feature request, but instead a combination
of various small changes in which each may have some benefits of their
own. The proposal is more in line with cleaning up things so they can
(if one desires) get them to work together easier. But that needn't be
the main reason for doing it.
I also recognize that python has many very specific functions and
modules, many of which are highly optimized. Most of the major problems
have already been solved in that way, so it is really hard to find
things that make a big difference. But I don't think that means we
shouldn't work on making small improvements to things where they are
possible, even if it's only to make it a bit easier to remember and/or
learn.
> I think an enriched slicing model that allows sequence views to be expressed
> easily as "this slice of this sequence" would allow this to be dealt with
> cleanly, without requiring every sequence to provide a corresponding "sequence
> view" with non-copying semantics. I think Guido's concern that people will
> reach for string views when they don't need them is also valid (as I believe
> that it is most often inexperience that leads to premature optimization that
> then leads to needless code complexity).
I agree with both of these, but maybe we should concentrate on the
individual changes and not a big picture to justify a group of changes.
The individual changes or enhancements need to stand on their own.
So in that light, the following individual *separate* items is what I
would focus on for now. (Not string views or slice partition functions.
Let those come later if they prove useful.)
> The specific changes I suggest based on the find/rfind discussion are:
>
> 1. make range() (what used to be xrange()) a subclass of slice(), so that
> range objects can be used to index sequences. The only differences between
> range() and slice() would then be that start/stop/step will never be None for
> range instances, and range instances act like an immutable sequence while
> slice instances do not (i.e. range objects would grow an indices() method).
1. Remove None stored as indices in slice objects. Depending on the step
value, Any Nones can be converted to 0 or -1 immediately, the step
should never be None or Zero.
Once the slice is created the Nones are not needed, valid index values
can be determined. This moves the checks forward to slice object
creation time from slice object use time.
If a slice object is reused, then there might be some (micro)
performance benefits if it is defined outside a loop and then used
multiple times inside a loop.
Also the indices can be read and used directly via slice.start, etc...
without having to check for None or invalid index's if someone wants to
do that.
> 2. change range() and slice() to accept slice() instances as arguments so
> that range(range(0)) is equivalent to range(0). (range(x) may throw ValueError
> if x.stop is None).
2. Enable slices and ranges to be converted back and forth.
This works now.
>>> xrange(*slice(1,-1,1).indices(10))
xrange(1, 9)
There is no way to get the indices from an xrange object. They are not
available via attributes or methods, (that I know of), but they can be
gotten by parsing the __repr__ string.
So this doesn't work.
slice(*xrange(1,10,1).indices()) # no indices method
While I don't have any real specific use case for this item, it may have
some educational or introspective value. ie... something to teach the
relationships of each. An xrange() object can also be defined outside a
loop and then used multiple times in an inner loop.
3. Continue to make xrange() and slice() a bit more alike in how they
work and the values they return, but keep them separate and don't
subclass range from slice. Each has a definite different purpose
although they are related in some ways they shouldn't try to 'be' the
other I think.
The following examples show some inconsistencies in how they work or
where they could be more alike. For example viewing a xrange vs slice
objects returns differing representations depending on what the values
of the indices are. These are just minor (barely) annoyances, and there
isn't anything actually wrong, but they could be improved a bit I think.
# slice always shows all three values if viewed. (This is ok)
>>> slice(10)
slice(None, 10, None) # None stored as indices.
>>> slice(0, 10, 1)
slice(0, 10, 1)
# - xrange only shows values different from the defaults.
>>> xrange(10)
xrange(10)
>>> xrange(1, 10)
xrange(1, 10)
>>> xrange(0, 10, 1)
xrange(10) # hides 0 and 1
# - The xrange stop value is always an even increment of
# the step value + start.
is even numbered.
>>> xrange(1, 10, 2)
xrange(1, 11, 2) # 11! why not 10 here?
>>> xrange(0, 10, 3)
xrange(0, 12, 3) # and 10 here instead of 12?
# slice accepts anything!
>>> slice(1, 10, 0) # zero for step
slice(1, 10, 0)
>>> slice(list, int, dict)
slice(<type 'list'>, <type 'int'>, <type 'dict'>)
# xrange rejects any invalid index's.
>>> xrange(None, 10, None) # None not an integer.
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: an integer is required
>>> xrange(1, 10, 0)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
ValueError: xrange() arg 3 must not be zero
4. Allow slice objects to be sub-classed. That will allow for
experimentation and or for programmers to modify slice in ways they may
find useful for their "own" applications. Most likely it would be a way
to group methods together that all use the same start, stop and or step
indices. And then could it be possible to apply those via the slice
operation at once?
5. Find a way to avoid slice wrap-a-rounds. These happen when iterating
past zero in either direction. It usually requires a different approach
and/or check to avoid going past the zero/-1 boundary.
One thought I've had on this is to allow only positive integers along
with a symbol to indicate an index is to be counted from the far end.
Then an exception could be raised if a negative index is used.
Possibly something like:
[i:\j] # '\' indicate j is to be counted from the far end.
The line continuation back slash could be special cased for use with
slices I think. But some other symbol might be better.
I think this group of separate items taken together will do what the
title in this thread suggests. But each of these is a separate item in
itself as well and has its own reasons why it could be helpful.
Regarding the other items...
The above changes possibly make some (or most) of the other suggestions
possible and/or easier to implement. So then a programmer can roll
their own string views or slice partition functions in a clean way if
they want to. That's the point of the "Making more effective use of
slice objects". Its not a specific idea, but a generality that may come
about by doing these other smaller things first. And doing them as a
group is probably a good way to address these things.
I hope this clarifies at least my view point if not Nicks. But I'll keep
an open mind and see what he has to offer in his PEP.
Cheers,
Ron
More information about the Python-3000
mailing list