Possible new slice behaviour? Was ( Negative slice discussion.)

This is one solution to what we can do to make slices both easier to understand and work in a much more consistent and flexible way. This matches the slice semantics that Guido likes. (possibly for python 4.) The ability to pass callables along with the slice creates an easy and clean way to add new indexing modes. (As Nick suggested.) Overall, this makes everything simpler and easier to do. :-) Cheers, Ron """ An improved slice implementation. Even the C source says: "It's harder to get right than you might think." both in code and in understanding This requires changing slice indexing so that the following relationships are true. s[i:j:k] == s[i:j][k] both in code and in understanding s[i:j:-1] == s[i:j:1][::-1] And it also adds the ability to apply callables to slices and index's using the existing slice syntax. These alterations would need to be made to the __getitem__ and __setitem__, methods of built-in types. Possibly in Python 4.0. *(I was not able to get a clean version of this behaviour with the existing slice semantics. But the slice and index behaviour of this implementation is much simpler and makes using callables to adjust index's very easy. That seems like a good indication that changing slices to match the above relationships is worth doing. It may be possible to get the current behaviour by applying a callable to the slice like the open, closed, and ones index examples below. ) """ # A string sub-class for testing. class Str(str): def _fix_slice_indexes(self, slc): # Replace Nones and check step value. if isinstance(slc, int): return slc i, j, k = slc.start, slc.stop, slc.step if k == 0: raise ValueError("slice step cannot be zero") if i == None: i = 0 if j == None: j = len(self) if k == None: k = 1 return slice(i, j, k) def __getitem__(self, args): """ Gets a item from a string with either an index, or slice. Apply any callables to the slice if they are pressent. Valid inputes... i (i, callables ...) slice() (slice(), callables ...) """ # Apply callables if any. if isinstance(args, tuple): slc, *callables = args slc = self._fix_slice_indexes(slc) for fn in callables: slc = fn(self, slc) else: slc = self._fix_slice_indexes(args) # Just an index if isinstance(slc, int): return str.__getitem__(self, slc) # Handle slice. rval = [] i, j, k = slc.start, slc.stop, slc.step ix = i if k > 0 else j-1 while i <= ix < j: rval.append(str.__getitem__(self, ix)) ix += k return type(self)('').join(rval) """ These end with 'i' to indicate they make index adjustments, and also to make them less likely to clash with other functions. Some of these are so simple, you'd probably just adjust the index directly, ie.. reversei. But they make good examples of what is possible. And possible There are other uses as well. Because they are just objects passed in, the names aren't important. They can be called anything and still work, and the programmer is free to create new alternatives. """ def reversei(obj, slc): """Return a new slice with reversed step.""" if isinstance(slc, slice): i, j, k = slc.start, slc.stop, slc.step return slice(i, j, -k) return slc def trimi(obj, slc): """Trim left and right so an IndexError is not produced.""" if isinstance(slc, slice): ln = len(obj) i, j, k = slc.start, slc.stop, slc.step if i<0: i = 0 if j>ln: j = ln return slice(i, j, k) return slc def openi(obj, slc): """Open interval - Does not include end points.""" if isinstance(slc, slice): i, j, k = slc.start, slc.stop, slc.step return slice(i+1, j, k) return slc def closedi(obj, slc): """Closed interval - Includes end points.""" if isinstance(slc, slice): i, j, k = slc.start, slc.stop, slc.step return slice(i, j+1, k) return slc def onei(obj, slc): """First element is 1 instead of zero.""" if isinstance(slc, slice): i, j, k = slc.start, slc.stop, slc.step return slice(i-1, j-1, k) return slc - 1 def _test_cases1(): """ # test string >>> s = Str('0123456789') # |0|1|2|3|4|5|6|7|8|9| # 0 1 2 3 4 5 6 7 8 9 10 # 10 9 8 7 6 5 4 3 2 1 0 >>> s[:] '0123456789' >>> s[:, trimi] '0123456789' >>> s[:, reversei] '9876543210' >>> s[:, reversei, trimi] '9876543210' >>> s[::, trimi, reversei] '9876543210' # Right side bigger than len(s) >>> s[:100] Traceback (most recent call last): IndexError: string index out of range >>> s[:100, trimi] '0123456789' >>> s[:100, trimi, reversei] '9876543210' >>> s[:100, reversei] Traceback (most recent call last): IndexError: string index out of range >>> s[:100, reversei, trimi] '9876543210' # Left side smaller than 0. >>> s[-100:] Traceback (most recent call last): IndexError: string index out of range >>> s[-100:, trimi] '0123456789' >>> s[-100:, trimi, reversei] '9876543210' # Slice bigger than s. >>> s[-100:100] Traceback (most recent call last): IndexError: string index out of range >>> s[-100:100, trimi] '0123456789' # Slice smaller than s. >>> s[3:7] '3456' >>> s[3:7, reversei] '6543' # From left With negative step. >>> s[::-1] '9876543210' >>> s[::-1, reversei] '0123456789' >>> s[:100:-1, trimi] # j past right side '9876543210' >>> s[-100::-1, trimi] # i before left side '9876543210' >>> s[-100:100:-1, trimi, reversei] # slice is bigger '0123456789' # Null results >>> s[7:3:1, trimi] '' >>> s[7:3:-1, trimi] '' # Check None values. >>> s[:] '0123456789' >>> s[None:None] '0123456789' >>> s[None:None:None] '0123456789' >>> s[:: 1] '0123456789' >>> s[::-1] '9876543210' >>> s[None:None:1] '0123456789' >>> s[None:None:-1] '9876543210' # Check error messages. >>> s[0:0:0:0] Traceback (most recent call last): SyntaxError: invalid syntax >>> s[5:5:0] Traceback (most recent call last): ValueError: slice step cannot be zero # And various other combinations. >>> s = Str('123456789') >>> s[3, onei] '3' >>> s[4:8, onei] '4567' >>> s[4:8, onei, openi] '567' >>> s[4:8, onei, closedi] '45678' """ def _test(): import doctest print(doctest.testmod(verbose=False)) if __name__=="__main__": _test()

On 5 Nov 2013 03:35, "Ron Adam" <ron3200@gmail.com> wrote:
This is one solution to what we can do to make slices both easier to
understand and work in a much more consistent and flexible way.
clean way to add new indexing modes. (As Nick suggested.) Tuples can't really be used for this purpose, since that's incompatible with multi-dimensional indexing. However, I also agree containment would be a better way to go than subclassing. I'm currently thinking that a fourth "adjust" argument to the slice constructor may work, and call that from the indices method as: def adjust_indices(start, stop, step, length): ... The values passed in would be those from the slice constructor plus the length passed to the indices method. The only preconditioning would be the check for a non-zero step. The result would be used as the result of the indices method. Cheers, Nick.

On 11/04/2013 04:40 PM, Nick Coghlan wrote:
Are there plans for pythons builtin types to use multidimensional indexing? I don't think what I'm suggesting would create an issue with it in either. It may even be complementary. Either I'm missing something, or you aren't quite understanding where the changes I'm suggesting are to be made. As long as the change is made local to the object that uses it, it won't effect any other types uses of slices. And what is passed in a tuple is different from specifying the meaning of a tuple. There may be other reasons this may not be a bad idea, but I can't think of any myself at the moment. Possibly because a callable passed with a slice may alter the object, but that could be limited by giving the callable a length instead of of the object itself. But personally I like that it's open ended and not limited by the syntax. Consider this...
Python lists currently don't know what to do with a tuple. In order to do anything else, the __getitem__ and __setitem__ methods need to be overridden. For that reason, it can't cause an issue with anything as long as the change is kept *local to the object(s)* that use it. Making changes at the syntax level, or even slice level could be disruptive though. (This doesn't do that.)
The slice syntax already constructs a tuple if it gets a complex set of argument. That isn't being changed. The only thing that it does is expand what builtin types can accept through the existing syntax. It does not restrict, or make any change, at a level that will prevent anything else from using that same syntax in other ways. As a way to allow new-slices and the current slices together/overlap in a transition period, we could just require one extra value to be passed, which would cause a tuple to be created and the __getitem__ method could then use the newer indexing on the slice. s[i:j:k] # current indexing. s[i:j:k, ''] # new indexing... Null string or None causes tuple to be created. (or a callable that takes a slice.)
Currently the length adjustment is made by the __getitem__ method calling the indices method as in this example.
So you don't need to add the fourth length argument if the change is made in __getitem__ and __setitem__. Or possibly you can do it just in the slices, indices method.
The result would be used as the result of the indices method.
Did you see this part of the tests?
These all were very easy to implement, and did not require any extra logic added to the underlying __getitem__ code other than calling the passed functions in the tuple. It moves these cases out of the object being sliced in a nice way. Other ways of doing it would require keywords and logic for each case to be included in the objects. Cheers,adjustment Ron

On 11/04/2013 07:48 PM, Ron Adam wrote:
Cheers,adjustment Ron
I seem to be getting random inserts of pieces I've cut from other places in my thunderbird email client. <shrug> 'adjustment' wasn't there when I posted. I hope this will be fixed in a Ubuntu update soon. Ron

On 5 Nov 2013 11:48, "Ron Adam" <ron3200@gmail.com> wrote:
understand and work in a much more consistent and flexible way.
Yes, memoryview already has some support for multidimensional array shapes, and that's likely to be enhanced further in 3.5.
You're proposing a mechanism for slice index customisation that would be ambiguous and thoroughly confusing when used to define a slice as part of a multidimensional array access. Remember, the Ellipsis was first added specifically as part of multi-dimensional indexing notation for the scientific community. Even though the stdlib only partially supports it today, the conventions for multidimensional slicing are defined by NumPy and need to be taken into account in future design changes. that it's open ended and not limited by the syntax. I like the idea of a passing a callable. I just think it should be an extra optional argument to slice that is used to customise the result of calling indices() rather than changing the type seen by the underlying container.
2, 3, 4, 5, 6, 7, 8, 9]]
Except for all the humans that will have to read it, and the confusion of applying it to multidimensional array operations.
Making changes at the syntax level, or even slice level could be
disruptive though. (This doesn't do that.) And hence only works with types that have been updated to support it. We already did that once for extended slicing support, so let's not do it again when there are other alternatives available. However, using a custom container type is a good way to experiment, so I've gone back to not wanting to permit slice subclasses at this point (since containment is sufficient when experimenting with a custom container).
argument. That isn't being changed.
The only thing that it does is expand what builtin types can accept
through the existing syntax. It does not restrict, or make any change, at a level that will prevent anything else from using that same syntax in other ways. Yes, I realise it requires changes to all the container types. That's one of the problems with the idea. So, yes, I did understand your proposal, and definitely like the general idea of passing in a callable to customise the index calculations. I just don't like the specifics of it, both because of the visual confusion with multidimensional indexing and because we've been through this before with extended slicing and requiring changes to every single container type is a truly painful way to make a transition. Implementing a change through the slice object instead would mean that any transition problems would be isolated to using the new feature with containers that didn't call slice.indices or the C API equivalent when calculating slice indices. Cheers, Nick.
As a way to allow new-slices and the current slices together/overlap in a
transition period, we could just require one extra value to be passed, which would cause a tuple to be created and the __getitem__ method could then use the newer indexing on the slice.
s[i:j:k] # current indexing. s[i:j:k, ''] # new indexing... Null string or None causes
tuple to be created. (or a callable that takes a slice.)
However, I also agree containment would be a better way to go than
subclassing. the indices method as in this example.
indices method.
So you don't need to add the fourth length argument if the change is made
in __getitem__ and __setitem__. passed functions in the tuple. It moves these cases out of the object being sliced in a nice way. Other ways of doing it would require keywords and logic for each case to be included in the objects.
Cheers,adjustment Ron

On 11/05/2013 05:02 AM, Nick Coghlan wrote:
Ok, I suppose other builtin types may (or may not) follow that pattern. But in this light, I agree, it's best not to create a more complex pattern to handle for those cases. (... and it worked so nicely, Oh Well.) The alternative is to have a function that does what the slice syntax does. And then extend that. It seems to me it's a good idea to have function equivalents of syntax when possible in any case. do_slice(obj, slices, *callables) Where slices is either a single slice or a tuple of slices or indices. # Examples class GetSlice: """Return a slices from slice syntax.""" def __getitem__(self, slc): return slc gs = GetSlice() seq = list(range(10)) print(do_slice(seq, gs[1, 5, 7])) print(do_slice(seq, gs[3:7], openi)) print(do_slice(seq, gs[3:7], closedi)) print(do_slice(seq, gs[3:7], closedi, onei)) print(do_slice(seq, gs[3:5, 7:8, 9], reversei)) print(do_slice(seq, gs[:], reversei)) print(do_slice(seq, range(-5, 15), wrapi)) print(do_slice(seq, range(15, -5, -1), wrapi)) """ [1, 5, 7] [4, 5, 6] [3, 4, 5, 6, 7] [2, 3, 4, 5, 6] [[4, 3], [7], 9] [9, 8, 7, 6, 5, 4, 3, 2, 1, 0] [5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4] [5, 4, 3, 2, 1, 0, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 9, 8, 7, 6] """ Cheers, Ron

On 5 November 2013 10:51, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Agreed. Numpy users are the biggest consumers of slicing. Any proposal to improve slicing had better improve it for numpy as well which means it should work in multidimensional slicing context - regardless of whether numpy is in the stdlib. Oscar

On 5 Nov 2013 03:35, "Ron Adam" <ron3200@gmail.com> wrote:
This is one solution to what we can do to make slices both easier to
understand and work in a much more consistent and flexible way.
clean way to add new indexing modes. (As Nick suggested.) Tuples can't really be used for this purpose, since that's incompatible with multi-dimensional indexing. However, I also agree containment would be a better way to go than subclassing. I'm currently thinking that a fourth "adjust" argument to the slice constructor may work, and call that from the indices method as: def adjust_indices(start, stop, step, length): ... The values passed in would be those from the slice constructor plus the length passed to the indices method. The only preconditioning would be the check for a non-zero step. The result would be used as the result of the indices method. Cheers, Nick.

On 11/04/2013 04:40 PM, Nick Coghlan wrote:
Are there plans for pythons builtin types to use multidimensional indexing? I don't think what I'm suggesting would create an issue with it in either. It may even be complementary. Either I'm missing something, or you aren't quite understanding where the changes I'm suggesting are to be made. As long as the change is made local to the object that uses it, it won't effect any other types uses of slices. And what is passed in a tuple is different from specifying the meaning of a tuple. There may be other reasons this may not be a bad idea, but I can't think of any myself at the moment. Possibly because a callable passed with a slice may alter the object, but that could be limited by giving the callable a length instead of of the object itself. But personally I like that it's open ended and not limited by the syntax. Consider this...
Python lists currently don't know what to do with a tuple. In order to do anything else, the __getitem__ and __setitem__ methods need to be overridden. For that reason, it can't cause an issue with anything as long as the change is kept *local to the object(s)* that use it. Making changes at the syntax level, or even slice level could be disruptive though. (This doesn't do that.)
The slice syntax already constructs a tuple if it gets a complex set of argument. That isn't being changed. The only thing that it does is expand what builtin types can accept through the existing syntax. It does not restrict, or make any change, at a level that will prevent anything else from using that same syntax in other ways. As a way to allow new-slices and the current slices together/overlap in a transition period, we could just require one extra value to be passed, which would cause a tuple to be created and the __getitem__ method could then use the newer indexing on the slice. s[i:j:k] # current indexing. s[i:j:k, ''] # new indexing... Null string or None causes tuple to be created. (or a callable that takes a slice.)
Currently the length adjustment is made by the __getitem__ method calling the indices method as in this example.
So you don't need to add the fourth length argument if the change is made in __getitem__ and __setitem__. Or possibly you can do it just in the slices, indices method.
The result would be used as the result of the indices method.
Did you see this part of the tests?
These all were very easy to implement, and did not require any extra logic added to the underlying __getitem__ code other than calling the passed functions in the tuple. It moves these cases out of the object being sliced in a nice way. Other ways of doing it would require keywords and logic for each case to be included in the objects. Cheers,adjustment Ron

On 11/04/2013 07:48 PM, Ron Adam wrote:
Cheers,adjustment Ron
I seem to be getting random inserts of pieces I've cut from other places in my thunderbird email client. <shrug> 'adjustment' wasn't there when I posted. I hope this will be fixed in a Ubuntu update soon. Ron

On 5 Nov 2013 11:48, "Ron Adam" <ron3200@gmail.com> wrote:
understand and work in a much more consistent and flexible way.
Yes, memoryview already has some support for multidimensional array shapes, and that's likely to be enhanced further in 3.5.
You're proposing a mechanism for slice index customisation that would be ambiguous and thoroughly confusing when used to define a slice as part of a multidimensional array access. Remember, the Ellipsis was first added specifically as part of multi-dimensional indexing notation for the scientific community. Even though the stdlib only partially supports it today, the conventions for multidimensional slicing are defined by NumPy and need to be taken into account in future design changes. that it's open ended and not limited by the syntax. I like the idea of a passing a callable. I just think it should be an extra optional argument to slice that is used to customise the result of calling indices() rather than changing the type seen by the underlying container.
2, 3, 4, 5, 6, 7, 8, 9]]
Except for all the humans that will have to read it, and the confusion of applying it to multidimensional array operations.
Making changes at the syntax level, or even slice level could be
disruptive though. (This doesn't do that.) And hence only works with types that have been updated to support it. We already did that once for extended slicing support, so let's not do it again when there are other alternatives available. However, using a custom container type is a good way to experiment, so I've gone back to not wanting to permit slice subclasses at this point (since containment is sufficient when experimenting with a custom container).
argument. That isn't being changed.
The only thing that it does is expand what builtin types can accept
through the existing syntax. It does not restrict, or make any change, at a level that will prevent anything else from using that same syntax in other ways. Yes, I realise it requires changes to all the container types. That's one of the problems with the idea. So, yes, I did understand your proposal, and definitely like the general idea of passing in a callable to customise the index calculations. I just don't like the specifics of it, both because of the visual confusion with multidimensional indexing and because we've been through this before with extended slicing and requiring changes to every single container type is a truly painful way to make a transition. Implementing a change through the slice object instead would mean that any transition problems would be isolated to using the new feature with containers that didn't call slice.indices or the C API equivalent when calculating slice indices. Cheers, Nick.
As a way to allow new-slices and the current slices together/overlap in a
transition period, we could just require one extra value to be passed, which would cause a tuple to be created and the __getitem__ method could then use the newer indexing on the slice.
s[i:j:k] # current indexing. s[i:j:k, ''] # new indexing... Null string or None causes
tuple to be created. (or a callable that takes a slice.)
However, I also agree containment would be a better way to go than
subclassing. the indices method as in this example.
indices method.
So you don't need to add the fourth length argument if the change is made
in __getitem__ and __setitem__. passed functions in the tuple. It moves these cases out of the object being sliced in a nice way. Other ways of doing it would require keywords and logic for each case to be included in the objects.
Cheers,adjustment Ron

On 11/05/2013 05:02 AM, Nick Coghlan wrote:
Ok, I suppose other builtin types may (or may not) follow that pattern. But in this light, I agree, it's best not to create a more complex pattern to handle for those cases. (... and it worked so nicely, Oh Well.) The alternative is to have a function that does what the slice syntax does. And then extend that. It seems to me it's a good idea to have function equivalents of syntax when possible in any case. do_slice(obj, slices, *callables) Where slices is either a single slice or a tuple of slices or indices. # Examples class GetSlice: """Return a slices from slice syntax.""" def __getitem__(self, slc): return slc gs = GetSlice() seq = list(range(10)) print(do_slice(seq, gs[1, 5, 7])) print(do_slice(seq, gs[3:7], openi)) print(do_slice(seq, gs[3:7], closedi)) print(do_slice(seq, gs[3:7], closedi, onei)) print(do_slice(seq, gs[3:5, 7:8, 9], reversei)) print(do_slice(seq, gs[:], reversei)) print(do_slice(seq, range(-5, 15), wrapi)) print(do_slice(seq, range(15, -5, -1), wrapi)) """ [1, 5, 7] [4, 5, 6] [3, 4, 5, 6, 7] [2, 3, 4, 5, 6] [[4, 3], [7], 9] [9, 8, 7, 6, 5, 4, 3, 2, 1, 0] [5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4] [5, 4, 3, 2, 1, 0, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, 9, 8, 7, 6] """ Cheers, Ron

On 5 November 2013 10:51, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Agreed. Numpy users are the biggest consumers of slicing. Any proposal to improve slicing had better improve it for numpy as well which means it should work in multidimensional slicing context - regardless of whether numpy is in the stdlib. Oscar
participants (4)
-
Greg Ewing
-
Nick Coghlan
-
Oscar Benjamin
-
Ron Adam