List assignment - extended slicing inconsistency

What little documentation I could find, providing a stride on the assignment target for a list is supposed to trigger 'advanced slicing' causing element-wise replacement - and hence requiring that the source iterable has the appropriate number of elements.
This is in contrast to regular slicing (*without* a stride), allowing to replace a *range* by another sequence of arbitrary length.
Issue ===== When, however, a stride of `1` is specified, advanced slicing is not triggered.
If advanced slicing had been triggered, there should have been a ValueError instead. Expected behaviour:
I think that is an inconsistency in the language that should be fixed. Why do we need this? ==================== One may want this as extra check as well so that list does not change size. Depending on implementation, it may come with performance benefits as well. One could, though, argue that you still get the same result if you do all correctly
But I disagree that there should be no error when it is wrong. *Strides that are not None should always trigger advanced slicing.* Other Data Types ================ This change should also be applied to bytearray, etc., though see below. Concerns ======== It may break some code that uses advanced slicing and expects regular slicing to occur? These cases should be rare, and the error message should be clear enough to allow fixes? I assume these cases should be exceptionally rare. If the implementation relies on `slice.indices(len(seq))[2] == 1` to determine about advance slicing or not, that would require some refactoring. If it is only `slice.stride in (1, None)` then this could easily replaced by checking against None. Will there be issues with syntax consistency with other data types, in particular outside the core library? - I always found that the dynamic behaviour of lists w/r non-advanced slicing to be somewhat peculiar in the first place, though, undeniably, it can be immensely useful. - Most external data types with fixed memory such as numpy do not have this dynamic flexibility, and the behavior of regular slicing on assignment is the same as regular slicing. The proposed change would increase consistency with these other data types. More surprises ==============
whereas
but numpy
and
The latter two as expected. memoryview behaves the same. Issue 2 ======= Whereas NumPy is know to behave differently as a data type with fixed memory layout, and is not part of the standard library anyway, the difference in behaviour between lists and arrays I find disconcerting. This should be resolved to a consistent behaviour. Proposal 2 ========== Arrays and bytearrays should should adopt the same advanced slicing behaviour I suggest for lists. Concerns 2 ========== This has the potential for a lot more side effects in existing code, but as before in most cases error message should be triggered. Summary ======= I find it it not acceptable as a good language design that there is a large range of behaviour on slicing in assignment target for the different native (and standard library) data type of seemingly similar kind, and that users have to figure out for each data type by testing - or at the very least remember if documented - how it behaves on slicing in assignment targets. There should be a consistent behaviour at the very least, ideally even one with a clear user interface as suggested for lists. -Alexander

On Thu, Feb 22, 2018 at 2:18 PM, Alexander Heger <python@2sn.net> wrote:
This makes sense. (I wonder if the discrepancy is due to some internal interface that loses the distinction between None and 1 before the decision is made whether to use advanced slicing or not. But that's a possible explanation, not an excuse.)
Sure.
Yeah, backwards compatibility sometimes prevents fixing a design bug. I don't know if that's the case here, we'll need reports from real-world code.
Things outside the stdlib are responsible for their own behavior. Usually they can move faster and with less worry about breaking backward compatibility.
If you're talking about the ability to resize a list by assigning to a slice, that's as intended. It predates advanced slicing by a decade or more.
How? Resizing through slice assignment will stay for builtin types -- if numpy doesn't support that, so be it.
OK, so array doesn't use the same rules. That should be fixed too probably (assuming whatever is valid today remains valid).
Bytearray should also follow the same rules.
Let's leave numpy out of this discussion. And memoryview is a special case because it can't change size (it provides a view into an inflexible structure).
Sure.
Side effects? No code that currently doesn't raise will break, right?
Fortunately, what *you* find acceptable or good language design is not all that important. (You can avoid making any mistakes in your own language. :-) You may by now realize that 100% consistent behavior is hard to obtain. However we'll gladly consider your feedback. -- --Guido van Rossum (python.org/~guido)

On 23 February 2018 at 11:51, Guido van Rossum <guido@python.org> wrote:
That explanation seems pretty likely to me, as for the data types implemented in C, we tend to switch to the Py_ssize_t form of slices pretty early, and that can't represent the None/1 distinction. Even for Python level collections, you lose the distinction as soon as you call slice.indices (as that promises to return 3-tuple of integers).
In this case, we should be able to start with a DeprecationWarning in 3.8, since we already have the checks in place to raise ValueError when the step is 2 or more - any patch would just need to make sure those checks either have access to the original slice object (so they can check the raw step value), or else an internal flag indicating whether or not an explicit step was provided. So the next step would be to file an issue pointing back to this thread for acknowledgement that this is a design bug to be handled with a DeprecationWarning in 3.8, and a ValueError in 3.9+. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Feb 22, 2018 at 6:21 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
If this is the case -- backward compatibility issues aside, wouldn't it be very hard to fix? Which means that should be investigated before going to far down the "how much code might this break" route. And certainly before adding a Deprecation Warning. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Sat, Feb 24, 2018 at 5:24 AM, Chris Barker <chris.barker@noaa.gov> wrote:
Ignoring backward compatibility, it ought to be possible to (ab)use a stride of zero for this. Calling slice.indices() on something with a stride of zero raises ValueError, so there's no ambiguity. But it would break code that iterates in a simple and obvious way, and (ugh ugh) break it in a very nasty way: an infinite loop. I'm not happy with that kind of breakage, even with multiple versions posting a warning. In the C API, there's PySlice_GetIndices "[y]ou probably do not want to use this function" and PySlice_GetIndicesEx, the "[u]sable replacement". Much as I dislike adding *yet another* function to do basically the same job, I think that might be the less-bad way to do this. ChrisA

23.02.18 20:50, Chris Angelico пише:
Actually PySlice_GetIndicesEx is deprecated too. It is not safe for resizeable sequences since it is vulnerable to race condition. The pair of PySlice_Unpack() and PySlice_AdjustIndices() replaces it in new code. So now we have 4 functions for doing the same thing in C, 2 of them are deprecated. Do you want to deprecate the other two and add new replacements for them?

On Sat, Feb 24, 2018 at 6:38 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
Wow. Who'd have thought slice indexing was so hard... (If you look at the Python 3.6 docos, that deprecation isn't mentioned. Should it be?) I presume it's already too late for 3.7 to change anything to fix this. ChrisA

On 24 February 2018 at 06:00, Chris Angelico <rosuav@gmail.com> wrote:
I presume it's already too late for 3.7 to change anything to fix this.
Yeah, any changes in relation to this would be 3.8+ only. To answer your previous question about "Wouldn't it be hard to fix this given the way slice processing works?", whether or not it's tricky depends more on the internal code structure of any given type implementation than it does the API that [C]Python exposes for converting a slice definition to a set of indices given a particular sequence length. For lists, for example, the code handling that dispatch is in the "list assign subscript" function, under a "PySlice_Check(item)" branch: https://github.com/python/cpython/blob/master/Objects/listobject.c#L2775 There's an early return there for the "step == 1" case, but at the point where we run that check, we still have access to "item", so that early return can be modified to instead check "((PySliceObject *) item->step == Py_None)". During the deprecation warning period, we'd then *also* add a second delegation point down where the exception normally gets raised, such that when "step == 1" we emit the deprecation warning and then call the same function as the existing early return does. In 3.9+, we'd delete that additional fallback code. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Feb 22, 2018 at 2:18 PM, Alexander Heger <python@2sn.net> wrote:
This makes sense. (I wonder if the discrepancy is due to some internal interface that loses the distinction between None and 1 before the decision is made whether to use advanced slicing or not. But that's a possible explanation, not an excuse.)
Sure.
Yeah, backwards compatibility sometimes prevents fixing a design bug. I don't know if that's the case here, we'll need reports from real-world code.
Things outside the stdlib are responsible for their own behavior. Usually they can move faster and with less worry about breaking backward compatibility.
If you're talking about the ability to resize a list by assigning to a slice, that's as intended. It predates advanced slicing by a decade or more.
How? Resizing through slice assignment will stay for builtin types -- if numpy doesn't support that, so be it.
OK, so array doesn't use the same rules. That should be fixed too probably (assuming whatever is valid today remains valid).
Bytearray should also follow the same rules.
Let's leave numpy out of this discussion. And memoryview is a special case because it can't change size (it provides a view into an inflexible structure).
Sure.
Side effects? No code that currently doesn't raise will break, right?
Fortunately, what *you* find acceptable or good language design is not all that important. (You can avoid making any mistakes in your own language. :-) You may by now realize that 100% consistent behavior is hard to obtain. However we'll gladly consider your feedback. -- --Guido van Rossum (python.org/~guido)

On 23 February 2018 at 11:51, Guido van Rossum <guido@python.org> wrote:
That explanation seems pretty likely to me, as for the data types implemented in C, we tend to switch to the Py_ssize_t form of slices pretty early, and that can't represent the None/1 distinction. Even for Python level collections, you lose the distinction as soon as you call slice.indices (as that promises to return 3-tuple of integers).
In this case, we should be able to start with a DeprecationWarning in 3.8, since we already have the checks in place to raise ValueError when the step is 2 or more - any patch would just need to make sure those checks either have access to the original slice object (so they can check the raw step value), or else an internal flag indicating whether or not an explicit step was provided. So the next step would be to file an issue pointing back to this thread for acknowledgement that this is a design bug to be handled with a DeprecationWarning in 3.8, and a ValueError in 3.9+. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Feb 22, 2018 at 6:21 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
If this is the case -- backward compatibility issues aside, wouldn't it be very hard to fix? Which means that should be investigated before going to far down the "how much code might this break" route. And certainly before adding a Deprecation Warning. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Sat, Feb 24, 2018 at 5:24 AM, Chris Barker <chris.barker@noaa.gov> wrote:
Ignoring backward compatibility, it ought to be possible to (ab)use a stride of zero for this. Calling slice.indices() on something with a stride of zero raises ValueError, so there's no ambiguity. But it would break code that iterates in a simple and obvious way, and (ugh ugh) break it in a very nasty way: an infinite loop. I'm not happy with that kind of breakage, even with multiple versions posting a warning. In the C API, there's PySlice_GetIndices "[y]ou probably do not want to use this function" and PySlice_GetIndicesEx, the "[u]sable replacement". Much as I dislike adding *yet another* function to do basically the same job, I think that might be the less-bad way to do this. ChrisA

23.02.18 20:50, Chris Angelico пише:
Actually PySlice_GetIndicesEx is deprecated too. It is not safe for resizeable sequences since it is vulnerable to race condition. The pair of PySlice_Unpack() and PySlice_AdjustIndices() replaces it in new code. So now we have 4 functions for doing the same thing in C, 2 of them are deprecated. Do you want to deprecate the other two and add new replacements for them?

On Sat, Feb 24, 2018 at 6:38 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
Wow. Who'd have thought slice indexing was so hard... (If you look at the Python 3.6 docos, that deprecation isn't mentioned. Should it be?) I presume it's already too late for 3.7 to change anything to fix this. ChrisA

On 24 February 2018 at 06:00, Chris Angelico <rosuav@gmail.com> wrote:
I presume it's already too late for 3.7 to change anything to fix this.
Yeah, any changes in relation to this would be 3.8+ only. To answer your previous question about "Wouldn't it be hard to fix this given the way slice processing works?", whether or not it's tricky depends more on the internal code structure of any given type implementation than it does the API that [C]Python exposes for converting a slice definition to a set of indices given a particular sequence length. For lists, for example, the code handling that dispatch is in the "list assign subscript" function, under a "PySlice_Check(item)" branch: https://github.com/python/cpython/blob/master/Objects/listobject.c#L2775 There's an early return there for the "step == 1" case, but at the point where we run that check, we still have access to "item", so that early return can be modified to instead check "((PySliceObject *) item->step == Py_None)". During the deprecation warning period, we'd then *also* add a second delegation point down where the exception normally gets raised, such that when "step == 1" we emit the deprecation warning and then call the same function as the existing early return does. In 3.9+, we'd delete that additional fallback code. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (6)
-
Alexander Heger
-
Chris Angelico
-
Chris Barker
-
Guido van Rossum
-
Nick Coghlan
-
Serhiy Storchaka