Implement comparison operators for range objects

There are circumstances, for example in unit testing, when it might be useful to check if two range objects describe the same range. Currently, this can't be done using the '==' operator: >>> range(5) == range(5) False To get a useful comparison, you would either need to realise both range objects as lists or use a function like def ranges_equal(r0, r1): if not r0: return not r1 return len(r0) == len(r1) and r0[0] == r1[0] and r0[-1] == r1 [-1] All other built-in sequence types (that is bytearray, bytes, list, str, and tuple) define equality by "all items of the sequence are equal". I think it would be both more consistent and more useful if range objects would pick up the same semantics. When implementing '==' and '!=' for range objects, it would be natural to implement the other comparison operators, too (lexicographically, as for all other sequence types). This change would be backwards incompatible, but I very much doubt there is much code out there relying on the current behaviour of considering two ranges as unequal just because they aren't the same object (and this code could be easily fixed by using 'is' instead of '=='). Opinions? -- Sven

Sven Marnach wrote:
I can see how that would be useful and straightforward, and certainly more useful than identity based equality. +1
I don't agree. Equality makes sense for ranges: two ranges are equal if they have the same start, stop and step values. But order comparisons don't have any sensible meaning: range objects are numeric ranges, integer-valued intervals, not generic lists, and it is meaningless to say that one range is less than or greater than another. Which is greater? range(1, 1000000, 1000) range(1000, 10000) The question makes no sense, and should be treated as an error, just as it is for complex numbers. -1 on adding order comparison operators. Aside: I'm astonished to see that range objects have a count method! What's the purpose of that? Any value's count will either be 0 or 1, and a more appropriate test would be `value in range`:
-- Steven

On Wed, Oct 12, 2011 at 6:33 PM, Steven D'Aprano <steve@pearwood.info> wrote:
I don't agree. Equality makes sense for ranges: two ranges are equal if they have the same start, stop and step values.
Hmm. I'm not sure that it's that clear cut. The other possible definition is that two ranges are equal if they're equal as lists. Should range(0, 10, 2) and range(0, 9, 2) be considered equal, or not? Agreed that it makes more sense to implement equality for ranges than the order comparisons. Mark

On Wed, Oct 12, 2011 at 1:58 PM, Mark Dickinson <dickinsm@gmail.com> wrote: ..
Should range(0, 10, 2) and range(0, 9, 2) be considered equal, or not?
I was going to ask the same question. I think ranges r1 and r2 should be considered equal iff list(r1) == list(r2). This is slightly harder to implement than just naively comparing (start, stop, step) tuples, but the advantage is that people won't run into surprises when they port 2.x code where result of range() is a list.

On Wed, Oct 12, 2011 at 1:58 PM, Mark Dickinson <dickinsm@gmail.com> wrote:
For equality and comparison, this should be the standard. range objects are sequences, and they should compare just like other sequences. If implemented at all, equality should be that they have the same items in the same order. If implemented at all, comparison should be lexicographic. It seems to me you'd need a really good reason to have behavior different from every other sequence. Mike

Steven D'Aprano schrieb am Do, 13. Okt 2011, um 04:33:49 +1100:
No, two ranges should be equal if they represent the same sequence, i.e. if they compare equal when converted to a list: range(0) == range(4, 4, 4) range(5, 10, 3) == range(5, 11, 3) range(3, 6, 3) == range(3, 4)
Well, it's meaningless unless you define what it means. Range objects are equal if they compare equal after converting to a list. You could define '<' or '>' the same way. All built-in sequence types support lexicographical comparison, so I thought it would be natural to bring the only one that behaves differently in line. (Special cases aren't special enough...) This is just to explain my thoughts, I don't have a strong opinion on this one. I'll try and prepare a patch for '==' and '!=' and add it to the issue tracker. Cheers, Sven

On Oct 12, 2011, at 12:36 PM, Sven Marnach wrote:
Given that there are two reasonably valid interpretations of equality (i.e. produces-equivalent-sequences or the stronger condition, has-an-equal-start/stop/step-tuple), we should acknowledge the ambiguity and refuse the temptation to guess. I vote for not defining equality for range objects -- it's not really an important service anyway (you really can live without it). Instead, let people explicitly compare the raw components, or if desired, compare components normalized by a slice:
range(2, 9, 2)[:] range(2, 10, 2)
Raymond

On Wed, Oct 12, 2011 at 9:31 AM, Sven Marnach <sven@marnach.net> wrote:
There are circumstances, for example in unit testing, when it might be useful to check if two range objects describe the same range.
Other than unit testing, what are the use cases? If I was writing a unit test, I'd be inclined to be very explicit about what I meant r1 is r2 repr(r1) == repr(r2) list(r1) == list(r2) Absent another use case, -1 --- Bruce w00t! Gruyere security codelab graduated from Google Labs! http://j.mp/googlelabs-gruyere Not to late to give it a 5-star rating if you like it. :-)

Bruce Leban schrieb am Mi, 12. Okt 2011, um 11:30:17 -0700:
Even with a useful '==' operator defined, you could still use 'r1 == r2' or 'r1 is r2', depending on the intended semnatics, just as with every other data type. You just wouldn't need to expand the range to a list. Comparing the representations doesn't ever seem useful, though. The only way to access the original start, stop and step values is by parsing the representation, and these values don't affect the behaviour of the range object in any other way. Moreover, they might even change implicitly: >>> range(5, 10, 3) range(5, 10, 3) >>> range(5, 10, 3)[:] range(5, 11, 3) I can't imagine any situation which I would like to consider the above two ranges different in. Cheers, Sven

Sven Marnach schrieb am Mi, 12. Okt 2011, um 21:33:26 +0100:
start, stop and step of course *do* affect the behaviour of the range object. What I meant is that the only way to tell the difference between two range objects defining the same sequence but creates with different values of start, stop and step is by looking at the representation. -- Sven

On Wed, Oct 12, 2011 at 1:33 PM, Sven Marnach <sven@marnach.net> wrote:
def test_copy_range(self): """Make sure that every time we call copy_range we get a new identical copy of the range.""" a = range(5, 10, 3) b = copy_range(a) c = copy_range(a) self.assert(a is not b) self.assert(a is not c) self.assert(b is not c) self.assert(repr(a) == repr(b)) self.assert(repr(a) == repr(c)) Anyway, my thought is that if you think this change should be made it would be helpful to have a use case other than unit tests as for those purposes, explicit list() or repr() is more clear and performance is not typically an issue. Why would you normally be comparing ranges at all? --- Bruce

Sven Marnach wrote:
Agreed -- comparing repr()s seems like a horrible way to do it. As far as comparing for equality, there's an excellent answer on StackOverflow -- http://stackoverflow.com/questions/7740796 def ranges_equal(a, b): return len(a)==len(b) and (len(a)==0 or a[0]==b[0] and a[-1]==b[-1]) ~Ethan~

I beg to differ with all those who want range(0, 10, 2) == range(0, 11, 2). After all the repr() shows the end point that was requested, not the end point after "normalization" (or whatever you'd call it) so the three parameters that went in should be considered state. OTOH range(10) == range(0, 10) == range(0, 10, 1). -- --Guido van Rossum (python.org/~guido)

Guido van Rossum wrote:
I beg to differ with all those who want range(0, 10, 2) == range(0, 11, 2).
I think practicality should beat purity here -- if the same results will be generated, then the ranges are the same and should be equal, no matter which exact parameters were used to create them.
Exactly. ~Ethan~

On Wed, Oct 12, 2011 at 3:57 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
Then we'd be forever stuck with not exporting the start/stop/step values. I'd much rather export those (like the slice() object does). (IOW I find the lack of exported start/stop/step values an omission, not a feature, and would like to fix that too.)
Because their repr() is the same: "range(0, 10)", thus proving that the internal state is the same. For range objects, I believe that the internal state represents what theey really "mean", and the sequence of vallues generated by iterating merely follows. -- --Guido van Rossum (python.org/~guido)

On 12 October 2011 22:12, Ethan Furman <ethan@stoneleaf.us> wrote:
While I'm agnostic on the question if whether range(0,9,2) and range(0,10,2) are the same, I'd point out that ranges_equal is straightforward to write and says they are equal. But if you're in the camp of saying they are not equal, you appear to have no way of determining that *except* by comparing reprs, as range objects don't seem to expose their start, step and end values as attributes - unless I've missed something.
Rather than worrying about supporting equality operators on ranges, I'd suggest exposing the start, step and end attributes and then leaving people who want them to roll their own equality functions. Paul.

On Wed, Oct 12, 2011 at 4:48 PM, Paul Moore <p.f.moore@gmail.com> wrote:
Unless I misunderstood, Guido is basically saying the same thing (the "exposing" part, that is). +1 on exposing start, step and end +1 on leaving it at that (unless it turns out to be a common case) -eric

On Thu, Oct 13, 2011 at 11:45 AM, Eric Snow <ericsnowcurrently@gmail.com> wrote:
My reading is that Guido has reserved judgment on the second part for now. Options are: - do nothing for 3.3 (+0 from me) - make sequence comparison the default (+1 from me) - make start/stop/step comparison the default (-1 from me) If we do either of the latter, range.__hash__ should be updated accordingly (since 3.x range objects are currently hashable due to their reliance on the default identity comparison) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, Oct 12, 2011 at 7:44 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
+1
Actually when I wrote that I was +1 on start/stop/step comparison and -1 on sequence (really: list) comparison. But I'd like to take a step back; we should really look at the use cases for comparing range objects. Since they don't return lists, you can't compare them to lists (or rather, they're always unequal). Because of this (and because it didn't work in 3.0, 3.1, 3.2) the proposed requirement that it should work the same as it did in Python 2 doesn't sway me. So what's the most useful comparison for range objects? When comparing non-empty ranges with step 1, I think we all agree. So we're left arguing about whether all empty ranges should be equal, and about whether non-empty ranges with step > 1 should compare equal if they have the same start, step and length (regardless of the exact value of stop). But why do we compare ranges? The first message in this thread (according to GMail) mentions unittests and suggests that it would be handy to check if two ranges are the same, but does not give a concrete example of such a unittest. The code example given uses list-wise comparison, but the use case is not elaborated further. Does anyone have an actual example of a unittest where being able to compare ranges would have been handy? Or of any other real-life example? Where it matter what happens if the range is empty or step is
1?
So, let me say I'm undecided (except on the desirability of an == test for ranges that's more useful than identity). FWIW, I don't think the argument from numeric comparisons carries directly. The reason numeric comparisons (across int, float and Decimal) ignore certain "state" of the value (like precision or type) is that that's how we want our numbers to work. The open question so far is: How do we want our ranges to work? My intuition is weak, but says: range(0) != range(1, 1) != range(1, 1, 2) and range(0, 10, 2) != range(0, 11, 2); all because the arguments (after filling in the defaults) are different, and those arguments can come out using the start, stop, step attributes (once we implement them :-).
Sure. That's implied when __eq__ is updated (though a good reminder for whoever will produce the patch). (I'm also -1 on adding ordering comparisons; there's little disagreement on that issue.) PS. An (unrelated) oddity with range and Decimal:
So int() knows something that range() doesn't. :-) -- --Guido van Rossum (python.org/~guido)

On Thu, Oct 13, 2011 at 1:18 PM, Guido van Rossum <guido@python.org> wrote:
Between this and Raymond's point about slicing permitting easy and cheap normalisation of endpoints, I'm convinced that, if we add direct comparison of ranges at all, then start/stop/step comparison is the way to go.
Yeah, range() wants to keep floats far away, so it only checks __index__, not __int__. So Decimal gets handled the same way float does (i.e. not allowed directly, but permitted after explicit coercion to an integer). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, Oct 12, 2011 at 9:53 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Thanks. Maybe I can nudge you a little more in the direction of my proposal by speaking about equivalence classes. A proper == function partitions the space of all objects into equivalence classes, which are non-overlapping sets such that all objects within one equivalence class are equal to each other, while no two objects in different classes are equal. (Let's leave NaN out of it for now; it does not have a "proper" == function.) There's a nice picture on this Wikipedia page: http://en.wikipedia.org/wiki/Equivalence_relation A trivial collection of equivalence classes is one where each object is in its own equivalence class. That's comparison-by-identity. It isn't very useful because we already have another operator that does the same partitioning. A more useful partitioning is the one which puts all range objects with the same start/stop/step triple into the same equivalence class. This is the one I (still) like best. Interestingly, the one that got the most votes so far is a proper "extension" of this one, in that equivalence according to equal start/stop/step triples implies equivalence according to this weaker definition. That's nice, because it means that there will probably be many use cases where either definition suffices (such as all use cases that only care about non-empty ranges with step==1). (Note: __hash__ needs to create equivalence classes that are proper extensions of those created by __eq__. In terms of the Wikipedia picture, an extension is allowed to merge some equivalence classes but not to split them.) BTW, I like Raymond's observation, and I agree that we should add slicing to range(), given that it already supports indexing; and slicing is a nice way to normalize the range. I just don't think that the status quo is better than either of the two proposed definitions for __eq__. Finally. Still waiting for actual use cases.
Sorry, it all makes sense now. Please move on. Nothing to see here. :-) -- --Guido van Rossum (python.org/~guido)

On 13/10/2011 6:30pm, Guido van Rossum wrote:
Actually cpython's dict lookup does not check equivalence of keys using __eq__ directly. Instead it uses something similar to def eq(a, b): return a.__hash__() == b.__hash__() and a.__eq__(b) This ensures compatibility with the equivalence classes for __hash__. (It is also an optimisation.) Cheers, sbt

Guido van Rossum wrote:
Both proposals define proper equivalence classes -- there is no difference in this regard. In one proposal, equivalence is defined by identical behaviour, in the other equivalence is defined by identical parameters at creation time. I still strongly lean towards the definition based on identical behaviour. If it wasn't for this particular choice of representation of ranges, there wouldn't be any way to distinguish the objects range(3, 8, 3) and range(3, 9, 3) They would be the same in every respect. It feels entirely artificial to me to consider them as not equal just because we used different parameters to create them (and someone chose to include these parameters in the representation).
range() already supports slicing, and it already does this: >>> r0 = range(3, 8, 3) >>> r1 = r[:] >>> r1 range(3, 9, 3) If we adopted equality based on start/stop/step, this would lead to the somewhat paradoxical situation that r0 != r0[:], in contrast to the behaviour of all other sequences in Python. Cheers, Sven

On Thu, Oct 13, 2011 at 1:29 PM, Sven Marnach <sven@marnach.net> wrote:
Both proposals define proper equivalence classes -- there is no difference in this regard.
I know. My point was that the equivalence classes "match up" in a way -- each equivalence class in the "as sequence" proposal comprises exactly one or more of the equivalence classes in the "as start/stop/step" proposal. (This is also why I had an aside about __hash__: each equivalence class for __hash__ comprises exactly one or more of the equivalence classes for __eq__.) -- --Guido van Rossum (python.org/~guido)

+1 for refusing the temptation to guess. Both equality definitions don't seem obvious or handy enough to be favorited by python. That is until some prevalent use cases are presented. --Yuval

On Thu, Oct 13, 2011 at 11:08 AM, Yuval Greenfield <ubershmekel@gmail.com> wrote:
Ah, but the stricter equality definition (by start/stop/step) also refuses to guess! It doesn't consider range(0, 0) and range(1, 1) as equivalent because, indeed, it would have to guess. But it will consider range(1) == range(1) since everybody considers those equivalent so there's no guess-work involved. The identity-based __eq__ does nobody any good. -- --Guido van Rossum (python.org/~guido)

On Wed, Oct 12, 2011 at 11:18 PM, Guido van Rossum <guido@python.org> wrote:
If range were a normal function, we would compare the output of the function call without worrying about what the inputs were. If range were tuple, then the exact inputs would matter, but it isn't. The few times I wanted to compare ranges, I cared what sequence they produced, and *wanted* it to normalize out the arguments for me. That said, it was years ago, and I can't even remember whether or not I was working on a "real-world" problem at the time. -jJ

On Wed, Oct 12, 2011 at 05:31:44PM +0100, Sven Marnach wrote:
I'm -32767 on this whole idea. It's not obvious what comparing a range actually means, because a range is very abstract. If a developer needs to compare ranges they should write their own functions to do it in whatever way that fits their case.

Sven Marnach wrote:
I can see how that would be useful and straightforward, and certainly more useful than identity based equality. +1
I don't agree. Equality makes sense for ranges: two ranges are equal if they have the same start, stop and step values. But order comparisons don't have any sensible meaning: range objects are numeric ranges, integer-valued intervals, not generic lists, and it is meaningless to say that one range is less than or greater than another. Which is greater? range(1, 1000000, 1000) range(1000, 10000) The question makes no sense, and should be treated as an error, just as it is for complex numbers. -1 on adding order comparison operators. Aside: I'm astonished to see that range objects have a count method! What's the purpose of that? Any value's count will either be 0 or 1, and a more appropriate test would be `value in range`:
-- Steven

On Wed, Oct 12, 2011 at 6:33 PM, Steven D'Aprano <steve@pearwood.info> wrote:
I don't agree. Equality makes sense for ranges: two ranges are equal if they have the same start, stop and step values.
Hmm. I'm not sure that it's that clear cut. The other possible definition is that two ranges are equal if they're equal as lists. Should range(0, 10, 2) and range(0, 9, 2) be considered equal, or not? Agreed that it makes more sense to implement equality for ranges than the order comparisons. Mark

On Wed, Oct 12, 2011 at 1:58 PM, Mark Dickinson <dickinsm@gmail.com> wrote: ..
Should range(0, 10, 2) and range(0, 9, 2) be considered equal, or not?
I was going to ask the same question. I think ranges r1 and r2 should be considered equal iff list(r1) == list(r2). This is slightly harder to implement than just naively comparing (start, stop, step) tuples, but the advantage is that people won't run into surprises when they port 2.x code where result of range() is a list.

On Wed, Oct 12, 2011 at 1:58 PM, Mark Dickinson <dickinsm@gmail.com> wrote:
For equality and comparison, this should be the standard. range objects are sequences, and they should compare just like other sequences. If implemented at all, equality should be that they have the same items in the same order. If implemented at all, comparison should be lexicographic. It seems to me you'd need a really good reason to have behavior different from every other sequence. Mike

Steven D'Aprano schrieb am Do, 13. Okt 2011, um 04:33:49 +1100:
No, two ranges should be equal if they represent the same sequence, i.e. if they compare equal when converted to a list: range(0) == range(4, 4, 4) range(5, 10, 3) == range(5, 11, 3) range(3, 6, 3) == range(3, 4)
Well, it's meaningless unless you define what it means. Range objects are equal if they compare equal after converting to a list. You could define '<' or '>' the same way. All built-in sequence types support lexicographical comparison, so I thought it would be natural to bring the only one that behaves differently in line. (Special cases aren't special enough...) This is just to explain my thoughts, I don't have a strong opinion on this one. I'll try and prepare a patch for '==' and '!=' and add it to the issue tracker. Cheers, Sven

On Oct 12, 2011, at 12:36 PM, Sven Marnach wrote:
Given that there are two reasonably valid interpretations of equality (i.e. produces-equivalent-sequences or the stronger condition, has-an-equal-start/stop/step-tuple), we should acknowledge the ambiguity and refuse the temptation to guess. I vote for not defining equality for range objects -- it's not really an important service anyway (you really can live without it). Instead, let people explicitly compare the raw components, or if desired, compare components normalized by a slice:
range(2, 9, 2)[:] range(2, 10, 2)
Raymond

On Wed, Oct 12, 2011 at 9:31 AM, Sven Marnach <sven@marnach.net> wrote:
There are circumstances, for example in unit testing, when it might be useful to check if two range objects describe the same range.
Other than unit testing, what are the use cases? If I was writing a unit test, I'd be inclined to be very explicit about what I meant r1 is r2 repr(r1) == repr(r2) list(r1) == list(r2) Absent another use case, -1 --- Bruce w00t! Gruyere security codelab graduated from Google Labs! http://j.mp/googlelabs-gruyere Not to late to give it a 5-star rating if you like it. :-)

Bruce Leban schrieb am Mi, 12. Okt 2011, um 11:30:17 -0700:
Even with a useful '==' operator defined, you could still use 'r1 == r2' or 'r1 is r2', depending on the intended semnatics, just as with every other data type. You just wouldn't need to expand the range to a list. Comparing the representations doesn't ever seem useful, though. The only way to access the original start, stop and step values is by parsing the representation, and these values don't affect the behaviour of the range object in any other way. Moreover, they might even change implicitly: >>> range(5, 10, 3) range(5, 10, 3) >>> range(5, 10, 3)[:] range(5, 11, 3) I can't imagine any situation which I would like to consider the above two ranges different in. Cheers, Sven

Sven Marnach schrieb am Mi, 12. Okt 2011, um 21:33:26 +0100:
start, stop and step of course *do* affect the behaviour of the range object. What I meant is that the only way to tell the difference between two range objects defining the same sequence but creates with different values of start, stop and step is by looking at the representation. -- Sven

On Wed, Oct 12, 2011 at 1:33 PM, Sven Marnach <sven@marnach.net> wrote:
def test_copy_range(self): """Make sure that every time we call copy_range we get a new identical copy of the range.""" a = range(5, 10, 3) b = copy_range(a) c = copy_range(a) self.assert(a is not b) self.assert(a is not c) self.assert(b is not c) self.assert(repr(a) == repr(b)) self.assert(repr(a) == repr(c)) Anyway, my thought is that if you think this change should be made it would be helpful to have a use case other than unit tests as for those purposes, explicit list() or repr() is more clear and performance is not typically an issue. Why would you normally be comparing ranges at all? --- Bruce

Sven Marnach wrote:
Agreed -- comparing repr()s seems like a horrible way to do it. As far as comparing for equality, there's an excellent answer on StackOverflow -- http://stackoverflow.com/questions/7740796 def ranges_equal(a, b): return len(a)==len(b) and (len(a)==0 or a[0]==b[0] and a[-1]==b[-1]) ~Ethan~

I beg to differ with all those who want range(0, 10, 2) == range(0, 11, 2). After all the repr() shows the end point that was requested, not the end point after "normalization" (or whatever you'd call it) so the three parameters that went in should be considered state. OTOH range(10) == range(0, 10) == range(0, 10, 1). -- --Guido van Rossum (python.org/~guido)

Guido van Rossum wrote:
I beg to differ with all those who want range(0, 10, 2) == range(0, 11, 2).
I think practicality should beat purity here -- if the same results will be generated, then the ranges are the same and should be equal, no matter which exact parameters were used to create them.
Exactly. ~Ethan~

On Wed, Oct 12, 2011 at 3:57 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
Then we'd be forever stuck with not exporting the start/stop/step values. I'd much rather export those (like the slice() object does). (IOW I find the lack of exported start/stop/step values an omission, not a feature, and would like to fix that too.)
Because their repr() is the same: "range(0, 10)", thus proving that the internal state is the same. For range objects, I believe that the internal state represents what theey really "mean", and the sequence of vallues generated by iterating merely follows. -- --Guido van Rossum (python.org/~guido)

On 12 October 2011 22:12, Ethan Furman <ethan@stoneleaf.us> wrote:
While I'm agnostic on the question if whether range(0,9,2) and range(0,10,2) are the same, I'd point out that ranges_equal is straightforward to write and says they are equal. But if you're in the camp of saying they are not equal, you appear to have no way of determining that *except* by comparing reprs, as range objects don't seem to expose their start, step and end values as attributes - unless I've missed something.
Rather than worrying about supporting equality operators on ranges, I'd suggest exposing the start, step and end attributes and then leaving people who want them to roll their own equality functions. Paul.

On Wed, Oct 12, 2011 at 4:48 PM, Paul Moore <p.f.moore@gmail.com> wrote:
Unless I misunderstood, Guido is basically saying the same thing (the "exposing" part, that is). +1 on exposing start, step and end +1 on leaving it at that (unless it turns out to be a common case) -eric

On Thu, Oct 13, 2011 at 11:45 AM, Eric Snow <ericsnowcurrently@gmail.com> wrote:
My reading is that Guido has reserved judgment on the second part for now. Options are: - do nothing for 3.3 (+0 from me) - make sequence comparison the default (+1 from me) - make start/stop/step comparison the default (-1 from me) If we do either of the latter, range.__hash__ should be updated accordingly (since 3.x range objects are currently hashable due to their reliance on the default identity comparison) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, Oct 12, 2011 at 7:44 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
+1
Actually when I wrote that I was +1 on start/stop/step comparison and -1 on sequence (really: list) comparison. But I'd like to take a step back; we should really look at the use cases for comparing range objects. Since they don't return lists, you can't compare them to lists (or rather, they're always unequal). Because of this (and because it didn't work in 3.0, 3.1, 3.2) the proposed requirement that it should work the same as it did in Python 2 doesn't sway me. So what's the most useful comparison for range objects? When comparing non-empty ranges with step 1, I think we all agree. So we're left arguing about whether all empty ranges should be equal, and about whether non-empty ranges with step > 1 should compare equal if they have the same start, step and length (regardless of the exact value of stop). But why do we compare ranges? The first message in this thread (according to GMail) mentions unittests and suggests that it would be handy to check if two ranges are the same, but does not give a concrete example of such a unittest. The code example given uses list-wise comparison, but the use case is not elaborated further. Does anyone have an actual example of a unittest where being able to compare ranges would have been handy? Or of any other real-life example? Where it matter what happens if the range is empty or step is
1?
So, let me say I'm undecided (except on the desirability of an == test for ranges that's more useful than identity). FWIW, I don't think the argument from numeric comparisons carries directly. The reason numeric comparisons (across int, float and Decimal) ignore certain "state" of the value (like precision or type) is that that's how we want our numbers to work. The open question so far is: How do we want our ranges to work? My intuition is weak, but says: range(0) != range(1, 1) != range(1, 1, 2) and range(0, 10, 2) != range(0, 11, 2); all because the arguments (after filling in the defaults) are different, and those arguments can come out using the start, stop, step attributes (once we implement them :-).
Sure. That's implied when __eq__ is updated (though a good reminder for whoever will produce the patch). (I'm also -1 on adding ordering comparisons; there's little disagreement on that issue.) PS. An (unrelated) oddity with range and Decimal:
So int() knows something that range() doesn't. :-) -- --Guido van Rossum (python.org/~guido)

On Thu, Oct 13, 2011 at 1:18 PM, Guido van Rossum <guido@python.org> wrote:
Between this and Raymond's point about slicing permitting easy and cheap normalisation of endpoints, I'm convinced that, if we add direct comparison of ranges at all, then start/stop/step comparison is the way to go.
Yeah, range() wants to keep floats far away, so it only checks __index__, not __int__. So Decimal gets handled the same way float does (i.e. not allowed directly, but permitted after explicit coercion to an integer). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, Oct 12, 2011 at 9:53 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Thanks. Maybe I can nudge you a little more in the direction of my proposal by speaking about equivalence classes. A proper == function partitions the space of all objects into equivalence classes, which are non-overlapping sets such that all objects within one equivalence class are equal to each other, while no two objects in different classes are equal. (Let's leave NaN out of it for now; it does not have a "proper" == function.) There's a nice picture on this Wikipedia page: http://en.wikipedia.org/wiki/Equivalence_relation A trivial collection of equivalence classes is one where each object is in its own equivalence class. That's comparison-by-identity. It isn't very useful because we already have another operator that does the same partitioning. A more useful partitioning is the one which puts all range objects with the same start/stop/step triple into the same equivalence class. This is the one I (still) like best. Interestingly, the one that got the most votes so far is a proper "extension" of this one, in that equivalence according to equal start/stop/step triples implies equivalence according to this weaker definition. That's nice, because it means that there will probably be many use cases where either definition suffices (such as all use cases that only care about non-empty ranges with step==1). (Note: __hash__ needs to create equivalence classes that are proper extensions of those created by __eq__. In terms of the Wikipedia picture, an extension is allowed to merge some equivalence classes but not to split them.) BTW, I like Raymond's observation, and I agree that we should add slicing to range(), given that it already supports indexing; and slicing is a nice way to normalize the range. I just don't think that the status quo is better than either of the two proposed definitions for __eq__. Finally. Still waiting for actual use cases.
Sorry, it all makes sense now. Please move on. Nothing to see here. :-) -- --Guido van Rossum (python.org/~guido)

On 13/10/2011 6:30pm, Guido van Rossum wrote:
Actually cpython's dict lookup does not check equivalence of keys using __eq__ directly. Instead it uses something similar to def eq(a, b): return a.__hash__() == b.__hash__() and a.__eq__(b) This ensures compatibility with the equivalence classes for __hash__. (It is also an optimisation.) Cheers, sbt

Guido van Rossum wrote:
Both proposals define proper equivalence classes -- there is no difference in this regard. In one proposal, equivalence is defined by identical behaviour, in the other equivalence is defined by identical parameters at creation time. I still strongly lean towards the definition based on identical behaviour. If it wasn't for this particular choice of representation of ranges, there wouldn't be any way to distinguish the objects range(3, 8, 3) and range(3, 9, 3) They would be the same in every respect. It feels entirely artificial to me to consider them as not equal just because we used different parameters to create them (and someone chose to include these parameters in the representation).
range() already supports slicing, and it already does this: >>> r0 = range(3, 8, 3) >>> r1 = r[:] >>> r1 range(3, 9, 3) If we adopted equality based on start/stop/step, this would lead to the somewhat paradoxical situation that r0 != r0[:], in contrast to the behaviour of all other sequences in Python. Cheers, Sven

On Thu, Oct 13, 2011 at 1:29 PM, Sven Marnach <sven@marnach.net> wrote:
Both proposals define proper equivalence classes -- there is no difference in this regard.
I know. My point was that the equivalence classes "match up" in a way -- each equivalence class in the "as sequence" proposal comprises exactly one or more of the equivalence classes in the "as start/stop/step" proposal. (This is also why I had an aside about __hash__: each equivalence class for __hash__ comprises exactly one or more of the equivalence classes for __eq__.) -- --Guido van Rossum (python.org/~guido)

+1 for refusing the temptation to guess. Both equality definitions don't seem obvious or handy enough to be favorited by python. That is until some prevalent use cases are presented. --Yuval

On Thu, Oct 13, 2011 at 11:08 AM, Yuval Greenfield <ubershmekel@gmail.com> wrote:
Ah, but the stricter equality definition (by start/stop/step) also refuses to guess! It doesn't consider range(0, 0) and range(1, 1) as equivalent because, indeed, it would have to guess. But it will consider range(1) == range(1) since everybody considers those equivalent so there's no guess-work involved. The identity-based __eq__ does nobody any good. -- --Guido van Rossum (python.org/~guido)

On Wed, Oct 12, 2011 at 11:18 PM, Guido van Rossum <guido@python.org> wrote:
If range were a normal function, we would compare the output of the function call without worrying about what the inputs were. If range were tuple, then the exact inputs would matter, but it isn't. The few times I wanted to compare ranges, I cared what sequence they produced, and *wanted* it to normalize out the arguments for me. That said, it was years ago, and I can't even remember whether or not I was working on a "real-world" problem at the time. -jJ

On Wed, Oct 12, 2011 at 05:31:44PM +0100, Sven Marnach wrote:
I'm -32767 on this whole idea. It's not obvious what comparing a range actually means, because a range is very abstract. If a developer needs to compare ranges they should write their own functions to do it in whatever way that fits their case.
participants (20)
-
Alexander Belopolsky
-
Antoine Pitrou
-
Bruce Leban
-
Eric Snow
-
Ethan Furman
-
Georg Brandl
-
Greg Ewing
-
Guido van Rossum
-
Jim Jewett
-
Mark Dickinson
-
Mike Graham
-
Nick Coghlan
-
Paul Moore
-
Raymond Hettinger
-
Ron Adam
-
shibturn
-
Steven D'Aprano
-
Sven Marnach
-
Westley Martínez
-
Yuval Greenfield