Re: [Python-ideas] [Python-Dev] Inclusive Range

On Mon, Oct 4, 2010 at 5:27 PM, Xavier Morel <python-dev@masklinn.net> wrote:
A flag doesn't have any chance either - you spell inclusive ranges by including a "+1" on the stop value. Closed ranges actually do superficially appear more intuitive (especially to new programmers) because we often use inclusive ranges in ordinary speech ("10-15 people" allows 15 people, "ages 8-12" includes 12 year olds, "from A-Z" includes items starting with "Z"). However, there are some cases where we naturally use half-open ranges as well (such as "between 10 and 12" excluding 12:01 to 12:59) or explicitly invoke exclusive ranges as being easier to deal with (such as the "under 13s", "under 19s", etc naming schemes used for age brackets in junior sports) However, as soon you move into the mathematical world (including programming), closed ranges turn out to require constant adjustments in the arithmetic, so it far more natural to use half-open ranges consistently. Xavier noted the two most important properties of half-closed ranges for Python: they match the definition of subtraction, such that len(range(start, stop)) = (stop - start), and they match the definition of slicing as being half-open. As to whether slicing itself being half-open is beneficial, the value of that becomes clear ones you start trying to manipulate ranges: With half-open slices, the following is true: s == s[:i] + s[i:] With inclusive slices (which would be needed to complement inclusive range), you would need either a -1 on the stop value of the first slice, or a +1 on the start value of the second slice. Similarly, if you know the length of the slice you want, then you can grab it via s[i:i+slice_len], while you'd need a -1 correction on the stop value if slices were inclusive. There are other benefits to half-open ranges when it comes to (approximately) continuous spectra like time values, floating point numbers and lexically ordered strings. Being able to say things like "10:00" <= x < '12:00", 10.0 <= x < 12.0, "a" <= x < "n" are much clearer than trying to specify their closed range equivalents. While that isn't specifically applicable to the range() builtin, it is another factor in why it is important to drink the "half-open ranges are your friend" Kool-aid as a serious programmer. Cheers, Nick. P.S. many of the points above are just rephrased from http://www.siliconbrain.com/ranges.htm, which is the first hit when Googling "half-open ranges" -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Changing range would only make sense if lists were also changed to start at 1 instead of 0, and that's never gonna happen. It's a massively backwards incompatible change with no real offsetting advantage. Still, if you were designing a brand new language today, would you have arrays/lists start at 0 or 1? (Or compromise and do .5?) I personally lean towards 1, since I recall being frequently tripped up by the first element in an array being a[0] way back when I first learn C++ in the 20th century. But maybe this was because I had been messed up by writing BASIC for loops from 1 to n before that? Is there anyone with teaching experience here? Is this much of a problem for young people learning Python (or any other zero-based indexing language) as their first language? What do you guys think? Now that simplifying pointer arithmetic isn't such an important consideration, is it still better to do zero-based indexing? -- Carl Johnson

On 2010-10-05, at 10:54 , Carl M. Johnson wrote:
I will refer to EWD 831[0], which talks about ranges and starting indexes without *once* referring to pointers. Pointers are in fact entirely irrelevant to the discussion: FORTRAN and ALGOL 60, among many others, used 1-indexed collections. Some languages (ADA, I believe, though I am by no means certain) also allow for arbitrary starting indexes. [0] http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF

I did some research before posting and saw that they talked about that Dykstra paper on C2's page about zero indexing, and honestly, I count it as a point in favor of starting with 1. Dykstra was a great computer scientist but a terrible computer programmer (with the exception of "Goto Considered Harmful" [a headline he didn't actually give to his article]) in the sense that he understand how to do things mathematically but not how to take into account the human factors in such a way that one can get normal people to program well. His theory that we should all be proving the correctness of our program is, to my way of thinking, a crank's theory. If regular people can't be trusted to program, they certainly can't be trusted to write correctness proofs, which is a harder task, not a simpler one. Moreover this ignores all of the stuff that Paul Graham would eventually say about the joys of exploratory programming, or to give an earlier reference, the need to build one to throw away as Brooks said. Proving correctness presumes that you know what you want to program before you start programming it, which is only rarely the case, mostly in the computer science classroom. So, I don't consider Dykstra's expertise to be worth relying in matters of programming, as distinct from matters of computer science. In the particular case, the correct way to represent an integer between 2 and 12 wouldn't be a, b, c, or d. It would be i in range(2, 12) (if we were creating a new language that was 1 indexed and range was likewise adjusted), the list [1] would be range(1), and the empty list would be range(0), so the whole issue could be neatly sidestepped. :-) As for l == l[:x] + l[x:y] + l[y:] where y > x, I think a case can be made that it would be less confusing as l == l[:x] + l[x+1:y] + l[y+1:], since you don't want to start again with x or y. You just ended at x. When you pick up again, you want to start at x+1 and y+1 so that you don't get the x-th and y-th elements again. ;-) Of course this is speculation on my part. Maybe students of programming find 1-indexing just as confusing as 0-indexing. Any pedagogues want to chime in? -- Carl Johnson

On Tue, Oct 5, 2010 at 1:05 AM, Masklinn <masklinn@masklinn.net> wrote:
He was trying to be language neutral by writing using < and <= but that's part of his problem. He's too much of a mathematician. Rewriting things so that they don't use < or <= at all is the best way to explain things to a non-math person. If you say "range(1, 5) gives a range from 1 to 5" your explanation doesn't have to use < or <= at all. This is unlike a C-like language where you would write int i=2; i<12; i++. So the question of what mathematics "really" underlies it can be sidestepped by using a language that many people know better than the language of mathematics: the English language.
Because (speaking naively) I already FizzBuzzed the x-th element before. I don't want to double FizzBuzz it. So that means I should start up again with the +1 element.
Yup. TANSTAAFL. That's why we shouldn't actually bother to change things: you lose on the backend what you gain on the frontend. I'm just curious about whether starting programmers have a strong preference for one or the other convention or whether both are confusing.

On 5 October 2010 12:51, Carl M. Johnson <cmjohnson.mailinglist@gmail.com>wrote:
Both teaching new programmers and programmers coming from other languages I've found them confused by the range behaviour and usually end up having to apologise for it (a sure sign of a language wart). It is *good* that range(5) produces 5 values (0 to 4) but *weird* that range(3, 10) doesn't include the 10. Changing it now would be *very* backwards incompatible of course. Python 4 perhaps? All the best, Michael Foord

On 05/10/2010 14:13, C. Titus Brown wrote:
Yes. That is why I said that the current behaviour of range for a single input is *good*. Perhaps I should have been clearer; it is only the behaviour of range(x, y) that I've found people-new-to-python confused by. All the best, Michael
?
--titus
-- http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.

On Tue, Oct 5, 2010 at 9:16 AM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
... Perhaps I should have been clearer; it is only the behaviour of range(x, y) that I've found people-new-to-python confused by.
Teach them about range(x, y, z) and once you cover negative z they will stop complaining about range(x, y). :-) At least you don't have to deal with range vs. xrange in 3.x anymore. IMO, range([start,] stop[, step]) is one of the worst interfaces in python. Is there any other function with an optional *first* argument? Why range(date(2010, 1, 1), date(2010, 2, 1), timedelta(1)) cannot be used to produce days in January? Why range(2**300) succeeds, but len(range(2**300)) raises OverflowError? No, I don't think much can be done about it. Py3k has already done everything that was practical about improving range(..).

On Tue, Oct 5, 2010 at 10:47 AM, Masklinn <masklinn@masklinn.net> wrote: ..
This particular wart is the subject of issue 2690. http://bugs.python.org/issue2690

On 05/10/2010 15:33, Alexander Belopolsky wrote:
Well, it probably doesn't help (for those coming to Python from languages other than C) that some languages do-the-right-thing with ranges. <0.5 wink> $ irb
(1..3).to_a => [1, 2, 3]
All the best, Michael Foord

On 2010-10-05, at 16:51 , Michael Foord wrote:
On the other hand (for Ruby),
(1...3).to_a => [1, 2]
Ruby is also a bit different in that ranges are generally used more for containment-testing (via when) and there is a separate Fixnum.upto for iteration.

On 10/5/10, C. Titus Brown <ctb@msu.edu> wrote:
On Tue, Oct 05, 2010 at 02:07:41PM +0100, Michael Foord wrote:
It is *good* that range(5) produces 5 values (0 to 4)
If not for compatibility, the 5 values (1,2,3,4,5) would be even better. But even in a new language, changing the rest of the language so that (1,2,3,4,5) was more useful might not be a win.
Doesn't it make sense that ... for i in range(5): mimics the C/C++ behavior of for (i = 0; i < 5; i++)
If not for assumed familiarity with C idioms, why shouldn't it instead match for (i=1; i<=5; i++) -jJ

On the more general topic of *teaching* 0-based indexing, the best explanation I've seen is the one where 1-based indexing is explained as referring directly to the items in the sequence, while 0-based indexing numbers the implicit gaps between items and then returns the item immediately after the identified gap. Slicing for 0-based indexing can then be explained without needing to talk about half-open ranges at all - you just grab everything between the two identified gaps*. I think the main point here is that these are not independent design decisions - the behaviour of range() (or its equivalent), indexing, slicing, enumeration and anything else related to sequences all comes back to a single fundamental design choice of 1-based vs 0-based indexing. Once you make that initial decision (regardless of the merits either way), other decisions are going to flow from it as consequences, and it isn't really something a language can ever practically tinker with. Cheers, Nick. *(unfortunately, it's a bit trickier to mesh that otherwise clear and concise explanation cleanly with Python's definition of ranges and slicing with negative step values, since those offset everything by one, such that "list(reversed(range(1, 5, 1])) == list(range(4, 0, -1])". If I was going to ask for a change to anything in Python's indexing semantics, it would be for negative step values to create ranges that were half-open at the beginning rather than the end, such that reversing a slice just involved swapping the start value with the stop value and negating the step value. As it is, you also have to subtract one from both the start and stop value to get the original range of values back. However, just like the idea of ranges starting from 1 rather than 0, the idea of negative slices giving ranges half-open at the start rather than the end is also doomed by significant problems with backwards compatibility. For a new language, you might be able to make the argument that the alternative behaviour is a better design choice. For an existing one like Python, any possible benefits are so nebulous as to not be worth the inevitable hassle involved in changing the semantics) -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 10/06/2010 08:58 AM, Nick Coghlan wrote:
Yes, negative slices are very tricky to get right. They could use some attention I think.
We don't need to change the current range function/generator to add inclusive or closed ranges. Just add a closed_range() function to the itertools or math module. [n for n in closed_range(-5, 5, 2)] --> [-5, -3, -1, 1, 3, 5] I just noticed the __getslice__ method is no longer on sequences. (?) My preference is for slicing to be based more on practical terms for manipulating sequences rather than be defined in a purely mathematical way. 1. Have the direction determine by the start and stop values rather than than by the step value so that the following is true. "abcdefg"[start:stop:step] == "abcdefg"[start:stop][::step] Reversing the slice can be done by simply swapping the start and stop. Negating the slice too would give you ... "abcdefg"[start:stop:step] == "abcdefg"[stop:start:-step] Negating the step would not always give you the reverse sequence for steps larger than 1, because the result may not contain the same values.
This is the current behavior and wouldn't change. A positive step value would step from the left, and a negative step value would step from the right of the slice determined by start and stop. This already works if you don't give stop and start values.
And these can be used in for loops or list comps.
[c for c in "abcdefg"[::2]] ['a', 'c', 'e', 'g']
If we could add a width value to slices we would be able to do this.
"abcdefg"[::2:2] 'abcdefg'
;-) As unimpressive as that looked, when used in a for loop or list comp it would give us an easy and useful way to step through data. [cc for cc in "abcdefg"[::2:2]] --> ['ab', 'cd', 'ef', 'g'] You could also spell that as... list("abcdefg")[::2:2]) --> ['ab', 'cd', 'ef', 'g'] The problems start when you try to use actual index values to specify start and stop ranges. You can't index the last element with an explicit stop value.
"abcdefg"[0:-1] 'abcdef'
"abcdefg"[0:-0] ''
But we can use "None" which is awkward and requires testing the stop value when the index is supplied by a variable.
'abcdefg'[:None] 'abcdefg'
I'm not sure how to fix this one. We've been living with this for a long time so it's not like we need to fix it all at once. Negative indexes can be confusing.
With the suggested change we get...
I think these are easier to use than the current behavior. It doesn't change slices using positive indexes and steps so maybe it's not so backward incompatible to sneak in. ;-) Ron

On Oct 6, 2010, at 12:21 PM, Ron Adam wrote:
We don't need to change the current range function/generator to add inclusive or closed ranges. Just add a closed_range() function to the itertools or math module.
[n for n in closed_range(-5, 5, 2)] --> [-5, -3, -1, 1, 3, 5]
If I were a betting man, I would venture that you could post a recipe for closed_range(), publicize it on various mailing lists, mention it in talks, and find that it would almost never get used. There's nothing wrong with the idea, but the YAGNI factor will be hard to overcome. IMO, this would become cruft on the same day it gets added to the library. OTOH for numerical applications, there is utility for a floating point variant, something like linspace() in MATLAB. Raymond

On 06/10/10 22:35, Raymond Hettinger wrote:
There are plenty of places in my code where I would find such a thing useful, though... usually where I'm working with pre-determined integer codes (one very specific use-case: elementary particle ID codes, which are integers constructed from quantum number values) and it's simply more elegant and intuitive to specify a range whose requested upper bound is a valid code rather than valid_code+1. IMHO, an extra keyword on range/xrange would allow to write nicer code where applicable, without crufting up the library with whole extra functions. Depends on what you consider more crufty, I suppose, but I agree that ~no-one is going to find and import a new range function. numpy.linspace uses "endpoint" as the name for such a keyword: http://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html#nump... but again no-one wants to depend on numpy *just* to get that functionality! So how about range(start, realend, endpoint=True) xrange(start, realend, endpoint=True) with endpoint=False as default? No backward compatibility or performance issues to my (admittedly inexpert) eye. Andy

On Wed, 06 Oct 2010 14:21:03 -0500 Ron Adam <rrr@ronadam.com> wrote:
Please provide an example with current and proposed semantics. If I understand correctly, this does not work in practice. When range bounds are variable (result from computation), the upper one can happen to be smaller than the upper one and we just want the resulting sub-sequence to be empty. This is normal and common use case, and this is good. (upper <= lower) ==> [] Else many routine would have to special-case (upper < lower). Your proposal, again if I understand, would break this semantics, instead returning a sub-sequence in reverse order. Denis -- -- -- -- -- -- -- vit esse estrany ☣ spir.wikidot.com

On Wed, 6 Oct 2010 23:58:48 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
On the more general topic of *teaching* 0-based indexing, the best explanation I've seen is the one where 1-based indexing is explained as referring directly to the items in the sequence, while 0-based indexing numbers the implicit gaps between items and then returns the item immediately after the identified gap. Slicing for 0-based indexing can then be explained without needing to talk about half-open ranges at all - you just grab everything between the two identified gaps*.
In my experience, the only explanation that makes sense for newcomers is that 1-based indexes are just ordinary ordinals like we use everyday, while 0-based ones are _offsets_ measured from the start. It does not really help in practice (people do errors anyway), but at least they understand the logic so can reason when needed, namely to correct their errors.
I think there are languages with base 0 & closed range, unless it is base 1 & half-open range. Any convention works, practically. Also, the logics every supporter of the C convention, namely the famous tewt by EWD, reverses your argumentation: he show the advantages of helf-open intervals (according to his opinion), then that 0-based indexes fit better with this kind of intervals (ditto). Denis -- -- -- -- -- -- -- vit esse estrany ☣ spir.wikidot.com

Carl M. Johnson wrote:
Starting programmers don't have enough experience to judge which will be less confusing in the long run, so their opinion shouldn't be given overriding weight when designing a language intended for real-life use. Speaking as an experienced programmer, I'm convinced that Python has made the right choice. Not because Dijkstra or any other authority says so, but because of my own personal experiences. -- Greg

With 1-based indexes, sometimes you have to add 1 and sometimes subtract 1 and sometimes neither. 0-based indexes avoid that problem. Personally, I think changing any of this behavior has about the same probability of success as adding bleen<http://www.urbandictionary.com/define.php?term=bleen> . --- Bruce http://www.vroospeak.com http://j.mp/gruyere-security On Tue, Oct 5, 2010 at 4:33 PM, Greg Ewing <greg.ewing@canterbury.ac.nz>wrote:

On 10/5/2010 4:54 AM, Carl M. Johnson wrote:
Sequences are often used as and can be viewed as tabular representations of functions for equally spaced inputs a+0*b, a+1*b, ..., a+i*b, .... In the simplest case, a==0 and b==1, so that the sequence directly maps counts 0,1,2,... to values. Without the 0 index, one must subtract 1 from each index to have the same effect. Pointer arithmetic is an example of the utility of keeping the 0 term, but only one such example of many. When one uses iterators instead of sequences, as in more common in Python 3, there is no inherent index to worry about or argue over. def inner_product(p,q): # no equal, finite len() check! sum = 0 for a,b in zip(p,q): sum += a*b No index in sight. -- Terry Jan Reedy

Nick Coghlan wrote:
makes one wonder about syntax like : for 10 <= x < 20 : blah(x) Mh, I suppose with rich comparisons special methods, it's possible to turn chained comparisons into range factories without introducing new syntax. Something more like for x in (10 <= step(1) < 20) : blah(x)

On Tue, 05 Oct 2010 13:45:56 +0200 Boris Borcic <bborcic@gmail.com> wrote:
About notation, even if loved right-hand-half-open intervals, I would wonder about [a,b] noting it. I guess 99.9% of programmers and novices (even purely amateur) have learnt about intervals at school in math courses. Both notations I know of use [a,b] for closed intervals, while half-open ones are noted either [a,b[ or [a,b). Thus, for me, the present C/python/etc notation is at best misleading. So, what about a hypothetical language using directly math *unambiguous* notation, thus also letting programmers chose their preferred semantics (without fooling others)? End of war? Denis -- -- -- -- -- -- -- vit esse estrany ☣ spir.wikidot.com

Changing range would only make sense if lists were also changed to start at 1 instead of 0, and that's never gonna happen. It's a massively backwards incompatible change with no real offsetting advantage. Still, if you were designing a brand new language today, would you have arrays/lists start at 0 or 1? (Or compromise and do .5?) I personally lean towards 1, since I recall being frequently tripped up by the first element in an array being a[0] way back when I first learn C++ in the 20th century. But maybe this was because I had been messed up by writing BASIC for loops from 1 to n before that? Is there anyone with teaching experience here? Is this much of a problem for young people learning Python (or any other zero-based indexing language) as their first language? What do you guys think? Now that simplifying pointer arithmetic isn't such an important consideration, is it still better to do zero-based indexing? -- Carl Johnson

On 2010-10-05, at 10:54 , Carl M. Johnson wrote:
I will refer to EWD 831[0], which talks about ranges and starting indexes without *once* referring to pointers. Pointers are in fact entirely irrelevant to the discussion: FORTRAN and ALGOL 60, among many others, used 1-indexed collections. Some languages (ADA, I believe, though I am by no means certain) also allow for arbitrary starting indexes. [0] http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF

I did some research before posting and saw that they talked about that Dykstra paper on C2's page about zero indexing, and honestly, I count it as a point in favor of starting with 1. Dykstra was a great computer scientist but a terrible computer programmer (with the exception of "Goto Considered Harmful" [a headline he didn't actually give to his article]) in the sense that he understand how to do things mathematically but not how to take into account the human factors in such a way that one can get normal people to program well. His theory that we should all be proving the correctness of our program is, to my way of thinking, a crank's theory. If regular people can't be trusted to program, they certainly can't be trusted to write correctness proofs, which is a harder task, not a simpler one. Moreover this ignores all of the stuff that Paul Graham would eventually say about the joys of exploratory programming, or to give an earlier reference, the need to build one to throw away as Brooks said. Proving correctness presumes that you know what you want to program before you start programming it, which is only rarely the case, mostly in the computer science classroom. So, I don't consider Dykstra's expertise to be worth relying in matters of programming, as distinct from matters of computer science. In the particular case, the correct way to represent an integer between 2 and 12 wouldn't be a, b, c, or d. It would be i in range(2, 12) (if we were creating a new language that was 1 indexed and range was likewise adjusted), the list [1] would be range(1), and the empty list would be range(0), so the whole issue could be neatly sidestepped. :-) As for l == l[:x] + l[x:y] + l[y:] where y > x, I think a case can be made that it would be less confusing as l == l[:x] + l[x+1:y] + l[y+1:], since you don't want to start again with x or y. You just ended at x. When you pick up again, you want to start at x+1 and y+1 so that you don't get the x-th and y-th elements again. ;-) Of course this is speculation on my part. Maybe students of programming find 1-indexing just as confusing as 0-indexing. Any pedagogues want to chime in? -- Carl Johnson

On Tue, Oct 5, 2010 at 1:05 AM, Masklinn <masklinn@masklinn.net> wrote:
He was trying to be language neutral by writing using < and <= but that's part of his problem. He's too much of a mathematician. Rewriting things so that they don't use < or <= at all is the best way to explain things to a non-math person. If you say "range(1, 5) gives a range from 1 to 5" your explanation doesn't have to use < or <= at all. This is unlike a C-like language where you would write int i=2; i<12; i++. So the question of what mathematics "really" underlies it can be sidestepped by using a language that many people know better than the language of mathematics: the English language.
Because (speaking naively) I already FizzBuzzed the x-th element before. I don't want to double FizzBuzz it. So that means I should start up again with the +1 element.
Yup. TANSTAAFL. That's why we shouldn't actually bother to change things: you lose on the backend what you gain on the frontend. I'm just curious about whether starting programmers have a strong preference for one or the other convention or whether both are confusing.

On 5 October 2010 12:51, Carl M. Johnson <cmjohnson.mailinglist@gmail.com>wrote:
Both teaching new programmers and programmers coming from other languages I've found them confused by the range behaviour and usually end up having to apologise for it (a sure sign of a language wart). It is *good* that range(5) produces 5 values (0 to 4) but *weird* that range(3, 10) doesn't include the 10. Changing it now would be *very* backwards incompatible of course. Python 4 perhaps? All the best, Michael Foord

On 05/10/2010 14:13, C. Titus Brown wrote:
Yes. That is why I said that the current behaviour of range for a single input is *good*. Perhaps I should have been clearer; it is only the behaviour of range(x, y) that I've found people-new-to-python confused by. All the best, Michael
?
--titus
-- http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.

On Tue, Oct 5, 2010 at 9:16 AM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
... Perhaps I should have been clearer; it is only the behaviour of range(x, y) that I've found people-new-to-python confused by.
Teach them about range(x, y, z) and once you cover negative z they will stop complaining about range(x, y). :-) At least you don't have to deal with range vs. xrange in 3.x anymore. IMO, range([start,] stop[, step]) is one of the worst interfaces in python. Is there any other function with an optional *first* argument? Why range(date(2010, 1, 1), date(2010, 2, 1), timedelta(1)) cannot be used to produce days in January? Why range(2**300) succeeds, but len(range(2**300)) raises OverflowError? No, I don't think much can be done about it. Py3k has already done everything that was practical about improving range(..).

On Tue, Oct 5, 2010 at 10:47 AM, Masklinn <masklinn@masklinn.net> wrote: ..
This particular wart is the subject of issue 2690. http://bugs.python.org/issue2690

On 05/10/2010 15:33, Alexander Belopolsky wrote:
Well, it probably doesn't help (for those coming to Python from languages other than C) that some languages do-the-right-thing with ranges. <0.5 wink> $ irb
(1..3).to_a => [1, 2, 3]
All the best, Michael Foord

On 2010-10-05, at 16:51 , Michael Foord wrote:
On the other hand (for Ruby),
(1...3).to_a => [1, 2]
Ruby is also a bit different in that ranges are generally used more for containment-testing (via when) and there is a separate Fixnum.upto for iteration.

On 10/5/10, C. Titus Brown <ctb@msu.edu> wrote:
On Tue, Oct 05, 2010 at 02:07:41PM +0100, Michael Foord wrote:
It is *good* that range(5) produces 5 values (0 to 4)
If not for compatibility, the 5 values (1,2,3,4,5) would be even better. But even in a new language, changing the rest of the language so that (1,2,3,4,5) was more useful might not be a win.
Doesn't it make sense that ... for i in range(5): mimics the C/C++ behavior of for (i = 0; i < 5; i++)
If not for assumed familiarity with C idioms, why shouldn't it instead match for (i=1; i<=5; i++) -jJ

On the more general topic of *teaching* 0-based indexing, the best explanation I've seen is the one where 1-based indexing is explained as referring directly to the items in the sequence, while 0-based indexing numbers the implicit gaps between items and then returns the item immediately after the identified gap. Slicing for 0-based indexing can then be explained without needing to talk about half-open ranges at all - you just grab everything between the two identified gaps*. I think the main point here is that these are not independent design decisions - the behaviour of range() (or its equivalent), indexing, slicing, enumeration and anything else related to sequences all comes back to a single fundamental design choice of 1-based vs 0-based indexing. Once you make that initial decision (regardless of the merits either way), other decisions are going to flow from it as consequences, and it isn't really something a language can ever practically tinker with. Cheers, Nick. *(unfortunately, it's a bit trickier to mesh that otherwise clear and concise explanation cleanly with Python's definition of ranges and slicing with negative step values, since those offset everything by one, such that "list(reversed(range(1, 5, 1])) == list(range(4, 0, -1])". If I was going to ask for a change to anything in Python's indexing semantics, it would be for negative step values to create ranges that were half-open at the beginning rather than the end, such that reversing a slice just involved swapping the start value with the stop value and negating the step value. As it is, you also have to subtract one from both the start and stop value to get the original range of values back. However, just like the idea of ranges starting from 1 rather than 0, the idea of negative slices giving ranges half-open at the start rather than the end is also doomed by significant problems with backwards compatibility. For a new language, you might be able to make the argument that the alternative behaviour is a better design choice. For an existing one like Python, any possible benefits are so nebulous as to not be worth the inevitable hassle involved in changing the semantics) -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 10/06/2010 08:58 AM, Nick Coghlan wrote:
Yes, negative slices are very tricky to get right. They could use some attention I think.
We don't need to change the current range function/generator to add inclusive or closed ranges. Just add a closed_range() function to the itertools or math module. [n for n in closed_range(-5, 5, 2)] --> [-5, -3, -1, 1, 3, 5] I just noticed the __getslice__ method is no longer on sequences. (?) My preference is for slicing to be based more on practical terms for manipulating sequences rather than be defined in a purely mathematical way. 1. Have the direction determine by the start and stop values rather than than by the step value so that the following is true. "abcdefg"[start:stop:step] == "abcdefg"[start:stop][::step] Reversing the slice can be done by simply swapping the start and stop. Negating the slice too would give you ... "abcdefg"[start:stop:step] == "abcdefg"[stop:start:-step] Negating the step would not always give you the reverse sequence for steps larger than 1, because the result may not contain the same values.
This is the current behavior and wouldn't change. A positive step value would step from the left, and a negative step value would step from the right of the slice determined by start and stop. This already works if you don't give stop and start values.
And these can be used in for loops or list comps.
[c for c in "abcdefg"[::2]] ['a', 'c', 'e', 'g']
If we could add a width value to slices we would be able to do this.
"abcdefg"[::2:2] 'abcdefg'
;-) As unimpressive as that looked, when used in a for loop or list comp it would give us an easy and useful way to step through data. [cc for cc in "abcdefg"[::2:2]] --> ['ab', 'cd', 'ef', 'g'] You could also spell that as... list("abcdefg")[::2:2]) --> ['ab', 'cd', 'ef', 'g'] The problems start when you try to use actual index values to specify start and stop ranges. You can't index the last element with an explicit stop value.
"abcdefg"[0:-1] 'abcdef'
"abcdefg"[0:-0] ''
But we can use "None" which is awkward and requires testing the stop value when the index is supplied by a variable.
'abcdefg'[:None] 'abcdefg'
I'm not sure how to fix this one. We've been living with this for a long time so it's not like we need to fix it all at once. Negative indexes can be confusing.
With the suggested change we get...
I think these are easier to use than the current behavior. It doesn't change slices using positive indexes and steps so maybe it's not so backward incompatible to sneak in. ;-) Ron

On Oct 6, 2010, at 12:21 PM, Ron Adam wrote:
We don't need to change the current range function/generator to add inclusive or closed ranges. Just add a closed_range() function to the itertools or math module.
[n for n in closed_range(-5, 5, 2)] --> [-5, -3, -1, 1, 3, 5]
If I were a betting man, I would venture that you could post a recipe for closed_range(), publicize it on various mailing lists, mention it in talks, and find that it would almost never get used. There's nothing wrong with the idea, but the YAGNI factor will be hard to overcome. IMO, this would become cruft on the same day it gets added to the library. OTOH for numerical applications, there is utility for a floating point variant, something like linspace() in MATLAB. Raymond

On 06/10/10 22:35, Raymond Hettinger wrote:
There are plenty of places in my code where I would find such a thing useful, though... usually where I'm working with pre-determined integer codes (one very specific use-case: elementary particle ID codes, which are integers constructed from quantum number values) and it's simply more elegant and intuitive to specify a range whose requested upper bound is a valid code rather than valid_code+1. IMHO, an extra keyword on range/xrange would allow to write nicer code where applicable, without crufting up the library with whole extra functions. Depends on what you consider more crufty, I suppose, but I agree that ~no-one is going to find and import a new range function. numpy.linspace uses "endpoint" as the name for such a keyword: http://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html#nump... but again no-one wants to depend on numpy *just* to get that functionality! So how about range(start, realend, endpoint=True) xrange(start, realend, endpoint=True) with endpoint=False as default? No backward compatibility or performance issues to my (admittedly inexpert) eye. Andy

On Wed, 06 Oct 2010 14:21:03 -0500 Ron Adam <rrr@ronadam.com> wrote:
Please provide an example with current and proposed semantics. If I understand correctly, this does not work in practice. When range bounds are variable (result from computation), the upper one can happen to be smaller than the upper one and we just want the resulting sub-sequence to be empty. This is normal and common use case, and this is good. (upper <= lower) ==> [] Else many routine would have to special-case (upper < lower). Your proposal, again if I understand, would break this semantics, instead returning a sub-sequence in reverse order. Denis -- -- -- -- -- -- -- vit esse estrany ☣ spir.wikidot.com

On Wed, 6 Oct 2010 23:58:48 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
On the more general topic of *teaching* 0-based indexing, the best explanation I've seen is the one where 1-based indexing is explained as referring directly to the items in the sequence, while 0-based indexing numbers the implicit gaps between items and then returns the item immediately after the identified gap. Slicing for 0-based indexing can then be explained without needing to talk about half-open ranges at all - you just grab everything between the two identified gaps*.
In my experience, the only explanation that makes sense for newcomers is that 1-based indexes are just ordinary ordinals like we use everyday, while 0-based ones are _offsets_ measured from the start. It does not really help in practice (people do errors anyway), but at least they understand the logic so can reason when needed, namely to correct their errors.
I think there are languages with base 0 & closed range, unless it is base 1 & half-open range. Any convention works, practically. Also, the logics every supporter of the C convention, namely the famous tewt by EWD, reverses your argumentation: he show the advantages of helf-open intervals (according to his opinion), then that 0-based indexes fit better with this kind of intervals (ditto). Denis -- -- -- -- -- -- -- vit esse estrany ☣ spir.wikidot.com

Carl M. Johnson wrote:
Starting programmers don't have enough experience to judge which will be less confusing in the long run, so their opinion shouldn't be given overriding weight when designing a language intended for real-life use. Speaking as an experienced programmer, I'm convinced that Python has made the right choice. Not because Dijkstra or any other authority says so, but because of my own personal experiences. -- Greg

With 1-based indexes, sometimes you have to add 1 and sometimes subtract 1 and sometimes neither. 0-based indexes avoid that problem. Personally, I think changing any of this behavior has about the same probability of success as adding bleen<http://www.urbandictionary.com/define.php?term=bleen> . --- Bruce http://www.vroospeak.com http://j.mp/gruyere-security On Tue, Oct 5, 2010 at 4:33 PM, Greg Ewing <greg.ewing@canterbury.ac.nz>wrote:

On 10/5/2010 4:54 AM, Carl M. Johnson wrote:
Sequences are often used as and can be viewed as tabular representations of functions for equally spaced inputs a+0*b, a+1*b, ..., a+i*b, .... In the simplest case, a==0 and b==1, so that the sequence directly maps counts 0,1,2,... to values. Without the 0 index, one must subtract 1 from each index to have the same effect. Pointer arithmetic is an example of the utility of keeping the 0 term, but only one such example of many. When one uses iterators instead of sequences, as in more common in Python 3, there is no inherent index to worry about or argue over. def inner_product(p,q): # no equal, finite len() check! sum = 0 for a,b in zip(p,q): sum += a*b No index in sight. -- Terry Jan Reedy

Nick Coghlan wrote:
makes one wonder about syntax like : for 10 <= x < 20 : blah(x) Mh, I suppose with rich comparisons special methods, it's possible to turn chained comparisons into range factories without introducing new syntax. Something more like for x in (10 <= step(1) < 20) : blah(x)

On Tue, 05 Oct 2010 13:45:56 +0200 Boris Borcic <bborcic@gmail.com> wrote:
About notation, even if loved right-hand-half-open intervals, I would wonder about [a,b] noting it. I guess 99.9% of programmers and novices (even purely amateur) have learnt about intervals at school in math courses. Both notations I know of use [a,b] for closed intervals, while half-open ones are noted either [a,b[ or [a,b). Thus, for me, the present C/python/etc notation is at best misleading. So, what about a hypothetical language using directly math *unambiguous* notation, thus also letting programmers chose their preferred semantics (without fooling others)? End of war? Denis -- -- -- -- -- -- -- vit esse estrany ☣ spir.wikidot.com
participants (16)
-
Alexander Belopolsky
-
Andy Buckley
-
Boris Borcic
-
Bruce Leban
-
C. Titus Brown
-
Carl M. Johnson
-
Greg Ewing
-
Jim Jewett
-
Masklinn
-
Michael Foord
-
MRAB
-
Nick Coghlan
-
Raymond Hettinger
-
Ron Adam
-
spir
-
Terry Reedy