Mailman 3 use type hints and slices to specify a valid numerical range, example: `Angle = int[0:361]` - Python-ideas

use type hints and slices to specify a valid numerical range, example: `Angle = int[0:361]`

Ricky Teachey

Aug. 7, 2020

2:44 p.m.

This was inspired by a tweet today from Brandon Rhodes. I looked for something like it on the mypy issues page and didn't find anything. Would it make good semantic sense- and be useful- to specify valid numerical ranges using slices and type-hint syntax? My suggestion would be to, at minimum, provide this functionality for int and float. Consider that currently we have: x: int # this means "all ints", today x: float # this means "all floating point numbers", today Idea in a nutshell would be for the following type declarations to mean: x: int[0:] # any ints greater than or equal to zero would match, others would fail x: int[:101] # any ints less than 101 match x: int[0:101:2] # even less than 101 # for floats, it's a bit odd since the upper bound is typically exclusive... # but we might prefer it to be inclusive? Not sure. # regardless, any float within the range matches import math x: float[-math.pi:math.pi] # any angle in radians from -pi to pi x: float[-180:180] # any angle in degrees from -180 to 180 So going with the angle example, one use-case might be something like: """my_module.py""" import math def atan_deg_int(v: float) -> int[-90:91]: """Computes the angle in integer degrees using arctangent.""" return math.atan(v) * 180/math.pi if __name__=="__main__": AngleDeg = float[0:360] # a positive angle up to 360 degrees x: AngleDeg x = atan_deg_int(-0.5) Then we would get an error similar to below:

...

...
...
mypy my_module.py <string>:1: error: Incompatible types in assignment (expression has type "int[-90:91]", variable has type "float[0:45]")

Here the error occurs not because ints are considered inconsistent with floats (mypy allows ints in the place of floats, so no problem there), but because some of the ints from -90 to 90 (inclusive) fall out of the float range of 0 to 360. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

Attachments:

attachment.htm (text/html — 3.5 KB)

Show replies by date

Paul Moore

August 2020

3:01 p.m.

On Fri, 7 Aug 2020 at 22:46, Ricky Teachey <ricky@teachey.org> wrote:

...

This was inspired by a tweet today from Brandon Rhodes. I looked for something like it on the mypy issues page and didn't find anything.

Would it make good semantic sense- and be useful- to specify valid numerical ranges using slices and type-hint syntax? My suggestion would be to, at minimum, provide this functionality for int and float.

Consider that currently we have:

x: int # this means "all ints", today x: float # this means "all floating point numbers", today

Idea in a nutshell would be for the following type declarations to mean:

x: int[0:] # any ints greater than or equal to zero would match, others would fail x: int[:101] # any ints less than 101 match x: int[0:101:2] # even less than 101

I suspect the biggest issue with this is that it's likely to be extremely hard (given the dynamic nature of Python) to check such type assertions statically. Even in statically typed languages, you don't often see range-based types like this. And type assertions don't do runtime checks, so if they can't be usefully checked statically, they probably aren't going to be of much benefit (documentation is basically all). Paul

Ricky Teachey

3:05 p.m.

On Fri, Aug 7, 2020 at 6:01 PM Paul Moore <p.f.moore@gmail.com> wrote:

...

On Fri, 7 Aug 2020 at 22:46, Ricky Teachey <ricky@teachey.org> wrote:

...
This was inspired by a tweet today from Brandon Rhodes. I looked for

something like it on the mypy issues page and didn't find anything.

...
Would it make good semantic sense- and be useful- to specify valid

numerical ranges using slices and type-hint syntax? My suggestion would be to, at minimum, provide this functionality for int and float.

...
Consider that currently we have:

x: int # this means "all ints", today x: float # this means "all floating point numbers", today

Idea in a nutshell would be for the following type declarations to mean:

x: int[0:] # any ints greater than or equal to zero would match, others

would fail

...
x: int[:101] # any ints less than 101 match x: int[0:101:2] # even less than 101

I suspect the biggest issue with this is that it's likely to be extremely hard (given the dynamic nature of Python) to check such type assertions statically. Even in statically typed languages, you don't often see range-based types like this. And type assertions don't do runtime checks, so if they can't be usefully checked statically, they probably aren't going to be of much benefit (documentation is basically all).

Paul

You could be right and I don't know much about this subject. However the question that comes up for me, though, is: how does TypedDict perform its static checks? It seems like this: class D(typing.TypedDict): a: int d: D = dict(b=2) # error Shouldn't be any harder to type check than this: x: int[0:] = -1 # error No? --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

...

David Mertz

8:48 p.m.

On Fri, Aug 7, 2020, 6:03 PM Paul Moore <p.f.moore@gmail.com> wrote:

...

...
x: int[0:] # any ints greater than or equal to zero would match, others would fail x: int[:101] # any ints less than 101 match x: int[0:101:2] # even less than 101

I suspect the biggest issue with this is that it's likely to be extremely hard (given the dynamic nature of Python) to check such type assertions statically.

Yes, it's hard in the sense that it would require solving the halting problem.

Ricky Teachey

9:17 p.m.

On Fri, Aug 7, 2020 at 11:48 PM David Mertz <mertz@gnosis.cx> wrote:

...

On Fri, Aug 7, 2020, 6:03 PM Paul Moore <p.f.moore@gmail.com> wrote:

...
...
x: int[0:] # any ints greater than or equal to zero would match, others would fail x: int[:101] # any ints less than 101 match x: int[0:101:2] # even less than 101

I suspect the biggest issue with this is that it's likely to be extremely hard (given the dynamic nature of Python) to check such type assertions statically.

Yes, it's hard in the sense that it would require solving the halting problem.

That doesn't sound so hard. ;) Thanks for educating me. Could it at least be useful for: 1. Providing semantic meaning to code (but this is probably not enough reason on its own) 2. Couldn't it still be useful for static analysis during runtime? Not in cpython, but when the type hints are used in cython, for example? --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

David Mertz

9:31 p.m.

On Sat, Aug 8, 2020, 12:18 AM Ricky Teachey

...

Yes, it's hard in the sense that it would require solving the halting

...
problem.

That doesn't sound so hard. ;)

Thanks for educating me. Could it at least be useful for:

1. Providing semantic meaning to code (but this is probably not enough reason on its own) 2. Couldn't it still be useful for static analysis during runtime? Not in cpython, but when the type hints are used in cython, for example?

Being static like CPython doesn't help. You cannot know statically what the result of an arbitrary computation will be. There are certainly languages with guards. For example, Python. I can definitely write a function like this: def small_nums(i: int): assert 0 < i < 100 do_stuff(i) x = small_nums(arbitrary_computation()) In concept, an annotation could be another way to spell an assertion. But I don't think we need that.

...

Steven D'Aprano

10:08 p.m.

On Fri, Aug 07, 2020 at 11:48:40PM -0400, David Mertz wrote:

...

On Fri, Aug 7, 2020, 6:03 PM Paul Moore <p.f.moore@gmail.com> wrote:

...
...
x: int[0:] # any ints greater than or equal to zero would match, others would fail x: int[:101] # any ints less than 101 match x: int[0:101:2] # even less than 101

I suspect the biggest issue with this is that it's likely to be extremely hard (given the dynamic nature of Python) to check such type assertions statically.

Yes, it's hard in the sense that it would require solving the halting problem.

How so? I don't see how static bounds checking would be fundamentally more difficult than static type checking. Static languages often check what bounds they can at compile time, and optionally insert bound checking runtime code for ambiguous places. See for example this comment: """ 2.5 Static Analysis Bounds checking has relied heavily on static analysis to optimize performance [15]. Checks can be eliminated if it can be statically determined that a pointer is safe, i.e. always within bounds, or that a check is redundant due to a previous check """ https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/baggy-US... I believe that JS++ does something similar: https://www.onux.com/jspp/blog/jspp-0-9-0-efficient-compile-time-analysis-of... and Checked C aims for similar compile-time bounds checking: https://www.microsoft.com/en-us/research/project/checked-c/ In any case, nobody expects to solve the Halting Problem for arbitrarily complex code. It is still useful to be able to detect some failures even if you can't detect them all: https://web.archive.org/web/20080509165811/http://perl.plover.com/yak/typing... -- Steven

David Mertz

10:28 p.m.

On Sat, Aug 8, 2020, 1:12 AM Steven D'Aprano

...

Static languages often check what bounds they can at compile time, and optionally insert bound checking runtime code for ambiguous places.

Yep. That's an assert, or it's moral equivalent. Here's a deterministic program using the hypothetical new feature. def plusone(i: int[1:1_000_000_000]): return i+1 random.seed(42) for n in range(1_000_000): random.randint(1, 1_000_000_001) Is this program type safe? Tell me by static analysis of Mersenne Twister. Or if you want to special case the arguments to randint, will, lots of things. Let's say a "random" walk on the integer number line where each time through the loop increments or decrements some (deterministic but hard to calculate) amount. After N steps are we within certain bounds?

David Mertz

10:44 p.m.

Oops, obviously I meant: def plusone(i: int[1:1_000_000_000]): return i+1 random.seed(42) for n in range(1_000_000): plusone(random.randint(1, 1_000_000_001)) Or a zillion other things. I can construct orbitals of Mandelbrot set that may or may not be bounded. Or bounds that depend on the twin prime conjecture. Or whatever. Mersenne Twister is just a non-obvious calculation that we have convenient functions for. On Sat, Aug 8, 2020, 1:28 AM David Mertz <mertz@gnosis.cx> wrote:

...

On Sat, Aug 8, 2020, 1:12 AM Steven D'Aprano

...
Static languages often check what bounds they can at compile time, and optionally insert bound checking runtime code for ambiguous places.

Yep. That's an assert, or it's moral equivalent.

Here's a deterministic program using the hypothetical new feature.

def plusone(i: int[1:1_000_000_000]): return i+1

random.seed(42) for n in range(1_000_000): random.randint(1, 1_000_000_001)

Is this program type safe? Tell me by static analysis of Mersenne Twister.

Or if you want to special case the arguments to randint, will, lots of things. Let's say a "random" walk on the integer number line where each time through the loop increments or decrements some (deterministic but hard to calculate) amount. After N steps are we within certain bounds?

Steven D'Aprano

4:36 p.m.

On Sat, Aug 08, 2020 at 01:28:59AM -0400, David Mertz wrote:

...

On Sat, Aug 8, 2020, 1:12 AM Steven D'Aprano

...
Static languages often check what bounds they can at compile time, and optionally insert bound checking runtime code for ambiguous places.

Yep. That's an assert, or it's moral equivalent.

No, asserts are not checked at compile time. Assertions are purely runtime checks. But bounds checking can be done (at least in part) at compile time, depending on how smart your compiler is. (In the case of Python, that would be entirely delegated to a third-party checker.) If a static checker knows that x is a float, and that n is an int between 0 and 100, it shouldn't be hard to tell that this is a bounds error: x: float n: int[0:100] n = int(x) but this isn't: n = int(x) % 5 But there is a big gap between what a human can reason about code and what a tool can do in practice (as opposed to in theory), and I don't know what the state of art in range checking is, or how sophisticated it may be. Compared to type systems, compile time range checking has been neglected for decades and (as far as I can tell) the state of the art is far more primitive. But I could be wrong and this could be one of the simplest things that they can handle for all I know :-) We should remember that the requirement for static checking isn't to detect at compile time that something *will* be a runtime error, but that it *could* be a runtime so that you can fix your code to avoid the possibility. Just as type checking doesn't tell you that you certainly will get a type error at runtime, only that you could. I suggested this proposed feature be deferred unless people in the MyPy and other static checking projects express interest. It may be that nobody in the MyPy and other static checker projects has the knowledge, time or interest in pursuing this, in which case supporting this would be a waste of time.

...

Here's a deterministic program using the hypothetical new feature.

def plusone(i: int[1:1_000_000_000]): return i+1

random.seed(42) for n in range(1_000_000): random.randint(1, 1_000_000_001)

Is this program type safe? Tell me by static analysis of Mersenne Twister.

I assume there was supposed to be a call to plusone there :-) randint will surely be declared as returning an int, since we can't be more specific than that without getting into serious levels of automated correctness checking. Any checker capable of doing bounds checks would know that the range of possible ints is unbounded in both directions, and therefore an int does not fit into the range [1:10**9]. Hence that will be a static bounds check failure: the checker detects that there is a chance that the input to plusone could be out of range.

...

Or if you want to special case the arguments to randint,

Of course not.

...

will, lots of things. Let's say a "random" walk on the integer number line where each time through the loop increments or decrements some (deterministic but hard to calculate) amount. After N steps are we within certain bounds?

Let's be less vague: def step()->int[-3:4]: pass n = 0: int[-100:101] for i in range(N): n += step() You and I can reason that after N steps, the maximum possible value of n is 3*N, which could be greater than 100 unless N was provably less than 34, which it may not be. And so the bounds check fails. I know very little about the state of art of bounds checking, but I would imagine that this would probably be beyond the capability of a simple checker, and maybe a sophisticated one too, but not the most sophisticated formal correctness proof checkers. So what? The value of static checking is not diminished by the problems not found, since they wouldn't be found if you had no static checking in the first place. The value of static checking is in the problems which are found. Whether or not MyPy etc are capable of such range checking, or even simpler checks, is a question best asked of them. If they have any interest in range checking, adding subscripting to `int` would be easy. Until then, YAGNI. -- Steven

David Mertz

5:13 p.m.

On Sat, Aug 8, 2020, 7:40 PM Steven D'Aprano

...

Any checker capable of doing bounds checks would know that the range of possible ints is unbounded in both directions, and therefore an int does not fit into the range [1:10**9]. Hence that will be a static bounds check failure:

Just so you can win a bet at the library or the bar, the largest Python int is 2**(2**63)-1, and the smallest -2**(2**63). Ints have a struct member of bit_length that is a fixed size in C. :-) A computer capable of storing that number cannot, of course, fit in the physical universe. Let's be less vague:

...

def step()->int[-3:4]: pass

n = 0: int[-100:101] for i in range(N): n += step()

You and I can reason that after N steps, the maximum possible value of n is 3*N, which could be greater than 100 unless N was provably less than 34, which it may not be. And so the bounds check fails.

Yes, this is absolutely doable! I guess my argument in several posts is that this simple level of analysis of "possible bounds violation" is rarely useful (at least in a Python context[*]). Vastly more complicated formal proofs might be useful... But, y'know, way more work for tools. [*] For languages with bounded data types, this is more compelling. If I think a variable will *definitely* fit in a uint8, having the static tool tell me it might not is powerful.

Steven D'Aprano

9:04 p.m.

On Sat, Aug 08, 2020 at 08:13:53PM -0400, David Mertz wrote:

...

Yes, this is absolutely doable! I guess my argument in several posts is that this simple level of analysis of "possible bounds violation" is rarely useful (at least in a Python context[*]). Vastly more complicated formal proofs might be useful... But, y'know, way more work for tools.

[*] For languages with bounded data types, this is more compelling. If I think a variable will *definitely* fit in a uint8, having the static tool tell me it might not is powerful.

uint8 = int[0:256] So if it's useful to know that something might violate the type uint8, surely it is just as useful to know that it might violate the range int[0:256]. For what it's worth, I have often had IndexErrors from indexing into a list or string that were a bugger to debug. But it is one thing to have a static checker that can tell me if i might exceed the bounds 0...256 and another that can tell me if i might exceed the bounds 0...N where N is not known until runtime. I suppose this is why bounds checking so often gets turned into runtime checks, but Python already has that: you get an IndexError, not a seg fault. -- Steven

David Mertz

9:21 p.m.

On Sun, Aug 9, 2020, 12:07 AM Steven D'Aprano

...

...
[*] For languages with bounded data types, this is more compelling. If I think a variable will *definitely* fit in a uint8, having the static tool tell me it might not is powerful.

uint8 = int[0:256]

So if it's useful to know that something might violate the type uint8, surely it is just as useful to know that it might violate the range int[0:256].

These seem like VERY different concerns. There's nothing all that special about 255 per se, most of the time. My actual expectation might be that I'll store a Natural number roughly under 50, for example. But 51 isn't per se an error. Maybe this is the usage of a limited, but somewhat elastic resource; using 256 things might be terrible performance, but it's not logically wrong. With a uint8 in a language that has that, we get suddenly different behavior with 255+1, versus 254+1. Either it wraps around to zero in C11, or it raises an exception in a guarded language.

Dominik Vilsmeier

1:40 p.m.

On 08.08.20 05:48, David Mertz wrote:

...

On Fri, Aug 7, 2020, 6:03 PM Paul Moore <p.f.moore@gmail.com <mailto:p.f.moore@gmail.com>> wrote:

> x: int[0:] # any ints greater than or equal to zero would match, others would fail > x: int[:101] # any ints less than 101 match > x: int[0:101:2] # even less than 101

I suspect the biggest issue with this is that it's likely to be extremely hard (given the dynamic nature of Python) to check such type assertions statically.

Yes, it's hard in the sense that it would require solving the halting problem.

How is it any more difficult than already existing type checks? The proposal talks about type hints for static type analysis and hence implies using literals for the slices. There's no dynamic nature to it. You can define `int[x]` to be a subtype of `int[y]` if the numbers defined by `x` are a subset of those defined by `y`. Referring to one of the examples including `random.randint`: def foo(x: int[0:11]): pass foo(random.randint(0, 10)) Regarding static type analysis, the only information about `random.randint` is that it returns an `int` and hence this should be an error. If you want to use it that way, you'd have to manually cast the value: foo(typing.cast(int[0:11], random.randint(0, 10))) This of course raises questions about the usefulness of the feature. By the way, I would find it more intuitive if the `stop` boundary was included in the interval, i.e. `int[0:10]` means all the integer numbers from 0 to 10 (including 10). I don't think that conflicts with the current notion of slices since they always indicate a position in the sequence, not the values directly (then `int[0:10]` would mean "the first 10 integer numbers", but where do they even start?).

Michael Smith

2:47 p.m.

On Sat, Aug 8, 2020 at 16:41 Dominik Vilsmeier <dominik.vilsmeier@gmx.de> wrote:

...

On 08.08.20 05:48, David Mertz wrote:

On Fri, Aug 7, 2020, 6:03 PM Paul Moore <p.f.moore@gmail.com> wrote:

...
...
x: int[0:] # any ints greater than or equal to zero would match, others would fail x: int[:101] # any ints less than 101 match x: int[0:101:2] # even less than 101

I suspect the biggest issue with this is that it's likely to be extremely hard (given the dynamic nature of Python) to check such type assertions statically.

Yes, it's hard in the sense that it would require solving the halting problem.

How is it any more difficult than already existing type checks? The proposal talks about type hints for static type analysis and hence implies using literals for the slices. There's no dynamic nature to it. You can define `int[x]` to be a subtype of `int[y]` if the numbers defined by `x` are a subset of those defined by `y`.

Referring to one of the examples including `random.randint`:

def foo(x: int[0:11]): pass

foo(random.randint(0, 10))

Regarding static type analysis, the only information about `random.randint` is that it returns an `int` and hence this should be an error. If you want to use it that way, you'd have to manually cast the value:

foo(typing.cast(int[0:11], random.randint(0, 10)))

This of course raises questions about the usefulness of the feature.

By the way, I would find it more intuitive if the `stop` boundary was included in the interval, i.e. `int[0:10]` means all the integer numbers from 0 to 10 (including 10). I don't think that conflicts with the current notion of slices since they always indicate a position in the sequence, not the values directly (then `int[0:10]` would mean "the first 10 integer numbers", but where do they even start?).

...

This of course raises questions about the usefulness of the feature.

I feel like this discussion boils down to one argument being the checking is doable as long as you keep it to something with very limited usefulness (static literal ranges) and the other being in order to make it useful you'd need dependent typing and a sophisticated theorem prover. I think about dependent typing when I look at TypeVar and Literal, but for it to work, you'd have to be able to do gnarly things like add and compare two TypeVars according to the operations of their literal types, like in this case A = TypeVar('A', int) B = TypeVar('B', int) def randint(a: A, b: B) -> int[A:Literal[1]+B]: # ? ... And randint(1, 1_000_000_001) means B = Literal[1_000_000_001] but plusone's type int[1:1_000_000_000] requires B < Literal[1_000_000_000], which fails. This kind of thing is so powerful, and I would love to see tooling capable of it in the python ecosystem. I believe folks who say it's very hard to implement correctly, but I don't know that that's a good reason not to make the proposed range syntax legal for some primitive types like int. Tools like mypy can choose not to handle int[A:B] until they are ready, or forever, but someone else can still work on it. Are we just talking about the language features here, or does this list include ideas for tooling like mypy? - Mike Smith

Ricky Teachey

3:24 p.m.

On Sat, Aug 8, 2020 at 5:51 PM Michael Smith <michael@smith-li.com> wrote:

...

This kind of thing is so powerful, and I would love to see tooling capable of it in the python ecosystem. I believe folks who say it's very hard to implement correctly, but I don't know that that's a good reason not to make the proposed range syntax legal for some primitive types like int. Tools like mypy can choose not to handle int[A:B] until they are ready, or forever, but someone else can still work on it.

I agree. A total, complete solution that successfully catches everything might well be out of reach. But that doesn't seem like a reason to not catch what can be caught. On Sat, Aug 8, 2020 at 5:51 PM Michael Smith <michael@smith-li.com> wrote:

...

Are we just talking about the language features here, or does this list include ideas for tooling like mypy?

Others will surely know better than me but it seems like that in order to add the ability to use the [ ] operator on the int (and perhaps float) type, and to be able to "get at" the arguments that were provided to them, the language would first have to change to support that,. But on the other hand there is no stopping someone from writing a library that writes its own Int and Float type hints that supports all of this, and putting it out in the universe to see what traction it gets. I know for a fact I am not skilled enough to do this, unfortunately, so I'd have to leave it to someone else. On Sat, Aug 8, 2020 at 4:42 PM Dominik Vilsmeier <dominik.vilsmeier@gmx.de> wrote:

...

By the way, I would find it more intuitive if the `stop` boundary was included in the interval, i.e. `int[0:10]` means all the integer numbers from 0 to 10 (including 10). I don't think that conflicts with the current notion of slices since they always indicate a position in the sequence, not the values directly (then `int[0:10]` would mean "the first 10 integer numbers", but where do they even start?).

...

...
...
x: int[excv(0):incv(10)] # where incv and excv stand for "inclusive" and "exclusive" x: int[@0:10] # where the @ is read "almost" or "about" or "around" --

It's always interesting how people's intuitions can differ. Making the stop value inclusive by default is a little jarring to me since it is so inconsistent with sequence indexing elsewhere. But I do see your point that in reality this is NOT slicing positions in a sequence (since it has no beginning and end). I don't think it would be a terrible thing for both start and stop to be inclusive by default. But regardless of whether the stop value is inclusive or not, there are certainly going to be instances when someone will want inclusivity and others when they want exclusivity, and this applies to both the start and the stop value. In other words, any syntax that did not provide the ability to state all of these number lines would seem pretty wanting: 0 < x < ∞ # all positive integers -∞ < x < 0 # all negative integers 0.0 < x < ∞ # all positive floats -∞ < x < 0.0 # all negative floats 0 <= x < ∞ # all positive integers, zero included -∞ < x <= 0 # all negative integers, zero included 0.0 <= x < ∞ # all positive floats, zero included -∞ < x <= 0.0 # all negative floats, zero included 0 <= x <= 10 # some inclusive int interval 0 < x < 10 # some exclusive int interval 0 < x <= 10 # some int interval, combined boundaries 0 <= x < 10 # some int interval, combined boundaries 0.0 <= x <= 10.0 # some inclusive float interval 0.0 < x < 10.0 # some exclusive float interval 0.0 < x <= 10.0 # some float interval, combined boundaries 0.0 <= x < 10.0 # some float interval, combined boundaries And versions of all of these with step arguments. A way around the inclusive vs exclusive problem might be to define some sort of indicator, or function, to modify the boundaries. Here's several ideas of how to spell the number line 0 < x <=10: the idea is all boundaries are inclusive by default and the @ makes a boundary exclusive

...

...
...
x: int[(0,"incv"):(10,"excv")] # here we just pass tuples to the start and stop slice arguments, and the text indicates whether each boundary is inclusive (incv) or exclusive (excv)

I think the third is my favorite - it reads pretty nicely. Although it would be nicer if it didn't need the parentheses. The second one is interesting though. But of course regardless of which of these spellings (or some other) is preferred, the default behavior would still need to be defined. --- Ricky. "I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

Guido van Rossum

3:43 p.m.

This thread seems light on real use cases. I know there are people eager to have integer generics, since they are essential for typing numpy, pandas, tensorflow and friends. And so there are good proposals floating around for this on typing-sig; I expect implementations within a year. But integer ranges? I guess they would be useful to catch array indexing mistakes, but I don’t believe IndexError is the bane of anyone’s existence. And float ranges? I wouldn’t want to touch those with a 10-foot pole. What am I missing? On Sat, Aug 8, 2020 at 15:28 Ricky Teachey <ricky@teachey.org> wrote:

...

On Sat, Aug 8, 2020 at 5:51 PM Michael Smith <michael@smith-li.com> wrote:

...
This kind of thing is so powerful, and I would love to see tooling capable of it in the python ecosystem. I believe folks who say it's very hard to implement correctly, but I don't know that that's a good reason not to make the proposed range syntax legal for some primitive types like int. Tools like mypy can choose not to handle int[A:B] until they are ready, or forever, but someone else can still work on it.

I agree.

A total, complete solution that successfully catches everything might well be out of reach. But that doesn't seem like a reason to not catch what can be caught.

On Sat, Aug 8, 2020 at 5:51 PM Michael Smith <michael@smith-li.com> wrote:

...
Are we just talking about the language features here, or does this list include ideas for tooling like mypy?

Others will surely know better than me but it seems like that in order to add the ability to use the [ ] operator on the int (and perhaps float) type, and to be able to "get at" the arguments that were provided to them, the language would first have to change to support that,.

But on the other hand there is no stopping someone from writing a library that writes its own Int and Float type hints that supports all of this, and putting it out in the universe to see what traction it gets.

I know for a fact I am not skilled enough to do this, unfortunately, so I'd have to leave it to someone else.

On Sat, Aug 8, 2020 at 4:42 PM Dominik Vilsmeier <dominik.vilsmeier@gmx.de> wrote:

...
By the way, I would find it more intuitive if the `stop` boundary was included in the interval, i.e. `int[0:10]` means all the integer numbers from 0 to 10 (including 10). I don't think that conflicts with the current notion of slices since they always indicate a position in the sequence, not the values directly (then `int[0:10]` would mean "the first 10 integer numbers", but where do they even start?).

It's always interesting how people's intuitions can differ. Making the stop value inclusive by default is a little jarring to me since it is so inconsistent with sequence indexing elsewhere. But I do see your point that in reality this is NOT slicing positions in a sequence (since it has no beginning and end). I don't think it would be a terrible thing for both start and stop to be inclusive by default.

But regardless of whether the stop value is inclusive or not, there are certainly going to be instances when someone will want inclusivity and others when they want exclusivity, and this applies to both the start and the stop value.

In other words, any syntax that did not provide the ability to state all of these number lines would seem pretty wanting:

0 < x < ∞ # all positive integers -∞ < x < 0 # all negative integers 0.0 < x < ∞ # all positive floats -∞ < x < 0.0 # all negative floats

0 <= x < ∞ # all positive integers, zero included -∞ < x <= 0 # all negative integers, zero included 0.0 <= x < ∞ # all positive floats, zero included -∞ < x <= 0.0 # all negative floats, zero included 0 <= x <= 10 # some inclusive int interval 0 < x < 10 # some exclusive int interval 0 < x <= 10 # some int interval, combined boundaries 0 <= x < 10 # some int interval, combined boundaries 0.0 <= x <= 10.0 # some inclusive float interval 0.0 < x < 10.0 # some exclusive float interval 0.0 < x <= 10.0 # some float interval, combined boundaries 0.0 <= x < 10.0 # some float interval, combined boundaries

And versions of all of these with step arguments.

A way around the inclusive vs exclusive problem might be to define some sort of indicator, or function, to modify the boundaries.

Here's several ideas of how to spell the number line 0 < x <=10:

...
...
...
x: int[excv(0):incv(10)] # where incv and excv stand for "inclusive" and "exclusive" x: int[@0:10] # where the @ is read "almost" or "about" or "around" -- the idea is all boundaries are inclusive by default and the @ makes a boundary exclusive x: int[(0,"incv"):(10,"excv")] # here we just pass tuples to the start and stop slice arguments, and the text indicates whether each boundary is inclusive (incv) or exclusive (excv)

I think the third is my favorite - it reads pretty nicely. Although it would be nicer if it didn't need the parentheses. The second one is interesting though.

But of course regardless of which of these spellings (or some other) is preferred, the default behavior would still need to be defined.

--- Ricky.

"I've never met a Kentucky man who wasn't either thinking about going home or actually going home." - Happy Chandler

_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/WAIJ2O... Code of Conduct: http://python.org/psf/codeofconduct/

-- --Guido (mobile)

Greg Ewing

6:53 p.m.

On 9/08/20 10:43 am, Guido van Rossum wrote:

...

But integer ranges? I guess they would be useful to catch array indexing mistakes,

I'm not sure they would. For an index error to be caught at compile time, the thing being indexed has to have a size known to the compiler. This was always the case in Pascal, and sometimes is in C/C++/C#/Java, but almost never is in Python. -- Greg

David Mertz

3:53 p.m.

On Sat, Aug 8, 2020 at 5:49 PM Michael Smith <michael@smith-li.com> wrote:

...

This kind of thing is so powerful, and I would love to see tooling capable of it in the python ecosystem. I believe folks who say it's very hard to implement correctly, but I don't know that that's a good reason not to make the proposed range syntax legal for some primitive types like int. Tools like mypy can choose not to handle int[A:B] until they are ready, or forever, but someone else can still work on it.

As Steven notes, there *are* some compilers in static languages that do a limited amount of this. But the cases where it is *provable* that some bounds are not exceeded are very limited, or take a great deal of work to construct the proof. I don't feel like Mypy or similar tools are going to stray quite there. And in those static languages, a great deal of the time, they must give up and just insert assertions to raise exceptions if the bounds are violated rather than *proving* the constraint. I slightly regret using the randint() example because you and I can so very easily recognize that `random.randint(0, 1_000_000_001)` MIGHT return a result of a billion-and-one. That was just to have something short. But a great many computations, even ones that mathematically HAVE bounds, are not easy to demonstrate the bounds. Let me try again, still slightly hand-wavey. def final_number(seed: int): "Something mathy with primes, modulo arithmetic, factorization, etc" x = seed while some_condition(x): while other_condition(x): if my_predicate(x): x += 1 elif other_predicate(x): x += 2 else: x -= 1 def my_fun(i: int[1:1_000_000]): pass my_fun(final_number(42)) Depending on what the called functions do, this is the sort of thing where number theorists might prove bounds. But the bounds might depend in weird, chaotic ways on the seed. For some it will fly off to infinity or negative infinity. For others it bounces around in a small cycle. For others it bounces around for billions of times through the loop within small bounds until flying off. It's really NOT that hard to write functions like that, even without really realizing their oddness. Yours, David... P.S. Compilers can do shockingly clever things. LLVM has an optimization that took me by surprise. I was playing around with it via Numba a few years ago. Actually, for teaching material; my intention was just to say "Look, this is way faster than plain Python." But here's the case I stumbled on: Folks all probably know the story of Gauss in elementary school in which (at least in anecdote) the teacher assigned the students to add all the integers from 1 to 100 so she could get some coffee or whatever. Gauss had the solution in a couple seconds to her annoyance. Because sum(range(1, N+1)) == (N*(N+1))/2 by some straightforward reasoning that most 10 year olds don't see (and in the story, neither did the teacher). Anyway, my silly example for my training students (i.e. working programmers and scientists) was loop to 100, add latest, with and without Numba. Then do it up to a billion. My intentions was "Numba is lots faster, but has a minor JIT overhead" ... my discovery was that "LLVM figures out Gauss' simplification and does it in constant time no matter the N. After that I looked at the LLVM bytecode to see, "Yup, it does." The optimizer is pretty smart about variations in writing the code in some slightly different ways, but I don't know exactly what it would take to fool it into missing the optimization. I asked one of the LLVM developers how I could find out what optimizations are there. Their answer was "Become a core developer and study the source code of LLVM." Apparently the Gauss thing isn't actually documented anyway outside the code. :-) -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

2QdxY4RzWzUUiLuE＠potatochowder.com

4:19 p.m.

On 2020-08-08 at 18:53:36 -0400, David Mertz <mertz@gnosis.cx> wrote:

...

... my discovery was that "LLVM figures out Gauss' simplification and does it in constant time no matter the N. After that I looked at the LLVM bytecode to see, "Yup, it does." The optimizer is pretty smart about variations in writing the code in some slightly different ways, but I don't know exactly what it would take to fool it into missing the optimization.

I was trying to learn something about 80X86 (where X might have been the empty string) assembly language by examing the output from Microsoft's C compiler. It made the same optimization, thus thwarting that particular effort, and that was probably 35 years ago now. For small N, it's really just constant folding, loop unrolling, and peephole optimizations; these days, "small N" might be a billion or 2**32. The point isn't that it't not suprising, of course, but that it's not something new.

Steven D'Aprano

10:32 p.m.

On Fri, Aug 07, 2020 at 05:44:31PM -0400, Ricky Teachey wrote:

...

Would it make good semantic sense- and be useful- to specify valid numerical ranges using slices and type-hint syntax? My suggestion would be to, at minimum, provide this functionality for int and float.

We know that syntactically we can write an annotation like `int[1:10]`, even if it is a runtime TypeError. The question would be to ask mypy, and maybe some of the other type checkers, whether they are capable of and interested in doing static bounds checking. Unless they are interested in the feature, having ints support it would be a waste of time. For compatibility with slicing and range(), we would surely want int ranges to be half open: int[1:100] # 1-99 inclusive, 100 exclused but for floats, half-open intervals are a real pain, as you suggest. How do I specify an upper bound of exactly math.pi, say? A runtime check is easy: x <= math.pi but specifying it as an open interval requires me to know that 3.1415926535897936 is the next float greater than math.pi. -- Steven

1638

Age (days ago)

1640

Last active (days ago)

List overview

Download

20 comments

9 participants

participants (9)

2QdxY4RzWzUUiLuE＠potatochowder.com
David Mertz
Dominik Vilsmeier
Greg Ewing
Guido van Rossum
Michael Smith
Paul Moore
Ricky Teachey
Steven D'Aprano

use type hints and slices to specify a valid numerical range, example: `Angle = int[0:361]`

tags

participants (9)