Re: [Numpy-discussion] making "low" optional in numpy.randint
Hello all, I have a PR open here <https://github.com/numpy/numpy/pull/7151> that makes "low" an optional parameter in numpy.randint and introduces new behavior into the API as follows: 1) `low == None` and `high == None` Numbers are generated over the range `[lowbnd, highbnd)`, where `lowbnd = np.iinfo(dtype).min`, and `highbnd = np.iinfo(dtype).max`, where `dtype` is the provided integral type. 2) `low != None` and `high == None` If `low >= 0`, numbers are <b>still</b> generated over the range `[0, low)`, but if `low` < 0, numbers are generated over the range `[low, highbnd)`, where `highbnd` is defined as above. 3) `low == None` and `high != None` Numbers are generated over the range `[lowbnd, high)`, where `lowbnd` is defined as above. The primary motivation was the second case, as it is more convenient to specify a 'dtype' by itself when generating such numbers in a similar vein to numpy.empty, except with initialized values. Looking forward to your feedback! Greg
Behavior of random integer generation: Python randint [a,b] MATLAB randi [a,b] Mma RandomInteger [a,b] haskell randomR [a,b] GAUSS rndi [a,b] Maple rand [a,b] In short, NumPy's `randint` is non-standard (and, I would add, non-intuitive). Presumably was due due to relying on a float draw from [0,1) along with the use of floor. The divergence in behavior between the (later) Python function of the same name is particularly unfortunate. So I suggest further work on this function is not called for, and use of `random_integers` should be encouraged. Probably NumPy's `randint` should be deprecated. If there is any playing with the interface, I think Mma provides a pretty good model. If I were designing the interface, I would always require a tuple argument (for the inclusive range), with possible `None` values to imply datatype extreme values. Proposed name (after `randint` deprecation): `randints`. Cheers, Alan Isaac
On Wed, Feb 17, 2016 at 4:40 PM, Alan Isaac <alan.isaac@gmail.com> wrote:
Behavior of random integer generation: Python randint [a,b] MATLAB randi [a,b] Mma RandomInteger [a,b] haskell randomR [a,b] GAUSS rndi [a,b] Maple rand [a,b]
In short, NumPy's `randint` is non-standard (and, I would add, non-intuitive). Presumably was due due to relying on a float draw from [0,1) along with the use of floor.
No, never was. It is implemented so because Python uses semi-open integer intervals by preference because it plays most nicely with 0-based indexing. Not sure about all of those systems, but some at least are 1-based indexing, so closed intervals do make sense. The Python stdlib's random.randint() closed interval is considered a mistake by python-dev leading to the implementation and preference for random.randrange() instead.
The divergence in behavior between the (later) Python function of the same name is particularly unfortunate.
Indeed, but unfortunately, this mistake dates way back to Numeric times, and easing the migration to numpy was a priority in the heady days of numpy 1.0.
So I suggest further work on this function is not called for, and use of `random_integers` should be encouraged. Probably NumPy's `randint` should be deprecated.
Not while I'm here. Instead, `random_integers()` is discouraged and perhaps might eventually be deprecated. -- Robert Kern
Actually, it has already been deprecated because I did it myself. :) On Wed, Feb 17, 2016 at 4:46 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Wed, Feb 17, 2016 at 4:40 PM, Alan Isaac <alan.isaac@gmail.com> wrote:
Behavior of random integer generation: Python randint [a,b] MATLAB randi [a,b] Mma RandomInteger [a,b] haskell randomR [a,b] GAUSS rndi [a,b] Maple rand [a,b]
In short, NumPy's `randint` is non-standard (and, I would add, non-intuitive). Presumably was due due to relying on a float draw from [0,1) along with the use of floor.
No, never was. It is implemented so because Python uses semi-open integer intervals by preference because it plays most nicely with 0-based indexing. Not sure about all of those systems, but some at least are 1-based indexing, so closed intervals do make sense.
The Python stdlib's random.randint() closed interval is considered a mistake by python-dev leading to the implementation and preference for random.randrange() instead.
The divergence in behavior between the (later) Python function of the same name is particularly unfortunate.
Indeed, but unfortunately, this mistake dates way back to Numeric times, and easing the migration to numpy was a priority in the heady days of numpy 1.0.
So I suggest further work on this function is not called for, and use of `random_integers` should be encouraged. Probably NumPy's `randint` should be deprecated.
Not while I'm here. Instead, `random_integers()` is discouraged and perhaps might eventually be deprecated.
-- Robert Kern
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Perhaps, but we are not coding in Haskell. We are coding in Python, and the standard is that the endpoint is excluded, which renders your point moot I'm afraid. On Wed, Feb 17, 2016 at 5:10 PM, Alan Isaac <alan.isaac@gmail.com> wrote:
On 2/17/2016 11:46 AM, Robert Kern wrote:
some at least are 1-based indexing, so closed intervals do make sense.
Haskell is 0-indexed. And quite carefully thought out, imo.
Cheers, Alan
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
On 2/17/2016 12:28 PM, G Young wrote:
Perhaps, but we are not coding in Haskell. We are coding in Python, and the standard is that the endpoint is excluded, which renders your point moot I'm afraid.
I am not sure what "standard" you are talking about. I thought we were talking about the user interface. Nobody is proposing changing the behavior of `range`. That is an entirely separate question. I'm not trying to change any minds, but let's not rely on spurious arguments. Cheers, Alan
On Wed, Feb 17, 2016 at 8:30 PM, Alan Isaac <alan.isaac@gmail.com> wrote:
On 2/17/2016 12:28 PM, G Young wrote:
Perhaps, but we are not coding in Haskell. We are coding in Python, and the standard is that the endpoint is excluded, which renders your point moot I'm afraid.
I am not sure what "standard" you are talking about. I thought we were talking about the user interface.
It is a persistent and consistent convention (i.e. "standard") across Python APIs that deal with integer ranges (range(), slice(), random.randrange(), ...), particularly those that end up related to indexing; e.g. `x[np.random.randint(0, len(x))]` to pull a random sample from an array. random.randint() was the one big exception, and it was considered a mistake for that very reason, soft-deprecated in favor of random.randrange(). -- Robert Kern
On 2/17/2016 3:42 PM, Robert Kern wrote:
random.randint() was the one big exception, and it was considered a mistake for that very reason, soft-deprecated in favor of random.randrange().
randrange also has its detractors: https://code.activestate.com/lists/python-dev/138358/ and following. I think if we start citing persistant conventions, the persistent convention across *many* languages that the bounds provided for a random integer range are inclusive also counts for something, especially when the names are essentially shared. But again, I am just trying to be clear about what is at issue, not push for a change. I think citing non-existent standards is not helpful. I think the discrepancy between the Python standard library and numpy for a function going by a common name is harmful. (But then, I teach.) fwiw, Alan
Also fwiw, I think the 0-based, half-open interval is one of the best features of Python indexing and yes, I do use random integers to index into my arrays and would not appreciate having to litter my code with "-1" everywhere. On Thu, Feb 18, 2016 at 10:29 AM, Alan Isaac <alan.isaac@gmail.com> wrote:
On 2/17/2016 3:42 PM, Robert Kern wrote:
random.randint() was the one big exception, and it was considered a mistake for that very reason, soft-deprecated in favor of random.randrange().
randrange also has its detractors: https://code.activestate.com/lists/python-dev/138358/ and following.
I think if we start citing persistant conventions, the persistent convention across *many* languages that the bounds provided for a random integer range are inclusive also counts for something, especially when the names are essentially shared.
But again, I am just trying to be clear about what is at issue, not push for a change. I think citing non-existent standards is not helpful. I think the discrepancy between the Python standard library and numpy for a function going by a common name is harmful. (But then, I teach.)
fwiw,
Alan
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Your statement is a little self-contradictory, but in any case, you shouldn't worry about random_integers getting removed from the code-base. However, it has been deprecated in favor of randint. On Wed, Feb 17, 2016 at 11:48 PM, Juan Nunez-Iglesias <jni.soma@gmail.com> wrote:
Also fwiw, I think the 0-based, half-open interval is one of the best features of Python indexing and yes, I do use random integers to index into my arrays and would not appreciate having to litter my code with "-1" everywhere.
On Thu, Feb 18, 2016 at 10:29 AM, Alan Isaac <alan.isaac@gmail.com> wrote:
On 2/17/2016 3:42 PM, Robert Kern wrote:
random.randint() was the one big exception, and it was considered a mistake for that very reason, soft-deprecated in favor of random.randrange().
randrange also has its detractors: https://code.activestate.com/lists/python-dev/138358/ and following.
I think if we start citing persistant conventions, the persistent convention across *many* languages that the bounds provided for a random integer range are inclusive also counts for something, especially when the names are essentially shared.
But again, I am just trying to be clear about what is at issue, not push for a change. I think citing non-existent standards is not helpful. I think the discrepancy between the Python standard library and numpy for a function going by a common name is harmful. (But then, I teach.)
fwiw,
Alan
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
LOL "random integers" != "random_integers". =D On Thu, Feb 18, 2016 at 10:52 AM, G Young <gfyoung17@gmail.com> wrote:
Your statement is a little self-contradictory, but in any case, you shouldn't worry about random_integers getting removed from the code-base. However, it has been deprecated in favor of randint.
On Wed, Feb 17, 2016 at 11:48 PM, Juan Nunez-Iglesias <jni.soma@gmail.com> wrote:
Also fwiw, I think the 0-based, half-open interval is one of the best features of Python indexing and yes, I do use random integers to index into my arrays and would not appreciate having to litter my code with "-1" everywhere.
On Thu, Feb 18, 2016 at 10:29 AM, Alan Isaac <alan.isaac@gmail.com> wrote:
On 2/17/2016 3:42 PM, Robert Kern wrote:
random.randint() was the one big exception, and it was considered a mistake for that very reason, soft-deprecated in favor of random.randrange().
randrange also has its detractors: https://code.activestate.com/lists/python-dev/138358/ and following.
I think if we start citing persistant conventions, the persistent convention across *many* languages that the bounds provided for a random integer range are inclusive also counts for something, especially when the names are essentially shared.
But again, I am just trying to be clear about what is at issue, not push for a change. I think citing non-existent standards is not helpful. I think the discrepancy between the Python standard library and numpy for a function going by a common name is harmful. (But then, I teach.)
fwiw,
Alan
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
He was talking consistently about "random integers" not "random_integers()". :-) On Wednesday, 17 February 2016, G Young <gfyoung17@gmail.com> wrote:
Your statement is a little self-contradictory, but in any case, you shouldn't worry about random_integers getting removed from the code-base. However, it has been deprecated in favor of randint.
On Wed, Feb 17, 2016 at 11:48 PM, Juan Nunez-Iglesias <jni.soma@gmail.com <javascript:_e(%7B%7D,'cvml','jni.soma@gmail.com');>> wrote:
Also fwiw, I think the 0-based, half-open interval is one of the best features of Python indexing and yes, I do use random integers to index into my arrays and would not appreciate having to litter my code with "-1" everywhere.
On Thu, Feb 18, 2016 at 10:29 AM, Alan Isaac <alan.isaac@gmail.com <javascript:_e(%7B%7D,'cvml','alan.isaac@gmail.com');>> wrote:
On 2/17/2016 3:42 PM, Robert Kern wrote:
random.randint() was the one big exception, and it was considered a mistake for that very reason, soft-deprecated in favor of random.randrange().
randrange also has its detractors: https://code.activestate.com/lists/python-dev/138358/ and following.
I think if we start citing persistant conventions, the persistent convention across *many* languages that the bounds provided for a random integer range are inclusive also counts for something, especially when the names are essentially shared.
But again, I am just trying to be clear about what is at issue, not push for a change. I think citing non-existent standards is not helpful. I think the discrepancy between the Python standard library and numpy for a function going by a common name is harmful. (But then, I teach.)
fwiw,
Alan
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org <javascript:_e(%7B%7D,'cvml','NumPy-Discussion@scipy.org');> https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org <javascript:_e(%7B%7D,'cvml','NumPy-Discussion@scipy.org');> https://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Robert Kern
On 2/17/2016 6:48 PM, Juan Nunez-Iglesias wrote:
Also fwiw, I think the 0-based, half-open interval is one of the best features of Python indexing and yes, I do use random integers to index into my arrays and would not appreciate having to litter my code with "-1" everywhere.
http://docs.scipy.org/doc/numpy-1.10.0/reference/generated /numpy.random.choice.html fwiw, Alan Isaac
Notice the limitation "1D array-like". On Thu, Feb 18, 2016 at 10:59 AM, Alan Isaac <alan.isaac@gmail.com> wrote:
On 2/17/2016 6:48 PM, Juan Nunez-Iglesias wrote:
Also fwiw, I think the 0-based, half-open interval is one of the best features of Python indexing and yes, I do use random integers to index into my arrays and would not appreciate having to litter my code with "-1" everywhere.
http://docs.scipy.org/doc/numpy-1.10.0/reference/generated /numpy.random.choice.html
fwiw, Alan Isaac
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
On 2/17/2016 7:01 PM, Juan Nunez-Iglesias wrote:
Notice the limitation "1D array-like".
http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.random.choi... "If an int, the random sample is generated as if a was np.arange(n)" hth, Alan Isaac
Ah! Touché! =) My last and admittedly weak defense is that I've been writing numpy since before 1.7. =) On Thu, Feb 18, 2016 at 11:08 AM, Alan Isaac <alan.isaac@gmail.com> wrote:
On 2/17/2016 7:01 PM, Juan Nunez-Iglesias wrote:
Notice the limitation "1D array-like".
http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.random.choi... "If an int, the random sample is generated as if a was np.arange(n)"
hth,
Alan Isaac
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
On Wed, Feb 17, 2016 at 7:17 PM, Juan Nunez-Iglesias <jni.soma@gmail.com> wrote:
Ah! Touché! =) My last and admittedly weak defense is that I've been writing numpy since before 1.7. =)
On Thu, Feb 18, 2016 at 11:08 AM, Alan Isaac <alan.isaac@gmail.com> wrote:
On 2/17/2016 7:01 PM, Juan Nunez-Iglesias wrote:
Notice the limitation "1D array-like".
http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.random.choi... "If an int, the random sample is generated as if a was np.arange(n)"
(un)related aside: my R doc quote about "may lead to undesired behavior" refers to this, IIRC, R's `sample` was the inspiration for this function but numpy distinguishes scalar from one element (1D) arrays
for i in range(3, 10): np.random.choice(np.arange(10)[i:])
Josef
hth,
Alan Isaac
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Joe: fair enough. A separate function seems more reasonable. Perhaps it was a wording thing, but you kept saying "wrapper," which is not the same as a separate function. Josef: I don't think we are making people think more. They're all keyword arguments, so if you don't want to think about them, then you leave them as the defaults, and everyone is happy. The 'dtype' keyword was needed by someone who wanted to generate a large array of uint8 random integers and could not just as call 'astype' due to memory constraints. I would suggest you read this issue here <https://github.com/numpy/numpy/issues/6790> and the PR's that followed so that you have a better understanding as to why this 'weird' behavior was chosen. On Wed, Feb 17, 2016 at 8:30 PM, Alan Isaac <alan.isaac@gmail.com> wrote:
On 2/17/2016 12:28 PM, G Young wrote:
Perhaps, but we are not coding in Haskell. We are coding in Python, and the standard is that the endpoint is excluded, which renders your point moot I'm afraid.
I am not sure what "standard" you are talking about. I thought we were talking about the user interface.
Nobody is proposing changing the behavior of `range`. That is an entirely separate question.
I'm not trying to change any minds, but let's not rely on spurious arguments.
Cheers, Alan
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Josef: I don't think we are making people think more. They're all keyword arguments, so if you don't want to think about them, then you leave
On Wed, Feb 17, 2016 at 8:43 PM, G Young <gfyoung17@gmail.com> wrote: them as the defaults, and everyone is happy. I believe that Josef has the code's reader in mind, not the code's writer. As a reader of other people's code (and I count 6-months-ago-me as one such "other people"), I am sure to eventually encounter all of the different variants, so I will need to know all of them. -- Robert Kern
I sense that this issue is now becoming more of "randint has become too complicated" I suppose we could always "add" more functions that present simpler interfaces, though if you really do want simple, there's always Python's random library you can use. On Wed, Feb 17, 2016 at 8:48 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Wed, Feb 17, 2016 at 8:43 PM, G Young <gfyoung17@gmail.com> wrote:
Josef: I don't think we are making people think more. They're all keyword arguments, so if you don't want to think about them, then you leave them as the defaults, and everyone is happy.
I believe that Josef has the code's reader in mind, not the code's writer. As a reader of other people's code (and I count 6-months-ago-me as one such "other people"), I am sure to eventually encounter all of the different variants, so I will need to know all of them.
-- Robert Kern
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
On Wed, Feb 17, 2016 at 3:58 PM, G Young <gfyoung17@gmail.com> wrote:
I sense that this issue is now becoming more of "randint has become too complicated" I suppose we could always "add" more functions that present simpler interfaces, though if you really do want simple, there's always Python's random library you can use.
On Wed, Feb 17, 2016 at 8:48 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Wed, Feb 17, 2016 at 8:43 PM, G Young <gfyoung17@gmail.com> wrote:
Josef: I don't think we are making people think more. They're all keyword arguments, so if you don't want to think about them, then you leave them as the defaults, and everyone is happy.
I believe that Josef has the code's reader in mind, not the code's writer. As a reader of other people's code (and I count 6-months-ago-me as one such "other people"), I am sure to eventually encounter all of the different variants, so I will need to know all of them.
I have mostly the users in mind (i.e. me). I like simple patterns where I don't have to stare at a docstring for five minutes to understand it, or pull it up again each time I use it. dtype for storage is different from dtype as distribution parameter. --- aside, since I just read this https://news.ycombinator.com/item?id=11112763 what to avoid. you save a few keystrokes and spend months trying to figure out what's going on. (exaggerated) "*Note* that this convenience feature may lead to undesired behaviour when ..." from R docs Josef
-- Robert Kern
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
On Mi, 2016-02-17 at 20:48 +0000, Robert Kern wrote:
On Wed, Feb 17, 2016 at 8:43 PM, G Young <gfyoung17@gmail.com> wrote:
Josef: I don't think we are making people think more. They're all keyword arguments, so if you don't want to think about them, then you leave them as the defaults, and everyone is happy.
I believe that Josef has the code's reader in mind, not the code's writer. As a reader of other people's code (and I count 6-months-ago -me as one such "other people"), I am sure to eventually encounter all of the different variants, so I will need to know all of them.
Completely agree. Greg, if you need more then a few minutes to explain it in this case, there seems little point. It seems to me even the worst cases of your examples would be covered by writing code like: np.random.randint(np.iinfo(np.uint8).min, 10, dtype=np.uint8) And *everyone* will immediately know what is meant with just minor extra effort for writing it. We should keep the analogy to "range" as much as possible. Anything going far beyond that, can be confusing. On first sight I am not convinced that there is a serious convenience gain by doing magic here, but this is a simple case: "Explicit is better then implicit" since writing the explicit code is easy. It might also create weird bugs if the completely unexpected (most users would probably not even realize it existed) happens and you get huge numbers because you happened to have a `low=0` in there. Especially your point 2) seems confusing. As for 3) if I see `np.random.randint(high=3)` I think I would assume [0, 3).... Additionally, I am not sure the maximum int range is such a common need anyway? - Sebastian
-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
On Mi, 2016-02-17 at 22:10 +0100, Sebastian Berg wrote:
On Mi, 2016-02-17 at 20:48 +0000, Robert Kern wrote:
On Wed, Feb 17, 2016 at 8:43 PM, G Young <gfyoung17@gmail.com> wrote:
Josef: I don't think we are making people think more. They're all keyword arguments, so if you don't want to think about them, then you leave them as the defaults, and everyone is happy.
I believe that Josef has the code's reader in mind, not the code's writer. As a reader of other people's code (and I count 6-months -ago -me as one such "other people"), I am sure to eventually encounter all of the different variants, so I will need to know all of them.
Completely agree. Greg, if you need more then a few minutes to explain it in this case, there seems little point. It seems to me even the worst cases of your examples would be covered by writing code like:
np.random.randint(np.iinfo(np.uint8).min, 10, dtype=np.uint8)
And *everyone* will immediately know what is meant with just minor extra effort for writing it. We should keep the analogy to "range" as much as possible. Anything going far beyond that, can be confusing. On first sight I am not convinced that there is a serious convenience gain by doing magic here, but this is a simple case:
"Explicit is better then implicit"
since writing the explicit code is easy. It might also create weird bugs if the completely unexpected (most users would probably not even realize it existed) happens and you get huge numbers because you happened to have a `low=0` in there. Especially your point 2) seems confusing. As for 3) if I see `np.random.randint(high=3)` I think I would assume [0, 3)....
OK, that was silly, that is what happens of course. So it is explicit in the sense that you have pass in at least one `None` explicitly. But I am still not sure that the added convenience is big and easy to understand [1], if it was always lowest for low and highest for high, I remember get it, but it seems more complex (though None does also look a a bit like "default" and "default" is 0 for low). - Sebastian [1] As in the trade-off between added complexity vs. added convenience.
Additionally, I am not sure the maximum int range is such a common need anyway?
- Sebastian
-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
"Explicit is better than implicit" - can't argue with that. It doesn't seem like the PR has gained much traction, so I'll close it. On Wed, Feb 17, 2016 at 9:27 PM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Mi, 2016-02-17 at 22:10 +0100, Sebastian Berg wrote:
On Mi, 2016-02-17 at 20:48 +0000, Robert Kern wrote:
On Wed, Feb 17, 2016 at 8:43 PM, G Young <gfyoung17@gmail.com> wrote:
Josef: I don't think we are making people think more. They're all keyword arguments, so if you don't want to think about them, then you leave them as the defaults, and everyone is happy.
I believe that Josef has the code's reader in mind, not the code's writer. As a reader of other people's code (and I count 6-months -ago -me as one such "other people"), I am sure to eventually encounter all of the different variants, so I will need to know all of them.
Completely agree. Greg, if you need more then a few minutes to explain it in this case, there seems little point. It seems to me even the worst cases of your examples would be covered by writing code like:
np.random.randint(np.iinfo(np.uint8).min, 10, dtype=np.uint8)
And *everyone* will immediately know what is meant with just minor extra effort for writing it. We should keep the analogy to "range" as much as possible. Anything going far beyond that, can be confusing. On first sight I am not convinced that there is a serious convenience gain by doing magic here, but this is a simple case:
"Explicit is better then implicit"
since writing the explicit code is easy. It might also create weird bugs if the completely unexpected (most users would probably not even realize it existed) happens and you get huge numbers because you happened to have a `low=0` in there. Especially your point 2) seems confusing. As for 3) if I see `np.random.randint(high=3)` I think I would assume [0, 3)....
OK, that was silly, that is what happens of course. So it is explicit in the sense that you have pass in at least one `None` explicitly.
But I am still not sure that the added convenience is big and easy to understand [1], if it was always lowest for low and highest for high, I remember get it, but it seems more complex (though None does also look a a bit like "default" and "default" is 0 for low).
- Sebastian
[1] As in the trade-off between added complexity vs. added convenience.
Additionally, I am not sure the maximum int range is such a common need anyway?
- Sebastian
-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
On Mi, 2016-02-17 at 21:53 +0000, G Young wrote:
"Explicit is better than implicit" - can't argue with that. It doesn't seem like the PR has gained much traction, so I'll close it.
Thanks for the effort though! Sometimes we get a bit carried away with doing fancy stuff, and I guess the idea is likely a bit too fancy for wide application. - Sebastian
On Wed, Feb 17, 2016 at 9:27 PM, Sebastian Berg < sebastian@sipsolutions.net> wrote:
On Mi, 2016-02-17 at 20:48 +0000, Robert Kern wrote:
On Wed, Feb 17, 2016 at 8:43 PM, G Young <gfyoung17@gmail.com> wrote:
Josef: I don't think we are making people think more. They're all keyword arguments, so if you don't want to think about them,
you leave them as the defaults, and everyone is happy.
I believe that Josef has the code's reader in mind, not the code's writer. As a reader of other people's code (and I count 6 -months -ago -me as one such "other people"), I am sure to eventually encounter all of the different variants, so I will need to know all of
Completely agree. Greg, if you need more then a few minutes to explain it in this case, there seems little point. It seems to me even
worst cases of your examples would be covered by writing code
On Mi, 2016-02-17 at 22:10 +0100, Sebastian Berg wrote: then them. the like:
np.random.randint(np.iinfo(np.uint8).min, 10, dtype=np.uint8)
And *everyone* will immediately know what is meant with just
minor
extra effort for writing it. We should keep the analogy to "range" as much as possible. Anything going far beyond that, can be confusing. On first sight I am not convinced that there is a serious convenience gain by doing magic here, but this is a simple case:
"Explicit is better then implicit"
since writing the explicit code is easy. It might also create weird bugs if the completely unexpected (most users would probably not even realize it existed) happens and you get huge numbers because you happened to have a `low=0` in there. Especially your point 2) seems confusing. As for 3) if I see `np.random.randint(high=3)` I think I would assume [0, 3)....
OK, that was silly, that is what happens of course. So it is explicit in the sense that you have pass in at least one `None` explicitly.
But I am still not sure that the added convenience is big and easy to understand [1], if it was always lowest for low and highest for high, I remember get it, but it seems more complex (though None does also look a a bit like "default" and "default" is 0 for low).
- Sebastian
[1] As in the trade-off between added complexity vs. added convenience.
Additionally, I am not sure the maximum int range is such a common need anyway?
- Sebastian
-- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
On Wed, Feb 17, 2016 at 10:01 AM, G Young <gfyoung17@gmail.com> wrote:
Hello all,
I have a PR open here <https://github.com/numpy/numpy/pull/7151> that makes "low" an optional parameter in numpy.randint and introduces new behavior into the API as follows:
1) `low == None` and `high == None`
Numbers are generated over the range `[lowbnd, highbnd)`, where `lowbnd = np.iinfo(dtype).min`, and `highbnd = np.iinfo(dtype).max`, where `dtype` is the provided integral type.
2) `low != None` and `high == None`
If `low >= 0`, numbers are <b>still</b> generated over the range `[0, low)`, but if `low` < 0, numbers are generated over the range `[low, highbnd)`, where `highbnd` is defined as above.
3) `low == None` and `high != None`
Numbers are generated over the range `[lowbnd, high)`, where `lowbnd` is defined as above.
My impression (*) is that this will be confusing, and uses a default that I never ever needed. Maybe a better way would be to use low=-np.inf and high=np.inf where inf would be interpreted as the smallest and largest representable number. And leave the defaults unchanged. (*) I didn't try to understand how it works for various cases. Josef
The primary motivation was the second case, as it is more convenient to specify a 'dtype' by itself when generating such numbers in a similar vein to numpy.empty, except with initialized values.
Looking forward to your feedback!
Greg
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
On Wed, Feb 17, 2016 at 1:37 PM, <josef.pktd@gmail.com> wrote:
On Wed, Feb 17, 2016 at 10:01 AM, G Young <gfyoung17@gmail.com> wrote:
Hello all,
I have a PR open here that makes "low" an optional parameter in numpy.randint and introduces new behavior into the API as follows:
1) `low == None` and `high == None`
Numbers are generated over the range `[lowbnd, highbnd)`, where `lowbnd = np.iinfo(dtype).min`, and `highbnd = np.iinfo(dtype).max`, where `dtype` is the provided integral type.
2) `low != None` and `high == None`
If `low >= 0`, numbers are <b>still</b> generated over the range `[0, low)`, but if `low` < 0, numbers are generated over the range `[low, highbnd)`, where `highbnd` is defined as above.
3) `low == None` and `high != None`
Numbers are generated over the range `[lowbnd, high)`, where `lowbnd` is defined as above.
My impression (*) is that this will be confusing, and uses a default that I never ever needed.
Maybe a better way would be to use low=-np.inf and high=np.inf where inf would be interpreted as the smallest and largest representable number. And leave the defaults unchanged.
(*) I didn't try to understand how it works for various cases.
Josef
As I mentioned on the PR discussion, the thing that bothers me is the inconsistency between the new and the old functionality, specifically in #2. If high is, the behavior is completely different depending on the value of `low`. Using `np.inf` instead of `None` may fix that, although I think that the author's idea was to avoid having to type the bounds in the `None`/`+/-np.inf` cases. I think that a better option is to have a separate wrapper to `randint` that implements this behavior in a consistent manner and leaves the current function consistent as well. -Joe
The primary motivation was the second case, as it is more convenient to specify a 'dtype' by itself when generating such numbers in a similar vein to numpy.empty, except with initialized values.
Looking forward to your feedback!
Greg
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
Yes, you are correct in explaining my intentions. However, as I also mentioned in the PR discussion, I did not quite understand how your wrapper idea would make things any more comprehensive at the cost of additional overhead and complexity. What do you mean by making the functions "consistent" (i.e. outline the behavior *exactly* depending on the inputs)? As I've explained before, and I will state it again, the different behavior for the high=None and low != None case is due to backwards compatibility. On Wed, Feb 17, 2016 at 6:52 PM, Joseph Fox-Rabinovitz < jfoxrabinovitz@gmail.com> wrote:
On Wed, Feb 17, 2016 at 1:37 PM, <josef.pktd@gmail.com> wrote:
On Wed, Feb 17, 2016 at 10:01 AM, G Young <gfyoung17@gmail.com> wrote:
Hello all,
I have a PR open here that makes "low" an optional parameter in numpy.randint and introduces new behavior into the API as follows:
1) `low == None` and `high == None`
Numbers are generated over the range `[lowbnd, highbnd)`, where `lowbnd
np.iinfo(dtype).min`, and `highbnd = np.iinfo(dtype).max`, where `dtype` is the provided integral type.
2) `low != None` and `high == None`
If `low >= 0`, numbers are <b>still</b> generated over the range `[0, low)`, but if `low` < 0, numbers are generated over the range `[low, highbnd)`, where `highbnd` is defined as above.
3) `low == None` and `high != None`
Numbers are generated over the range `[lowbnd, high)`, where `lowbnd` is defined as above.
My impression (*) is that this will be confusing, and uses a default
= that I
never ever needed.
Maybe a better way would be to use low=-np.inf and high=np.inf where inf would be interpreted as the smallest and largest representable number. And leave the defaults unchanged.
(*) I didn't try to understand how it works for various cases.
Josef
As I mentioned on the PR discussion, the thing that bothers me is the inconsistency between the new and the old functionality, specifically in #2. If high is, the behavior is completely different depending on the value of `low`. Using `np.inf` instead of `None` may fix that, although I think that the author's idea was to avoid having to type the bounds in the `None`/`+/-np.inf` cases. I think that a better option is to have a separate wrapper to `randint` that implements this behavior in a consistent manner and leaves the current function consistent as well.
-Joe
The primary motivation was the second case, as it is more convenient to specify a 'dtype' by itself when generating such numbers in a similar
vein
to numpy.empty, except with initialized values.
Looking forward to your feedback!
Greg
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
My point is that you are proposing to make the overall API have counter-intuitive behavior for the sake of adding a new feature. It is worth a little bit of overhead to have two functions that behave exactly as expected. Josef's footnote is a good example of how people will feel about having to figure out (not to mention remember) the different use cases. I think it is better to keep the current API and just add a "bounded_randint" function for which an input of `None` always means "limit of that bound, no exceptions". -Joe On Wed, Feb 17, 2016 at 2:09 PM, G Young <gfyoung17@gmail.com> wrote:
Yes, you are correct in explaining my intentions. However, as I also mentioned in the PR discussion, I did not quite understand how your wrapper idea would make things any more comprehensive at the cost of additional overhead and complexity. What do you mean by making the functions "consistent" (i.e. outline the behavior exactly depending on the inputs)? As I've explained before, and I will state it again, the different behavior for the high=None and low != None case is due to backwards compatibility.
On Wed, Feb 17, 2016 at 6:52 PM, Joseph Fox-Rabinovitz <jfoxrabinovitz@gmail.com> wrote:
On Wed, Feb 17, 2016 at 1:37 PM, <josef.pktd@gmail.com> wrote:
On Wed, Feb 17, 2016 at 10:01 AM, G Young <gfyoung17@gmail.com> wrote:
Hello all,
I have a PR open here that makes "low" an optional parameter in numpy.randint and introduces new behavior into the API as follows:
1) `low == None` and `high == None`
Numbers are generated over the range `[lowbnd, highbnd)`, where `lowbnd = np.iinfo(dtype).min`, and `highbnd = np.iinfo(dtype).max`, where `dtype` is the provided integral type.
2) `low != None` and `high == None`
If `low >= 0`, numbers are <b>still</b> generated over the range `[0, low)`, but if `low` < 0, numbers are generated over the range `[low, highbnd)`, where `highbnd` is defined as above.
3) `low == None` and `high != None`
Numbers are generated over the range `[lowbnd, high)`, where `lowbnd` is defined as above.
My impression (*) is that this will be confusing, and uses a default that I never ever needed.
Maybe a better way would be to use low=-np.inf and high=np.inf where inf would be interpreted as the smallest and largest representable number. And leave the defaults unchanged.
(*) I didn't try to understand how it works for various cases.
Josef
As I mentioned on the PR discussion, the thing that bothers me is the inconsistency between the new and the old functionality, specifically in #2. If high is, the behavior is completely different depending on the value of `low`. Using `np.inf` instead of `None` may fix that, although I think that the author's idea was to avoid having to type the bounds in the `None`/`+/-np.inf` cases. I think that a better option is to have a separate wrapper to `randint` that implements this behavior in a consistent manner and leaves the current function consistent as well.
-Joe
The primary motivation was the second case, as it is more convenient to specify a 'dtype' by itself when generating such numbers in a similar vein to numpy.empty, except with initialized values.
Looking forward to your feedback!
Greg
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
On Wed, Feb 17, 2016 at 2:09 PM, G Young <gfyoung17@gmail.com> wrote:
Yes, you are correct in explaining my intentions. However, as I also mentioned in the PR discussion, I did not quite understand how your wrapper idea would make things any more comprehensive at the cost of additional overhead and complexity. What do you mean by making the functions "consistent" (i.e. outline the behavior *exactly* depending on the inputs)? As I've explained before, and I will state it again, the different behavior for the high=None and low != None case is due to backwards compatibility.
One problem is that if there is only one positional argument, then I can still figure out that it might have different meanings. If there are two keywords, then I would assume standard python argument interpretation applies. If I want to save on typing, then I think it should be for a more "standard" case. (I also never sample all real numbers, at least not uniformly.) Josef
On Wed, Feb 17, 2016 at 6:52 PM, Joseph Fox-Rabinovitz < jfoxrabinovitz@gmail.com> wrote:
On Wed, Feb 17, 2016 at 1:37 PM, <josef.pktd@gmail.com> wrote:
On Wed, Feb 17, 2016 at 10:01 AM, G Young <gfyoung17@gmail.com> wrote:
Hello all,
I have a PR open here that makes "low" an optional parameter in numpy.randint and introduces new behavior into the API as follows:
1) `low == None` and `high == None`
Numbers are generated over the range `[lowbnd, highbnd)`, where
np.iinfo(dtype).min`, and `highbnd = np.iinfo(dtype).max`, where `dtype` is the provided integral type.
2) `low != None` and `high == None`
If `low >= 0`, numbers are <b>still</b> generated over the range `[0, low)`, but if `low` < 0, numbers are generated over the range `[low, highbnd)`, where `highbnd` is defined as above.
3) `low == None` and `high != None`
Numbers are generated over the range `[lowbnd, high)`, where `lowbnd` is defined as above.
My impression (*) is that this will be confusing, and uses a default
`lowbnd = that I
never ever needed.
Maybe a better way would be to use low=-np.inf and high=np.inf where inf would be interpreted as the smallest and largest representable number. And leave the defaults unchanged.
(*) I didn't try to understand how it works for various cases.
Josef
As I mentioned on the PR discussion, the thing that bothers me is the inconsistency between the new and the old functionality, specifically in #2. If high is, the behavior is completely different depending on the value of `low`. Using `np.inf` instead of `None` may fix that, although I think that the author's idea was to avoid having to type the bounds in the `None`/`+/-np.inf` cases. I think that a better option is to have a separate wrapper to `randint` that implements this behavior in a consistent manner and leaves the current function consistent as well.
-Joe
The primary motivation was the second case, as it is more convenient to specify a 'dtype' by itself when generating such numbers in a similar
vein
to numpy.empty, except with initialized values.
Looking forward to your feedback!
Greg
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
On Wed, Feb 17, 2016 at 2:20 PM, <josef.pktd@gmail.com> wrote:
On Wed, Feb 17, 2016 at 2:09 PM, G Young <gfyoung17@gmail.com> wrote:
Yes, you are correct in explaining my intentions. However, as I also mentioned in the PR discussion, I did not quite understand how your wrapper idea would make things any more comprehensive at the cost of additional overhead and complexity. What do you mean by making the functions "consistent" (i.e. outline the behavior *exactly* depending on the inputs)? As I've explained before, and I will state it again, the different behavior for the high=None and low != None case is due to backwards compatibility.
One problem is that if there is only one positional argument, then I can still figure out that it might have different meanings. If there are two keywords, then I would assume standard python argument interpretation applies.
If I want to save on typing, then I think it should be for a more "standard" case. (I also never sample all real numbers, at least not uniformly.)
One more thing I don't like: So far all distributions are "theoretical" distributions where the distribution depends on the provided shape, location and scale parameters. There is a limitation in how they are represented as numbers/dtype and what range is possible. However, that is not relevant for most use cases. In this case you are promoting `dtype` from a memory or storage parameter to an actual shape (or loc and scale) parameter. That's "weird", and even more so if this would be the default behavior. There is no proper uniform distribution on all integers. So, this forces users to think about the implementation detail like dtype, when I just want a random sample of a probability distribution. Josef
Josef
On Wed, Feb 17, 2016 at 6:52 PM, Joseph Fox-Rabinovitz < jfoxrabinovitz@gmail.com> wrote:
On Wed, Feb 17, 2016 at 1:37 PM, <josef.pktd@gmail.com> wrote:
On Wed, Feb 17, 2016 at 10:01 AM, G Young <gfyoung17@gmail.com> wrote:
Hello all,
I have a PR open here that makes "low" an optional parameter in numpy.randint and introduces new behavior into the API as follows:
1) `low == None` and `high == None`
Numbers are generated over the range `[lowbnd, highbnd)`, where
np.iinfo(dtype).min`, and `highbnd = np.iinfo(dtype).max`, where `dtype` is the provided integral type.
2) `low != None` and `high == None`
If `low >= 0`, numbers are <b>still</b> generated over the range `[0, low)`, but if `low` < 0, numbers are generated over the range `[low, highbnd)`, where `highbnd` is defined as above.
3) `low == None` and `high != None`
Numbers are generated over the range `[lowbnd, high)`, where `lowbnd` is defined as above.
My impression (*) is that this will be confusing, and uses a default
`lowbnd = that I
never ever needed.
Maybe a better way would be to use low=-np.inf and high=np.inf where inf would be interpreted as the smallest and largest representable number. And leave the defaults unchanged.
(*) I didn't try to understand how it works for various cases.
Josef
As I mentioned on the PR discussion, the thing that bothers me is the inconsistency between the new and the old functionality, specifically in #2. If high is, the behavior is completely different depending on the value of `low`. Using `np.inf` instead of `None` may fix that, although I think that the author's idea was to avoid having to type the bounds in the `None`/`+/-np.inf` cases. I think that a better option is to have a separate wrapper to `randint` that implements this behavior in a consistent manner and leaves the current function consistent as well.
-Joe
The primary motivation was the second case, as it is more convenient
to
specify a 'dtype' by itself when generating such numbers in a similar vein to numpy.empty, except with initialized values.
Looking forward to your feedback!
Greg
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (7)
-
Alan Isaac
-
G Young
-
josef.pktd@gmail.com
-
Joseph Fox-Rabinovitz
-
Juan Nunez-Iglesias
-
Robert Kern
-
Sebastian Berg