I just noticed that 1.7 is scheduled to add a random.choice function. I wonder if the best structure has been chosen. Specifically, it does not provide for array flattening, and it does not provide for subarray choice.
Back in 2006 (?) Robert Kern suggested something like the below (forgive any mistranslation). I think the included functionality does belong in a choice function. I do like the provision for repeated sampling in the current proposal, however.
Cheers, Alan Isaac
def choice(x, axis=None): """Select an element or subarray uniformly randomly. If axis is None, then a single element is chosen from the entire array. Otherwise, a subarray is chosen from the given axis. """ x = np.asarray(x) if axis is None: length = np.multiply.reduce(x.shape) n = random.randint(length) return x.flat[n] else: n = random.randint(x.shape[axis]) idx = map(slice, x.shape) idx[axis] = n return x[tuple(idx)]
On Fri, Nov 9, 2012 at 2:17 PM, Alan G Isaac alan.isaac@gmail.com wrote:
I just noticed that 1.7 is scheduled to add a random.choice function. I wonder if the best structure has been chosen. Specifically, it does not provide for array flattening, and it does not provide for subarray choice.
I think in terms of the function currently in numpy master: http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.random.choice....
You write flattening as np.random.choice(a.ravel(), ...) and subarray choice as np.take(a, np.random.choice(a.shape[ax], ...), axis=ax) ?
That said, since it (claims to) only work on 1-d arrays right now, we could always add either or both of these features later without breaking compatibility. So I don't think there's any urgent need to fix this before releasing.
(If you're worried though then you might want to double-check that the np.random.choice in 1.7 actually *does* give an error if the input array is not 1-d.)
-n
On 11/9/2012 12:21 PM, Nathaniel Smith wrote:
you might want to double-check that the np.random.choice in 1.7 actually*does* give an error if the input array is not 1-d
Any idea where I can look at the code? I browsed github after failing to find a productive search string, but failed to find it.
Which remind me: it would be nice if the docs linked to the source.
Thanks, Alan
Hey,
On Mon, 2012-11-12 at 08:48 -0500, Alan G Isaac wrote:
On 11/9/2012 12:21 PM, Nathaniel Smith wrote:
you might want to double-check that the np.random.choice in 1.7 actually*does* give an error if the input array is not 1-d
Any idea where I can look at the code? I browsed github after failing to find a productive search string, but failed to find it.
its here: https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L9...
Sounds like it should be pretty simple to add axis=None which would change the current behavior very little, it would stop give an error anymore for none 1-d arrays though.
Regards,
Sebastian
Which remind me: it would be nice if the docs linked to the source.
Thanks, Alan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On 11/12/2012 8:59 AM, Sebastian Berg wrote:
https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L9...
Sounds like it should be pretty simple to add axis=None which would change the current behavior very little, it would stop give an error anymore for none 1-d arrays though.
I believe Nathaniel suggested *retaining* this error until two things were supported: flattening ndarrays, and choice of subarrays for ndarrays. If I understand you, you are suggesting the first is easily supported. Or are you also suggesting the 2nd is easily supported?
Alan Isaac
PS I'll repost the code (or similar) that Robert Kern posted when this function was discussed in 2006. However it did not support multiple samples.
def choice(x, axis=None): x = np.asarray(x) if axis is None: length = np.multiply.reduce(x.shape) n = random.randint(length) return x.flat[n] else: n = random.randint(x.shape[axis]) idx = map(slice, x.shape) idx[axis] = n return x[tuple(idx)]
On Mon, Nov 12, 2012 at 3:50 PM, Alan G Isaac alan.isaac@gmail.com wrote:
On 11/12/2012 8:59 AM, Sebastian Berg wrote:
https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L9...
Sounds like it should be pretty simple to add axis=None which would change the current behavior very little, it would stop give an error anymore for none 1-d arrays though.
I believe Nathaniel suggested *retaining* this error until two things were supported: flattening ndarrays, and choice of subarrays for ndarrays.
Just to be clear, I don't really have an opinion on whether those things should be supported, or what the right API should be; I haven't really thought about it. Maybe others on the list have opinions. I was just saying that we have plenty of time to decide about these things, it's not an emergency that should hold up the 1.7 release (or risk locking in a bad API).
-n
On 11/12/2012 10:00 AM, Nathaniel Smith wrote:
I don't really have an opinion on whether those things should be supported, or what the right API should be; I haven't really thought about it. Maybe others on the list have opinions. I was just saying that we have plenty of time to decide about these things
OK. For now I've opened an issue for this: https://github.com/numpy/numpy/issues/2724 I assume that's the right place to accumulate comments on the proposal.
Alan
In a comment on the issue https://github.com/numpy/numpy/issues/2724 Sebastian notes: "it could also be reasonable to have size=None as default and have it return a scalar/the given axes removed in that case. That would be a real change in functionality unfortunately, but it would make sense for similarity to import random; random.choice mostly."
If this is under serious consider, then perhaps random.choice should not be in 1.7 unless a decision can be quickly made about the default for `size`. (Allowing an axis argument can however be postponed.)
I am inclined to think that Sebastian's suggestion is correct.
Alan Isaac
On Mon, Nov 12, 2012 at 5:31 PM, Alan G Isaac alan.isaac@gmail.com wrote:
In a comment on the issue https://github.com/numpy/numpy/issues/2724 Sebastian notes: "it could also be reasonable to have size=None as default and have it return a scalar/the given axes removed in that case. That would be a real change in functionality unfortunately, but it would make sense for similarity to import random; random.choice mostly."
If this is under serious consider, then perhaps random.choice should not be in 1.7 unless a decision can be quickly made about the default for `size`. (Allowing an axis argument can however be postponed.)
I am inclined to think that Sebastian's suggestion is correct.
For anyone else trying to follow, here's the current function: http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.random.choice....
I'm afraid I don't understand what Sebastian is suggesting should happen by default. size=None doesn't have any intuitive meaning to me, and I don't know what "a scalar/the given axes removed" means.
-n
On Mon, 2012-11-12 at 17:52 +0100, Nathaniel Smith wrote:
On Mon, Nov 12, 2012 at 5:31 PM, Alan G Isaac alan.isaac@gmail.com wrote:
In a comment on the issue https://github.com/numpy/numpy/issues/2724 Sebastian notes: "it could also be reasonable to have size=None as default and have it return a scalar/the given axes removed in that case. That would be a real change in functionality unfortunately, but it would make sense for similarity to import random; random.choice mostly."
If this is under serious consider, then perhaps random.choice should not be in 1.7 unless a decision can be quickly made about the default for `size`. (Allowing an axis argument can however be postponed.)
I am inclined to think that Sebastian's suggestion is correct.
For anyone else trying to follow, here's the current function: http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.random.choice....
I'm afraid I don't understand what Sebastian is suggesting should happen by default. size=None doesn't have any intuitive meaning to me, and I don't know what "a scalar/the given axes removed" means.
None is a little awkward I agree (but I don't think there is something better), but basically what I meant is this:
random.choice([1, 1])
1
np.random.choice([1, 2])
array([1]) # its 1-D not 0-D.
So instead of taking a sequence of length 1, take an element as default.
-n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On 11/12/2012 12:16 PM, Sebastian Berg wrote:
So instead of taking a sequence of length 1, take an element as default.
Sebastien has proposed that np.random.choice return a single *element* by default, not a 1d array of length 1. He proposes to associate this with a default value of `size=None`.
The motivation: it is more natural, and in particular, it would behave more like Python's random.choice by default.
This decision should be made before this function is part of a release.
Cheers, Alan
On Mon, Nov 12, 2012 at 11:34 PM, Alan G Isaac alan.isaac@gmail.com wrote:
On 11/12/2012 12:16 PM, Sebastian Berg wrote:
So instead of taking a sequence of length 1, take an element as default.
Sebastien has proposed that np.random.choice return a single *element* by default, not a 1d array of length 1. He proposes to associate this with a default value of `size=None`.
The motivation: it is more natural, and in particular, it would behave more like Python's random.choice by default.
This decision should be made before this function is part of a release.
I see, so right now we have
np.random.choice([1, 2, 3])
array([2])
but you're suggesting
np.random.choice([1, 2, 3])
2
np.random.choice([1, 2, 3], size=1)
array([2])
That does seem like an obvious improvement to me, since all the other random functions work that way, e.g.:
In [2]: np.random.normal() Out[2]: -0.8752867990041713
In [4]: np.random.normal(size=1) Out[4]: array([ 1.92803487])
Want to make a pull request?
-n
On 11/12/2012 5:46 PM, Nathaniel Smith wrote:
Want to make a pull request?
Well, I'd be happy to help Sebastien to change the code, but I'm not a git user.
And I'd have some questions. E.g., with `size=None`, couldn't we just call Python's random.choice? And for sampling without replacement, wouldn't it be faster to just call Python's random.sample (rather than implement this as currently done)?
Alan
On Mon, 2012-11-12 at 18:36 -0500, Alan G Isaac wrote:
On 11/12/2012 5:46 PM, Nathaniel Smith wrote:
Want to make a pull request?
Well, I'd be happy to help Sebastien to change the code, but I'm not a git user.
I have created a pull request, but tests are still needed... If you like it would be very nice if you can check it and maybe also write some tests. Git is relatively simple (and likely worth to learn) but even otherwise posting code would be nice.
And I'd have some questions. E.g., with `size=None`, couldn't we just call Python's random.choice? And for sampling without replacement, wouldn't it be faster to just call Python's random.sample (rather than implement this as currently done)?
I don't think the difference should be really noticeable. But even if, I doubt its worth special casing. If someone cares a lot about speed they probably should not use it to get single values anyway. Also for this case of random numbers, it would be a bad idea since you would use a different random number generator and a different seed!
Regards,
Sebastian
Alan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On 11/12/2012 8:18 PM, Sebastian Berg wrote:
I have created a pull request
This is still a bit different than I thought you intended. With `size=None` we don't get an element, but rather a 0d array.
I thought the idea was to return an element in this case?
Alan
On Mon, 2012-11-12 at 22:44 -0500, Alan G Isaac wrote:
On 11/12/2012 8:18 PM, Sebastian Berg wrote:
I have created a pull request
This is still a bit different than I thought you intended. With `size=None` we don't get an element, but rather a 0d array.
You are right, it should not be a 0d array. I overlooked that tuple() does not give the same as None at least least for the random functions.
I thought the idea was to return an element in this case?
Alan
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Mon, Nov 12, 2012 at 2:48 PM, Alan G Isaac alan.isaac@gmail.com wrote:
On 11/9/2012 12:21 PM, Nathaniel Smith wrote:
you might want to double-check that the np.random.choice in 1.7 actually*does* give an error if the input array is not 1-d
Any idea where I can look at the code? I browsed github after failing to find a productive search string, but failed to find it.
Looks like it's in numpy/random/mtrand/mtrand.pyx (cython code)
I was actually thinking you would check by just trying it, since that's an easier and more reliable way to determine what code actually does than reading it :-). (Or even better, writing a test?)
Which remind me: it would be nice if the docs linked to the source.
True, though it's a difficult problem for code like this that goes Cython file -> C -> .so file -> Python. I'm not sure Cython actually preserves any metadata that would let us look at the np.random.choice object in the interpreter and map that back to a line of a source file.
-n