Re: [Python-ideas] [Python-Dev] About adding a new iterator methodcalled "shuffled"

You can already write: sorted(s, key=lambda x: random()) But nobody does that. So you have a good indication that the proposed method isn't need. Raymond

Raymond Hettinger <python@...> writes:
On the other hand, sorting is O(n.log(n)), which is probably sub-optimal for shuffling (I don't know how shuffle() is internally implemented, but ISTM that it shouldn't take more than O(n)). Note that I'm not supporting the original proposal: shuffle() is not used enough to warrant such a shortcut. Regards Antoine.

from random import shuffle shuffle(s) I think it's convenient enough as is. Brandon

On Tue, Mar 24, 2009 at 6:29 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I assumed that the OP was suggesting something of the form: def shuffled(L): while L: i = random.randrange(len(L)) yield L[i] L.pop(i) fixed up somehow so that it's only O(1) to yield each element; in effect, an itertools version of random.sample. I could see uses for this in cases where you only want a few randomly chosen elements from a large list, but don't necessarily know in advance how many elements you need. Mark

Mark Dickinson wrote:
Like this, for example: def shuffled(L): L = list(L) # make a copy, so we don't mutate the argument while L: i = random.randrange(len(L)) yield L[i] L[i] = L[-1] L.pop() Or this: def shuffled(L): D = {} # use a dict to store modified values so we don't have to mutate the argument for j in xrange(len(L)-1, -1, -1): i = random.randrange(j+1) yield D.get(i, L[i]) D[i] = D.get(j, L[j]) The second is a bit slower but avoids copying the whole list up front, which should be better for the kind of uses you mention. And yes, I think it is necessary that it doesn't modify its argument.
- Jacob

On Wed, 25 Mar 2009 03:03:31 am Raymond Hettinger wrote:
That's nice -- not as readable as random.shuffle(s) but still nice. And fast too: on my PC, it is about twice as fast as random.shuffle() for "reasonable" sized lists (tested up to one million items). I don't think randomly shuffling a list is anywhere near common enough a task that it should be a built-in, so -1 on the OP's request, but since we're on the topic, I wonder whether the random.shuffle() implementation should use Raymond's idiom rather than the current Fisher-Yates shuffle? The advantage of F-Y is that it is O(N) instead of O(N*log N) for sorting, but the constant factor makes it actually significantly slower in practice. In addition, the F-Y shuffle is limited by the period of the random number generator: given a period P, it can randomize lists of length n where n! < P. For lists larger than n items, some permutations are unreachable. In the current implementation of random, n equals 2080. I *think* Raymond's idiom suffers from the same limitation, it's hard to imagine that it doesn't, but can anyone confirm this? (In any case, if you're shuffling lists with more than 2080 items, and you care about the statistical properties of the result (as opposed to just "make it somewhat mixed up"), then the current implementation isn't good enough and you'll need to use your own shuffle routine.) Are there any advantages of the current F-Y implementation? It seems to me that Raymond's idiom is no worse statistically, and significantly faster in practice, so it should be the preferred implementation. Thoughts? -- Steven D'Aprano

On Wed, 25 Mar 2009 07:20:00 am Steven D'Aprano wrote:
Ah crap. Ignore the above: I made an embarrassing error in my test (neglected to actually call random inside the lambda) and so my timings were completely wrong. The current random.shuffle() is marginally faster even for small lists (500 items) so I withdraw my suggestion that it be replaced. -- Steven D'Aprano

Steven D'Aprano wrote:
In addition, the F-Y shuffle is limited by the period of the random number generator:
*All* shuffling algorithms are limited by that. Think about it: A shuffling algorithm is a function from a random number to a permutation. There's no way you can get more permutations out than there are random numbers to put in. -- Greg

Greg Ewing <greg.ewing@...> writes:
The period of the generator should be (much) larger than the number of possible random numbers, because of the generator's internal state. (I'm not sure I understood your sentence as you meant it though)

Antoine Pitrou wrote:
The period of the generator should be (much) larger than the number of possible random numbers, because of the generator's internal state.
Hm, yes, I should have said a function from an RNG state to a permutation. The initial state of the RNG completely determines the permutation generated, so there can't be more permutations than states. -- Greg

Raymond Hettinger <python@...> writes:
On the other hand, sorting is O(n.log(n)), which is probably sub-optimal for shuffling (I don't know how shuffle() is internally implemented, but ISTM that it shouldn't take more than O(n)). Note that I'm not supporting the original proposal: shuffle() is not used enough to warrant such a shortcut. Regards Antoine.

from random import shuffle shuffle(s) I think it's convenient enough as is. Brandon

On Tue, Mar 24, 2009 at 6:29 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I assumed that the OP was suggesting something of the form: def shuffled(L): while L: i = random.randrange(len(L)) yield L[i] L.pop(i) fixed up somehow so that it's only O(1) to yield each element; in effect, an itertools version of random.sample. I could see uses for this in cases where you only want a few randomly chosen elements from a large list, but don't necessarily know in advance how many elements you need. Mark

Mark Dickinson wrote:
Like this, for example: def shuffled(L): L = list(L) # make a copy, so we don't mutate the argument while L: i = random.randrange(len(L)) yield L[i] L[i] = L[-1] L.pop() Or this: def shuffled(L): D = {} # use a dict to store modified values so we don't have to mutate the argument for j in xrange(len(L)-1, -1, -1): i = random.randrange(j+1) yield D.get(i, L[i]) D[i] = D.get(j, L[j]) The second is a bit slower but avoids copying the whole list up front, which should be better for the kind of uses you mention. And yes, I think it is necessary that it doesn't modify its argument.
- Jacob

On Wed, 25 Mar 2009 03:03:31 am Raymond Hettinger wrote:
That's nice -- not as readable as random.shuffle(s) but still nice. And fast too: on my PC, it is about twice as fast as random.shuffle() for "reasonable" sized lists (tested up to one million items). I don't think randomly shuffling a list is anywhere near common enough a task that it should be a built-in, so -1 on the OP's request, but since we're on the topic, I wonder whether the random.shuffle() implementation should use Raymond's idiom rather than the current Fisher-Yates shuffle? The advantage of F-Y is that it is O(N) instead of O(N*log N) for sorting, but the constant factor makes it actually significantly slower in practice. In addition, the F-Y shuffle is limited by the period of the random number generator: given a period P, it can randomize lists of length n where n! < P. For lists larger than n items, some permutations are unreachable. In the current implementation of random, n equals 2080. I *think* Raymond's idiom suffers from the same limitation, it's hard to imagine that it doesn't, but can anyone confirm this? (In any case, if you're shuffling lists with more than 2080 items, and you care about the statistical properties of the result (as opposed to just "make it somewhat mixed up"), then the current implementation isn't good enough and you'll need to use your own shuffle routine.) Are there any advantages of the current F-Y implementation? It seems to me that Raymond's idiom is no worse statistically, and significantly faster in practice, so it should be the preferred implementation. Thoughts? -- Steven D'Aprano

On Wed, 25 Mar 2009 07:20:00 am Steven D'Aprano wrote:
Ah crap. Ignore the above: I made an embarrassing error in my test (neglected to actually call random inside the lambda) and so my timings were completely wrong. The current random.shuffle() is marginally faster even for small lists (500 items) so I withdraw my suggestion that it be replaced. -- Steven D'Aprano

Steven D'Aprano wrote:
In addition, the F-Y shuffle is limited by the period of the random number generator:
*All* shuffling algorithms are limited by that. Think about it: A shuffling algorithm is a function from a random number to a permutation. There's no way you can get more permutations out than there are random numbers to put in. -- Greg

Greg Ewing <greg.ewing@...> writes:
The period of the generator should be (much) larger than the number of possible random numbers, because of the generator's internal state. (I'm not sure I understood your sentence as you meant it though)

Antoine Pitrou wrote:
The period of the generator should be (much) larger than the number of possible random numbers, because of the generator's internal state.
Hm, yes, I should have said a function from an RNG state to a permutation. The initial state of the RNG completely determines the permutation generated, so there can't be more permutations than states. -- Greg
participants (7)
-
Antoine Pitrou
-
Brandon Mintern
-
Greg Ewing
-
Jacob Holm
-
Mark Dickinson
-
Raymond Hettinger
-
Steven D'Aprano