[Python-ideas] About adding a new iterator methodcalled "shuffled"

Steven D'Aprano steve at pearwood.info
Tue Mar 24 21:20:00 CET 2009


On Wed, 25 Mar 2009 03:03:31 am Raymond Hettinger wrote:
> > On Tue, Mar 24, 2009, Roy Hyunjin Han wrote:
> >> I know that Python has iterator methods called "sorted" and
> >> "reversed" and these are handy shortcuts.
> >>
> >> Why not add a new iterator method called "shuffled"?
>
> You can already write:
>
>        sorted(s, key=lambda x: random())
>
> But nobody does that.  So you have a good 
> indication that the proposed method isn't needed.

That's nice -- not as readable as random.shuffle(s) but still nice. And 
fast too: on my PC, it is about twice as fast as random.shuffle() 
for "reasonable" sized lists (tested up to one million items).

I don't think randomly shuffling a list is anywhere near common enough a 
task that it should be a built-in, so -1 on the OP's request, but since 
we're on the topic, I wonder whether the random.shuffle() 
implementation should use Raymond's idiom rather than the current 
Fisher-Yates shuffle?

The advantage of F-Y is that it is O(N) instead of O(N*log N) for 
sorting, but the constant factor makes it actually significantly slower 
in practice.

In addition, the F-Y shuffle is limited by the period of the random 
number generator: given a period P, it can randomize lists of length n 
where n! < P. For lists larger than n items, some permutations are 
unreachable. In the current implementation of random, n equals 2080. I 
*think* Raymond's idiom suffers from the same limitation, it's hard to 
imagine that it doesn't, but can anyone confirm this?

(In any case, if you're shuffling lists with more than 2080 items, and 
you care about the statistical properties of the result (as opposed to 
just "make it somewhat mixed up"), then the current implementation 
isn't good enough and you'll need to use your own shuffle routine.)

Are there any advantages of the current F-Y implementation? It seems to 
me that Raymond's idiom is no worse statistically, and significantly 
faster in practice, so it should be the preferred implementation. 
Thoughts?



-- 
Steven D'Aprano



More information about the Python-ideas mailing list