[Python-ideas] About adding a new iterator methodcalled "shuffled"
Steven D'Aprano
steve at pearwood.info
Tue Mar 24 21:20:00 CET 2009
On Wed, 25 Mar 2009 03:03:31 am Raymond Hettinger wrote:
> > On Tue, Mar 24, 2009, Roy Hyunjin Han wrote:
> >> I know that Python has iterator methods called "sorted" and
> >> "reversed" and these are handy shortcuts.
> >>
> >> Why not add a new iterator method called "shuffled"?
>
> You can already write:
>
> sorted(s, key=lambda x: random())
>
> But nobody does that. So you have a good
> indication that the proposed method isn't needed.
That's nice -- not as readable as random.shuffle(s) but still nice. And
fast too: on my PC, it is about twice as fast as random.shuffle()
for "reasonable" sized lists (tested up to one million items).
I don't think randomly shuffling a list is anywhere near common enough a
task that it should be a built-in, so -1 on the OP's request, but since
we're on the topic, I wonder whether the random.shuffle()
implementation should use Raymond's idiom rather than the current
Fisher-Yates shuffle?
The advantage of F-Y is that it is O(N) instead of O(N*log N) for
sorting, but the constant factor makes it actually significantly slower
in practice.
In addition, the F-Y shuffle is limited by the period of the random
number generator: given a period P, it can randomize lists of length n
where n! < P. For lists larger than n items, some permutations are
unreachable. In the current implementation of random, n equals 2080. I
*think* Raymond's idiom suffers from the same limitation, it's hard to
imagine that it doesn't, but can anyone confirm this?
(In any case, if you're shuffling lists with more than 2080 items, and
you care about the statistical properties of the result (as opposed to
just "make it somewhat mixed up"), then the current implementation
isn't good enough and you'll need to use your own shuffle routine.)
Are there any advantages of the current F-Y implementation? It seems to
me that Raymond's idiom is no worse statistically, and significantly
faster in practice, so it should be the preferred implementation.
Thoughts?
--
Steven D'Aprano
More information about the Python-ideas
mailing list