selecting a random item from a set
Alexander Schmolck
a.schmolck at gmx.net
Wed Jan 7 05:23:02 CET 2004
[sorry for the late reply the server here had been down a couple of days and
then I was busy]
"Tim Peters" <tim.one at comcast.net> writes:
> [Tim, asks which algorithms require random selection, as opposed
> to arbitrary selection, from a set]
>
> [Alexander Schmolck]
> > Feature selection schemes, for example?
>
> Eh? That's like "numerical methods" <wink/frown>. IOW, too vague.
Well, pretty much any scheme where the feature space is to large to explore
exhaustively, IOW from forward-backward feature selection to markov chain
monte carlo. If that's still to vague, I'd be happy to provide more detail.
> You can do it in O(1) space now at essentially full C speed via, e.g.,
>
> i = random.randrange(len(some_set))
> random_elt = itertools.islice(some_set, i, None).next()
> some_set.remove(random_elt)
>
> That tricks it into finding the starting point with a C loop, and then the
> iterator is discarded after extracting its first element.
Yes, this is a much better (more readable and efficient)
> But (of course) that still takes average time proportional to the # of
> elements in the set. Well, that wishes away a detail: this can actually be
> arbitrarily expensive, as there's no bound on how sparse a Python dict can
> get, and the C dict iteration code has to visit every slot, occupied or not,
> to find "the next" occupied slot.
Sure.
> If the sets are small, you're not going to beat random.choice(tuple(s)).
Yep, I had a vague feeling this might be the case and because it's also simple
and clear, so that's in fact what I used (almost -- I did list(s), which ought
to be only marginally slower I would think).
> If the sets are large, low-level optimization of an approach with the wrong
> O() behavior is just a monumentally bad idea.
Unlikely, for the current application they should be < 100 elements. So I
guess I shall stick to the tuple-cast for the time being.
Thanks -- this has been quite informative (and the other suggestions you made
might still come in handy at some point).
'as
More information about the Python-list
mailing list