[Python-ideas] random.choice on non-sequence

Tim Peters tim.peters at gmail.com
Wed Apr 13 01:08:47 EDT 2016


[Chris Barker - NOAA Federal <chris.barker at noaa.gov>]
> ...
> Anyway, the hash table has gaps, but wouldn't it be sufficiently
> random to pick a random index, and if it's a gap, pick another one?

That would be a correct implementation, but ...

> I suppose in theory, this could be in infinite process, but in practice,
> it would be O(1) with a constant of two or three... Better than
> iterating through on average half of the keys.

There's an upper limit on how dense a CPython dict or set can become
(the load factor doesn't exceed 2/3), but no lower limit.  For
example, it's easy to end up with a dict holding a single entry hiding
among millions of empty slots (dicts are never resized on key
deletion, only on key insertion).

> ...
> BTW, isn't it impossible to randomly select from an infinite iterable anyway?

Of course, but it is possible to do uniform random selection, in one
pass using constant space and in linear time, from an iterable whose
length isn't known in advance (simple case of "reservoir sampling").


More information about the Python-ideas mailing list