[Python-ideas] random.choice on non-sequence

M.-A. Lemburg mal at egenix.com
Wed Apr 13 12:33:12 EDT 2016

On 13.04.2016 07:08, Tim Peters wrote:
> [Chris Barker - NOAA Federal <chris.barker at noaa.gov>]
>> ...
>> Anyway, the hash table has gaps, but wouldn't it be sufficiently
>> random to pick a random index, and if it's a gap, pick another one?
> That would be a correct implementation, but ...
>> I suppose in theory, this could be in infinite process, but in practice,
>> it would be O(1) with a constant of two or three... Better than
>> iterating through on average half of the keys.
> There's an upper limit on how dense a CPython dict or set can become
> (the load factor doesn't exceed 2/3), but no lower limit.  For
> example, it's easy to end up with a dict holding a single entry hiding
> among millions of empty slots (dicts are never resized on key
> deletion, only on key insertion).

Converting a set to a list is O(N) (with N being the number
of slots allocated for the set, with N > n, the number of used keys),
so any gap skipping logic won't be better in performance than doing:

import random

my_set = {1, 2, 3, 4}
l = list(my_set)
selection = [random.choice(l) for x in l]

print (selection)

You'd only have a memory advantage, AFAICT.

>> ...
>> BTW, isn't it impossible to randomly select from an infinite iterable anyway?
> Of course, but it is possible to do uniform random selection, in one
> pass using constant space and in linear time, from an iterable whose
> length isn't known in advance (simple case of "reservoir sampling").

Marc-Andre Lemburg

Professional Python Services directly from the Experts (#1, Apr 13 2016)
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> Python Database Interfaces ...           http://products.egenix.com/
>>> Plone/Zope Database Interfaces ...           http://zope.egenix.com/

::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

More information about the Python-ideas mailing list