[Python-ideas] random.choice on non-sequence
M.-A. Lemburg
mal at egenix.com
Wed Apr 13 12:33:12 EDT 2016
On 13.04.2016 07:08, Tim Peters wrote:
> [Chris Barker - NOAA Federal <chris.barker at noaa.gov>]
>> ...
>> Anyway, the hash table has gaps, but wouldn't it be sufficiently
>> random to pick a random index, and if it's a gap, pick another one?
>
> That would be a correct implementation, but ...
>
>> I suppose in theory, this could be in infinite process, but in practice,
>> it would be O(1) with a constant of two or three... Better than
>> iterating through on average half of the keys.
>
> There's an upper limit on how dense a CPython dict or set can become
> (the load factor doesn't exceed 2/3), but no lower limit. For
> example, it's easy to end up with a dict holding a single entry hiding
> among millions of empty slots (dicts are never resized on key
> deletion, only on key insertion).
Converting a set to a list is O(N) (with N being the number
of slots allocated for the set, with N > n, the number of used keys),
so any gap skipping logic won't be better in performance than doing:
import random
my_set = {1, 2, 3, 4}
l = list(my_set)
selection = [random.choice(l) for x in l]
print (selection)
You'd only have a memory advantage, AFAICT.
>> ...
>> BTW, isn't it impossible to randomly select from an infinite iterable anyway?
>
> Of course, but it is possible to do uniform random selection, in one
> pass using constant space and in linear time, from an iterable whose
> length isn't known in advance (simple case of "reservoir sampling").
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Experts (#1, Apr 13 2016)
>>> Python Projects, Coaching and Consulting ... http://www.egenix.com/
>>> Python Database Interfaces ... http://products.egenix.com/
>>> Plone/Zope Database Interfaces ... http://zope.egenix.com/
________________________________________________________________________
::: We implement business ideas - efficiently in both time and costs :::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
http://www.malemburg.com/
More information about the Python-ideas
mailing list