On 13.04.2016 07:08, Tim Peters wrote:
[Chris Barker - NOAA Federal
] ... Anyway, the hash table has gaps, but wouldn't it be sufficiently random to pick a random index, and if it's a gap, pick another one?
That would be a correct implementation, but ...
I suppose in theory, this could be in infinite process, but in practice, it would be O(1) with a constant of two or three... Better than iterating through on average half of the keys.
There's an upper limit on how dense a CPython dict or set can become (the load factor doesn't exceed 2/3), but no lower limit. For example, it's easy to end up with a dict holding a single entry hiding among millions of empty slots (dicts are never resized on key deletion, only on key insertion).
Converting a set to a list is O(N) (with N being the number of slots allocated for the set, with N > n, the number of used keys), so any gap skipping logic won't be better in performance than doing: import random my_set = {1, 2, 3, 4} l = list(my_set) selection = [random.choice(l) for x in l] print (selection) You'd only have a memory advantage, AFAICT.
... BTW, isn't it impossible to randomly select from an infinite iterable anyway?
Of course, but it is possible to do uniform random selection, in one pass using constant space and in linear time, from an iterable whose length isn't known in advance (simple case of "reservoir sampling").
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Apr 13 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/