an element from a set
Bryan
bryanjugglercryptographer at yahoo.com
Mon May 17 06:42:43 EDT 2010
Carl Banks wrote:
[...]
> Random element from a set is such a natural idea.
>
> Even if we were to modify the set type at the C level to support it, I
> can't think of an easy way to get a random element without selection
> bias. For instance, if you start from a random hash code, some
> elements are more likely to be selected than others, depending on
> distance between entries in the hash table. You'd probably have to
> choose an number in range(len(s)) and count off that many, but then
> might as well have just converted it to a list or used an iterator. A
> little disappointing, actually.
>
> Probably the "right" way is to use a binary-tree-based set.
There's a reasonably straightforward solution supporting O(1) time
random choice that also preserves the O(1) average-case time of set's
add(), remove(), and 'in'. The trick is to keep each element in both a
list and a hash table. Implementing Python's entire set interface is a
bit of project, so the code below just supports enough for a demo.
-Bryan Olson
# --------
from random import choice
class SetWithRandom:
def __init__(self, *args):
self.lst = list(*args)
self.dct = dict((elem, i) for (i, elem) in
enumerate(self.lst))
def audit(self):
# Test consistency of dict and list.
assert len(self.lst) == len(self.dct)
for elem in self.dct:
assert self.lst[self.dct[elem]] == elem
def random(self):
return choice(self.lst)
def add(self, elem):
if elem not in self.dct:
self.dct[elem] = len(self.lst)
self.lst.append(elem)
def remove(self, elem):
index = self.dct[elem]
# Move the last item into vacated index.
mover = self.lst[-1]
self.lst[index] = mover
self.dct[mover] = index
del self.dct[elem]
self.lst.pop()
def __len__(self):
return len(self.lst)
def __contains__(self, elem):
return elem in self.dct
def __iter__(self):
return self.dct.__iter__()
# ---------
# Test against Python's built-in set.
for i in range(0, 100, 20):
trange = list(range(1000, 1000 + i))
pyset = set(trange)
testset = SetWithRandom(pyset)
testset.audit()
assert pyset == set(testset)
rands = set(testset.random() for _ in trange)
assert len(rands) >= i / 4
while pyset:
randx = testset.random()
pyset.remove(randx)
testset.remove(randx)
testset.audit()
for _ in range(choice([0, 0, 1, 2])):
randx = choice(trange)
pyset.add(randx)
testset.add(randx)
assert pyset == set(testset)
assert not testset
More information about the Python-list
mailing list