fastest method to choose a random element

Neil Cerutti mr.cerutti at gmail.com
Fri Jan 4 21:47:11 CET 2008


On Jan 4, 2008 2:55 PM, <caca at mailinator.com> wrote:

>  Hello,
>  This is a question for the best method (in terms of performance
> only) to choose a random element from a list among those that satisfy
> a certain property.


I would automatically use random.choice(filter(pred_func, a_list)). You just
have to catch the possible IndexError.

 This is the setting: I need to pick from a list a random element
> that satisfies a given property. All or none of the elements may have
> the property. Most of the time, many of the elements will satisfy the
> property, and the property is a bit expensive to evaluate. Chance of
> having the property are uniform among elements.
>
>  A simple approach is:
>
> import random
> def random_pick(a_list,property):
>    '''Returns a random element from a list that has the property
>
>    Returns None if no element  has the property
>    '''
>    random.shuffle(a_list)
>    for i in a_list:
>        if property(i): return i


I'm pretty sure you don't want to use a destructive random_pick function.
You'll have to shuffle a copy instead to avoid that problem.


>
> but that requires to shuffle the list every time.
>
>  A second approach, that works if we know that at least one element of
> the list has the property, is:
>
> import random
> def random_pick(a_list,property):
>    '''Returns a random element from a list that has the property
>
>    Loops forever if no element has the property
>    '''
>    while 1:
>        i=random.choice(a_list)
>        if property(i): return i
>
> which is more efficient (on average) if many elements of the list have
> the property and less efficient if only few elements of the list has
> the property (and goes crazy if no element has the property)


That's awful. Here's another linear time idea, returning the nearest element
that satisfies the predicate.

offset = random.randrange(len(a_list))
for n in xrange(len(a_list)):
    ix = (offset + n) % len(a_list)
    if predicate(a_list[ix]):
        return a_list[ix]
raise ValueError('no element has the property')

The possible problem is that large strings of elements in a row that don't
match the predicate greatly increase the odds of getting the following
element that *does* match the predicate. Worst case is two predicate matched
elements in a row, surrounded by a bunch of non-matched elements.


> Yet another one:
>
> import random
> def random_pick(a_list,property):
>    '''Returns a random element from a list that has the property
>    '''
>    b_list=[x for  x in a_list if property(x)]
>    try:
>        return random.choice(b_list)
>    finally: return None
>
> but this one checks the property on all the elements, which is no
> good.


Is it really expensive to check the property? That would mitigate against
the filter solution and for the other one I posted.

This seems to be a case of trying to solve a data problem functionally. It'd
be better to store your data differently if this will be a frequent
operation and you simply can't afford to call the predicate on all the
elements.

Incidentally, try not to shadow builtin names like 'property'.


-- 
Neil Cerutti
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20080104/d35f428e/attachment.html>


More information about the Python-list mailing list