[Tutor] fast sampling with replacement

Andrew Fithian afith13 at gmail.com
Sun Feb 21 05:47:29 CET 2010


On Sat, Feb 20, 2010 at 11:55 AM, Luke Paireepinart
<rabidpoobear at gmail.com>wrote:

>
>
> On Sat, Feb 20, 2010 at 1:50 PM, Kent Johnson <kent37 at tds.net> wrote:
>
>> On Sat, Feb 20, 2010 at 11:22 AM, Andrew Fithian <afith13 at gmail.com>
>> wrote:
>> >  can
>> > you help me speed it up even more?
>> > import random
>> > def sample_with_replacement(list):
>> >     l = len(list) # the sample needs to be as long as list
>> >     r = xrange(l)
>> >     _random = random.random
>> >     return [list[int(_random()*l)] for i in r]
>>
>> You don't have to assign to r, just call xrange() in the list comp.
>> You can cache int() as you do with random.random()
>> Did you try random.randint(0, l) instead of int(_random()*i) ?
>> You shouldn't call your parameter 'list', it hides the builtin list
>> and makes the code confusing.
>>
>> You might want to ask this on comp.lang.python, many more optimization
>> gurus there.
>>
>> Also the function's rather short, it would help to just inline it (esp.
> with Kent's modifications, it would basically boil down to a list
> comprehension (unless you keep the local ref's to the functions), I hear the
> function call overhead is rather high (depending on your usage - if your
> lists are huge and you don't call the function that much it might not
> matter.)
>
> The code is taking a list of length n and randomly sampling n items with
replacement from the list and then returning the sample.

I'm going to try the suggestion to inline the code before I make any of the
other (good) suggested changes to the implementation. This function is being
called thousands of times per execution, if the "function call overhead" is
as high as you say that sounds like the place to start optimizing. Thanks
guys.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20100220/d3d8ead4/attachment.html>


More information about the Tutor mailing list