random.randint() slow, esp in Python 3

Sat Sep 24 02:55:18 EDT 2011

Chris Angelico wrote:

> On Fri, Sep 23, 2011 at 4:14 AM, Steven D'Aprano
> <steve+comp.lang.python at pearwood.info> wrote:
>> What makes you think it's in C? I don't have Python 3.3a, but in 3.2 the
>> random module is mostly Python. There is an import of _random, which
>> presumably is in C, but it doesn't have a randint method:
> 
> True. It seems to be defined in cpython/lib/random.py as a reference
> to randrange, which does a pile of error checking and calls
> _randbelow... which does a whole lot of work as well as calling
> random(). Guess I should have checked the code before asking!
> 
> There's probably good reasons for using randint(), 

If you want unbiased, random (or at least pseudo-random) integers chosen
from an uniform distribution with proper error checking, you should use
randint or randrange.

> but if you just 
> want a pile of more-or-less random integers, int(random.random()*top)
> is the best option.

"I only need random-ish ints, but I need 'em yesterday!!!"

If you want biased, not-quite-uniformly distributed integers with no error
checking, then the above will save you a few nanoseconds per call. So if
your application needs a million such ints, you might save a full five
seconds or so. Sounds like premature optimization to me, but what do I
know? If you have an application where the quality of randomness is
unimportant and generating random ints is a genuine performance bottleneck,
then go right ahead.

>> I'm not seeing any significant difference in speed between 2.6 and 3.2:
>>
>> [steve at sylar ~]$ python2.6 -m timeit -s "from random import
>> randint" "randint(0, 1000000)"
>> 100000 loops, best of 3: 4.29 usec per loop
>>
>> [steve at sylar ~]$ python3.2 -m timeit -s "from random import
>> randint" "randint(0, 1000000)"
>> 100000 loops, best of 3: 4.98 usec per loop
> 
> That might be getting lost in the noise. Try the list comp that I had 
> above and see if you can see a difference - or anything else that
> calls randint that many times.

If you want to measure the speed of randint, you should measure the speed of
randint. That's what the timeit call does: it calls randint 100000 times.

You can't measure the speed of:

[random.randint(0,sz*10-1) for i in range(sz)]

and draw conclusions about randint alone. Roughly speaking, you are
measuring the time taken to:

lookup range
lookup sz, twice
call range, possibly generating a massive list
iterate over the list/range object
lookup random
lookup random.randint
multiply sz by 10 and subtract 1
build a list of the results, repeatedly resizing it as you go

(in no particular order)

There is MUCH more noise in your suggestion, not my version. But for what
it's worth, I get these results:

[steve at sylar ~]$ python3.2 -m timeit -s "import random" -s "sz =
1000000" "[random.randint(0,sz*10-1) for i in range(sz)]"
10 loops, best of 3: 6.09 sec per loop
[steve at sylar ~]$ python2.6 -m timeit -s "import random" -s "sz =
1000000" "[random.randint(0,sz*10-1) for i in range(sz)]"
10 loops, best of 3: 4.67 sec per loop

So Python 3.2 is about 30% slower for the whole mess. (I would, however,
give little confidence in those results: at 4-6 seconds per loop, I expect
that OS-level noise will be significant.)

For what (little) it's worth, it seems that Python 3.2 should be a bit
faster than 2.6, at least if you consider the Pystone benchmark to mean
anything.

http://www.levigross.com/post/2340736877/pystone-benchmark-on-2-6-2-7-3-2

> Performance-testing with a heapsort (and by the way, it's
> _ridiculously_ slower implementing it in Python instead of just
> calling a.sort(), but we all knew that already!) shows a similar
> difference in performance.

You do know that Python's heap implementation is written in C? Don't be
fooled by the heapq module being in Python. About 2/3rds of the way down,
there's a sneaky little trick:

# If available, use C implementation
try:
    from _heapq import *
except ImportError:
    pass

-- 
Steven