performance of tight loop
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Mon Dec 13 22:29:38 EST 2010
On Mon, 13 Dec 2010 18:50:38 -0800, gry wrote:
> [python-2.4.3, rh CentOS release 5.5 linux, 24 xeon cpu's, 24GB ram] I
> have a little data generator that I'd like to go faster... any
> suggestions?
> maxint is usually 9223372036854775808(max 64bit int), but could
> occasionally be 99.
> width is usually 500 or 1600, rows ~ 5000.
>
> from random import randint
>
> def row(i, wd, mx):
> first = ['%d' % i]
> rest = ['%d' % randint(1, mx) for i in range(wd - 1)]
> return first + rest
> ...
> while True:
> print "copy %s from stdin direct delimiter ',';" % table_name
> for i in range(i,i+rows):
> print ','.join(row(i, width, maxint))
> print '\.'
This isn't entirely clear to me. Why is the while loop indented? I assume
it's part of some other function that you haven't shown us, rather than
part of the function row().
Assuming this, I would say that the overhead of I/O (the print commands)
will likely be tens or hundreds of times greater than the overhead of the
loop, so you're probably not likely to see much appreciable benefit. You
might save off a few seconds from something that runs for many minutes. I
don't see the point, really.
If the print statements are informative rather than necessary, I would
print every tenth (say) line rather than every line. That should save
*lots* of time.
Replacing "while True" with "while 1" may save a tiny bit of overhead.
Whether it is significant or not is another thing.
Replacing range with xrange should also make a difference, especially if
rows is a large number.
Moving the code from row() inline, replacing string interpolation with
calls to str(), may also help. Making local variables of any globals may
also help a tiny bit. But as I said, you're shaving microseconds of
overhead and spending millseconds printing -- the difference will be tiny.
But for what it's worth, I'd try this:
# Avoid globals in favour of locals.
from random import randint
_maxint = maxint
loop = xrange(i, i+rows) # Where does i come from?
inner_loop = xrange(width) # Note 1 more than before.
while 1:
print "copy %s from stdin direct delimiter ',';" % table_name
for i in loop:
row = [str(randint(1, _maxint)) for _ in inner_loop]
row[0] = str(i) # replace in place
print ','.join(row)
print '\.'
Hope it helps.
--
Steven
More information about the Python-list
mailing list