speed problems

Jeff Epler jepler at unpythonic.net
Thu Jun 3 11:03:56 EDT 2004


In addition to the items Steve Lamb noted, I have a few suggestions:

Place the whole script in a function and call it.  This will give you an
immediate speedup of some percent, because lookup of names that are
local to a function is faster than looking up names at the module level.

>     for line in lf.readlines():
Unless the bzip2 or gzip modules don't support it, you should write
      for line in lf:
instead.  This is likely to improve memory consumption, and may improve
the program speed too.

>       if string.count( line, "INFECTED" ):
>         vname = re.compile( "INFECTED \((.*)\)" ).search( line ).group(1)

Unless you arrived at this two-step process through profiling, it's
probably better to write
    m = infected_rx.search(line)
    if m:
        vname = m.group(1)
        ...

>         if string.count( vname, ", " ):
>           for vnam in string.split( vname, ", " ):
[...]
>         else:

If there are no ", " in vname, the split will produce a single item.
Also, there's no no reason to use the "string" module anymore, as
opposed to string methods.  Finally, splitting on single characters is
likely to be optimized, but I'm not sure.

I'd just use
    for vnam in vname.split(","):
        vnam = vnam.strip()

>             if vnam not in virstat:
>               virstat[vnam] = 1
>             else:
>               virstat[vnam] += 1

You have several alternatives here:
    try:
        virstat[vnam] += 1
    except KeyError:
        virstat[vnam] = 1
or
    virstat[vnam] = virstat.get(vnam, 0) + 1

Jeff
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20040603/4248d9fc/attachment.sig>


More information about the Python-list mailing list