python performance

Sun Sep 15 17:56:48 EDT 2002

"Padraig Brady" <padraig at linux.ie> wrote in message
news:3D84E66F.6020701 at linux.ie...
> I was wondering about the performance characteristics
> of python and ran a simple test. The 2 programs
> below are functionally equivalent and just read
> in fields into a list. file.fields contains
> 720 fields in each of 405 lines of the form of
> repeating: oneoneone twotwo "thre e three"
>
> The time to run is shown above each program
> from which I've inferred the following:
>
> 1. The function call version is (6.3%) faster because
>     the cumulative cost of parsing the simpler expressions
>     and function call overhead is smaller than parsing the
>     1 single complex expression?

Calling functions adds overhead.  Running code within functions
reduces overhead for object access.  See end for third version to
test.

> Or the function is parsed
>     only once and doesn't have to be reparsed. This would
>     suggest that top level code is parsed for each iteration?

No.  All code is parsed only once.  See above and below.

> 2. Anyway I thought that parsing affects would be removed by doing
the
>     parsing only once, i.e. compiling the code to .pyc (I used
>     py_compile.compile()). However this makes no difference at all?
>     Surely compiling is not just for code obfuscation.

No.  The one-time parsing of 10 lines of code is extremely fast.  If
it were parsed over again for each line, then you would notice the
parsing time.

> Note I did do the test several times and averaged the results.
> ----------------------------
> 2.514s
> ----------------------------
> #!/usr/bin/env python2.2
> import re
>
> reFieldFinder = re.compile('[^ "]+|"[^"]+"') #unquoted|quoted
> def getFields(line):
>      fields = reFieldFinder.findall(line)
>      return [field.replace('"', '') for field in fields]
>
> for line in open("file.fields").readlines():
>      listLine = getFields(line[:-1])
>
> ----------------------------
> 2.672s
> ----------------------------
> #!/usr/bin/env python2.2
> import re
>
> reFieldFinder = re.compile('[^ "]+|"[^"]+"') #unquoted|quoted
> for line in open("file.fields").readlines():
>      listLine = [field.replace('"', '') for field in
> reFieldFinder.findall(line[:-1])]

Try one more test to run at function speed without repeated
getsfields():

def mytest:
    reFieldFinder = re.compile('[^ "]+|"[^"]+"') #unquoted|quoted
    for line in open("file.fields").readlines():
        listLine = [field.replace('"', '') for field in
    reFieldFinder.findall(line[:-1])]

mytest()

Terry J. Reedy