New subject: [lxml-dev] Re: More timing and profiling

23 Jun 2005

      Hi list,

I updated my script to use the 'profile' module instead of 'hotshot' 
since 'profile' displays calls to C functions and builtins.

Before, actually profiling, I wanted to check that cElementTree was 
actually faster on the local branching operation by performing it three 
times (only on the first 100 revisions of BZR instead of more than 700 
at the time of writing):

ogrisel@localhost:~/Developments/lxml-testcase/grisel/profiling $ python 
profile_bzr.py -a timeit -x lxml
Timing against /tmp/bzr.dev-lxml/bzrlib/__init__.pyc
branching took 2.34565591812

ogrisel@localhost:~/Developments/lxml-testcase/grisel/profiling $ python 
profile_bzr.py -a timeit -x cetree
Timing against /tmp/bzr.dev/bzrlib/__init__.pyc
branching took 1.86508107185

So cElementTree is faster on the local branching operation. Now here is 
the profile of the same operation in both cases:

ogrisel@localhost:~/Developments/lxml-testcase/grisel/profiling $ python 
profile_bzr.py -a profile -x lxml
Profiling against /tmp/bzr.dev-lxml/bzrlib/__init__.pyc
Added 230 texts.
Added 100 inventories.
Added 100 revisions.
Thu Jun 23 23:42:22 2005    /tmp/bzr_data.profile

          385004 function calls (374095 primitive calls) in 5.700 CPU 
seconds

    Ordered by: internal time, call count
    List reduced from 429 to 20 due to restriction <20>

    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     64844    0.860    0.000    0.860    0.000 :0(get)
       430    0.520    0.001    0.520    0.001 :0(compress)
      8992    0.430    0.000    1.310    0.000 
/tmp/bzr.dev-lxml/bzrlib/inventory.py:189(from_element)
     21611    0.390    0.000    0.600    0.000 
/usr/lib/python2.4/posixpath.py:56(join)
17627/9095    0.230    0.000    0.470    0.000 
/tmp/bzr.dev-lxml/bzrlib/inventory.py:317(iter_entries)
      9009    0.190    0.000    0.770    0.000 
/tmp/bzr.dev-lxml/bzrlib/store.py:132(__contains__)
     10165    0.170    0.000    0.490    0.000 
/tmp/bzr.dev-lxml/bzrlib/store.py:69(_path)
       204    0.160    0.001    0.160    0.001 :0(parse)
     18878    0.160    0.000    0.160    0.000 :0(access)
     22295    0.130    0.000    0.130    0.000 :0(startswith)
         1    0.130    0.130    4.730    4.730 
/tmp/bzr.dev-lxml/bzrlib/branch.py:727(update_revisions)
3601/1238    0.120    0.000    0.540    0.000 
/tmp/bzr.dev-lxml/bzrlib/changeset.py:682(get_new_path)
      9977    0.110    0.000    0.230    0.000 
/usr/lib/python2.4/posixpath.py:74(split)
       430    0.110    0.000    0.110    0.000 :0(flush)
       103    0.110    0.001    1.540    0.015 
/tmp/bzr.dev-lxml/bzrlib/inventory.py:487(from_element)
      1218    0.110    0.000    0.110    0.000 :0(decompress)
      9090    0.100    0.000    0.120    0.000 
/tmp/bzr.dev-lxml/bzrlib/inventory.py:408(add)
      3184    0.090    0.000    0.090    0.000 :0(seek)
      1836    0.080    0.000    0.480    0.000 
/usr/lib/python2.4/gzip.py:242(_read)
     22160    0.080    0.000    0.080    0.000 :0(endswith)

ogrisel@localhost:~/Developments/lxml-testcase/grisel/profiling $ python 
profile_bzr.py -a profile -x cetree
Profiling against /tmp/bzr.dev/bzrlib/__init__.pyc
Added 230 texts.
Added 100 inventories.
Added 100 revisions.
Thu Jun 23 23:42:44 2005    /tmp/bzr_data.profile

          404062 function calls (393056 primitive calls) in 5.120 CPU 
seconds

    Ordered by: internal time, call count
    List reduced from 431 to 20 due to restriction <20>

    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       430    0.560    0.001    0.560    0.001 :0(compress)
     21611    0.460    0.000    0.650    0.000 
/usr/lib/python2.4/posixpath.py:56(join)
      8992    0.350    0.000    0.670    0.000 
/tmp/bzr.dev/bzrlib/inventory.py:189(from_element)
     64642    0.310    0.000    0.310    0.000 :0(get)
         1    0.250    0.250    3.990    3.990 
/tmp/bzr.dev/bzrlib/branch.py:727(update_revisions)
      9977    0.220    0.000    0.310    0.000 
/usr/lib/python2.4/posixpath.py:74(split)
     18878    0.200    0.000    0.200    0.000 :0(access)
17627/9095    0.170    0.000    0.480    0.000 
/tmp/bzr.dev/bzrlib/inventory.py:317(iter_entries)
       204    0.160    0.001    0.310    0.002 :0(_parse)
      9009    0.140    0.000    0.650    0.000 
/tmp/bzr.dev/bzrlib/store.py:132(__contains__)
3601/1238    0.130    0.000    0.680    0.001 
/tmp/bzr.dev/bzrlib/changeset.py:682(get_new_path)
     22295    0.120    0.000    0.120    0.000 :0(startswith)
     10165    0.110    0.000    0.380    0.000 
/tmp/bzr.dev/bzrlib/store.py:69(_path)
      9090    0.100    0.000    0.120    0.000 
/tmp/bzr.dev/bzrlib/inventory.py:408(add)
      2639    0.090    0.000    0.450    0.000 
/usr/lib/python2.4/gzip.py:242(_read)
       430    0.080    0.000    0.080    0.000 :0(flush)
     15767    0.080    0.000    0.080    0.000 :0(rstrip)
       103    0.080    0.001    0.880    0.009 
/tmp/bzr.dev/bzrlib/inventory.py:487(from_element)
      1617    0.080    0.000    0.080    0.000 :0(decompress)
     22160    0.070    0.000    0.070    0.000 :0(endswith)

The main difference is the :0(get) time that is much higher in the lxml 
case.

C functions are tagged :0(function_name) and agreggated by names. This 
is annoying since for instance, lxml/_elementpath.py:167(_compile) calls 
C function named 'get' whose stats get agreggated whith the stats of all 
of other 'get' C functions of the python standard library. One has to 
use the print_callers method of the pstats.Stats object to list all the 
different contribitions:

:0(get) 
/tmp/bzr.dev-lxml/bzrlib/changeset.py:809(longest_to_shortest)(194)    0.000

/tmp/bzr.dev-lxml/bzrlib/changeset.py:829(rename_to_temp_delete)(98) 
0.000

/tmp/bzr.dev-lxml/bzrlib/changeset.py:870(rename_to_new_create)(98)    0.100

/tmp/bzr.dev-lxml/bzrlib/changeset.py:1133(get_inventory_change)(98) 
0.070

/tmp/bzr.dev-lxml/bzrlib/changeset.py:1366(make_basic_entry)(198)    0.160

/tmp/bzr.dev-lxml/bzrlib/changeset.py:1516(get_path)(100)    0.010

/tmp/bzr.dev-lxml/bzrlib/inventory.py:189(from_element)(62944)    1.430

/tmp/bzr.dev-lxml/bzrlib/revision.py:159(unpack_revision)(808)    0.040

/usr/lib/python2.4/encodings/__init__.py:69(search_function)(3)    0.000

/usr/lib/python2.4/site-packages/lxml/_elementpath.py:167(_compile)(202) 
    0.010

/usr/lib/python2.4/sre.py:213(_compile)(100)    0.000

/usr/lib/python2.4/sre_parse.py:225(_class_escape)(1)    0.000

This :0(get) time is the main difference between the two profiles. The 
parsing times :0(_parse) and :0 are(parse) are the same. So to 
summarize, this study tells us that to improve lxml over cElementTree on 
this particular case, one should first focus on optimizing the following 
piece of code (or the related code that uses it in lxml) in _elementpath.py:

"""
_cache = {}

##
# (Internal) Compile path.

def _compile(path):
     p = _cache.get(path)
     if p is not None:
         return p
     p = Path(path)
     if len(_cache) >= 100:
         _cache.clear()
     _cache[path] = p
     return p
"""

Best,

--
Olivier

[lxml-dev] More timing and profiling

Olivier Grisel

Martijn Faassen

Martijn Faassen

Olivier Grisel

Martijn Faassen

tags

participants (2)