[C++-sig] [pygccxml] caching fails due to pickling recursion limit

Roman Yakovenko roman.yakovenko at gmail.com
Sat Jun 30 07:45:57 CEST 2007


On 6/29/07, Kirill Lapshin <kir at lapshin.net> wrote:
> Roman Yakovenko wrote:
> > May I propose to use another format for cache - gccxml generated files.
> > I am serious. I am not kidding. Last released version(0.9) has many
> > performance improvements, one of them is parsing XML files. pygccxml
> > now uses cElementTree iterparse functionality  as XML parser when it
> > available. You can find here the benchmarks
> > http://effbot.org/zone/celementtree.htm .
>
> Thanks for a prompt response.
>
> I gave it a try. So far looks promising, few caveats though:
>
> 1. cElementTree on windows is installed to the root of site-packages, so
> I had to modify your import line, e.g. replace:
>
>     import xml.etree.cElementTree as ElementTree
>
> with
>
>     try:
>         import xml.etree.cElementTree as ElementTree
>     except:
>         import cElementTree as ElementTree

I will fix this.

> 2. cElementTree helps, but not by a wide margin -- parsing used to take
> about 8.1 sec and with cElementTree it takes 5.7 sec, which is nowhere
> near speeds clamed on cElementTree page, but I guess most of the time is
> spent not in xml parsing, but in reader/scanner/whatever.
>
> 3. Overall time with xml cache is comparable to pickle cache (still
> about 10% slower though), however cold startup (when no cache is there)
> is about 15% faster.

This is because I don't have to save the cache to disk

> 4. xml as cache is not as robust as old style cache, meaning that there
> is no logic to refresh xml file when it gets outdated. Granted in ideal
> world pyplusplus shouldn't do it in first place, it is more of a build
> system responsibility, but many of us stuck with less then ideal build
> systems that can't automatically scan .hpp file dependency trees.
>
> I would migrate to xml cache in a heartbit, given our problems with
> pickle, but build system deficiencies prohibit this move at the moment.
>
> Are there any plans to obsolete xml cache file automatically whenever
> source file, or any files included by source file are modified? I can
> try to hack something myself, but if you are planning to work on it
> anyway, I may wait for proper solution.


I understand. I think I can add this functionality and than to start
to deprecate the cache that use pickle.

> Performance stats for various scenarios (listing biggest offenders only):
>
> Note: I'm using py++/pygccxml 0.9 amended a bit to add more performance
> logging and to use cElementTree as described above.
>
> 1. xml using cache
>     parsing xml 5.6 sec
>     relinking declared types...
>      parsing files - done (7.5 sec)
>      setting declarations defaults - done (3.1 sec)
>     preparing data structures for query optimizer (4.7 sec)
>     --- total 26 sec
>     --- total (logging off) 23 sec
>
> 2. xml not using cache (cache file has been deleted)
>     creating xml 11.2 sec
>     parsing xml 5.6 sec
>     relinking declared types...
>      parsing files - done (18.6 sec)
>      setting declarations defaults - done (3.1 sec)
>     preparing data structures for query optimizer (4.7 sec)
>     --- total 39 sec
>     --- total (logging off) 34 sec
>
>     note: times do not add up! parsing files time looks suspicious.
> probably it includes parsing xml
>
> 3. pickle using cache
>     parsing source file 3.1 sec
>     relinking declared types...
>      parsing files - done (5 sec)
>      setting declarations defaults - done (2.7 sec)
>     preparing data structures for query optimizer (4.5 sec)
>     --- total 25 sec
>     --- total (logging off) 21 sec
>
> 4. pickle not using cache (cache file has been deleted)
>     parsing source file 22.3 sec
>     relinking declared types...
>      parsing files - done (24.24 sec)
>      setting declarations defaults - done (3 sec)
>     preparing data structures for query optimizer (4.7 sec)
>     --- total 44 sec
>     --- total (logging off) 40 sec

It will be helpful if you could run your scripts under profiler. So we
can see the bottlenecks.

-- 
Roman Yakovenko
C++ Python language binding
http://www.language-binding.net/



More information about the Cplusplus-sig mailing list