[C++-sig] [pygccxml] caching fails due to pickling recursion limit
Kirill Lapshin
kir at lapshin.net
Fri Jun 29 11:19:27 CEST 2007
Roman Yakovenko wrote:
> May I propose to use another format for cache - gccxml generated files.
> I am serious. I am not kidding. Last released version(0.9) has many
> performance improvements, one of them is parsing XML files. pygccxml
> now uses cElementTree iterparse functionality as XML parser when it
> available. You can find here the benchmarks
> http://effbot.org/zone/celementtree.htm .
Thanks for a prompt response.
I gave it a try. So far looks promising, few caveats though:
1. cElementTree on windows is installed to the root of site-packages, so
I had to modify your import line, e.g. replace:
import xml.etree.cElementTree as ElementTree
with
try:
import xml.etree.cElementTree as ElementTree
except:
import cElementTree as ElementTree
2. cElementTree helps, but not by a wide margin -- parsing used to take
about 8.1 sec and with cElementTree it takes 5.7 sec, which is nowhere
near speeds clamed on cElementTree page, but I guess most of the time is
spent not in xml parsing, but in reader/scanner/whatever.
3. Overall time with xml cache is comparable to pickle cache (still
about 10% slower though), however cold startup (when no cache is there)
is about 15% faster.
4. xml as cache is not as robust as old style cache, meaning that there
is no logic to refresh xml file when it gets outdated. Granted in ideal
world pyplusplus shouldn't do it in first place, it is more of a build
system responsibility, but many of us stuck with less then ideal build
systems that can't automatically scan .hpp file dependency trees.
I would migrate to xml cache in a heartbit, given our problems with
pickle, but build system deficiencies prohibit this move at the moment.
Are there any plans to obsolete xml cache file automatically whenever
source file, or any files included by source file are modified? I can
try to hack something myself, but if you are planning to work on it
anyway, I may wait for proper solution.
Performance stats for various scenarios (listing biggest offenders only):
Note: I'm using py++/pygccxml 0.9 amended a bit to add more performance
logging and to use cElementTree as described above.
1. xml using cache
parsing xml 5.6 sec
relinking declared types...
parsing files - done (7.5 sec)
setting declarations defaults - done (3.1 sec)
preparing data structures for query optimizer (4.7 sec)
--- total 26 sec
--- total (logging off) 23 sec
2. xml not using cache (cache file has been deleted)
creating xml 11.2 sec
parsing xml 5.6 sec
relinking declared types...
parsing files - done (18.6 sec)
setting declarations defaults - done (3.1 sec)
preparing data structures for query optimizer (4.7 sec)
--- total 39 sec
--- total (logging off) 34 sec
note: times do not add up! parsing files time looks suspicious.
probably it includes parsing xml
3. pickle using cache
parsing source file 3.1 sec
relinking declared types...
parsing files - done (5 sec)
setting declarations defaults - done (2.7 sec)
preparing data structures for query optimizer (4.5 sec)
--- total 25 sec
--- total (logging off) 21 sec
4. pickle not using cache (cache file has been deleted)
parsing source file 22.3 sec
relinking declared types...
parsing files - done (24.24 sec)
setting declarations defaults - done (3 sec)
preparing data structures for query optimizer (4.7 sec)
--- total 44 sec
--- total (logging off) 40 sec
Kirill
More information about the Cplusplus-sig
mailing list