[C++-sig] [pygccxml] caching fails due to pickling recursion limit

Roman Yakovenko roman.yakovenko at gmail.com
Thu Jun 28 20:28:44 CEST 2007


On 6/28/07, Kirill Lapshin <kir at lapshin.net> wrote:
> Hi Roman, everybody,

Good evening

> We are using py++/pygccxml on moderately sized project, and recently
> py++ started to fail silently on us. It runs gccxml and then silently quits.

I'd like to fix this bug.

> It turned out the offending piece of code was pickling cache. Something
> fails in cPickle and terminates the whole process. Most likely it is a
> stack overflow, because if cPickle replaced with pickle, then we get
> recursion limit runtime error. One can raise recursion limit and pickle
> will save cache just fine, but really slowly (not surprising at all).
>
> At the moment we've worked around this problem by disabling cache,
> however it would be nice to fix it properly.

May I propose to use another format for cache - gccxml generated files.
I am serious. I am not kidding. Last released version(0.9) has many
performance improvements, one of them is parsing XML files. pygccxml
now uses cElementTree iterparse functionality  as XML parser when it
available. You can find here the benchmarks
http://effbot.org/zone/celementtree.htm .

> After reading a bit about pickling, it looks like (c)pickle don't work
> very well with recursive structures, which declarations obviously are.
>
> I just want to list some ideas I have on how to tackle this problem, and
> I would love to get some comments on their feasibility, or better yet
> alternative approaches.

Right now I think that caching functionality was a mistake and it was
better to use GCC-XML generated files in this role. The cache classes
will stay for backward compatibility.

> 1. Simple one, just set all parents to None before pickling and rebuild
> parents upon loading. Should be relatively simple, I'll try it first
> thing. Hopefully that would kill fair amount of recursion, but not all
> of it (e.g. class A refers to class B and B to A).
>
> 2. Add unique id to each declaration, have a global store that saves
> declarations by id, and define for each declaration __getstate__,
> __setstate__ that would serialize ids, rather then declaration
> instances. Then we just pickle the global store. In reality somewhat
> more complex approach needed to tackle recursive references.

Can you try the approach I propose? If this will not work for you I am
sure we will find some other solution.

Relevant API docs:
module_builder_t.__init__
http://language-binding.net/pyplusplus/documentation/apidocs/pyplusplus.module_builder.builder.module_builder_t-class.html#__init__

file_configuration_t class doc:
http://language-binding.net/pyplusplus/documentation/apidocs/pygccxml.parser.project_reader.file_configuration_t-class.html

few convenience functions for constructing file_configuration_t class:
http://language-binding.net/pyplusplus/documentation/apidocs/pygccxml.parser.project_reader-module.html#create_gccxml_fc

HTH

-- 
Roman Yakovenko
C++ Python language binding
http://www.language-binding.net/



More information about the Cplusplus-sig mailing list