[C++-sig] Patches and complete pyste replacement prototype for pyplusplus

Allen Bierbaum abierbaum at gmail.com
Tue Feb 28 05:40:20 CET 2006


On 2/27/06, Matthias Baas <baas at ira.uka.de> wrote:
> In the following I tried to answer to Allen and Roman postings at the
> same time (as they both refer to the same mail), quoting both of their
> mails....
>
>
> ["Module" vs "Pipeline" vs "ModuleBuilder" vs "module_builder_t]
>
> Allen Bierbaum wrote:
> > I debated for quite a while what to call this object.  It corresponds
> > roughly to a builder for what pyplusplus calls a module, so I went
> > with that.  Another reason I chose this name is that from the user
> > perspective what they are trying to build is a python module and this
> > is the tool they are using to build it.  I am definitely willing to
> > rename this and thinking about it now it may be better to call it
> > "ModuleBuilder" or something along those lines.
>
> Roman Yakovenko wrote:
> > I would like to call it generator_t, but module_builder_t is also okay.
>
> I'm fine with "Module Builder" as well. As to the exact spelling, I'd
> vote for "ModuleBuilder" for two reasons:

In my codebase I have changed it to ModuleBuilder for now.

> Allen Bierbaum wrote:
> >> In Allen's version, the user always explicitly creates an instance of
> >> that class himself, in my version this instance is created internally
> >> and each method is also available as function which internally calls the
> >> corresponding method of the global instance (if desired the user could
> >> also create an instance himself).
> >
> > This is definitely one area where our efforts diverged.  I really took
> > hold of the idea early on to use an object oriented API throughout
> > because:
> > - I am assuming that people using the tool know python
> > - I want the ability to have binding generation scripts instantiate
> > multiple separate builders
> > - It seemed to be a good idea conceptually to deal with objects throughout
> >
> > I could definitely add a similar interface of global methods that
> > automatically call through to a single global instance, but it
> > wouldn't work for my bindings so I didn't spend to much time on it.
> > If this is a required capability I could easily add it.
>
> Roman Yakovenko wrote:
> > I think object oriented interface is fine. I will not let global variables to
> > come into pyplusplus without good reason. More over, I think, that within same
> > script it should be possible to use more then one module_builder_t( I
> > like this name ).
>
> I introduced the global functions simply as an abbreviation so that I
> could be more concise in my script. My point is not to forbid explicit
> usage of the module builder class. Even in my version of the API you
> could have used the builder class explicitly as Allen did in his version.
> Well, having global functions and an internal global builder class is a
> rather minor feature, I could also live without. My argument is just
> that introducing the global functions has absolutely no effect on those
> people who instantiate the builder themselves, whereas leaving the
> functions out affects those people who would prefer to use them. So why
> not let the user decide for themselves?

Agreed.  How about we get the object-based interface finalized first
and then we come back and introduce an optional way to use global
methods later.

> Roman Yakovenko wrote:
> >> In Allen's version there are three main "control methods": parse(),
> >> createCreators(), createModule(). In my version, I have the three
> >> methods parse(), codeCreators() and writeFiles() which serve the same
> >> purpose (as said above, these methods are also available as functions).
> >> In both versions, the second step (creating the code creators) is done
> >> internally if it wasn't done explicitly by the user (in my version I
> >> also applied that rule for the parse() step, but I admit that probably
> >> everyone has to do that step manually anyway (but it's a nice feature
> >> for a "Hello World" example :) ).
> >
> > I can think about an other approach: properties
> > For example, class module_builder_t will have 4 properties:
> > - parser configuration
> >    keeps all data to configure pygccxml parser
> > - code creators factory configuration
> >    keeps all data to configure module_creator.creator_t class
> > - declarations
> >    returns declarations
> >    within this property, files will be parsed, only once, and
> > declaration tree will be returned.
> > - module_creator
> >    returns module_t code creator that has been created using by creator_t class
> >
> > In this way user don't need to think "parsing" and "code creators
> > factory", but rather I have a set of declarations, lets do some adaptation.
>
> I have to admit that I caught myself forgetting to call parse() before
> trying to access the declaration tree when I was setting up a simple
> pyplusplus example. But on the other hand, triggering such a "big"
> operation like parsing the headers just by accessing an attribute sounds
> unusual. But then, you didn't say how the attribute access would look
> like. The parse() step could really be done internally once the user
> calls any of the Class(), Method(), etc. methods which is basically what
> you were proposing. I think this is not such a bad idea at all, I'm in
> favor of trying it out. :)

I like Matthias's method names a little better here.  I don't really
see the attraction of using properties to trigger such side-effects,
but I definitely don't use properties anywhere near as much as Roman
does in his code. :)

For now can we try using modified names based on Matthias's ideas and
making everything happend automatically if someone skips a step?  I
have update my code base to do this for now and then we can refine
from there.

> Allen Bierbaum wrote:
> >> In both versions, there are methods Class, Method, Function, etc. to
> >> select one or more particular declarations that can then be decorated to
> >> customize the final bindings. In Allen's version, these function either
> >> return a DeclWrapper or MultiDeclWrapper object (depending on whether
> >> the selection contains one or more declarations). In my version, the
> >> return value is an IDecl object (that always acts like a MultiDeclWrapper).
> >> Decorating the declarations also looks almost the same in both versions.
> >
> > I thought about doing this similar to Matthias, but I decided that I
> > wanted an easy ability to detect user errors and give good warnings.
> > What I found was that by splitting this is two I could have a separate
> > interface for MultiDeclWrapper (the case where multiple declarations
> > are wrapped) and only allow methods that made sense for multiple
> > declarations.  Similarly this interface can modify the way the methods
> > operate to make them take into account they they are wrapping multiple
> > declarations.   If I made everything wrap multiple declarations then I
> > would have to add test/handling code in each method to check wether
> > the method was valid.
> >
> > I am not too hung up on this though as it was more an implementation
> > detail then anything else.
>
> I agree that the decision whether there should be two declaration
> wrapper classes or only one is really just an implementation detail.
> I suppose the question rather is what interface we would like to have on
> that declaration wrapper(s) and whether the interface should depend on
> a) the number of contained declarations and b) on the type of the
> contained declaration(s).
> Our implementations agreed in that they did not base the interface on
> the declaration type (which means there should already be test/handling
> code in each method). I also didn't base the interface on the number of
> contained declarations because I thought whenever I call a method on a
> MultiDecl object I could just as well iterate over the contained
> declarations and call that method on each of them individually. And
> that's basically what I'm doing, relieving the user from having to write
> that loop himself.

There were some methods that I found I didn't think this was a good
thing to do.  For example, add_method and rename.  I didn't think it
made much sense to rename multiple declarations to the same name or to
add the same method to multiple found classes in a namespace.

This probably makes more sense with your API though since you can
query declarations across parents.  If we support that feature then I
would revisit my conclusions about what should be allowed to be called
on a multi-declaration.

> Roman Yakovenko wrote:
> > mb = module_builder_t( ... )
> > mb.class_ = mb.class_group
> >
> > You replace function that return 1 class with function that returns
> > many classes. Your code will work without changes.
>
> If the basic idea behind this can be rephrased as "let the user
> customize the API", then I think I can agree, but I'd do it the other
> way. Instead of replacing methods by new methods I would just allow to
> set options that alter the semantics of the methods a little bit. For
> example, you could provide new default values for arguments (like the
> recursive flag mentioned somewhere below) or you could enable/disable
> the automatic assertion feature that I've mentioned in an earlier mail.
> If I want to reference several classes at once I could disable automatic
> assertion for class queries. Whereas if I want to be sure to get exactly
> the class I have specified I enable automatic assertion with a count of 1.

Agreed.  I would rather make the behavior explicit then make it so you
have to use some python magic to make it behave differently with
exactly the same code.

Both the recursive flag and the assertion on number found sound like
good solutions to me.

> Roman Yakovenko wrote:
> > Also we can not join between decl_wrapper and multi_decl_wrapper.
> > Every declaration
> > has set of unique properties like parent or location. Those properties
> > will not be in interface of multi_decl_wrapper.
>
> As mentioned above neither Allen's nor my API bases the declaration
> interface on the *types* of the contained declaration. So currently, you
> don't have that anyway (but this hasn't been a problem for me, and
> obviously neither for Allen as the main purpose of the DeclWrapper class
> is to *decorate* the declarations, the selection has been done earlier).

Agreed.  I don't see any reason to seggregate the declarations based
on "types".  Once of the most powerful aspects of this interface is
that everything is treated identically.

> Allen Bierbaum wrote:
> > There is one area here though where I am a little worried.  Namely I
> > find the way I query only the children of a declaration to be a little
> > more structured.
> >
> > For example with my method the user would always go about build up
> > their module based on the name hierarchy of the module:
> >
> > ns = mod.Namespace("test_ns")
> > class1 = ns.Class("class1")
> > class1_method1 = class1.Method("method1")
> > class2 = ns.Class("class2")
> > class2_method1 = class2.Method("method1")
> >
> > In Matthias's API I believe you could do something where you could ask
> > for all methods named "method1" across the entire decl tree.
>
> Right, you *could*, but you don't *have to*. The above code would work
> in my version just as well with the same semantics, i.e. class1_method
> would only contain the method of class1 and not the one from class2
> because you called Method() on a previous selection of exactly one
> class. Only if you would call Method() on the main namespace (which by
> default also contains all children nodes) or on a class selection that
> references both classes would you get the "method1" methods from both
> classes.

This "all children nodes" or include children interface in your API
was one of the biggest things that confused me.  Can you describe how
it is meant to work or perhaps another similar question: do you think
that using a recursive flag as suggested below could allow the same
functionality without having to say include_children?  Instead we
would just be telling the search methods to recursive  deeply into all
children which I think gives the same functionality if I understand
your goal here.

>
> Suppose I modify the above code and add a line like this (assuming your
> version of the API):
>
> classes = ns.Class("class.*")
>
> This would already address both classes and return a MultiDeclWrapper
> object in the above case. This means, I couldn't call Method() on them
> to further refine my query. But if the library only had a class1 class
> but no class2 class, the above call would return a DeclWrapper object
> and I would get a different interface where Method() is available. In my
> version I wanted to prevent such cases as I consider this to be somewhat
> inconsistent (you cannot tell what interface the returned object has
> just by looking at the above line. You can only answer that question by
> knowing the contents of the headers that were parsed).

And in mine I was trying to avoid this by making it an error to call
Method() on a MultiDeclWrapper because I thought that would imply the
user didn't find the correct thing, namely a single class.  I am
beginning to see that your interface could be more powerful though and
that maybe I should allow things like this.

> The bottom line is that my main argument for my approach would be the
> same as above. Together with auto assertions my approach doesn't affect
> the way you use your API whereas limiting the flexibility affects the
> way I was creating my wrappers. So again, why not letting the users
> decide for themselves which approach suits them best?

I am all for letting the users have the most power possible.  So yes,
I am convinced that your method has merit and should be supported. 
You have convinced me.  At the same time though I want to make sure
that doing simple things is still simple. ie. by default I would
suggest that searches don't go beyond the immediate children of the
declaration.

> Roman Yakovenko wrote:
> > But some time it should be possible to say something
> > like this: give me all declarations that their names start with QXml or QDom.
>
> That's already possible in both versions by using a regular expression
> (such as QDom.*) on the name.

Agreed.  This is definitely supported by both APIs although Matthias's
recursive searching allows you to do it on a wider range of decls at
the same time.

> Allen Bierbaum wrote:
> >> Then I ignore all ()-operators that return a reference to a float or
> >> double by the following line:
> >>
> >> Method("operator()", retval=["float &", "double &"]).ignore()
> >>
> >> Again, this addresses several classes and several methods at once. There
> >> are four filters (and three filter types) involved in this query:
> >
> > This is the one I am not so sure about.  I like the idea of being able
> > to do this but I am not convinced that it should be default behavior
> > to search across the entire declaration tree.
>
> Note however, that by using the global Method() function I more or less
> explicitly stated that I really wanted to search the entire declaration
> tree. When I would have wanted to restrict the query to a particular
> class I would have written:
>
> Class("Foo").Method("operator()", retval=["float &", "double &"]).ignore()
>
> > Maybe something like this instead:
> >
> > ns = mod.Namespace("test_ns")
> > ns.Method("operator()", retval=["float &", "double &"], recursive=True).ignore()
> >
> > (notice the explicit request to recursively search).
>
> Well, I could argue that calling Method() on a namespace and explicitly
> setting recursive to True is sort of redundant. ;)

Not quite.  The way I would look at calling Method() on a namespace is
that you want to find the methods that match that pattern as immediate
children of the namespace.  So for example if you search for methods
of the name: "operator*()" you would expect to get the namespace wide
mult operators and not the operators found inside contained classes
(or nested classes inside the contained classes and so on).

This is one of those cases where I think the simple thing that
non-power users would expect is that they would jsut find the
immediate local methods and not do a full search across everything.

> But apart from that
> I'm fine with it. (Could we also agree on making the default value for
> recursive customizable? Then it almost feels like home... :)

I can agree to that. :)

In fact the more I think about it the more I like the idea of making
it possible to customize the default behavior.  This would allow users
to kind of dial in their desired behavior and then start using it.  As
long as we make the starting defaults something that works well for
new users I think we should be good.


> Allen Bierbaum wrote:
> > In my personal opinion (and I am higly biased) I would summarize the
> > comparison by saying that the prototype I put together may be further
> > ahead on features in general but could definitely be helped out with
> > more expressiveness of queries.
>
> I agree with that summary. :)
>
> > If we could come to some agreement
> > about how queries should work across the decl tree I would like to add
> > to extend my api proposal with the expressiveness of yours.  I could
> > build upon many of the ideas from your implementation and I am already
> > thinking of places in my wrapper scripts where doing so would help
> > simplify my life quite a bit. :)
> >
> > Do you think it would be a good idea for me to refine my prototype
> > with your query system or should we start over with a new code base
> > merging the best ideas?
>
> As our APIs are close enough I don't think we have to start over from
> scratch again. Feel free to take anything you need from my version and
> post any updates as soon as you have them finished so that I can test it
> and maybe even add some stuff. In the meantime, I'll refrain from doing
> more changes to my version.

Okay.  I have made some modifications to my version.  I would like to
get it into a public VCS as soon as possible (see below) so we can
refine further.

> Personally, I think the following items have to be sorted out as quickly
> as possible:
>
> - Where is the main version of the "experimental" API kept? Ideally,
> this should be a cvs/subversion repository that we can all access. I
> guess the only repository that is already there is the pyplusplus
> repository itself. But this would mean Roman would have to reserve an
> area in his repository and give us write access to it. Alternatively,
> I'm fine with keeping the main sources in Allen's hands and sending him
> patches whenever someone actually does changes to the code (I'd
> recommend to announce such attempts here so that everyone knows what
> everyone else is up to. Maybe this is really the time to start using the
> wiki).

IMHO, best option is to see if Roman is willing to open up his CVS on
sf.net.  If that doesn't work out I can setup a subversion repository
on one of my servers to get us going.  Just let me know what you want
to do.

I would suggest starting to use the wiki to track the outstanding
issues to discuss and the on-going discussion about them. (personally
I think this discussion should be taken off the mailing list to cut
down on traffic.  once we have a more refined prototype then we can
come back and ask for comments)


> - What is the internal "decoration" API of pyplusplus? Does the patch
> from Allen already contain everything that is needed? Was this part of
> the patch accepted and applied to cvs? Where is this API documented?

Assuming Roman accepted everything as-is, it contains most but not
everything.  It was designed to be extensible though so we can add new
capabilities as needed.

> - What are the guidelines for writing doc strings and which tool will be
> used to create reference documentation? (I think pyplusplus itself is
> also in dire need of doc string and now that I keep looking at the

Agreed.

> sources I could just as well provide some doc strings myself. But for
> this, I need to know what guidelines I have to follow (should it be
> plain text or is it ok to add some markup for a specific tool? And if
> so, which tool? epydoc? doxygen? etc))

I agree that this needs to be decided.  If we want to keep with
Roman's use of restructured text, epydoc has some nice abilities to
extract this from the doc strings.    (note: I don't know this format
yet but I am willing to learn)

-Allen

> - Matthias -
>
> _______________________________________________
> C++-sig mailing list
> C++-sig at python.org
> http://mail.python.org/mailman/listinfo/c++-sig
>



More information about the Cplusplus-sig mailing list