[C++-sig] Pyste: your opinion about some changes

Sun Jul 13 00:06:21 CEST 2003

Hi everyone!

Prabhu and I have engaged in some discussions on irc about Pyste, and 
came up with some ideas on how to fix The Order Bug, and the workings of 
Pyste in general, and would like to know your opinion on this.

The Order problem is this: class hierarchies must be exported in order, 
from the base class to the most-derived class. The naive approach, parse 
all header files, look up all the bases in the header files and order 
the classes, occupies too much memory, making it prohibitive with too 
much classes.

The first idea was based on the suggestion by David of using 
pickle/shelve to hold the declarations in a form that could be easily 
swaped in/out of disk as needed, solving then the memory problem. We 
implemented this, it works correctly, but it is too slow, sometimes 
prohibitive. Plus, Prabhu noted a flaw in Pyste that was always there: 
you always generate the *entire* wrapper code, there's no support for 
generating only the wrapper code for a single pyste file. So, he would 
change something in a Pyste file, like excluding a function, and *all* 
the wrapper code would be generated again, taking another 10 minutes in 
his machine. Of course, a more incremental approach is needed.

We have thought that a viable solution is that we could lend the 
responsability of ordering the classes to the users. Ideally, one could 
do this in a Pyste file:

    Class('Derived', ...)
    Class('Base', ...)

And Pyste would first instantiate Base, and then Derived. While this is 
a nice feature, it generates some of complications:

- Given Derived, we must know what are its Bases, and we can only do 
that by calling gccxml.
- Given Derived, we must know if Base was exported, so we can put 
"bases<Base>" inside the class_. If the user just exported Derived, 
"bases<Base>" should not be generated.

We decided to drop this feature, since while nice, it is not totally 
necessary, since while exporting hierarchies it is natural to export the 
bases first. So, the user must write:

    Class('Base', ...)
    Class('Derived', ...)

as he would if he were writing the wrapper by hand. What do you people 
feel about this?

For Pyste to generate the correct code for a given class, it must know 
all the other classes that are also being exported, as explained above. 
So we must pass all pyste files to Pyste somehow before being able to 
produce code for any class.

pyste --module=foo foo1.pyste foo2.pyste

This generates a file named foo.cpp, with all the wrapper code on it. 
While convenient, this is impratical, since compile times can get very 
high for large libraries. That is why --multiple was created:

pyste --multiple --module=foo foo1.pyste foo2.pyste

That generates 3 files, _main.cpp, _foo1.cpp and _foo2.cpp. Compiling 
and linking them together gives the same results as without the 
--multiple flag.

And now, to solve the incrementing generation problem, we came up with 2 
approaches, and would like to know the opinions of everyone in the list 
that are interested. Both of them aim to improve --multiple, and should 
be faster then the current system.

1. One approach is to add an option like --wrap-only <pyste file>, that 
would generate just the code related to the pyste file, and --main-only, 
that would generate just the _main.cpp file:

pyste --wrap-only foo1.pyste --multiple --module=foo foo1.pyste foo2.pyste

Would generate just _foo1.cpp. Notice that we still pass all pyste files 
to the command line, because as explained before, Pyste must know all 
classes that are being exported to be able to generate correct code.

pyste --main-only --multiple --module=foo foo1.pyste foo2.pyste

Would generate just _main.cpp.

The advantage with this approach is that it basically extends the 
current workings of Pyste, ie, users won't have to change anything to 
keep using it. The disadvantage of this method is that it looks weird, 
since you have to pass all pyste files even thought you may be 
interested in generating code for just one.

2. Another approach is to make the dependencies explicit in the pyste 
file by using another function, Import. This would make clear that for a 
given pyste file to be exported, the Imported pyste file would have to 
be taken in account. Going back to the Base/Derived example, we would 
have either:

all.pyste:

Class('Base', ...)
Class('Derived', ...)

or:

base.pyste:

Class('Base', ...)

derived.pyste:

Import('base.pyste')
Class('Derived', ...)

That way, the dependencies between the files is explicit, and the user 
is no longer required to pass all the files in the command line:

pyste --multiple --module=foo derived.pyste

would generate _derived.cpp, and:

pyste --multiple --module=foo base.pyste

would generate _base.cpp. To generate _main.cpp, the user would have to 
call:

pyste --only-main --multiple --module=foo base.pyste derived.pyste

The advantages of this method is that the dependencies between the files 
are explicit in the pyste files themselves, plus it feels more natural 
than passing all pyste files in the command line in order to generate 
code for only one of them. The disadvantages is that it complicates the 
pyste files a little, and changes the way that Pyste currently works.

Whew, that's all. Opinions, anyone?

Regards,
Nicodemus.