Dear all,

Following today's blog post [1] about wrapping C++ libraries, I would like to take the opportunity to get some opinions from the PyPy community on a related topic. I hope this mailing list is the appropriate place for this sort of discussion.

My name is Gertjan van Zwieten and I have been a Python enthusiast since long. I have used it extensively in the context of my PhD project, which was related to Finite Element modelling, and I have recently taken steps together with a friend of mine towards continuing on this path commercially [2]. We intend to design our own Finite Element framework, which we aim to be open source, written largely in Python. It is the 'largely' that I would like to discuss here.

I have always argued to people that, yes, it is possible to do computationally intensive things in an interpreted language, as long as the majority of work is done by compiled components. For me the obvious example of that is Numpy, which provides a rather specific data type and a set of vectorized operations, as well as an interface with optimized linear algebra libraries. This leaves Python as a convenient and very powerful glue language to connect these optimized components into something purposeful. With that conceptualization in mind, my preferred route for optimizing code was to identify critical components of generic nature, and re-implement them as a plain C module using the Python API.

So this is how I used to look at things. I would say that things were nice and clear this way, until PyPy started throwing stones in the water.

I have not so far used PyPy in any actual application, mainly (if not only) because of lacking Numpy support. But I can see it holds great promise, especially for the computational community (I am also eagerly following the transactional memory developments) and I anticipate making the switch after it matures a little bit further. However, that does put me in the situation that, currently, I am not sure anymore what things to implement in C, if any, and in what form.

Right now I am prototyping our framework using only Numpy and pure Python. I avoid premature optimization as much as possible, and in fact at times find myself sacrificing speed for elegance, arguing that I will be able to bring back efficiency later on in a deeper layer. For example, rather than adopting a numbering scheme for the many computational elements such as is common in Finite Element implementations, I have moved to a more object oriented approach that is more flexible and allows for more error checking, but that forces me to put in dictionaries what used to be Numpy arrays. Obviously dictionaries were not meant to be numeric data structures, and it's a typical component that I would try to implement efficiently in C eventually.

With PyPy at the horizon, I am not so sure anymore. For one I'm not sure if PyPy will ever be able to use the C module, or use it efficiently; I could understand if support exists merely for sake of compatibility. I am also not certain if I should maybe forget about C altogether and rely on the JIT to compile the Python for loops that I have always tried to avoid. Would you go as far as to say that there will be no more reason for low level programming whatsoever? Or would you advise to write the component in RPython and use the translator to compile it? With my poor overview of these things, there are very few arguments that I can formulate in favour or against any of these options.

Regarding today's blog post, I have the feeling that this is meant more for wrapping existing C++ libraries than for starting new ones, is that correct? Or if not, and it is, in fact, an option to consider, will this be able to work in CPython, too? That would make the transition a bit easier, obviously.

I am very interested what your views are on this topic of optimizations.

Best regards and thanks a lot for working on PyPy!

Gertjan

[1] http://morepypy.blogspot.com/2011/08/wrapping-c-libraries-with-reflection.html
[2] http://hvzengineering.nl