[summerofcode] First release of Shed Skin, a Python-to-C++ compiler.

Mark Dufour mark.dufour at gmail.com
Sun Sep 11 00:36:41 CEST 2005


After nine months of hard work, I am proud to introduce my baby to the
world: an experimental Python-to-C++ compiler. It can convert many
Python programs into optimized C++ code, without any user intervention
such as adding type declarations. It uses rather advanced static type
inference techniques to deduce type information by itself. In
addition, it determines whether deduced types may be parameterized,
and if so, it generates corresponding C++ generics. Based on deduced
type information, it also attempts to convert heap allocation into
stack and static preallocation (falling back to libgc in case this
fails.)

The compiler was motivated by the belief that in many cases it should
be possible to automatically deduce C++ versions of Python programs,
enabling users to enjoy both the productivity of Python and the
efficiency of C++. It works best for Python programs written in a
relatively static C++-style, in essence enabling users to specify C++
programs at a higher level.

At the moment the compiler correctly handles 124 unit tests, six of
which are serious programs of between 100 and 200 lines:

  -an othello player
  -two satisfiability solvers
  -a japanese puzzle solver
  -a sudoku solver
  -a neural network simulator

Unfortunately I am just a single person, and much work remains to be
done. At the moment, there are several limitations to the type of
Python programs that the compiler accepts. Even so, there is enough of
Python left to be able to remain highly productive in many cases.
However, for most larger programs, there are probably some minor
problems that need to be fixed first, and some external dependencies
to be implemented/bridged in C++.

With this initial release, I hope to attract other people to help me
locate remaining problems, help implement external dependencies, and
in the end hopefully even to contribute to the compiler itself. I
would be very happy to receive small programs that the compiler does
or should be able to handle. If you are a C++ template wizard, and you
would be interested in working on the C++ implementation of builtin
types, I would also love to get in contact with you. Actually, I'd
like to talk to anyone even slightly interested in the compiler, as
this would be highly motivating to me.

The source code is available at the following site. Please check the
README for simple installation/usage instructions. Let me know if you
would like to create ebuild/debian packages.

Sourceforge site: http://shedskin.sourceforge.net
Shed Skin blog: http://shed-skin.blogspot.com

Should you reply to this mail, please also reply to me directly. Thanks!


Credits

Parts of the compiler have been sponsored by Google, via its Summer of
Code program. I am very grateful to them for keeping me motivated
during a difficult period. I am also grateful to the Python Software
Foundation for chosing my project for the Summer of Code. Finally, I
would like to thank my university advisor Koen Langendoen for guiding
this project.


Details

The following describes in a bit more detail various aspects of the
compiler. Before seriously using the compiler, please make sure to
understand especially its limitations.

Main Features

  -very precise, efficient static type inference (iterative object
contour splitting, where each iteration performs the cartesian product
algorithm)
  -stack and static pre-allocation (libgc is used as a fall-back)
  -support for list comprehensions, tuple assignments, anonymous funcs
  -generation of arbitrarily complex class and function templates
(even member templates, or generic, nested list comprehensions)
  -binary tuples are internally analyzed 
  -some understanding of inheritance (e.g. list(dict/list) becomes
list<pyiter<A>>)
  -hierarchical project support: generation of corresponding C++
hierarchy, including (nested) Makefiles; C++ namespaces
  -annotation of source code with deduced types
  -builtin classes, functions (enumerate, sum, min, max, range, zip..)
  -polymorphic inline caches or virtual vars/calls (not well tested)
  -always unbox scalars (compiler bails out with error if scalars are
mixed with pointer types)
  -full source code available under the MIT license

Main Limitations/TODO's

  -Windows support (I don't have Windows, sorry)
  -reflection (getattr, hasattr), dynamic inheritance, eval, ..  
  -mixing scalars with pointer types (e.g. int and None in a single variable) 
  -mixing unrelated types in single container instance variable other
than tuple-2
  -holding different types of objects in tuples with length >2;
builtin 'zip' can only take 2 arguments.
  -exceptions, generators, nested functions, operator overloading
  -recursive types (e.g. a = []; a.append(a))
  -expect some problems when mixing floats and ints together
  -varargs (*x) are not very well supported; keyword args are not supported yet
  -arbitrary-size arithmetic
  -possible non-termination ('recursive customization', have not
encountered it yet)
  -profiling will be required for scaling to very large programs
  -combining binary-type tuples with single-type tuples (e.g. (1,1.0)+(2,))
  -unboxing of small tuples (should form a nice speedup)
  -foreign code has to be modeled and implemented/bridged in C++ 
  -some builtins are not implemented yet, e.g. 'reduce' and 'map'


More information about the summerofcode mailing list