SPy: a new companion static language for Python; thoughts?
Hello, Do any of you follow SPy (https://github.com/spylang/spy), the new statically compiled variant of Python created by Antonio Cuni? I have been looking into it recently and became convinced it could have a significant impact on the Python ecosystem, including for numerical computing and libraries like NumPy. I wrote up my thoughts in a post entitled "SPy: a vision for its impact on the Python community": https://fluiddyn.pages.heptapod.net/spy-book/impact-on-python/ I would be very interested to hear what this community thinks, both about SPy itself and about the ideas in the post. Best, Pierre Augier (CNRS researcher in fluid mechanics, Grenoble)
Certainly a means of fixing the "two language problem", if it works, would be a great thing. However, the SPy documentation I saw doesn't address what I see as some of the main issues that lead to this problem. The fact that Python doesn't run like C (or Rust, or Fortran) is only partly to do with dynamic typing. It also has to do with the ability for the programmer to reason well about aspects of the algorithm like memory layouts (including things like row-major vs. column-major ordering and how many levels of pointer indirection exist in an object), cache "friendliness", the cost of operations, etc. Most Python code (at least, non-"toy" Python code, as opposed to people using Python sort of like BASIC was used in the early days of computing) is built heavily on high-level, vectorized constructs that seem to actively fight reasoning about these sort of algorithmic details, that end up accounting for the majority of performance optimizations in the "fast languages". I do recall that Numpy has functionality to do things like specifying or querying the memory layout of an array, but I'd be confident in saying that few Python programmers, in fact even few Numpy programmers, ever learn about these, and even fewer use them. For a language like SPy that aims to close the gap between Python and compiled languages, these features need to be prominent, "first class" aspects of the language that are routinely covered in tutorials. It's not JUST numerical algorithms that this applies to. For instance, in Python code it's very common to pass what are essentially structs as dictionaries, where the field names are keys. To make them C-like in performance (and memory usage), these should decay to actual structs, which of course requires that queries/"indices" to these dicts are known at compile time in all instances, i.e. are not the values of variables. As I understand, Python already tries to do this somewhat with "qstr", but when it does this or not is not something a typical Python programmer is given the tools to anticipate, let alone control. Also, in what is kind of the flip side to this, since many already existing Python programmers DO think in terms of high-level, vectorized constructs, having sensible default inferences for the VM in terms of how to map them to decently fast machine code is important for programmers who do NOT want to need to reason about or specify the algorithm details. This includes things like how to minimize the overhead of bounds checks on containers when using iterators or assigning to slices. Some of this probably requires passing high-level aspects of the code through the syntax tree to the machine code compiler, in order to tell it when data access patterns are such that it should emit vector instructions or elide bounds checks. None of this explicitly has to do directly with static vs. dynamic typing. It's true that dynamic typing is one of the main aspects of Python that forces its interpreter to be complex and slow, which in turn is why higher-level, vectorized code is often a must in order to offload as much of the heavy lifting as possible to C code paths within the interpreter itself or C libraries like Numpy. But JUST static typing by itself doesn't magically solve all these issues by itself, and I don't see much in the roadmap about how SPy attempts to address them.
Hi! rosko37@gmail.com wrote:
Certainly a means of fixing the "two language problem", if it works, would be a great thing. However, the SPy documentation I saw doesn't address what I see as some of the main issues that lead to this problem.
The fact that Python doesn't run like C (or Rust, or Fortran) is only partly to do with dynamic typing. It also has to do with the ability for the programmer to reason well about aspects of the algorithm like memory layouts (including things like row-major vs. column-major ordering and how many levels of pointer indirection exist in an object), cache "friendliness", the cost of operations, etc.
all of this is totally in scope for SPy. The idea is to provide low-level mechanisms to write this kind of code (which are then translated to the equivalent C), on top of which to build higher level zero costs abstractions to hide these details from the final user. A good example of this pattern in action is the implementation of `list`, whch is in SPy itself; here you can see that the underlying data is basically a C array, but then it uses SPy-specific features e.g. to statically dispatch the `__getitem__` to the "index version" or to the "slice version": https://github.com/spylang/spy/blob/1c6ad6bdb59541c44b43a55830b2ff6b8c03427c... These low-level features are accessed by using the `unsafe` module. Currently you can use unsafe features "freely", meaning that the end user is potentially able to shoot themselves in the foot, but the long term plan is that "unsafe" features can be accessed only in certain specific areas of code clearly labeled as such. This is similar to what already happens in Python, where there "unsafe" features are "clearly labeled" as C code :).
It's not JUST numerical algorithms that this applies to. For instance, in Python code it's very common to pass what are essentially structs as dictionaries, where the field names are keys. To make them C-like in performance (and memory usage), these should decay to actual structs, which of course requires that queries/"indices" to these dicts are known at compile time in all instances, i.e. are not the values of variables.
again something which is fully supported by SPy. See e.g. the implementation of `tuple` in `stdlib/_tuple.spy`: https://github.com/spylang/spy/blob/1c6ad6bdb59541c44b43a55830b2ff6b8c03427c... Tuples are translated into the equivalent C struct. E.g. `tuple[i32, str]` is translated into `struct _tup { int32_t _item0; spy_str *_item1; }`. Then there is "SPy blue magic" to translate `__getitem__(0)` into a direct `._item0` field access. It will totally be possible to do the same for dicts whose set of keys are known at compile time.
participants (3)
-
Antonio Cuni -
PIERRE AUGIER -
rosko37@gmail.com