Mailman 3 Multiple dispatch transpilation - Python-ideas

16 Mar 2021

      Hi fellow Pythonistas!

A colleague of mine recently got me involved in the Julia language and 
while I was really impressed by the performance, the complete lack of 
OOP and thus bad discoverability (not to mention all their unicode 
function names) for me leaves a lot to be desired.

TL;DR:

Julia uses a type-based multi-method system (multiple dispatch), that 
because of type-stability allows JIT compilation of very efficient code. 
Since Pythons single-dispatch classes are a subset of that system, it 
should in principle be possible to transform the code from 
|Class.method|to |method(class:type)|, *before compilation*essentially 
allowing the same optimizations. For the other arguments could utilize 
the existing type annotations. The idea would be to allow this via 
decorators etc for code where it makes sense (e.g. high volume number 
crunching), while keeping the rest of the language free from JIT 
compiler issues and other nuisances.

Very long version:

Julia is a high level dynamically typed language, that often approaches 
or matches C performance. This is due to type-based multiple dispatch 
and type-stability through specialization. However, the coolest part is 
that the type system is set-theoretic, which makes it unnecessary to 
repeat functions for related types. This page 
<https://ucidatascienceinitiative.github.io/IntroToJulia/Html/WhyJulia>and 
this blog post 
<http://www.stochasticlifestyle.com/like-julia-scales-productive-insights-julia-developer/>give 
a good explanations of how this works. *However*, since with their 
multiple dispatch system everything can be written without using 
classes, they have opted to not include any OOP. This leads to a very 
high amount of functions that are accessible in the global namespace and 
hinders one to explore the packages in the usual, playful way that 
probably all of us are used to. So the usual resort is to have to read 
documentation. What a shame ;)

Since all of Julia needs to be JIT compiled, importing packages usually 
takes a lot of time (it does some pre-compilation on first import, which 
takes a lot longer, but depending on the system its still quite a 
while). A quick search for "time-to-first-plot" or "latency" will be 
plenty to show that this is quite a nuisance. I believe that its 
perfectly fine for some code to be interpreted, as the python community 
has frequently shown, so that might not be the best approach. Only 
performance-critical code needs to be fast. In our world this is usually 
done by offloading the quick stuff into C, but in that case it gets 
really hard to expand those libraries (e.g. numpy has its own dispatch 
system, so pypy cannot help it as much as it could). Also extending 
numerical types for arrays is basically impossible.

So how can we try to retain usability and still gain the performance for 
the code that needs it? In general I'd argue that multiple dispatch and 
JIT compilation is fine, as our performance critical code usually runs 
for very long times, so the added overhead is minimal. If the code 
doesn't take a large amount of time, you probably don't need to optimize 
it anyway. I'd argue that one way to get there would be to "transpile" 
our Python code by inserting a step before compilation that utilizes the 
type annotations that already exist and then pulls typed versions out of 
the classes. So e.g. |Class.method(x:int)|would be transformed into the 
typed version |method(class:type, x:int)|, from where on multiple 
dispatch could be run.

This would have the further advantage, that in Julia it can be very hard 
to see, which attributes a type should have and which methods 
necessarily have to be defined in order to guarantee library 
compatibility. In OOP, we can easily achieve that by defining the type 
as an abstract base class.

A lot of the statements about Julia here come from Jeff Bezansons (one 
of the Julia main devs PhD thesis) 
<https://dspace.mit.edu/handle/1721.1/99811>. It's a quick read and I 
would in general recommend reading it.

On the one hand, in general I think that julia is more compiler-friendly 
than it is user or tooling-friendly (since methods for your type can be 
defined outside of the scope of the class, its very hard for your IDE to 
tell you if that function you just misspelled doesn't exist). On the 
other hand, its quite nice to be able to create a custom array and just 
push it through all the numerical libraries in a very efficient way, 
which works because you have defined all the required methods. I think 
this way of speeding up things would be cooler than enabling static 
typing which many people are asking for, as we still retain the 
flexibility (or might even increase it in the case of arrays), while 
greatly speeding up code.

Since I only understand all these things at a high level, I'm sure that 
I've greatly simplified or misrepresented something. The real hope here 
is that someone smarter than me is inspired by that idea and manages to 
build something cool. I'd by happy to help in any way I can though.

Thanks for reading this far, you are a good person :*

Cheers,

Marvin

Multiple dispatch transpilation

Marvin van Aalst

Todd

Greg Ewing

David Mertz

tags

participants (4)