Multiple dispatch transpilation
Hi fellow Pythonistas! A colleague of mine recently got me involved in the Julia language and while I was really impressed by the performance, the complete lack of OOP and thus bad discoverability (not to mention all their unicode function names) for me leaves a lot to be desired. TL;DR: Julia uses a type-based multi-method system (multiple dispatch), that because of type-stability allows JIT compilation of very efficient code. Since Pythons single-dispatch classes are a subset of that system, it should in principle be possible to transform the code from |Class.method|to |method(class:type)|, *before compilation*essentially allowing the same optimizations. For the other arguments could utilize the existing type annotations. The idea would be to allow this via decorators etc for code where it makes sense (e.g. high volume number crunching), while keeping the rest of the language free from JIT compiler issues and other nuisances. Very long version: Julia is a high level dynamically typed language, that often approaches or matches C performance. This is due to type-based multiple dispatch and type-stability through specialization. However, the coolest part is that the type system is set-theoretic, which makes it unnecessary to repeat functions for related types. This page <https://ucidatascienceinitiative.github.io/IntroToJulia/Html/WhyJulia>and this blog post <http://www.stochasticlifestyle.com/like-julia-scales-productive-insights-julia-developer/>give a good explanations of how this works. *However*, since with their multiple dispatch system everything can be written without using classes, they have opted to not include any OOP. This leads to a very high amount of functions that are accessible in the global namespace and hinders one to explore the packages in the usual, playful way that probably all of us are used to. So the usual resort is to have to read documentation. What a shame ;) Since all of Julia needs to be JIT compiled, importing packages usually takes a lot of time (it does some pre-compilation on first import, which takes a lot longer, but depending on the system its still quite a while). A quick search for "time-to-first-plot" or "latency" will be plenty to show that this is quite a nuisance. I believe that its perfectly fine for some code to be interpreted, as the python community has frequently shown, so that might not be the best approach. Only performance-critical code needs to be fast. In our world this is usually done by offloading the quick stuff into C, but in that case it gets really hard to expand those libraries (e.g. numpy has its own dispatch system, so pypy cannot help it as much as it could). Also extending numerical types for arrays is basically impossible. So how can we try to retain usability and still gain the performance for the code that needs it? In general I'd argue that multiple dispatch and JIT compilation is fine, as our performance critical code usually runs for very long times, so the added overhead is minimal. If the code doesn't take a large amount of time, you probably don't need to optimize it anyway. I'd argue that one way to get there would be to "transpile" our Python code by inserting a step before compilation that utilizes the type annotations that already exist and then pulls typed versions out of the classes. So e.g. |Class.method(x:int)|would be transformed into the typed version |method(class:type, x:int)|, from where on multiple dispatch could be run. This would have the further advantage, that in Julia it can be very hard to see, which attributes a type should have and which methods necessarily have to be defined in order to guarantee library compatibility. In OOP, we can easily achieve that by defining the type as an abstract base class. A lot of the statements about Julia here come from Jeff Bezansons (one of the Julia main devs PhD thesis) <https://dspace.mit.edu/handle/1721.1/99811>. It's a quick read and I would in general recommend reading it. On the one hand, in general I think that julia is more compiler-friendly than it is user or tooling-friendly (since methods for your type can be defined outside of the scope of the class, its very hard for your IDE to tell you if that function you just misspelled doesn't exist). On the other hand, its quite nice to be able to create a custom array and just push it through all the numerical libraries in a very efficient way, which works because you have defined all the required methods. I think this way of speeding up things would be cooler than enabling static typing which many people are asking for, as we still retain the flexibility (or might even increase it in the case of arrays), while greatly speeding up code. Since I only understand all these things at a high level, I'm sure that I've greatly simplified or misrepresented something. The real hope here is that someone smarter than me is inspired by that idea and manages to build something cool. I'd by happy to help in any way I can though. Thanks for reading this far, you are a good person :* Cheers, Marvin
Numba already provides as-needed JIT compilation and type annotations. I am not sure it supports multiple dispatch but if that was useful it could be added with decorators. I am not sure integrating this into the language will help much. On Tue, Mar 16, 2021, 10:07 Marvin van Aalst <marvin.van.aalst@hhu.de> wrote:
Hi fellow Pythonistas!
A colleague of mine recently got me involved in the Julia language and while I was really impressed by the performance, the complete lack of OOP and thus bad discoverability (not to mention all their unicode function names) for me leaves a lot to be desired.
TL;DR:
Julia uses a type-based multi-method system (multiple dispatch), that because of type-stability allows JIT compilation of very efficient code. Since Pythons single-dispatch classes are a subset of that system, it should in principle be possible to transform the code from Class.method to method(class:type), *before compilation* essentially allowing the same optimizations. For the other arguments could utilize the existing type annotations. The idea would be to allow this via decorators etc for code where it makes sense (e.g. high volume number crunching), while keeping the rest of the language free from JIT compiler issues and other nuisances.
Very long version:
Julia is a high level dynamically typed language, that often approaches or matches C performance. This is due to type-based multiple dispatch and type-stability through specialization. However, the coolest part is that the type system is set-theoretic, which makes it unnecessary to repeat functions for related types. This page <https://ucidatascienceinitiative.github.io/IntroToJulia/Html/WhyJulia> and this blog post <http://www.stochasticlifestyle.com/like-julia-scales-productive-insights-julia-developer/> give a good explanations of how this works. *However*, since with their multiple dispatch system everything can be written without using classes, they have opted to not include any OOP. This leads to a very high amount of functions that are accessible in the global namespace and hinders one to explore the packages in the usual, playful way that probably all of us are used to. So the usual resort is to have to read documentation. What a shame ;)
Since all of Julia needs to be JIT compiled, importing packages usually takes a lot of time (it does some pre-compilation on first import, which takes a lot longer, but depending on the system its still quite a while). A quick search for "time-to-first-plot" or "latency" will be plenty to show that this is quite a nuisance. I believe that its perfectly fine for some code to be interpreted, as the python community has frequently shown, so that might not be the best approach. Only performance-critical code needs to be fast. In our world this is usually done by offloading the quick stuff into C, but in that case it gets really hard to expand those libraries (e.g. numpy has its own dispatch system, so pypy cannot help it as much as it could). Also extending numerical types for arrays is basically impossible.
So how can we try to retain usability and still gain the performance for the code that needs it? In general I'd argue that multiple dispatch and JIT compilation is fine, as our performance critical code usually runs for very long times, so the added overhead is minimal. If the code doesn't take a large amount of time, you probably don't need to optimize it anyway. I'd argue that one way to get there would be to "transpile" our Python code by inserting a step before compilation that utilizes the type annotations that already exist and then pulls typed versions out of the classes. So e.g. Class.method(x:int) would be transformed into the typed version method(class:type, x:int), from where on multiple dispatch could be run.
This would have the further advantage, that in Julia it can be very hard to see, which attributes a type should have and which methods necessarily have to be defined in order to guarantee library compatibility. In OOP, we can easily achieve that by defining the type as an abstract base class.
A lot of the statements about Julia here come from Jeff Bezansons (one of the Julia main devs PhD thesis) <https://dspace.mit.edu/handle/1721.1/99811>. It's a quick read and I would in general recommend reading it.
On the one hand, in general I think that julia is more compiler-friendly than it is user or tooling-friendly (since methods for your type can be defined outside of the scope of the class, its very hard for your IDE to tell you if that function you just misspelled doesn't exist). On the other hand, its quite nice to be able to create a custom array and just push it through all the numerical libraries in a very efficient way, which works because you have defined all the required methods. I think this way of speeding up things would be cooler than enabling static typing which many people are asking for, as we still retain the flexibility (or might even increase it in the case of arrays), while greatly speeding up code.
Since I only understand all these things at a high level, I'm sure that I've greatly simplified or misrepresented something. The real hope here is that someone smarter than me is inspired by that idea and manages to build something cool. I'd by happy to help in any way I can though.
Thanks for reading this far, you are a good person :*
Cheers,
Marvin _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/Z26RAD... Code of Conduct: http://python.org/psf/codeofconduct/
On 17/03/21 2:54 am, Marvin van Aalst wrote:
it should in principle be possible to transform the code from |Class.method|to |method(class:type)|, *before compilation*essentially allowing the same optimizations.
I don't think this transformation would help with anything. If the static analysis can work out what class of object a name refers to, and what methods and attributes it has, it can generate code that is just as efficient. -- Greg
I quite like multiple dispatch. It's odd to claim it's not OOP, since a relatively old object-oriented system, the Common Lisp Object System (CLOS) uses this as it's fundamental construct. A long time ago, I wrote about and created an example library for multimethods. Lots of other folks have done so as well. Mine was old enough to precede decorators, which I think is the best way to express it in modern Python. Guido had demonstrated a simple system when decorators were pretty new. While I would like an implementation added to `functools`, I think it's exceptionally unlikely it would have anything to do with speeding up Python. It's a nice way to express certain programs, but not any slower or faster than other styles. On Tue, Mar 16, 2021, 6:13 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
On 17/03/21 2:54 am, Marvin van Aalst wrote:
it should in principle be possible to transform the code from |Class.method|to |method(class:type)|, *before compilation*essentially allowing the same optimizations.
I don't think this transformation would help with anything. If the static analysis can work out what class of object a name refers to, and what methods and attributes it has, it can generate code that is just as efficient.
-- Greg
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ZEWMGY... Code of Conduct: http://python.org/psf/codeofconduct/
participants (4)
-
David Mertz
-
Greg Ewing
-
Marvin van Aalst
-
Todd