Tensor type checking - updates
(Moving this thread here from the other mailing list.) Thanks for the update, Alfonso. Cool to know there's active work on this in Pyre - and that proper support for variadics is close to becoming a reality, if Mark is planning to submit a PEP. In that sense, I believe that there are many reasons to think that is not a
good idea to create yet another type checker for Python. In the particular case of Deepmind, my humble suggestion would be to contribute to Mypy or Pyre, at least for the part related with the type system.
Right. Initially I was thinking it might be better to have a separate tool for shape checking so that it might be easier to plug into existing infrastructure for folks who have already committed to a particular type checker (e.g. we use pytype, so if we implemented support in Mypy we'd have to run both, and that seemed like it might have some complications) - but on reflection I agree it would be better to contribute to an existing checker a) to make use of the existing framework that checker would provide and b) yeah, to avoid proliferation of tools. (Also, to be clear: we don't have any official work on this at DeepMind; just a few of us interested and playing around in our free time.) for example arithmetic on types, which actually is something that we are
currently working on, in case that anyone is interested.
So it sounds like Pyre already has a pretty developed roadmap for this - super cool :) In that case, maybe the question is: what are you working on already, and what's left to be done - that is, what external contributions would be most useful? On Mon, 15 Jun 2020 at 11:39, <proyect.hd@gmail.com> wrote:
Hi everyone,
Thanks once again for bringing attention to this important topic.
I think that we all agree that the main step in this direction is the introduction of variadics, what has been mentioned several times 1 <https://docs.google.com/document/d/1vpMse4c6DrWH5rq2tQSx3qwP_m_0lyn-Ij4WHqQq...> , 2 <https://github.com/python/typing/issues/513>, 3 <https://github.com/pytorch/pytorch/issues/26889>.
Variadic support is more mature than it seems. In the case of Pyre, from already one year ago we have support for variadics. The first official proposal that I recall was at last year Python Typing Summit (here <https://github.com/facebook/pyre-check/blob/master/docs/Variadic_Type_Variab...>). The syntax is aligned with the proposal that Guido has shared. However, iirc the initial syntax relied to much on Concatenate/Expand making it verbose and ambiguous when there are 2 variadics. For that purpose, the current syntax relies on capture groups "[]" for manually specifying the part of the type that correspond to the variadic, and only requires Concatenate for concatenating types and variadics. More about the final syntax can be seen here (here <https://docs.google.com/document/d/13TwsqnLgEgLH-zQMYTs4xgMTg9NHCl9qSPcTdtGi...>). Although I don't want this to become a PEP, the way the syntax works with the proposed example is:
tf.Tensor[Batch, Time, [64, 64, 3]]
Special cases could be considered to make it more ergonomic when there is only one variadic at the end, in general having an unambiguous syntax is a must.
Regarding maturity, afaik Mark Mendoza had informal conversations with the community regarding the proposed syntax and got positive feedback, and he currently plans to submit a PEP, once the Parameter Specification PEP gets merged (here <https://www.python.org/dev/peps/pep-0612/>).
Hence, if we assume that we are not so far from agreeing on a final syntax, then the next question is about having actual support for it. As I mentioned, Pyre already supports it so it could be used for testing ideas and giving developers the opportunity of writing code stubs before other type checkers get support. In that sense, I believe that there are many reasons to think that is not a good idea to create yet another type checker for Python. In the particular case of Deepmind, my humble suggestion would be to contribute to Mypy or Pyre, at least for the part related with the type system. Regarding Teddy's work, afaik Teddy was trying to contribute to Mypy itself so perhaps his checker is more a proof of concept. I hope that he will tell us more about it.
About static shape checkers that rely on abstract interpretation, for sure there is room for them, but it will be better if they rely on the more advanced type specifications that variadics will introduce. After all, variadics at this point will only be useful for some use cases, since for many tensor operations other functionalities are need, for example arithmetic on types, which actually is something that we are currently working on, in case that anyone is interested.
Finally, if there are many teams working in this direction I would suggest to organise some online meetings, inspired in what they do at MLIR (here <https://docs.google.com/document/d/1y_9f1AbfgcoVdJh4_aM6-BaSHvrHl8zuA5G4jv_9...> )
Best, Alfonso.
PS: I would propose to keep future discussion in Python Typing mailing list. Many people that would be interested in this sort of discussions follow that list.
On Mon, 15 Jun 2020 at 11:36, Matthew Rahtz <mrahtz@google.com> wrote:
Thanks for the links, Guido! That first one in particular has some interesting ideas I hadn't seen before. I'll incorporate them as options into the doc we're preparing.
Adam -
IMO the biggest bottleneck to porting my tool, or implementing any other
one, is a good abstract interpreter for Python that can handle programs that are incomplete, have syntax errors, etc
Interesting. I'm guessing you're saying this with a view to having good support in e.g. IDEs? Sergei, am I right in thinking you have some experience with this? I guess PyCharm must also have some internal solution to this; I'm not sure how hard it would be to use it as a standalone tool...
I also wanted to add that in my experience implementing an interpreter in
Python will likely make it more difficult to write something good in the long term, because we'll miss out on all the nice things like ADTs with pattern matching, exhaustiveness checks, etc.
Also interesting. Yes, let's keep this in mind.
On Fri, 12 Jun 2020 at 17:24, Guido van Rossum <guido@python.org> wrote:
Here are some links to docs with well thought-out ideas for additions to the standard Python (static) type system as described by PEP 484 to support numpy(-ish) array types and shapes:
Here's a doc I've held on to (mostly written by Ivan Levkivskiy) with a list of proposals that made the rounds at PyCon 2019 (and even before).
https://paper.dropbox.com/doc/Type-system-improvements--A1zE~KliBDY3oh4Bf2bU...
Also:
https://paper.dropbox.com/doc/Static-typing-of-Python-numeric-stack-summary-...
And check out the typing summit schedule:
https://paper.dropbox.com/doc/Typing-Summit-Schedule--A1ylce3sozRREf~6CEi0Ck...
On Fri, Jun 12, 2020 at 9:19 AM 'Adam Paszke' via Python shape checkers < python-shape-checkers@googlegroups.com> wrote:
Hi everyone,
Thanks a lot for starting the list Matthew! It'll be great for all of us to get together.
Since you've asked about the stage of my project, the Swift prototype is working pretty nicely, but I didn't get to porting it to Python (except for a *very early* prototype in Haskell). However, if there is sufficient interest in such an effort, then I'm happy to reprioritize and I'm pretty sure that I could spend even up to 50% of my time on that.
IMO the biggest bottleneck to porting my tool, or implementing any other one, is a good abstract interpreter for Python that can handle programs that are incomplete, have syntax errors, etc. That's an effort we could definitely share, and in the future we can even prototype multiple different checkers on top of this common infrastructure. In Swift this was easy, because I could write a very simple interpreter for SIL (Swift Intermediate Representation), which was already quite low-level. Mirroring Python's semantics faithfully will be much more difficult as even "trivial" operations like attribute lookups can get extremely complicated.
I also wanted to add that in my experience implementing an interpreter in Python will likely make it more difficult to write something good in the long term, because we'll miss out on all the nice things like ADTs with pattern matching, exhaustiveness checks, etc. Haskell/OCaml make those things a breeze. I do understand that they might make onboarding a little slower, but I wouldn't want to rule them as an option out just yet, as it makes a huge difference.
Also, it's super cool to see that other people are getting to implementing more static checkers too. I'll have to read Teddy's thesis soon to better understand how it relates to what I did.
Best, Adam
On Fri, Jun 12, 2020 at 1:47 PM 'mrahtz' via Python shape checkers < python-shape-checkers@googlegroups.com> wrote:
Hey everyone!
First off - welcome to the group! There's been scattered interest in shape checking for some time, so in coming all together in one place here rather than in scattered email threads and GitHub issues and Slack channels I'm hoping we can push this through to something suitable for widespread use.
To summarise the current state of shape annotation and checking, there are three categories of things to care about:
- Defining the syntax for how code should be annotated with shapes - Runtime shape checkers - Static shape checkers
*Syntax*
Stephan made a great start with Ideas for array shape typing in Python <https://docs.google.com/document/d/1vpMse4c6DrWH5rq2tQSx3qwP_m_0lyn-Ij4WHqQq...>. A group of us at DeepMind have been working on a followup which goes into more detail which we should be able to share soon once it's been cleaned up.
There's also the syntax that tsalib <https://github.com/ofnote/tsalib> uses, though only allows annotation of shapes (without specification of e.g. whether something is a tf.Tensor or a np.ndarray, and without any info on data type).
*Runtime shape checkers*
Some existing options here include ShapeGuard <https://github.com/Qwlouse/shapeguard> (which is what most folks at DeepMind are using at the moment) and tsanley <https://github.com/ofnote/tsanley>. We also have an internal prototype at DeepMind that can check annotations like x: tf.Tensor[Batch, Time, 64, 64, 3] which we're still working on.
*Static shape checkers*
This is where things get interesting. A static shape checker is probably what it's going to take for us to be able to say the problem of shape checking is 'solved'.
One bottleneck here is support for variadic generics in existing static type checkers. Pyre apparently has prototype support for this in its ListVariadic <https://github.com/facebook/pyre-check/blob/master/examples/pytorch/stubs/_t...> type (see also this PyTorch example <https://github.com/facebook/pyre-check/blob/e16cfea51df9466bc84841065fb31497...>) but as far as I know neither pytype or mypy support this yet.
In the meantime, Teddy Liu has developed a toy static checker <https://github.com/theodoretliu/thesis> for his bachelor's thesis. It's written in OCaml, but reckons it should be portable to Python if necessary.
Outside of the Python world, Adam Paszke has made a static checker for Swift for TensorFlow <https://github.com/google-research/swift-tfp>. He's also interested in developing something similar for Python.
*Next steps*
I think the general direction of next steps should be to continue developing a static checker while simultaneously trying to work out the details of what a full syntax would look like (based on what we find to work well in practice with the static checker).
In particular, I'm wondering whether it would be worth porting Teddy's checker to Python (which I'm assuming more people will be comfortable with than OCaml) or whether we should join Adam in developing something from scratch. Adam, how are things going on your end?
I'll be able to work on this full-time for two weeks from the 22nd as part of a DeepMind hackathon, during which my plan is to finish the draft syntax proposal doc and polish an internal prototype we have for a runtime checker based on that syntax - but I'm also open to other ideas, if there are other higher-leverage things to do.
Cheers! Matthew
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?) <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>*
At Pyre we think that it could be a good idea to organise a VC where we can share some first hand information about our roadmap. In particular, we’d like to present the work that has been done on variadics and on integer generics, what use cases do we think that we will be able to cover, and ultimately try to find points where we could collaborate. Although I understand that somehow you are limited to other checker like Mypy or Pytype, the ultimate of goal our work is to propose changes that can be merged into other Python checkers later on. Moreover, if someone from Mypy can attend it would be a great opportunity to discuss further changes in the type system (beyond variadics) and write down our proposals, so that it can be helpful for developers behind tensor libraries to understand where are we heading. Regarding custom shape checking tools, probably once the type system work is over the discussion on custom abstract interpreters will follow naturally. So please let me know if you would be interested in participating in a VC, and if so what time slots would work for you. Keep in mind that we should try to integrate both CET and US West Cost time zones. Finally, if you have any preference regarding the VC platform also let me know. Best, Alfonso.
Oop, sorry for the slow reply - for some reason my mailing list settings had defaulted to no emails. A VC sounds like a great idea. Any of the following slots would work for me: - Monday 22nd, any time between 8am and 1pm San Fransico time (4pm to 9pm London time) - Tuesday 23rd, ditto - Wednesday 24th, ditto - Friday 26th, ditto - Monday 29th, ditto - Tuesday 30th, ditto - Wednesday 1st, ditto - Friday 3rd, ditto I was in contact with Jukka Lehtosalo last week, so I'll ask him if he'd be interested in coming along too. My preference would be for Google Meet, but I'm easy. Cheers, Matthew
I think Tuesday the 30th at 9:30 AM SF time works for everyone on our end, so let's pencil in that time? (Pending response from Guido/Jukka/Ivan)
Tuesday the 30th at 9:30 AM SF time works for us too, sounds good. On Tue, 23 Jun 2020 at 00:05, Mark Mendoza <mendoza.mark.a@gmail.com> wrote:
I think Tuesday the 30th at 9:30 AM SF time works for everyone on our end, so let's pencil in that time? (Pending response from Guido/Jukka/Ivan) _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: mrahtz@google.com
Tuesday June 30 at 9:30am SF time works for me. On Tue, Jun 23, 2020 at 12:05 AM Mark Mendoza <mendoza.mark.a@gmail.com> wrote:
I think Tuesday the 30th at 9:30 AM SF time works for everyone on our end, so let's pencil in that time? (Pending response from Guido/Jukka/Ivan) _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: jlehtosalo@gmail.com
OK, so let's meet at meet.google.com/abv-diox-uwa at 9:30am SF time. Let's aim for about an hour. I would propose the following rough agenda: - Introductions: 10 minutes - Alfonso/Mark present work on variadics and integer generics: 20 minutes - Matthew/Jörg present work on tentative specification proposal: 20 minutes - Discussion of next steps: 10 minutes In the meantime, here's a doc summarising some of our thoughts so far: https://docs.google.com/document/d/1But-hjet8-djv519HEKvBN6Ik2lW3yu0ojZo6pG9... Cheers, Matthew On Tue, 23 Jun 2020 at 00:05, Mark Mendoza <mendoza.mark.a@gmail.com> wrote:
I think Tuesday the 30th at 9:30 AM SF time works for everyone on our end, so let's pencil in that time? (Pending response from Guido/Jukka/Ivan) _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: mrahtz@google.com
Hey, thanks everyone for coming! As suggested by Alfonso, we've created a doc with some notes from the meeting: Tensor Typing Open Design Meetings <https://docs.google.com/document/d/1oaG0V2ZE5BRDjd9N-Tr1N0IKGwZQcraIlZ0N8ayq...> - Recording: https://drive.google.com/file/d/1wwlzlSOqVYCqHedS4D8zi9EwcUMkq3sd/view?usp=s... - Slides: Tensor Typing Pyre <https://docs.google.com/presentation/d/1cLEcrePqNh4cnZKp80rxx9mq-a1NTFPL2O3z...> , Shapes as types <https://docs.google.com/presentation/d/1THS9hZIyCklJXNDgWj6-u7Paqo9loxNXu5-Y...> <https://docs.google.com/presentation/d/1cLEcrePqNh4cnZKp80rxx9mq-a1NTFPL2O3z...> Also as suggested (I can't remember by who, but thank you, whoever it was!), here's the skeleton of a GitHub wiki we can use to keep track of the kinds of situations we want to be able to deal with: https://github.com/mrahtz/tensor-typing-exploration/wiki it's barebones right now, but I'll start filling in some of the examples Jorg and I have collected over the past weeks. It should be public-writeable, so please do start filling in examples too! As for the next gathering - I would tentatively propose meeting monthly, with our next date on Monday the 3rd of August? Cheers, Matthew On Mon, 29 Jun 2020 at 15:36, Matthew Rahtz <mrahtz@google.com> wrote:
OK, so let's meet at meet.google.com/abv-diox-uwa at 9:30am SF time.
Let's aim for about an hour. I would propose the following rough agenda:
- Introductions: 10 minutes - Alfonso/Mark present work on variadics and integer generics: 20 minutes - Matthew/Jörg present work on tentative specification proposal: 20 minutes - Discussion of next steps: 10 minutes
In the meantime, here's a doc summarising some of our thoughts so far: https://docs.google.com/document/d/1But-hjet8-djv519HEKvBN6Ik2lW3yu0ojZo6pG9...
Cheers, Matthew
On Tue, 23 Jun 2020 at 00:05, Mark Mendoza <mendoza.mark.a@gmail.com> wrote:
I think Tuesday the 30th at 9:30 AM SF time works for everyone on our end, so let's pencil in that time? (Pending response from Guido/Jukka/Ivan) _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: mrahtz@google.com
participants (4)
-
Alfonso L. Castaño
-
Jukka Lehtosalo
-
Mark Mendoza
-
Matthew Rahtz