
Hello, I'm working on giving a presentation to a high performance computing community on pypy. I've got a good deal of content developed except for a high level overview of how pypy in fact works. Several days of YouTube, searching stack overflow, and searching the docs haven't clarified much. I was hoping to get in contact with a developer if it's not too much trouble Trevor Clack HPCMP PET Computational Scientist Mobile 949.412.9902 Trevor.Clack@gdit.com<mailto:Trevor.Clack@gdit.com> 3909 Halls Ferry Road Vicksburg, MS 39180 www.gdit.com<http://www.gdit.com/> [cid:image001.png@01D664F5.C0E0A040]

On 9/30/20 2:15 AM, Clack, Trevor wrote:
Hi and welcome. Could you be a little more specific: do you mean "how the RPython tool chain can compile a Python interpreter", "how is PyPy fast?", "what optimizations can the JIT do", "how does a tracing JIT work, "why doesn't PyPy use refcount semantics", or something else? Matti

Hi Matti, thank you for your response. It will be proprietary in the sense that it will belong within the company as part of a training presentation, but we will not be making money from it. These training presentations go into a growing archive of high performance computing, ranging from instructions on how to parallelize code on GPUs to specific tools that will help code run faster (ex. cython, numba and hopefully pypy). The training will not have anything enlightening on it. It's intended as a general introduction to the tool and when to use it (as you know it's slow when calling C extension modules and performing very short, simple tasks compared to python) Just that sort of stuff. Here is what I'm specifically looking for: A high level overview of how pypy works starting from source code up through the end of execution. I'd like something analogous to this description and diagram of cpython: The cpython interpreter (written in C) parses the syntax and creates bytecode. This bytecode is then executed by the python virtual machine (also written in C). [cid:c0457197-8bba-4e57-a906-d3b9e5412dcb] So I'm not going into details like garbage collection, the GIL, or even details of the comiler. Similarly I'd like something that doesn't go too deeply into the innerworkings of pypy, but gives a good idea of how the code is handled at each stage during the execution. I've found several diagrams and charts but they don't seem to include the whole picture or are too detailed and I can't seem to make sense of it. Here are some diagrams or flowcharts I've found: [cid:c3777ff1-e1e0-4f1c-bce9-c709d93401df][cid:0fc18619-9475-4f5d-8d8c-3256276825b2][cid:5e604678-8ce5-4031-94fd-0df620bfbd77][cid:7aa29236-d0a1-4ab8-8d47-18f683931ad4] For pypy, I'm uncertain for example if source code is converted into analogous rpython then that's converted to C, or if it's converted to bytecode then C. I'm not certain where the jit kicks in, I know it's provided by rpython and it kicks in on hot (>1031 loops). If a guard fails, I'm uncertain where the code returns to being interpreted. I feel it would be quicker if we could chat on zoom or other platform to share screens, it may be the fastest. I feel a good high level overview would be a good blog post for pypy or be included in the introductory documentation. Trevor Clack HPCMP PET Computational Scientist Mobile 949.412.9902 Trevor.Clack@gdit.com<mailto:Trevor.Clack@gdit.com> 3909 Halls Ferry Road Vicksburg, MS 39180 www.gdit.com<http://www.gdit.com/> [cid:image001.png@01D664F5.C0E0A040] ________________________________ From: Matti Picus <matti.picus@gmail.com> Sent: Wednesday, September 30, 2020 12:42 AM To: Clack, Trevor <Trevor.Clack@gdit.com>; pypy-dev@python.org <pypy-dev@python.org> Subject: Re: [pypy-dev] Short discussion about pypy overview [External: Use caution with links & attachments] On 9/30/20 2:15 AM, Clack, Trevor wrote:
Hi and welcome. Could you be a little more specific: do you mean "how the RPython tool chain can compile a Python interpreter", "how is PyPy fast?", "what optimizations can the JIT do", "how does a tracing JIT work, "why doesn't PyPy use refcount semantics", or something else? Matti

Top posting to avoid too much scrolling. PyPy has two parts that work together: a byte code interpreter like CPython's, and a JIT. The difference is that the interpreter has an additional layer. All functions and loops (for simplicity, let's stick with those) are traced, and the tracer knows the arguments provided to functions and loops. After the tracer sees ~1000 calls to the same code with the same types, it will replace that piece of code with specialized code for those types of arguments. This is typical of a tracing JIT. It will also add guards so that if the assumptions are broken the old, interpreted code, will be run. Thus only specialized code is turned into assembler. Some other JIT frameworks, like Numba, produce LLVM IR code rather than the assembler code that PyPy produces, and then must call a compiler to finish the job. PyPy's RPython knows how to directly emit assembler. I am sure my description is not 100% accurate, others are invited to correct me. We can continue the discussion on ZOOM, would 11:00 UTC be suitable tomorrow? Matti On 9/30/20 6:52 PM, Clack, Trevor wrote:

On 9/30/20 2:15 AM, Clack, Trevor wrote:
Hi and welcome. Could you be a little more specific: do you mean "how the RPython tool chain can compile a Python interpreter", "how is PyPy fast?", "what optimizations can the JIT do", "how does a tracing JIT work, "why doesn't PyPy use refcount semantics", or something else? Matti

Hi Matti, thank you for your response. It will be proprietary in the sense that it will belong within the company as part of a training presentation, but we will not be making money from it. These training presentations go into a growing archive of high performance computing, ranging from instructions on how to parallelize code on GPUs to specific tools that will help code run faster (ex. cython, numba and hopefully pypy). The training will not have anything enlightening on it. It's intended as a general introduction to the tool and when to use it (as you know it's slow when calling C extension modules and performing very short, simple tasks compared to python) Just that sort of stuff. Here is what I'm specifically looking for: A high level overview of how pypy works starting from source code up through the end of execution. I'd like something analogous to this description and diagram of cpython: The cpython interpreter (written in C) parses the syntax and creates bytecode. This bytecode is then executed by the python virtual machine (also written in C). [cid:c0457197-8bba-4e57-a906-d3b9e5412dcb] So I'm not going into details like garbage collection, the GIL, or even details of the comiler. Similarly I'd like something that doesn't go too deeply into the innerworkings of pypy, but gives a good idea of how the code is handled at each stage during the execution. I've found several diagrams and charts but they don't seem to include the whole picture or are too detailed and I can't seem to make sense of it. Here are some diagrams or flowcharts I've found: [cid:c3777ff1-e1e0-4f1c-bce9-c709d93401df][cid:0fc18619-9475-4f5d-8d8c-3256276825b2][cid:5e604678-8ce5-4031-94fd-0df620bfbd77][cid:7aa29236-d0a1-4ab8-8d47-18f683931ad4] For pypy, I'm uncertain for example if source code is converted into analogous rpython then that's converted to C, or if it's converted to bytecode then C. I'm not certain where the jit kicks in, I know it's provided by rpython and it kicks in on hot (>1031 loops). If a guard fails, I'm uncertain where the code returns to being interpreted. I feel it would be quicker if we could chat on zoom or other platform to share screens, it may be the fastest. I feel a good high level overview would be a good blog post for pypy or be included in the introductory documentation. Trevor Clack HPCMP PET Computational Scientist Mobile 949.412.9902 Trevor.Clack@gdit.com<mailto:Trevor.Clack@gdit.com> 3909 Halls Ferry Road Vicksburg, MS 39180 www.gdit.com<http://www.gdit.com/> [cid:image001.png@01D664F5.C0E0A040] ________________________________ From: Matti Picus <matti.picus@gmail.com> Sent: Wednesday, September 30, 2020 12:42 AM To: Clack, Trevor <Trevor.Clack@gdit.com>; pypy-dev@python.org <pypy-dev@python.org> Subject: Re: [pypy-dev] Short discussion about pypy overview [External: Use caution with links & attachments] On 9/30/20 2:15 AM, Clack, Trevor wrote:
Hi and welcome. Could you be a little more specific: do you mean "how the RPython tool chain can compile a Python interpreter", "how is PyPy fast?", "what optimizations can the JIT do", "how does a tracing JIT work, "why doesn't PyPy use refcount semantics", or something else? Matti

Top posting to avoid too much scrolling. PyPy has two parts that work together: a byte code interpreter like CPython's, and a JIT. The difference is that the interpreter has an additional layer. All functions and loops (for simplicity, let's stick with those) are traced, and the tracer knows the arguments provided to functions and loops. After the tracer sees ~1000 calls to the same code with the same types, it will replace that piece of code with specialized code for those types of arguments. This is typical of a tracing JIT. It will also add guards so that if the assumptions are broken the old, interpreted code, will be run. Thus only specialized code is turned into assembler. Some other JIT frameworks, like Numba, produce LLVM IR code rather than the assembler code that PyPy produces, and then must call a compiler to finish the job. PyPy's RPython knows how to directly emit assembler. I am sure my description is not 100% accurate, others are invited to correct me. We can continue the discussion on ZOOM, would 11:00 UTC be suitable tomorrow? Matti On 9/30/20 6:52 PM, Clack, Trevor wrote:
participants (2)
-
Clack, Trevor
-
Matti Picus