Hi all, I would like to contribute a RISC-V 64 JIT backend for RPython. I have made some progress at the end of 2023. ## Status My prototype can pass the test cases below: * test_runner.py * test_basic.py and almost all test_ajit.py related tests (except test_rvmprof.py) * test_zrpy_gc_boehm.py I am still working on test_zrpy_gc.py though (p.s. I can pass this if I disable malloc inlining). I haven't done a full translation yet. ## Logistic I wonder how you would like to review the patches? I have roughly 73 pending commits. Each commit has a specific reason for change and corresponding test cases (if applicable). Is it better to just send one GitHub Pull Request containing all of them? Or, do you prefer one commit per Pull Request? Thank you. Regards, Logan
Hi, I forgot to include the link in my previous email. If you want to have a look on my prototype, you can find it here: https://github.com/loganchien/pypy/tree/rv64 Thanks. Regards, Logan On Sun, Jan 7, 2024 at 5:18 PM Logan Chien <tzuhsiang.chien@gmail.com> wrote:
Hi all,
I would like to contribute a RISC-V 64 JIT backend for RPython. I have made some progress at the end of 2023.
## Status
My prototype can pass the test cases below:
* test_runner.py * test_basic.py and almost all test_ajit.py related tests (except test_rvmprof.py) * test_zrpy_gc_boehm.py
I am still working on test_zrpy_gc.py though (p.s. I can pass this if I disable malloc inlining).
I haven't done a full translation yet.
## Logistic
I wonder how you would like to review the patches? I have roughly 73 pending commits. Each commit has a specific reason for change and corresponding test cases (if applicable).
Is it better to just send one GitHub Pull Request containing all of them?
Or, do you prefer one commit per Pull Request?
Thank you.
Regards, Logan
Hi Logan Very cool you are interested in that! It's often useful to hang out on IRC as you can ask questions directly. I have not taken any looks at all, but can you tell me what kind of setup does one need for testing it? Are you using real hardware or emulation? The approach of starting with tests and getting translation done later is very much what we have done in the past. Best, Maciej On Mon, 8 Jan 2024 at 09:42, Logan Chien <tzuhsiang.chien@gmail.com> wrote:
Hi,
I forgot to include the link in my previous email.
If you want to have a look on my prototype, you can find it here: https://github.com/loganchien/pypy/tree/rv64
Thanks.
Regards, Logan
On Sun, Jan 7, 2024 at 5:18 PM Logan Chien <tzuhsiang.chien@gmail.com> wrote:
Hi all,
I would like to contribute a RISC-V 64 JIT backend for RPython. I have made some progress at the end of 2023.
## Status
My prototype can pass the test cases below:
* test_runner.py * test_basic.py and almost all test_ajit.py related tests (except test_rvmprof.py) * test_zrpy_gc_boehm.py
I am still working on test_zrpy_gc.py though (p.s. I can pass this if I disable malloc inlining).
I haven't done a full translation yet.
## Logistic
I wonder how you would like to review the patches? I have roughly 73 pending commits. Each commit has a specific reason for change and corresponding test cases (if applicable).
Is it better to just send one GitHub Pull Request containing all of them?
Or, do you prefer one commit per Pull Request?
Thank you.
Regards, Logan
_______________________________________________ pypy-dev mailing list -- pypy-dev@python.org To unsubscribe send an email to pypy-dev-leave@python.org https://mail.python.org/mailman3/lists/pypy-dev.python.org/ Member address: fijall@gmail.com
On 8/1/24 10:03, Maciej Fijalkowski wrote:
Hi Logan
Very cool you are interested in that! It's often useful to hang out on IRC as you can ask questions directly. I have not taken any looks at all, but can you tell me what kind of setup does one need for testing it? Are you using real hardware or emulation?
The approach of starting with tests and getting translation done later is very much what we have done in the past.
Best, Maciej
On Mon, 8 Jan 2024 at 09:42, Logan Chien <tzuhsiang.chien@gmail.com> wrote:
Hi,
I forgot to include the link in my previous email.
If you want to have a look on my prototype, you can find it here: https://github.com/loganchien/pypy/tree/rv64
Thanks.
Regards, Logan
On Sun, Jan 7, 2024 at 5:18 PM Logan Chien <tzuhsiang.chien@gmail.com> wrote:
Hi all,
I would like to contribute a RISC-V 64 JIT backend for RPython. I have made some progress at the end of 2023.
## Status
My prototype can pass the test cases below:
* test_runner.py * test_basic.py and almost all test_ajit.py related tests (except test_rvmprof.py) * test_zrpy_gc_boehm.py
I am still working on test_zrpy_gc.py though (p.s. I can pass this if I disable malloc inlining).
I haven't done a full translation yet.
## Logistic
I wonder how you would like to review the patches? I have roughly 73 pending commits. Each commit has a specific reason for change and corresponding test cases (if applicable).
Is it better to just send one GitHub Pull Request containing all of them?
Or, do you prefer one commit per Pull Request?
Thank you.
Regards, Logan
Exciting, thanks! I find IRC too temporary: it is hard to search through. This can be both an advantage and a disadvantage. Maybe since we have moved development efforts to github we could try out the github discussions platform. I opened it up at https://github.com/orgs/pypy/discussions. Of course you are welcome to use IRC if you are comfortable with it. In addition to Maciej's questions: is there only one compilation target or would the backend need to know about the different ISA extensions? As for patch review and merging: we have a history of long-lived branches in PyPy. Two examples: the windows 64-bit branch was only merged once it was quite ready, and led to only small breakage of the main branches. I recently merged the hpy0.9 branch too early, and the failing tests masked some other py3.9 failures until I got it under control, so it would have been better to hold off until it was more completely finished. Something tho think about. Matti
Hi,
I have not taken any looks at all, but can you tell me what kind of setup does one need for testing it? Are you using real hardware or emulation?
Currently, I use "qemu-user-static + schroot + Debian riscv64 root filesystem" on an x86-64 host. I have a HiFive Unmatched board as well. But I haven't figured out how to install Debian riscv64 for this board yet. Also, given its price tag, I think I will only run benchmarks on it instead of doing toolchain translation on it (though it should have enough DRAM).
is there only one compilation target or would the backend need to know about the different ISA extensions?
Currently, I only target RV64 IMAD: I - Base instruction set M - Integer multiplication A - Atomic (used by call_release_gil) D - Double precision floating point arithmetic I don't use the C (compress) extension for now because it may complicate the branch offset calculation and register allocation. I plan to support the V (vector) extension after I finish the basic JIT support. But there are some unknowns. I am not sure whether (a) I want to detect the availability of the V extension dynamically (thus sharing the same pypy executable) or (b) build different executables for different combinations of extensions. Also, I don't have a development board that supports the V extension. I am searching for one. Another remote goal is to support RV32IMAF (singlefloats) or RV32IMAD. In RISC-V, 32-bit and 64-bit ISAs are quite similar. The only difference is on LW/SW (32-bit) vs. LD/SD (64-bit) and some special instructions for 64-bit (e.g. ADDW). I isolated many of them into load_int/store_int helper functions so that it will be easy to swap implementations. However, I am not sure if we have to change the object alignment in `malloc_nursery*` (to ensure we align to multiples of `double`). Also, I am not sure whether it is common for RV32 cores to include the D extension. But, anyway, RV32 will be a lower priority for me because I will have to figure out how to build a RV32 root filesystem first (p.s. Debian doesn't (officially) support RV32 as of writing).
As for patch review and merging: we have a history of long-lived branches in PyPy. Two examples: ..., so it would have been better to hold off until it was more completely finished. Something tho think about.
Sure. This approach works for me too. Regards, Logan On Mon, Jan 8, 2024 at 8:21 AM Matti Picus <matti.picus@gmail.com> wrote:
Hi Logan
Very cool you are interested in that! It's often useful to hang out on IRC as you can ask questions directly. I have not taken any looks at all, but can you tell me what kind of setup does one need for testing it? Are you using real hardware or emulation?
The approach of starting with tests and getting translation done later is very much what we have done in the past.
Best, Maciej
On Mon, 8 Jan 2024 at 09:42, Logan Chien <tzuhsiang.chien@gmail.com> wrote:
Hi,
I forgot to include the link in my previous email.
If you want to have a look on my prototype, you can find it here: https://github.com/loganchien/pypy/tree/rv64
Thanks.
Regards, Logan
On Sun, Jan 7, 2024 at 5:18 PM Logan Chien <tzuhsiang.chien@gmail.com> wrote:
Hi all,
I would like to contribute a RISC-V 64 JIT backend for RPython. I have made some progress at the end of 2023.
## Status
My prototype can pass the test cases below:
* test_runner.py * test_basic.py and almost all test_ajit.py related tests (except test_rvmprof.py) * test_zrpy_gc_boehm.py
I am still working on test_zrpy_gc.py though (p.s. I can pass this if I disable malloc inlining).
I haven't done a full translation yet.
## Logistic
I wonder how you would like to review the patches? I have roughly 73
Is it better to just send one GitHub Pull Request containing all of
On 8/1/24 10:03, Maciej Fijalkowski wrote: pending commits. Each commit has a specific reason for change and corresponding test cases (if applicable). them?
Or, do you prefer one commit per Pull Request?
Thank you.
Regards, Logan
Exciting, thanks!
I find IRC too temporary: it is hard to search through. This can be both an advantage and a disadvantage. Maybe since we have moved development efforts to github we could try out the github discussions platform. I opened it up at https://github.com/orgs/pypy/discussions. Of course you are welcome to use IRC if you are comfortable with it.
In addition to Maciej's questions: is there only one compilation target or would the backend need to know about the different ISA extensions?
As for patch review and merging: we have a history of long-lived branches in PyPy. Two examples: the windows 64-bit branch was only merged once it was quite ready, and led to only small breakage of the main branches. I recently merged the hpy0.9 branch too early, and the failing tests masked some other py3.9 failures until I got it under control, so it would have been better to hold off until it was more completely finished. Something tho think about.
Matti
_______________________________________________ pypy-dev mailing list -- pypy-dev@python.org To unsubscribe send an email to pypy-dev-leave@python.org https://mail.python.org/mailman3/lists/pypy-dev.python.org/ Member address: tzuhsiang.chien@gmail.com
Hi Logan, On Tue, 9 Jan 2024 at 04:01, Logan Chien <tzuhsiang.chien@gmail.com> wrote:
Currently, I only target RV64 IMAD:
I - Base instruction set M - Integer multiplication A - Atomic (used by call_release_gil) D - Double precision floating point arithmetic
I don't use the C (compress) extension for now because it may complicate the branch offset calculation and register allocation.
I plan to support the V (vector) extension after I finish the basic JIT support. But there are some unknowns. I am not sure whether (a) I want to detect the availability of the V extension dynamically (thus sharing the same pypy executable) or (b) build different executables for different combinations of extensions. Also, I don't have a development board that supports the V extension. I am searching for one.
Another remote goal is to support RV32IMAF (singlefloats) or RV32IMAD. In RISC-V, 32-bit and 64-bit ISAs are quite similar. The only difference is on LW/SW (32-bit) vs. LD/SD (64-bit) and some special instructions for 64-bit (e.g. ADDW). I isolated many of them into load_int/store_int helper functions so that it will be easy to swap implementations. However, I am not sure if we have to change the object alignment in `malloc_nursery*` (to ensure we align to multiples of `double`). Also, I am not sure whether it is common for RV32 cores to include the D extension. But, anyway, RV32 will be a lower priority for me because I will have to figure out how to build a RV32 root filesystem first (p.s. Debian doesn't (officially) support RV32 as of writing).
Cool! Here are a few thoughts I had when I looked at some RISC-V early documents long ago (warning, it may be outdated): Yes, not using the "compress" extension is probably a good approach. That looks like something a compiler might do, but it's quite a bit of work both implementation-wise, and it's unclear if it would help anyway here. About the V extension, I'm not sure it would be helpful; do you plan to use it in the same way as our x86-64 vector extension support? As far as I know this has been experimental all along and isn't normally enabled in a standard PyPy. (I may be wrong about that.) Singlefloats: we don't do any arithmetic on singlefloats with the JIT, but it has got a few instructions to pack/unpack double floats into single floats or to call a C-compiled function with singlefloat arguments. That's not optional, though I admit I don't know how a C compiler compiles these operations if floats are not supported by the hardware. But as usual, you can just write a tiny C program and see. I agree that RV32 can be a more remote goal for now. It should simplify a lot of stuff if you can just assume a 64-bit environment. Plus all the other points you mention: the hardware may not support doubles, and may not be supported by Debian... A bientôt, Armin Rigo
Hi Armin,
About the V extension, I'm not sure it would be helpful; do you plan to use it in the same way as our x86-64 vector extension support? As far as I know this has been experimental all along and isn't normally enabled in a standard PyPy. (I may be wrong about that.)
Well, if the vector extension is not enabled by default even for x86-64 backend, then I will have to conduct more survey, planning, and designing. I haven't read the vectorization code yet. Anyway, I will finish the basic JIT first. Regards, Logan On Tue, Jan 9, 2024 at 2:22 AM Armin Rigo <armin.rigo@gmail.com> wrote:
Hi Logan,
Currently, I only target RV64 IMAD:
I - Base instruction set M - Integer multiplication A - Atomic (used by call_release_gil) D - Double precision floating point arithmetic
I don't use the C (compress) extension for now because it may complicate
On Tue, 9 Jan 2024 at 04:01, Logan Chien <tzuhsiang.chien@gmail.com> wrote: the branch offset calculation and register allocation.
I plan to support the V (vector) extension after I finish the basic JIT
support. But there are some unknowns. I am not sure whether (a) I want to detect the availability of the V extension dynamically (thus sharing the same pypy executable) or (b) build different executables for different combinations of extensions. Also, I don't have a development board that supports the V extension. I am searching for one.
Another remote goal is to support RV32IMAF (singlefloats) or RV32IMAD.
In RISC-V, 32-bit and 64-bit ISAs are quite similar. The only difference is on LW/SW (32-bit) vs. LD/SD (64-bit) and some special instructions for 64-bit (e.g. ADDW). I isolated many of them into load_int/store_int helper functions so that it will be easy to swap implementations. However, I am not sure if we have to change the object alignment in `malloc_nursery*` (to ensure we align to multiples of `double`). Also, I am not sure whether it is common for RV32 cores to include the D extension. But, anyway, RV32 will be a lower priority for me because I will have to figure out how to build a RV32 root filesystem first (p.s. Debian doesn't (officially) support RV32 as of writing).
Cool! Here are a few thoughts I had when I looked at some RISC-V early documents long ago (warning, it may be outdated):
Yes, not using the "compress" extension is probably a good approach. That looks like something a compiler might do, but it's quite a bit of work both implementation-wise, and it's unclear if it would help anyway here.
About the V extension, I'm not sure it would be helpful; do you plan to use it in the same way as our x86-64 vector extension support? As far as I know this has been experimental all along and isn't normally enabled in a standard PyPy. (I may be wrong about that.)
Singlefloats: we don't do any arithmetic on singlefloats with the JIT, but it has got a few instructions to pack/unpack double floats into single floats or to call a C-compiled function with singlefloat arguments. That's not optional, though I admit I don't know how a C compiler compiles these operations if floats are not supported by the hardware. But as usual, you can just write a tiny C program and see.
I agree that RV32 can be a more remote goal for now. It should simplify a lot of stuff if you can just assume a 64-bit environment. Plus all the other points you mention: the hardware may not support doubles, and may not be supported by Debian...
A bientôt,
Armin Rigo
Hi Logan As far as I remember (and neither Armin nor I did any major pypy development recently), the vectorization was never really something we got to work to the point where it was worth it. In theory, having vectorized operations like numpy arrays to compile to vectorized CPU instructions would be glorious, but in practice it never worked well enough for us to enable it by default. Best, Maciej On Wed, 10 Jan 2024 at 08:39, Logan Chien <tzuhsiang.chien@gmail.com> wrote:
Hi Armin,
About the V extension, I'm not sure it would be helpful; do you plan to use it in the same way as our x86-64 vector extension support? As far as I know this has been experimental all along and isn't normally enabled in a standard PyPy. (I may be wrong about that.)
Well, if the vector extension is not enabled by default even for x86-64 backend, then I will have to conduct more survey, planning, and designing. I haven't read the vectorization code yet.
Anyway, I will finish the basic JIT first.
Regards, Logan
On Tue, Jan 9, 2024 at 2:22 AM Armin Rigo <armin.rigo@gmail.com> wrote:
Hi Logan,
On Tue, 9 Jan 2024 at 04:01, Logan Chien <tzuhsiang.chien@gmail.com> wrote:
Currently, I only target RV64 IMAD:
I - Base instruction set M - Integer multiplication A - Atomic (used by call_release_gil) D - Double precision floating point arithmetic
I don't use the C (compress) extension for now because it may complicate the branch offset calculation and register allocation.
I plan to support the V (vector) extension after I finish the basic JIT support. But there are some unknowns. I am not sure whether (a) I want to detect the availability of the V extension dynamically (thus sharing the same pypy executable) or (b) build different executables for different combinations of extensions. Also, I don't have a development board that supports the V extension. I am searching for one.
Another remote goal is to support RV32IMAF (singlefloats) or RV32IMAD. In RISC-V, 32-bit and 64-bit ISAs are quite similar. The only difference is on LW/SW (32-bit) vs. LD/SD (64-bit) and some special instructions for 64-bit (e.g. ADDW). I isolated many of them into load_int/store_int helper functions so that it will be easy to swap implementations. However, I am not sure if we have to change the object alignment in `malloc_nursery*` (to ensure we align to multiples of `double`). Also, I am not sure whether it is common for RV32 cores to include the D extension. But, anyway, RV32 will be a lower priority for me because I will have to figure out how to build a RV32 root filesystem first (p.s. Debian doesn't (officially) support RV32 as of writing).
Cool! Here are a few thoughts I had when I looked at some RISC-V early documents long ago (warning, it may be outdated):
Yes, not using the "compress" extension is probably a good approach. That looks like something a compiler might do, but it's quite a bit of work both implementation-wise, and it's unclear if it would help anyway here.
About the V extension, I'm not sure it would be helpful; do you plan to use it in the same way as our x86-64 vector extension support? As far as I know this has been experimental all along and isn't normally enabled in a standard PyPy. (I may be wrong about that.)
Singlefloats: we don't do any arithmetic on singlefloats with the JIT, but it has got a few instructions to pack/unpack double floats into single floats or to call a C-compiled function with singlefloat arguments. That's not optional, though I admit I don't know how a C compiler compiles these operations if floats are not supported by the hardware. But as usual, you can just write a tiny C program and see.
I agree that RV32 can be a more remote goal for now. It should simplify a lot of stuff if you can just assume a 64-bit environment. Plus all the other points you mention: the hardware may not support doubles, and may not be supported by Debian...
A bientôt,
Armin Rigo
_______________________________________________ pypy-dev mailing list -- pypy-dev@python.org To unsubscribe send an email to pypy-dev-leave@python.org https://mail.python.org/mailman3/lists/pypy-dev.python.org/ Member address: fijall@gmail.com
Hi Maciej, Thank you for your information. Let me conduct more surveys. Thanks. Regards, Logan On Thu, Jan 11, 2024 at 2:44 AM Maciej Fijalkowski <fijall@gmail.com> wrote:
Hi Logan
As far as I remember (and neither Armin nor I did any major pypy development recently), the vectorization was never really something we got to work to the point where it was worth it. In theory, having vectorized operations like numpy arrays to compile to vectorized CPU instructions would be glorious, but in practice it never worked well enough for us to enable it by default.
Best, Maciej
On Wed, 10 Jan 2024 at 08:39, Logan Chien <tzuhsiang.chien@gmail.com> wrote:
Hi Armin,
About the V extension, I'm not sure it would be helpful; do you plan to use it in the same way as our x86-64 vector extension support? As far as I know this has been experimental all along and isn't normally enabled in a standard PyPy. (I may be wrong about that.)
Well, if the vector extension is not enabled by default even for x86-64
backend, then I will have to conduct more survey, planning, and designing. I haven't read the vectorization code yet.
Anyway, I will finish the basic JIT first.
Regards, Logan
On Tue, Jan 9, 2024 at 2:22 AM Armin Rigo <armin.rigo@gmail.com> wrote:
Hi Logan,
On Tue, 9 Jan 2024 at 04:01, Logan Chien <tzuhsiang.chien@gmail.com>
wrote:
Currently, I only target RV64 IMAD:
I - Base instruction set M - Integer multiplication A - Atomic (used by call_release_gil) D - Double precision floating point arithmetic
I don't use the C (compress) extension for now because it may complicate the branch offset calculation and register allocation.
I plan to support the V (vector) extension after I finish the basic JIT support. But there are some unknowns. I am not sure whether (a) I want to detect the availability of the V extension dynamically (thus sharing the same pypy executable) or (b) build different executables for different combinations of extensions. Also, I don't have a development board that supports the V extension. I am searching for one.
Another remote goal is to support RV32IMAF (singlefloats) or RV32IMAD. In RISC-V, 32-bit and 64-bit ISAs are quite similar. The only difference is on LW/SW (32-bit) vs. LD/SD (64-bit) and some special instructions for 64-bit (e.g. ADDW). I isolated many of them into load_int/store_int helper functions so that it will be easy to swap implementations. However, I am not sure if we have to change the object alignment in `malloc_nursery*` (to ensure we align to multiples of `double`). Also, I am not sure whether it is common for RV32 cores to include the D extension. But, anyway, RV32 will be a lower priority for me because I will have to figure out how to build a RV32 root filesystem first (p.s. Debian doesn't (officially) support RV32 as of writing).
Cool! Here are a few thoughts I had when I looked at some RISC-V early documents long ago (warning, it may be outdated):
Yes, not using the "compress" extension is probably a good approach. That looks like something a compiler might do, but it's quite a bit of work both implementation-wise, and it's unclear if it would help anyway here.
About the V extension, I'm not sure it would be helpful; do you plan to use it in the same way as our x86-64 vector extension support? As far as I know this has been experimental all along and isn't normally enabled in a standard PyPy. (I may be wrong about that.)
Singlefloats: we don't do any arithmetic on singlefloats with the JIT, but it has got a few instructions to pack/unpack double floats into single floats or to call a C-compiled function with singlefloat arguments. That's not optional, though I admit I don't know how a C compiler compiles these operations if floats are not supported by the hardware. But as usual, you can just write a tiny C program and see.
I agree that RV32 can be a more remote goal for now. It should simplify a lot of stuff if you can just assume a 64-bit environment. Plus all the other points you mention: the hardware may not support doubles, and may not be supported by Debian...
A bientôt,
Armin Rigo
_______________________________________________ pypy-dev mailing list -- pypy-dev@python.org To unsubscribe send an email to pypy-dev-leave@python.org https://mail.python.org/mailman3/lists/pypy-dev.python.org/ Member address: fijall@gmail.com
Hi, I have good news: the RISC-V backend can pass as many unit tests as the AArch64 backend. I got vmprof and codemap working this weekend. I also completed a full translation and got a workable pypy executable. I have two questions now: 1. Are there other test suites that I can check for the correctness? 2. How do we measure the performance? Do we have a command line that can run all benchmarks? Thank you in advance. Regards, Logan p.s. All changes are at: https://github.com/loganchien/pypy/tree/rv64 On Mon, Jan 15, 2024 at 8:54 PM Logan Chien <tzuhsiang.chien@gmail.com> wrote:
Hi Maciej,
Thank you for your information. Let me conduct more surveys. Thanks.
Regards, Logan
On Thu, Jan 11, 2024 at 2:44 AM Maciej Fijalkowski <fijall@gmail.com> wrote:
Hi Logan
As far as I remember (and neither Armin nor I did any major pypy development recently), the vectorization was never really something we got to work to the point where it was worth it. In theory, having vectorized operations like numpy arrays to compile to vectorized CPU instructions would be glorious, but in practice it never worked well enough for us to enable it by default.
Best, Maciej
On Wed, 10 Jan 2024 at 08:39, Logan Chien <tzuhsiang.chien@gmail.com> wrote:
Hi Armin,
About the V extension, I'm not sure it would be helpful; do you plan to use it in the same way as our x86-64 vector extension support? As far as I know this has been experimental all along and isn't normally enabled in a standard PyPy. (I may be wrong about that.)
Well, if the vector extension is not enabled by default even for x86-64
backend, then I will have to conduct more survey, planning, and designing. I haven't read the vectorization code yet.
Anyway, I will finish the basic JIT first.
Regards, Logan
On Tue, Jan 9, 2024 at 2:22 AM Armin Rigo <armin.rigo@gmail.com> wrote:
Hi Logan,
On Tue, 9 Jan 2024 at 04:01, Logan Chien <tzuhsiang.chien@gmail.com>
wrote:
Currently, I only target RV64 IMAD:
I - Base instruction set M - Integer multiplication A - Atomic (used by call_release_gil) D - Double precision floating point arithmetic
I don't use the C (compress) extension for now because it may complicate the branch offset calculation and register allocation.
I plan to support the V (vector) extension after I finish the basic JIT support. But there are some unknowns. I am not sure whether (a) I want to detect the availability of the V extension dynamically (thus sharing the same pypy executable) or (b) build different executables for different combinations of extensions. Also, I don't have a development board that supports the V extension. I am searching for one.
Another remote goal is to support RV32IMAF (singlefloats) or RV32IMAD. In RISC-V, 32-bit and 64-bit ISAs are quite similar. The only difference is on LW/SW (32-bit) vs. LD/SD (64-bit) and some special instructions for 64-bit (e.g. ADDW). I isolated many of them into load_int/store_int helper functions so that it will be easy to swap implementations. However, I am not sure if we have to change the object alignment in `malloc_nursery*` (to ensure we align to multiples of `double`). Also, I am not sure whether it is common for RV32 cores to include the D extension. But, anyway, RV32 will be a lower priority for me because I will have to figure out how to build a RV32 root filesystem first (p.s. Debian doesn't (officially) support RV32 as of writing).
Cool! Here are a few thoughts I had when I looked at some RISC-V early documents long ago (warning, it may be outdated):
Yes, not using the "compress" extension is probably a good approach. That looks like something a compiler might do, but it's quite a bit of work both implementation-wise, and it's unclear if it would help anyway here.
About the V extension, I'm not sure it would be helpful; do you plan to use it in the same way as our x86-64 vector extension support? As far as I know this has been experimental all along and isn't normally enabled in a standard PyPy. (I may be wrong about that.)
Singlefloats: we don't do any arithmetic on singlefloats with the JIT, but it has got a few instructions to pack/unpack double floats into single floats or to call a C-compiled function with singlefloat arguments. That's not optional, though I admit I don't know how a C compiler compiles these operations if floats are not supported by the hardware. But as usual, you can just write a tiny C program and see.
I agree that RV32 can be a more remote goal for now. It should simplify a lot of stuff if you can just assume a 64-bit environment. Plus all the other points you mention: the hardware may not support doubles, and may not be supported by Debian...
A bientôt,
Armin Rigo
_______________________________________________ pypy-dev mailing list -- pypy-dev@python.org To unsubscribe send an email to pypy-dev-leave@python.org https://mail.python.org/mailman3/lists/pypy-dev.python.org/ Member address: fijall@gmail.com
On 16/1/24 07:02, Logan Chien wrote:
Hi,
I have good news: the RISC-V backend can pass as many unit tests as the AArch64 backend. I got vmprof and codemap working this weekend. I also completed a full translation and got a workable pypy executable.
I have two questions now:
1. Are there other test suites that I can check for the correctness? 2. How do we measure the performance? Do we have a command line that can run all benchmarks?
Thank you in advance.
Regards, Logan
p.s. All changes are at: https://github.com/loganchien/pypy/tree/rv64
Very cool. 1: Eventually we would want a buildbot worker [0] using either actual hardware or qemu. Using qemu might be too slow to be practical. It could be based off the aarch64 docker file [1] as a template. For now, you could follow manually the different steps in a pyp-c-jit-<platform> buildbot run [2]: click on the stdio link for each step to follow the workflow. The "app-level -A", "extra tests", "lib-python" would give an indication of how compatible the rpython code is, and the "pypyjit" tests would give an indication of how well the JIT code generation follows the other platforms. A deeper compliance test would be to run the binary in the workflows from the binary-testing repo [6] against some common python libraries. 2: Theses [3] are the benchmarks that feed speed.pypy.org. They are run by a buildbot worker [4]. The step of interest is 9, where the command line is the top line of [5]. This generates a json results file and also some textual output. You will want an additional run with cpython for a baseline. Matti [0] https://foss.heptapod.net/pypy/buildbot/-/blob/branch/default/README_BUILDSL... [1] https://foss.heptapod.net/pypy/buildbot/-/blob/branch/default/docker/Dockerf... [2] https://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-64/builds/9219 for example [3] https://foss.heptapod.net/pypy/benchmarks [4] https://buildbot.pypy.org/builders/jit-benchmark-linux-x86-64/builds/4056 [5] https://buildbot.pypy.org/builders/jit-benchmark-linux-x86-64/builds/4056/st... [6] https://github.com/pypy/binary-testing
Hi Matti, Thank you for your information. I will try these this weekend. Regards, Logan On Tue, Jan 16, 2024, 12:52 AM Matti Picus <matti.picus@gmail.com> wrote:
On 16/1/24 07:02, Logan Chien wrote:
Hi,
I have good news: the RISC-V backend can pass as many unit tests as the AArch64 backend. I got vmprof and codemap working this weekend. I also completed a full translation and got a workable pypy executable.
I have two questions now:
1. Are there other test suites that I can check for the correctness? 2. How do we measure the performance? Do we have a command line that can run all benchmarks?
Thank you in advance.
Regards, Logan
p.s. All changes are at: https://github.com/loganchien/pypy/tree/rv64
Very cool.
1: Eventually we would want a buildbot worker [0] using either actual hardware or qemu. Using qemu might be too slow to be practical. It could be based off the aarch64 docker file [1] as a template. For now, you could follow manually the different steps in a pyp-c-jit-<platform> buildbot run [2]: click on the stdio link for each step to follow the workflow. The "app-level -A", "extra tests", "lib-python" would give an indication of how compatible the rpython code is, and the "pypyjit" tests would give an indication of how well the JIT code generation follows the other platforms. A deeper compliance test would be to run the binary in the workflows from the binary-testing repo [6] against some common python libraries.
2: Theses [3] are the benchmarks that feed speed.pypy.org. They are run by a buildbot worker [4]. The step of interest is 9, where the command line is the top line of [5]. This generates a json results file and also some textual output. You will want an additional run with cpython for a baseline.
Matti
[0]
https://foss.heptapod.net/pypy/buildbot/-/blob/branch/default/README_BUILDSL...
[1]
https://foss.heptapod.net/pypy/buildbot/-/blob/branch/default/docker/Dockerf...
[2] https://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-64/builds/9219 for example
[3] https://foss.heptapod.net/pypy/benchmarks
[4] https://buildbot.pypy.org/builders/jit-benchmark-linux-x86-64/builds/4056
[5]
https://buildbot.pypy.org/builders/jit-benchmark-linux-x86-64/builds/4056/st...
[6] https://github.com/pypy/binary-testing
_______________________________________________ pypy-dev mailing list -- pypy-dev@python.org To unsubscribe send an email to pypy-dev-leave@python.org https://mail.python.org/mailman3/lists/pypy-dev.python.org/ Member address: tzuhsiang.chien@gmail.com
Hi Logan Additionally to what Matti says, there are random fuzzing tests like test_ll_random.py in jit/backend/test. Run those for longer than the default (e.g. whole night) to see if they find issues Best, Maciej Fijalkowski On Tue, 16 Jan 2024 at 07:02, Logan Chien <tzuhsiang.chien@gmail.com> wrote:
Hi,
I have good news: the RISC-V backend can pass as many unit tests as the AArch64 backend. I got vmprof and codemap working this weekend. I also completed a full translation and got a workable pypy executable.
I have two questions now:
1. Are there other test suites that I can check for the correctness? 2. How do we measure the performance? Do we have a command line that can run all benchmarks?
Thank you in advance.
Regards, Logan
p.s. All changes are at: https://github.com/loganchien/pypy/tree/rv64
On Mon, Jan 15, 2024 at 8:54 PM Logan Chien <tzuhsiang.chien@gmail.com> wrote:
Hi Maciej,
Thank you for your information. Let me conduct more surveys. Thanks.
Regards, Logan
On Thu, Jan 11, 2024 at 2:44 AM Maciej Fijalkowski <fijall@gmail.com> wrote:
Hi Logan
As far as I remember (and neither Armin nor I did any major pypy development recently), the vectorization was never really something we got to work to the point where it was worth it. In theory, having vectorized operations like numpy arrays to compile to vectorized CPU instructions would be glorious, but in practice it never worked well enough for us to enable it by default.
Best, Maciej
On Wed, 10 Jan 2024 at 08:39, Logan Chien <tzuhsiang.chien@gmail.com> wrote:
Hi Armin,
About the V extension, I'm not sure it would be helpful; do you plan to use it in the same way as our x86-64 vector extension support? As far as I know this has been experimental all along and isn't normally enabled in a standard PyPy. (I may be wrong about that.)
Well, if the vector extension is not enabled by default even for x86-64 backend, then I will have to conduct more survey, planning, and designing. I haven't read the vectorization code yet.
Anyway, I will finish the basic JIT first.
Regards, Logan
On Tue, Jan 9, 2024 at 2:22 AM Armin Rigo <armin.rigo@gmail.com> wrote:
Hi Logan,
On Tue, 9 Jan 2024 at 04:01, Logan Chien <tzuhsiang.chien@gmail.com> wrote:
Currently, I only target RV64 IMAD:
I - Base instruction set M - Integer multiplication A - Atomic (used by call_release_gil) D - Double precision floating point arithmetic
I don't use the C (compress) extension for now because it may complicate the branch offset calculation and register allocation.
I plan to support the V (vector) extension after I finish the basic JIT support. But there are some unknowns. I am not sure whether (a) I want to detect the availability of the V extension dynamically (thus sharing the same pypy executable) or (b) build different executables for different combinations of extensions. Also, I don't have a development board that supports the V extension. I am searching for one.
Another remote goal is to support RV32IMAF (singlefloats) or RV32IMAD. In RISC-V, 32-bit and 64-bit ISAs are quite similar. The only difference is on LW/SW (32-bit) vs. LD/SD (64-bit) and some special instructions for 64-bit (e.g. ADDW). I isolated many of them into load_int/store_int helper functions so that it will be easy to swap implementations. However, I am not sure if we have to change the object alignment in `malloc_nursery*` (to ensure we align to multiples of `double`). Also, I am not sure whether it is common for RV32 cores to include the D extension. But, anyway, RV32 will be a lower priority for me because I will have to figure out how to build a RV32 root filesystem first (p.s. Debian doesn't (officially) support RV32 as of writing).
Cool! Here are a few thoughts I had when I looked at some RISC-V early documents long ago (warning, it may be outdated):
Yes, not using the "compress" extension is probably a good approach. That looks like something a compiler might do, but it's quite a bit of work both implementation-wise, and it's unclear if it would help anyway here.
About the V extension, I'm not sure it would be helpful; do you plan to use it in the same way as our x86-64 vector extension support? As far as I know this has been experimental all along and isn't normally enabled in a standard PyPy. (I may be wrong about that.)
Singlefloats: we don't do any arithmetic on singlefloats with the JIT, but it has got a few instructions to pack/unpack double floats into single floats or to call a C-compiled function with singlefloat arguments. That's not optional, though I admit I don't know how a C compiler compiles these operations if floats are not supported by the hardware. But as usual, you can just write a tiny C program and see.
I agree that RV32 can be a more remote goal for now. It should simplify a lot of stuff if you can just assume a 64-bit environment. Plus all the other points you mention: the hardware may not support doubles, and may not be supported by Debian...
A bientôt,
Armin Rigo
_______________________________________________ pypy-dev mailing list -- pypy-dev@python.org To unsubscribe send an email to pypy-dev-leave@python.org https://mail.python.org/mailman3/lists/pypy-dev.python.org/ Member address: fijall@gmail.com
Hi Maciej Thank you for the information. It sounds like a good idea to run this before I go to sleep. Other updates: I didn't make progress last weekend. I spent last weekend revising the code generator for integer immediate load (replace up-to-eight-instructions with pc-relative load instructions). I will try them in the upcoming weekends. Regards, Logan On Sun, Jan 21, 2024 at 10:36 PM Maciej Fijalkowski <fijall@gmail.com> wrote:
Hi Logan
Additionally to what Matti says, there are random fuzzing tests like test_ll_random.py in jit/backend/test. Run those for longer than the default (e.g. whole night) to see if they find issues
Best, Maciej Fijalkowski
On Tue, 16 Jan 2024 at 07:02, Logan Chien <tzuhsiang.chien@gmail.com> wrote:
Hi,
I have good news: the RISC-V backend can pass as many unit tests as the
AArch64 backend. I got vmprof and codemap working this weekend. I also completed a full translation and got a workable pypy executable.
I have two questions now:
1. Are there other test suites that I can check for the correctness? 2. How do we measure the performance? Do we have a command line that
can run all benchmarks?
Thank you in advance.
Regards, Logan
p.s. All changes are at: https://github.com/loganchien/pypy/tree/rv64
On Mon, Jan 15, 2024 at 8:54 PM Logan Chien <tzuhsiang.chien@gmail.com>
Hi Maciej,
Thank you for your information. Let me conduct more surveys. Thanks.
Regards, Logan
On Thu, Jan 11, 2024 at 2:44 AM Maciej Fijalkowski <fijall@gmail.com>
wrote:
Hi Logan
As far as I remember (and neither Armin nor I did any major pypy development recently), the vectorization was never really something we got to work to the point where it was worth it. In theory, having vectorized operations like numpy arrays to compile to vectorized CPU instructions would be glorious, but in practice it never worked well enough for us to enable it by default.
Best, Maciej
On Wed, 10 Jan 2024 at 08:39, Logan Chien <tzuhsiang.chien@gmail.com>
wrote:
Hi Armin,
About the V extension, I'm not sure it would be helpful; do you
to use it in the same way as our x86-64 vector extension support? As far as I know this has been experimental all along and isn't normally enabled in a standard PyPy. (I may be wrong about that.)
Well, if the vector extension is not enabled by default even for x86-64 backend, then I will have to conduct more survey, planning, and designing. I haven't read the vectorization code yet.
Anyway, I will finish the basic JIT first.
Regards, Logan
On Tue, Jan 9, 2024 at 2:22 AM Armin Rigo <armin.rigo@gmail.com> wrote:
Hi Logan,
On Tue, 9 Jan 2024 at 04:01, Logan Chien <tzuhsiang.chien@gmail.com>
wrote:
> Currently, I only target RV64 IMAD: > > I - Base instruction set > M - Integer multiplication > A - Atomic (used by call_release_gil) > D - Double precision floating point arithmetic > > I don't use the C (compress) extension for now because it may complicate the branch offset calculation and register allocation. > > I plan to support the V (vector) extension after I finish the basic JIT support. But there are some unknowns. I am not sure whether (a) I want to detect the availability of the V extension dynamically (thus sharing the same pypy executable) or (b) build different executables for different combinations of extensions. Also, I don't have a development board that supports the V extension. I am searching for one. > > Another remote goal is to support RV32IMAF (singlefloats) or RV32IMAD. In RISC-V, 32-bit and 64-bit ISAs are quite similar. The only difference is on LW/SW (32-bit) vs. LD/SD (64-bit) and some special instructions for 64-bit (e.g. ADDW). I isolated many of them into load_int/store_int helper functions so that it will be easy to swap implementations. However, I am not sure if we have to change the object alignment in `malloc_nursery*` (to ensure we align to multiples of `double`). Also, I am not sure whether it is common for RV32 cores to include the D extension. But, anyway, RV32 will be a lower priority for me because I will have to figure out how to build a RV32 root filesystem first (p.s. Debian doesn't (officially) support RV32 as of writing).
Cool! Here are a few thoughts I had when I looked at some RISC-V early documents long ago (warning, it may be outdated):
Yes, not using the "compress" extension is probably a good approach. That looks like something a compiler might do, but it's quite a bit of work both implementation-wise, and it's unclear if it would help anyway here.
About the V extension, I'm not sure it would be helpful; do you plan to use it in the same way as our x86-64 vector extension support? As far as I know this has been experimental all along and isn't normally enabled in a standard PyPy. (I may be wrong about that.)
Singlefloats: we don't do any arithmetic on singlefloats with the JIT, but it has got a few instructions to pack/unpack double floats into single floats or to call a C-compiled function with singlefloat arguments. That's not optional, though I admit I don't know how a C compiler compiles these operations if floats are not supported by
wrote: plan the
hardware. But as usual, you can just write a tiny C program and see.
I agree that RV32 can be a more remote goal for now. It should simplify a lot of stuff if you can just assume a 64-bit environment. Plus all the other points you mention: the hardware may not support doubles, and may not be supported by Debian...
A bientôt,
Armin Rigo
_______________________________________________ pypy-dev mailing list -- pypy-dev@python.org To unsubscribe send an email to pypy-dev-leave@python.org https://mail.python.org/mailman3/lists/pypy-dev.python.org/ Member address: fijall@gmail.com
Hi, I have one question regarding pypy/module/_rawffi/alt/test/test_funcptr.py. In test_getaddressindll, the test uses `sys.maxint*2 - 1` as the mask (on linux): def test_getaddressindll(self): import sys from _rawffi.alt import CDLL libm = CDLL(self.libm_name) pow_addr = libm.getaddressindll('pow') fff = sys.maxint*2-1 ### WHY?? if sys.platform == 'win32' or sys.platform == 'darwin': fff = sys.maxint*2+1 assert pow_addr == self.pow_addr & fff But on Linux (both x86_64 and riscv), `sys.maxint*2 - 1` is 0xffffffff_fffffffd (or 0b1111_...._1111_1101). Why does the mask end with 0xd? It is a little weird because if the intention is to ensure the address is aligned to multiple of 4, the mask should end with 0xc. It is causing a problem for the RISC-V backend because in RISC-V the function address can be multiple of 2. # Other status updates This week I ran the test suites (see results below). I fixed one error related to large frame slot offsets (caught by test_tarfile). * Test Suite: app-level (-A) test * test_getsetsockopt_zero -- It looks like a buffer uninitialized error (the second byte changes between runs) (untriaged) * test_half_conversions -- No idea (untriaged) * test_floorceiltrunc -- It looks like RISC-V floor/ceil/trunc don't preserve signbit for nan inputs. * test__mercurial -- It looks like I have to install git and rebuild a pypy from scratch. * Test Suite: -D tests * All passes. * Test Suite: extra tests * test_connection_del - OperationalError: Could not open database (untriaged) * Test Suite: lib-python test * test_ssl -- (untriaged) * test_tokenize -- (untriaged) * test_zipfile64 -- It looks like heap corruption (maybe related to bad malloc_nursery fast path) (untriaged) * Test Suite: pypyjit tests * test_jitlogparser -- (untriaged) * test_micronumpy -- (untriaged) I also ran test_ll_random.py with `--repeat=20000 --random-seed=1234` and all test are passing. Regards, Logan On Mon, Jan 22, 2024 at 10:19 PM Logan Chien <tzuhsiang.chien@gmail.com> wrote:
Hi Maciej
Thank you for the information. It sounds like a good idea to run this before I go to sleep.
Other updates:
I didn't make progress last weekend. I spent last weekend revising the code generator for integer immediate load (replace up-to-eight-instructions with pc-relative load instructions). I will try them in the upcoming weekends.
Regards, Logan
On Sun, Jan 21, 2024 at 10:36 PM Maciej Fijalkowski <fijall@gmail.com> wrote:
Hi Logan
Additionally to what Matti says, there are random fuzzing tests like test_ll_random.py in jit/backend/test. Run those for longer than the default (e.g. whole night) to see if they find issues
Best, Maciej Fijalkowski
On Tue, 16 Jan 2024 at 07:02, Logan Chien <tzuhsiang.chien@gmail.com> wrote:
Hi,
I have good news: the RISC-V backend can pass as many unit tests as the
AArch64 backend. I got vmprof and codemap working this weekend. I also completed a full translation and got a workable pypy executable.
I have two questions now:
1. Are there other test suites that I can check for the correctness? 2. How do we measure the performance? Do we have a command line that
can run all benchmarks?
Thank you in advance.
Regards, Logan
p.s. All changes are at: https://github.com/loganchien/pypy/tree/rv64
On Mon, Jan 15, 2024 at 8:54 PM Logan Chien <tzuhsiang.chien@gmail.com>
Hi Maciej,
Thank you for your information. Let me conduct more surveys. Thanks.
Regards, Logan
On Thu, Jan 11, 2024 at 2:44 AM Maciej Fijalkowski <fijall@gmail.com>
wrote:
Hi Logan
As far as I remember (and neither Armin nor I did any major pypy development recently), the vectorization was never really something we got to work to the point where it was worth it. In theory, having vectorized operations like numpy arrays to compile to vectorized CPU instructions would be glorious, but in practice it never worked well enough for us to enable it by default.
Best, Maciej
On Wed, 10 Jan 2024 at 08:39, Logan Chien <tzuhsiang.chien@gmail.com>
wrote:
Hi Armin,
> About the V extension, I'm not sure it would be helpful; do you
> to use it in the same way as our x86-64 vector extension support? As > far as I know this has been experimental all along and isn't normally > enabled in a standard PyPy. (I may be wrong about that.)
Well, if the vector extension is not enabled by default even for x86-64 backend, then I will have to conduct more survey, planning, and designing. I haven't read the vectorization code yet.
Anyway, I will finish the basic JIT first.
Regards, Logan
On Tue, Jan 9, 2024 at 2:22 AM Armin Rigo <armin.rigo@gmail.com> wrote: > > Hi Logan, > > On Tue, 9 Jan 2024 at 04:01, Logan Chien < tzuhsiang.chien@gmail.com> wrote: > > Currently, I only target RV64 IMAD: > > > > I - Base instruction set > > M - Integer multiplication > > A - Atomic (used by call_release_gil) > > D - Double precision floating point arithmetic > > > > I don't use the C (compress) extension for now because it may complicate the branch offset calculation and register allocation. > > > > I plan to support the V (vector) extension after I finish the basic JIT support. But there are some unknowns. I am not sure whether (a) I want to detect the availability of the V extension dynamically (thus sharing the same pypy executable) or (b) build different executables for different combinations of extensions. Also, I don't have a development board that supports the V extension. I am searching for one. > > > > Another remote goal is to support RV32IMAF (singlefloats) or RV32IMAD. In RISC-V, 32-bit and 64-bit ISAs are quite similar. The only difference is on LW/SW (32-bit) vs. LD/SD (64-bit) and some special instructions for 64-bit (e.g. ADDW). I isolated many of them into load_int/store_int helper functions so that it will be easy to swap implementations. However, I am not sure if we have to change the object alignment in `malloc_nursery*` (to ensure we align to multiples of `double`). Also, I am not sure whether it is common for RV32 cores to include the D extension. But, anyway, RV32 will be a lower priority for me because I will have to figure out how to build a RV32 root filesystem first (p.s. Debian doesn't (officially) support RV32 as of writing). > > Cool! Here are a few thoughts I had when I looked at some RISC-V > early documents long ago (warning, it may be outdated): > > Yes, not using the "compress" extension is probably a good approach. > That looks like something a compiler might do, but it's quite a bit of > work both implementation-wise, and it's unclear if it would help anyway here. > > About the V extension, I'm not sure it would be helpful; do you
> to use it in the same way as our x86-64 vector extension support? As > far as I know this has been experimental all along and isn't normally > enabled in a standard PyPy. (I may be wrong about that.) > > Singlefloats: we don't do any arithmetic on singlefloats with the JIT, > but it has got a few instructions to pack/unpack double floats into > single floats or to call a C-compiled function with singlefloat > arguments. That's not optional, though I admit I don't know how a C > compiler compiles these operations if floats are not supported by
wrote: plan plan the
> hardware. But as usual, you can just write a tiny C program and see. > > I agree that RV32 can be a more remote goal for now. It should > simplify a lot of stuff if you can just assume a 64-bit environment. > Plus all the other points you mention: the hardware may not support > doubles, and may not be supported by Debian... > > > A bientôt, > > Armin Rigo
_______________________________________________ pypy-dev mailing list -- pypy-dev@python.org To unsubscribe send an email to pypy-dev-leave@python.org https://mail.python.org/mailman3/lists/pypy-dev.python.org/ Member address: fijall@gmail.com
Hi Logan, I wanted to start by stressing that this is really impressive progress. We've never had get so far with a new backend without serious support by the core JIT devs, so thanks a lot for your work, I'm really excited about this! A general suggestion about the test failures: for the failing tests it's worth it to see whether they also fail when you pass --jit off To the pypy binary. If they fail the same way, you can rule out your backend as the cause of the failure. I put comments about specific cases inline below. On 1/29/24 04:26, Logan Chien wrote:
Hi,
I have one question regarding pypy/module/_rawffi/alt/test/test_funcptr.py. In test_getaddressindll, the test uses `sys.maxint*2 - 1` as the mask (on linux):
def test_getaddressindll(self): import sys from _rawffi.alt import CDLL libm = CDLL(self.libm_name) pow_addr = libm.getaddressindll('pow') fff = sys.maxint*2-1 ### WHY?? if sys.platform == 'win32' or sys.platform == 'darwin': fff = sys.maxint*2+1 assert pow_addr == self.pow_addr & fff
But on Linux (both x86_64 and riscv), `sys.maxint*2 - 1` is 0xffffffff_fffffffd (or 0b1111_...._1111_1101). Why does the mask end with 0xd? It is a little weird because if the intention is to ensure the address is aligned to multiple of 4, the mask should end with 0xc.
It is causing a problem for the RISC-V backend because in RISC-V the function address can be multiple of 2.
This test looks just wrong, in my opinion. Given that the variable name is `fff`, I think it was just meant as a check "does it roughly look like a pointer". So I think somebody just forgot that sys.maxint is not a power of 2 (and then things failed on win32 and darwin and somebody fixed it with the extra if). You can change the test to always use sys.maxint*2+1.
# Other status updates
This week I ran the test suites (see results below). I fixed one error related to large frame slot offsets (caught by test_tarfile).
* Test Suite: app-level (-A) test * test_getsetsockopt_zero -- It looks like a buffer uninitialized error (the second byte changes between runs) (untriaged) * test_half_conversions -- No idea (untriaged) * test_floorceiltrunc -- It looks like RISC-V floor/ceil/trunc don't preserve signbit for nan inputs. * test__mercurial -- It looks like I have to install git and rebuild a pypy from scratch. * Test Suite: -D tests * All passes. * Test Suite: extra tests * test_connection_del - OperationalError: Could not open database (untriaged) * Test Suite: lib-python test * test_ssl -- (untriaged) * test_tokenize -- (untriaged) * test_zipfile64 -- It looks like heap corruption (maybe related to bad malloc_nursery fast path) (untriaged) * Test Suite: pypyjit tests * test_jitlogparser -- (untriaged) * test_micronumpy -- (untriaged)
I also ran test_ll_random.py with `--repeat=20000 --random-seed=1234` and all test are passing.
How long does that take, in wall clock time? I think for the other backends we kept it running for a bunch of days after the last crash occurred. Cheers, CF
On 1/29/24 09:27, CF Bolz-Tereick via pypy-dev wrote:
This test looks just wrong, in my opinion. Given that the variable name is `fff`, I think it was just meant as a check "does it roughly look like a pointer". So I think somebody just forgot that sys.maxint is not a power of 2 (and then things failed on win32 and darwin and somebody fixed it with the extra if). You can change the test to always use sys.maxint*2+1.
I went ahead and just did that change on the main branch. Cheers, CF
Hi CF, Thank you for your reply.
I also ran test_ll_random.py with `--repeat=20000 --random-seed=1234` and all test are passing.
How long does that take, in wall clock time? I think for the other backends we kept it running for a bunch of days after the last crash occurred.
It took only ~6 hrs (wall clock). If it takes a bunch of days on other architectures, I guess I must multiply `--repeat` by 40 times or run it on real hardware. I'll try it again after I clear other bugs. Regards, Logan On Mon, Jan 29, 2024 at 12:37 AM CF Bolz-Tereick via pypy-dev < pypy-dev@python.org> wrote:
On 1/29/24 09:27, CF Bolz-Tereick via pypy-dev wrote:
This test looks just wrong, in my opinion. Given that the variable name is `fff`, I think it was just meant as a check "does it roughly look like a pointer". So I think somebody just forgot that sys.maxint is not a power of 2 (and then things failed on win32 and darwin and somebody fixed it with the extra if). You can change the test to always use sys.maxint*2+1.
I went ahead and just did that change on the main branch.
Cheers,
CF _______________________________________________ pypy-dev mailing list -- pypy-dev@python.org To unsubscribe send an email to pypy-dev-leave@python.org https://mail.python.org/mailman3/lists/pypy-dev.python.org/ Member address: tzuhsiang.chien@gmail.com
Hi all, I wonder if there are any tricks that can be used to debug memory corruption? I am debugging test_tokenize and test_zipfile64 (from lib_python_tests.py). If I run test_zipfile64, I sometimes see this error backtrace: ``` testMoreThan64kFilesAppend (test.test_zipfile64.OtherTests) ... RPython traceback: File "rpython_jit_metainterp_8.c", line 35597, in CacheEntry_read File "rpython_rtyper_lltypesystem.c", line 26925, in ll_dict_getitem__dicttablePtr_objectPtr memory corruption: bad size for object in the nursery ``` It is definitely related to the RISCV JIT backend which I am working on. I tried to build a RISCV build with `-O2` and both test_tokenize and test_zipfile64 passed without problem. But, I can't further reduce to one of the following case: 1. Bad JIT opcode implementation that results in out-of-bound writes (thus corrupting the heap data structure). 2. Bad gcmap calculation (thus object is being freed too early or reference not being relocated properly) 3. Bad malloc* opcode implementation that corrupts the heap. 4. Something else. For (3), I made two attempts: a. I tried to skip all "fast paths" and only call the malloc_slowpath (fixed size, str, unicode, array). But this attempt doesn't help. b. I tried to build a JIT'ed PyPy targetstandalone.py with `--gc=boehm`, but, unfortunately, the generated C source code doesn't compile (with the error message below). ``` pypy_module_cpyext.c: In function 'pypy_g_W_PyCTypeObject__cpyext_attach_pyobj': pypy_module_cpyext.c:125333:9: warning: implicit declaration of function 'OP_GC_RAWREFCOUNT_CREATE_LINK_PYOBJ'; did you mean 'OP_GC_RAWREFCOUNT_CREATE_LINK_PYPY'? [-Wimplicit-function-declaration] 125333 | OP_GC_RAWREFCOUNT_CREATE_LINK_PYOBJ(l_v451927, l_v451928, /* nothing */); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | OP_GC_RAWREFCOUNT_CREATE_LINK_PYPY pypy_module_cpyext.c:125333:80: error: expected expression before ')' token 125333 | OP_GC_RAWREFCOUNT_CREATE_LINK_PYOBJ(l_v451927, l_v451928, /* nothing */); | ^ make: *** [Makefile:762: pypy_module_cpyext.o] Error 1 make: *** Waiting for unfinished jobs.... ``` So here comes my question: do we have some way to log the allocation/marking/relocation/deallocation in the GC? Or any other suggestions are much appreciated. Thank you. Regards, Logan On Mon, Jan 29, 2024 at 6:51 PM Logan Chien <tzuhsiang.chien@gmail.com> wrote:
Hi CF,
Thank you for your reply.
I also ran test_ll_random.py with `--repeat=20000 --random-seed=1234` and all test are passing.
How long does that take, in wall clock time? I think for the other backends we kept it running for a bunch of days after the last crash occurred.
It took only ~6 hrs (wall clock). If it takes a bunch of days on other architectures, I guess I must multiply `--repeat` by 40 times or run it on real hardware. I'll try it again after I clear other bugs.
Regards, Logan
On Mon, Jan 29, 2024 at 12:37 AM CF Bolz-Tereick via pypy-dev < pypy-dev@python.org> wrote:
On 1/29/24 09:27, CF Bolz-Tereick via pypy-dev wrote:
This test looks just wrong, in my opinion. Given that the variable name is `fff`, I think it was just meant as a check "does it roughly look like a pointer". So I think somebody just forgot that sys.maxint is not a power of 2 (and then things failed on win32 and darwin and somebody fixed it with the extra if). You can change the test to always use sys.maxint*2+1.
I went ahead and just did that change on the main branch.
Cheers,
CF _______________________________________________ pypy-dev mailing list -- pypy-dev@python.org To unsubscribe send an email to pypy-dev-leave@python.org https://mail.python.org/mailman3/lists/pypy-dev.python.org/ Member address: tzuhsiang.chien@gmail.com
Hi Logan, On Fri, 16 Feb 2024 at 07:46, Logan Chien <tzuhsiang.chien@gmail.com> wrote:
pypy_module_cpyext.c:125333:80: error: expected expression before ')' token 125333 | OP_GC_RAWREFCOUNT_CREATE_LINK_PYOBJ(l_v451927, l_v451928, /* nothing */);
Ah, I guess that we are missing a dependency. To compile with Boehm, you need to avoid this particular GC-related feature that is used by the cpyext module. Try to translate with the option ``--withoutmod-cpyext``. Does the equivalent of `pypysrc\rpython\jit\backend\x86\test\test_zrpy_gc.py` pass on your backend? I guess it does, and so does a long-running `test_zll_stress_*.py`---but maybe try to run `test_zll_stress_*.py` for even longer, it can often eventually find bugs if they are really in the JIT backend. If it doesn't help, then "congratulation", you are in the land of debugging the very rare crash with gdb. For this case, it's a matter of going earlier in time from the crash. If the crash is nicely reproductible, then you have a chance of doing this by setting correct breakpoints (hardware breakpoints on memory change, for example) and restarting the program and repeating. If it is too random, then that doesn't work; maybe look for what reverse debuggers are available nowadays. Last I looked, on x86-64, gdb had a built-in but useless one (only goes backward a little bit), but there was lldb which worked, and udb was still called undodb---but my guess would be that none of that works on RISC-V. If all else fails, I remember once hacking around to dump a huge amount of data (at least every single memory write into GC structures) (but that was outside the JIT; logging from generated assembly is made harder by the fact that the log calls must not cause the generated code to change apart from the calls). It would let me know exactly what happened---that was for one bug that took me 10 days of hard work, my personal best :-/ A bientôt, Armin
Hi Armin, Thank you for your reply.
Try to translate with the option ``--withoutmod-cpyext``.
This option fixes the error, but now I encounter another error message (in pypy_module_sys.c): ``` pypy_module_sys.c: In function 'pypy_g_setrecursionlimit': pypy_module_sys.c:2890:9: warning: implicit declaration of function 'OP_GC_INCREASE_ROOT_STACK_DEPTH' [-Wimplicit-function-declaration] 2890 | OP_GC_INCREASE_ROOT_STACK_DEPTH(l_v498959, /* nothing */); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ pypy_module_sys.c:2890:65: error: expected expression before ')' token 2890 | OP_GC_INCREASE_ROOT_STACK_DEPTH(l_v498959, /* nothing */); | ^ make: *** [Makefile:641: pypy_module_sys.o] Error 1 ```
Does the equivalent of `pypysrc\rpython\jit\backend\x86\test\test_zrpy_gc.py` pass on your backend? I guess it does, and so does a long-running `test_zll_stress_*.py`---but maybe try to run `test_zll_stress_*.py` for even longer, it can often eventually find bugs if they are really in the JIT backend.
Yes. test_zrpy_gc.py is passing. I tried `test_zll_stress_*.py` and it passed too. Then, I increased `total_iterations` to 10000 and decreased `pieces` to 1 and it still passed. Just to be sure, is the following command correct? ``` python2.7 ./pytest.py rpython/jit/backend/test/test_zll_stress_0.py -s -v ``` If everything is correct, I feel that I have to debug this the hard way. Let's see if I can find more leads or not. Thank you. Regards, Logan On Fri, Feb 16, 2024 at 1:38 AM Armin Rigo <armin.rigo@gmail.com> wrote:
Hi Logan,
On Fri, 16 Feb 2024 at 07:46, Logan Chien <tzuhsiang.chien@gmail.com> wrote:
pypy_module_cpyext.c:125333:80: error: expected expression
before ')' token
125333 | OP_GC_RAWREFCOUNT_CREATE_LINK_PYOBJ(l_v451927,
l_v451928, /* nothing */);
Ah, I guess that we are missing a dependency. To compile with Boehm, you need to avoid this particular GC-related feature that is used by the cpyext module. Try to translate with the option ``--withoutmod-cpyext``.
Does the equivalent of `pypysrc\rpython\jit\backend\x86\test\test_zrpy_gc.py` pass on your backend? I guess it does, and so does a long-running `test_zll_stress_*.py`---but maybe try to run `test_zll_stress_*.py` for even longer, it can often eventually find bugs if they are really in the JIT backend.
If it doesn't help, then "congratulation", you are in the land of debugging the very rare crash with gdb. For this case, it's a matter of going earlier in time from the crash. If the crash is nicely reproductible, then you have a chance of doing this by setting correct breakpoints (hardware breakpoints on memory change, for example) and restarting the program and repeating. If it is too random, then that doesn't work; maybe look for what reverse debuggers are available nowadays. Last I looked, on x86-64, gdb had a built-in but useless one (only goes backward a little bit), but there was lldb which worked, and udb was still called undodb---but my guess would be that none of that works on RISC-V. If all else fails, I remember once hacking around to dump a huge amount of data (at least every single memory write into GC structures) (but that was outside the JIT; logging from generated assembly is made harder by the fact that the log calls must not cause the generated code to change apart from the calls). It would let me know exactly what happened---that was for one bug that took me 10 days of hard work, my personal best :-/
A bientôt,
Armin
Hi Logan, On Mon, 19 Feb 2024 at 05:02, Logan Chien <tzuhsiang.chien@gmail.com> wrote:
2890 | OP_GC_INCREASE_ROOT_STACK_DEPTH(l_v498959, /* nothing */);
Ah, yet another missing macro. This should just be #defined to do nothing with Boehm, maybe in rpython/translator/c/src/mem.h in the section "dummy version of these operations, e.g. with Boehm".
Just to be sure, is the following command correct?
python2.7 ./pytest.py rpython/jit/backend/test/test_zll_stress_0.py -s -v
Yes, that's correct. A bientôt, Armin
Hi Armin, Thank you for the reply.
This should just be #defined to do nothing with Boehm, maybe in rpython/translator/c/src/mem.h
With this change and a few RISC-V backend fixes (related to self.cpu.vtable_offset), I can build and run a JIT+BoehmGC PyPy. This configuration (JIT+BoehmGC) can pass test_tokenize and test_zipfile64 (from lib_python_tests.py). Thus, my next step will focus on the differences between JIT+BoehmGC and JIT+IncminimarkGC. I think the differences are: 1. call_malloc_nursery_* (fast/slow paths) 2. shadow stack push/pop updates 3. gcmap 4. write barriers (fast/slow paths) 5. realloc_frame Regards, Logan On Sun, Feb 18, 2024 at 11:06 PM Armin Rigo <armin.rigo@gmail.com> wrote:
Hi Logan,
On Mon, 19 Feb 2024 at 05:02, Logan Chien <tzuhsiang.chien@gmail.com> wrote:
2890 | OP_GC_INCREASE_ROOT_STACK_DEPTH(l_v498959, /*
nothing */);
Ah, yet another missing macro. This should just be #defined to do nothing with Boehm, maybe in rpython/translator/c/src/mem.h in the section "dummy version of these operations, e.g. with Boehm".
Just to be sure, is the following command correct?
python2.7 ./pytest.py rpython/jit/backend/test/test_zll_stress_0.py -s -v
Yes, that's correct.
A bientôt, Armin
Hi Logan, On Tue, 20 Feb 2024 at 05:08, Logan Chien <tzuhsiang.chien@gmail.com> wrote:
This should just be #defined to do nothing with Boehm, maybe in rpython/translator/c/src/mem.h
With this change and a few RISC-V backend fixes (related to self.cpu.vtable_offset), I can build and run a JIT+BoehmGC PyPy.
Cool! I also got a pull request merged into the main branch with this change, and it does indeed fix boehm builds.
This configuration (JIT+BoehmGC) can pass test_tokenize and test_zipfile64 (from lib_python_tests.py).
Thus, my next step will focus on the differences between JIT+BoehmGC and JIT+IncminimarkGC.
A problem that came up a lot in other backends is a specific input instruction that the backend emits with specific registers. When you run into the bad case, the emitted code reuses a register *before* reading the same register assuming that it still contains its old value. It's entirely dependent on register allocation, and if you run it with boehm then the sequence of instruction is slightly different and that might be the reason that the bug doesn't show up then. If you get two failures with incminimark and none with boehm, then it sounds more likely that the case involves one of the incminimark-only constructions---but it's also possible the bug is somewhere unrelated and it's purely bad luck... Armin
Hi Armin, Thank you for the reply. Luckily, I found the bug. It was a bug in my write barrier card marking implementation. I misunderstood what AArch64 MVN instruction meant when I was porting the code. After fixing it, I can pass these two test cases (test_zipfile64 and test_tokenize). Now, I am looking into test_json. Earlier, I thought it was an XFAIL because the -O2 build was failing too. But after adding `@settings(suppress_health_check=[HealthCheck.too_slow])` to `test_json.test_roundtrip`, I could run it in reasonable time. However, it was extremely slow when I ran the same test with the `-Ojit` build. According to `PYPYLOG=jit:log.txt`, the JIT compiler kept building the same (or similar) bridge. Statistics showed that the RISC-V JIT compiled more than 3000 bridges (when Ctrl-C interrupted) whereas the X86 JIT build compiled only 900 bridges (when completed). I will try to figure out the failing guard op first. Regards, Logan On Mon, Feb 19, 2024 at 10:05 PM Armin Rigo <armin.rigo@gmail.com> wrote:
Hi Logan,
On Tue, 20 Feb 2024 at 05:08, Logan Chien <tzuhsiang.chien@gmail.com> wrote:
This should just be #defined to do nothing with Boehm, maybe in rpython/translator/c/src/mem.h
With this change and a few RISC-V backend fixes (related to self.cpu.vtable_offset), I can build and run a JIT+BoehmGC PyPy.
Cool! I also got a pull request merged into the main branch with this change, and it does indeed fix boehm builds.
This configuration (JIT+BoehmGC) can pass test_tokenize and test_zipfile64 (from lib_python_tests.py).
Thus, my next step will focus on the differences between JIT+BoehmGC and JIT+IncminimarkGC.
A problem that came up a lot in other backends is a specific input instruction that the backend emits with specific registers. When you run into the bad case, the emitted code reuses a register *before* reading the same register assuming that it still contains its old value. It's entirely dependent on register allocation, and if you run it with boehm then the sequence of instruction is slightly different and that might be the reason that the bug doesn't show up then. If you get two failures with incminimark and none with boehm, then it sounds more likely that the case involves one of the incminimark-only constructions---but it's also possible the bug is somewhere unrelated and it's purely bad luck...
Armin
Hi all, I am looking into the last failing case: "TestMicroNumPy::()::test_reduce_logical_and" but I don't quite understand what this test means. The test case fails with: ``` @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ Loops don't match ================= loop id = None ('operation mismatch',) <could not determine information> Ignore ops: [] Got: ===== HERE ===== guard_not_invalidated(descr=<Guard0x4005356a88>) i39 = int_and(i35, 7) i40 = int_is_zero(i39) guard_true(i40, descr=<Guard0x40053a0e30>) f41 = raw_load_f(i9, i35, descr=<ArrayF 8>) i43 = float_ne(f41, 0.000000) guard_true(i43, descr=<Guard0x40053a0e60>) i45 = int_add(i28, 1) i47 = int_add(i35, 8) i48 = int_ge(i45, i36) guard_false(i48, descr=<Guard0x40053a0e90>) jump(p29, i45, p2, i47, p4, p6, i9, i36, descr=TargetToken(274965300512)) Expected: i10096 = int_and(i29, 7) i10097 = int_is_zero(i10096) guard_true(i10097, descr=...) guard_not_invalidated(descr=...) f31 = raw_load_f(i9, i29, descr=<ArrayF 8>) i32 = float_ne(f31, 0.000000) guard_true(i32, descr=...) i36 = int_add(i24, 1) i37 = int_add(i29, 8) i38 = int_ge(i36, i30) guard_false(i38, descr=...) jump(..., descr=...) ``` IIUC, the difference is that guard_not_invalidated is at a different location. But I don't understand why the backend can affect the logs in the 'jit-log-opt-' tag. Also, I found that reduce_logical_and (failed) and reduce_logical_xor (passed) are very different. Is there more information on the details of this test? Any ideas to debug this test case are very welcomed! Thanks. Regards, Logan p.s. I almost covered all the test cases. Except the one described above, other test cases are either Passing, classified as XFAIL (not supportable), or related to environment (e.g. schroot/qemu). I will try to run it on the real board this weekend. On Wed, Feb 21, 2024 at 9:57 PM Logan Chien <tzuhsiang.chien@gmail.com> wrote:
Hi Armin,
Thank you for the reply.
Luckily, I found the bug. It was a bug in my write barrier card marking implementation. I misunderstood what AArch64 MVN instruction meant when I was porting the code. After fixing it, I can pass these two test cases (test_zipfile64 and test_tokenize).
Now, I am looking into test_json. Earlier, I thought it was an XFAIL because the -O2 build was failing too. But after adding `@settings(suppress_health_check=[HealthCheck.too_slow])` to `test_json.test_roundtrip`, I could run it in reasonable time.
However, it was extremely slow when I ran the same test with the `-Ojit` build. According to `PYPYLOG=jit:log.txt`, the JIT compiler kept building the same (or similar) bridge. Statistics showed that the RISC-V JIT compiled more than 3000 bridges (when Ctrl-C interrupted) whereas the X86 JIT build compiled only 900 bridges (when completed). I will try to figure out the failing guard op first.
Regards, Logan
On Mon, Feb 19, 2024 at 10:05 PM Armin Rigo <armin.rigo@gmail.com> wrote:
Hi Logan,
On Tue, 20 Feb 2024 at 05:08, Logan Chien <tzuhsiang.chien@gmail.com> wrote:
This should just be #defined to do nothing with Boehm, maybe in rpython/translator/c/src/mem.h
With this change and a few RISC-V backend fixes (related to self.cpu.vtable_offset), I can build and run a JIT+BoehmGC PyPy.
Cool! I also got a pull request merged into the main branch with this change, and it does indeed fix boehm builds.
This configuration (JIT+BoehmGC) can pass test_tokenize and test_zipfile64 (from lib_python_tests.py).
Thus, my next step will focus on the differences between JIT+BoehmGC and JIT+IncminimarkGC.
A problem that came up a lot in other backends is a specific input instruction that the backend emits with specific registers. When you run into the bad case, the emitted code reuses a register *before* reading the same register assuming that it still contains its old value. It's entirely dependent on register allocation, and if you run it with boehm then the sequence of instruction is slightly different and that might be the reason that the bug doesn't show up then. If you get two failures with incminimark and none with boehm, then it sounds more likely that the case involves one of the incminimark-only constructions---but it's also possible the bug is somewhere unrelated and it's purely bad luck...
Armin
Hi Logan, On Thu, 29 Feb 2024 at 08:37, Logan Chien <tzuhsiang.chien@gmail.com> wrote:
IIUC, the difference is that guard_not_invalidated is at a different location.
But I don't understand why the backend can affect the logs in the 'jit-log-opt-' tag.
There are a few ways to influence the front-end: for example, the "support_*" class-level flags. Maybe the front-end did either do or skip a specific optimization when compared with x86, and it results only in a 'guard_not_invalidated' being present or not (and then once it is emitted, it's not emitted again a few instructions below). Or there are some other reasons. But as a general rule, we can mostly ignore the position of 'guard_not_invalidated'. It should have no effect except in corner cases.
Also, I found that reduce_logical_and (failed) and reduce_logical_xor (passed) are very different.
Is there more information on the details of this test? Any ideas to debug this test case are very welcomed! Thanks.
A possibility is that some of these tests are flaky in the sense of passing half by chance on x86---I vaguely remember having some troubles sometimes. It's sometimes hard to write tests without testing too many details. Others may have better comments about them. Generally, it's OK to look at what you got and compare it with what the test expects. If you can come up with a reason for why what you got is correct too, and free of real performance issues, then that's good enough. A bientôt, Armin
Hi Armin, Thank you for the reply. I'll check (1) the config, (2) the frontend code that emits guard_not_invalidated, and (3) the actual performance on HW this weekend. Regards, Logan On Thu, Feb 29, 2024 at 4:45 AM Armin Rigo <armin.rigo@gmail.com> wrote:
Hi Logan,
On Thu, 29 Feb 2024 at 08:37, Logan Chien <tzuhsiang.chien@gmail.com> wrote:
IIUC, the difference is that guard_not_invalidated is at a different location.
But I don't understand why the backend can affect the logs in the 'jit-log-opt-' tag.
There are a few ways to influence the front-end: for example, the "support_*" class-level flags. Maybe the front-end did either do or skip a specific optimization when compared with x86, and it results only in a 'guard_not_invalidated' being present or not (and then once it is emitted, it's not emitted again a few instructions below). Or there are some other reasons. But as a general rule, we can mostly ignore the position of 'guard_not_invalidated'. It should have no effect except in corner cases.
Also, I found that reduce_logical_and (failed) and reduce_logical_xor (passed) are very different.
Is there more information on the details of this test? Any ideas to debug this test case are very welcomed! Thanks.
A possibility is that some of these tests are flaky in the sense of passing half by chance on x86---I vaguely remember having some troubles sometimes. It's sometimes hard to write tests without testing too many details. Others may have better comments about them. Generally, it's OK to look at what you got and compare it with what the test expects. If you can come up with a reason for why what you got is correct too, and free of real performance issues, then that's good enough.
A bientôt, Armin
Hi all, It has been a while since my last email. I was busy until recently but would like to restart the work now. First of all, good news! All tests were passing or with reasonable workarounds (see below). I also ran the benchmarks on SiFive Unmatched board (with Ubuntu 24.04) and got the result w.r.t. CPython 2.7 (see the attached JSON file). *Thus, I would like to send out the Pull Request to merge the work soon. What do you think?* Back the the test result: These were the failures I mentioned earlier: * Test Suite: app-level (-A) test * test_getsetsockopt_zero -- It seems to be QEMU-specific. It isn't reproducible on SiFive Unmatched + Ubuntu 24.04. * test_half_conversions -- sNAN being canonicalized when RISCV converts F64->F32. I separate it into two cases: (1) F16->F64 (passing) and (2) F16->F64->F32 (skipped). * test_floorceiltrunc -- RISC-V floor/ceil/trunc don't preserve signbit for nan inputs. * test__mercurial -- This was fixed after installing the git command. * Test Suite: -D tests * All passes. * Test Suite: extra tests * test_connection_del -- Fixed * Test Suite: lib-python test * test_ssl -- libssl internal error when TLSv1 was requested. I will send out another Pull Request for this. * test_tokenize -- This was caused by a bug in RISCV backend card marking code generator. This is fixed now. * test_zipfile64 -- This was caused by a bug in RISCV backend card marking code generator. This is fixed now. * Test Suite: pypyjit tests * test_jitlogparser -- This was caused by a bug in RISCV backend card marking code generator. This is fixed now. * test_micronumpy -- This looked like a benign error (the order is slightly different). Since the performance was fine, I wrote a special case for RISCV. The following were caused by slow CPU speed vs. wall clock time: * Test Suite: lib-python test * test_json.py: test_roundtrip * test_textio.py: test_readline * test_unicode.py: test_index, test_rfind, test_rindex These can be fixed by adding: ``` from hypothesis import settings, HealthCheck @settings(suppress_health_check=[HealthCheck.too_slow]) ``` I feel we can just ignore these for now. This concludes the work to debug all failing tests. The full list of Git commits can be found here: https://github.com/pypy/pypy/compare/main...loganchien:pypy:rv64 Please let me know what you think. Thank you. Regards, Logan On Fri, Mar 1, 2024 at 11:52 AM Logan Chien <tzuhsiang.chien@gmail.com> wrote:
Hi Armin,
Thank you for the reply. I'll check (1) the config, (2) the frontend code that emits guard_not_invalidated, and (3) the actual performance on HW this weekend.
Regards, Logan
On Thu, Feb 29, 2024 at 4:45 AM Armin Rigo <armin.rigo@gmail.com> wrote:
Hi Logan,
On Thu, 29 Feb 2024 at 08:37, Logan Chien <tzuhsiang.chien@gmail.com> wrote:
IIUC, the difference is that guard_not_invalidated is at a different location.
But I don't understand why the backend can affect the logs in the 'jit-log-opt-' tag.
There are a few ways to influence the front-end: for example, the "support_*" class-level flags. Maybe the front-end did either do or skip a specific optimization when compared with x86, and it results only in a 'guard_not_invalidated' being present or not (and then once it is emitted, it's not emitted again a few instructions below). Or there are some other reasons. But as a general rule, we can mostly ignore the position of 'guard_not_invalidated'. It should have no effect except in corner cases.
Also, I found that reduce_logical_and (failed) and reduce_logical_xor (passed) are very different.
Is there more information on the details of this test? Any ideas to debug this test case are very welcomed! Thanks.
A possibility is that some of these tests are flaky in the sense of passing half by chance on x86---I vaguely remember having some troubles sometimes. It's sometimes hard to write tests without testing too many details. Others may have better comments about them. Generally, it's OK to look at what you got and compare it with what the test expects. If you can come up with a reason for why what you got is correct too, and free of real performance issues, then that's good enough.
A bientôt, Armin
participants (5)
-
Armin Rigo
-
CF Bolz-Tereick
-
Logan Chien
-
Maciej Fijalkowski
-
Matti Picus