[pypy-dev] Re: Contribute a RISC-V 64 JIT backend

Aug. 11, 2024

      Hi all,

It has been a while since my last email.  I was busy until recently but
would like to restart the work now.

First of all, good news!  All tests were passing or with reasonable
workarounds (see below).  I also ran the benchmarks on SiFive Unmatched
board (with Ubuntu 24.04) and got the result w.r.t. CPython 2.7 (see the
attached JSON file).

*Thus, I would like to send out the Pull Request to merge the work soon.
What do you think?*

Back the the test result:

These were the failures I mentioned earlier:

* Test Suite: app-level (-A) test
  * test_getsetsockopt_zero -- It seems to be QEMU-specific. It isn't
reproducible on SiFive Unmatched + Ubuntu 24.04.
  * test_half_conversions -- sNAN being canonicalized when RISCV converts
F64->F32. I separate it into two cases: (1) F16->F64 (passing) and (2)
F16->F64->F32 (skipped).
  * test_floorceiltrunc -- RISC-V floor/ceil/trunc don't preserve signbit
for nan inputs.
  * test__mercurial -- This was fixed after installing the git command.
* Test Suite: -D tests
  * All passes.
* Test Suite: extra tests
  * test_connection_del -- Fixed
* Test Suite: lib-python test
  * test_ssl -- libssl internal error when TLSv1 was requested. I will send
out another Pull Request for this.
  * test_tokenize -- This was caused by a bug in RISCV backend card marking
code generator. This is fixed now.
  * test_zipfile64 -- This was caused by a bug in RISCV backend card
marking code generator. This is fixed now.
* Test Suite: pypyjit tests
  * test_jitlogparser -- This was caused by a bug in RISCV backend card
marking code generator. This is fixed now.
  * test_micronumpy -- This looked like a benign error (the order is
slightly different). Since the performance was fine, I wrote a special case
for RISCV.

The following were caused by slow CPU speed vs. wall clock time:

* Test Suite: lib-python test
  * test_json.py: test_roundtrip
  * test_textio.py: test_readline
  * test_unicode.py: test_index, test_rfind, test_rindex

These can be fixed by adding:

```
from hypothesis import settings, HealthCheck
@settings(suppress_health_check=[HealthCheck.too_slow])
```

I feel we can just ignore these for now.

This concludes the work to debug all failing tests.

The full list of Git commits can be found here:
https://github.com/pypy/pypy/compare/main...loganchien:pypy:rv64

Please let me know what you think.  Thank you.

Regards,
Logan

On Fri, Mar 1, 2024 at 11:52 AM Logan Chien <tzuhsiang.chien@gmail.com>
wrote:
...
Hi Armin,
Thank you for the reply.  I'll check (1) the config, (2) the frontend code
that emits guard_not_invalidated, and (3) the actual performance on HW this
weekend.
Regards,
Logan
On Thu, Feb 29, 2024 at 4:45 AM Armin Rigo <armin.rigo@gmail.com> wrote:
...
Hi Logan,
On Thu, 29 Feb 2024 at 08:37, Logan Chien <tzuhsiang.chien@gmail.com>
wrote:
...
IIUC, the difference is that guard_not_invalidated is at a different
location.
But I don't understand why the backend can affect the logs in the
'jit-log-opt-' tag.
There are a few ways to influence the front-end: for example, the
"support_*" class-level flags.  Maybe the front-end did either do or
skip a specific optimization when compared with x86, and it results
only in a 'guard_not_invalidated' being present or not (and then once
it is emitted, it's not emitted again a few instructions below).  Or
there are some other reasons.  But as a general rule, we can mostly
ignore the position of 'guard_not_invalidated'.  It should have no
effect except in corner cases.
...
Also, I found that reduce_logical_and (failed) and reduce_logical_xor
(passed) are very different.
Is there more information on the details of this test?  Any ideas to
debug this test case are very welcomed!  Thanks.
A possibility is that some of these tests are flaky in the sense of
passing half by chance on x86---I vaguely remember having some
troubles sometimes.  It's sometimes hard to write tests without
testing too many details.  Others may have better comments about them.
Generally, it's OK to look at what you got and compare it with what
the test expects.  If you can come up with a reason for why what you
got is correct too, and free of real performance issues, then that's
good enough.
A bientôt,
Armin