Hi all, It has been a while since my last email. I was busy until recently but would like to restart the work now. First of all, good news! All tests were passing or with reasonable workarounds (see below). I also ran the benchmarks on SiFive Unmatched board (with Ubuntu 24.04) and got the result w.r.t. CPython 2.7 (see the attached JSON file). *Thus, I would like to send out the Pull Request to merge the work soon. What do you think?* Back the the test result: These were the failures I mentioned earlier: * Test Suite: app-level (-A) test * test_getsetsockopt_zero -- It seems to be QEMU-specific. It isn't reproducible on SiFive Unmatched + Ubuntu 24.04. * test_half_conversions -- sNAN being canonicalized when RISCV converts F64->F32. I separate it into two cases: (1) F16->F64 (passing) and (2) F16->F64->F32 (skipped). * test_floorceiltrunc -- RISC-V floor/ceil/trunc don't preserve signbit for nan inputs. * test__mercurial -- This was fixed after installing the git command. * Test Suite: -D tests * All passes. * Test Suite: extra tests * test_connection_del -- Fixed * Test Suite: lib-python test * test_ssl -- libssl internal error when TLSv1 was requested. I will send out another Pull Request for this. * test_tokenize -- This was caused by a bug in RISCV backend card marking code generator. This is fixed now. * test_zipfile64 -- This was caused by a bug in RISCV backend card marking code generator. This is fixed now. * Test Suite: pypyjit tests * test_jitlogparser -- This was caused by a bug in RISCV backend card marking code generator. This is fixed now. * test_micronumpy -- This looked like a benign error (the order is slightly different). Since the performance was fine, I wrote a special case for RISCV. The following were caused by slow CPU speed vs. wall clock time: * Test Suite: lib-python test * test_json.py: test_roundtrip * test_textio.py: test_readline * test_unicode.py: test_index, test_rfind, test_rindex These can be fixed by adding: ``` from hypothesis import settings, HealthCheck @settings(suppress_health_check=[HealthCheck.too_slow]) ``` I feel we can just ignore these for now. This concludes the work to debug all failing tests. The full list of Git commits can be found here: https://github.com/pypy/pypy/compare/main...loganchien:pypy:rv64 Please let me know what you think. Thank you. Regards, Logan On Fri, Mar 1, 2024 at 11:52 AM Logan Chien <tzuhsiang.chien@gmail.com> wrote:
Hi Armin,
Thank you for the reply. I'll check (1) the config, (2) the frontend code that emits guard_not_invalidated, and (3) the actual performance on HW this weekend.
Regards, Logan
On Thu, Feb 29, 2024 at 4:45 AM Armin Rigo <armin.rigo@gmail.com> wrote:
Hi Logan,
On Thu, 29 Feb 2024 at 08:37, Logan Chien <tzuhsiang.chien@gmail.com> wrote:
IIUC, the difference is that guard_not_invalidated is at a different location.
But I don't understand why the backend can affect the logs in the 'jit-log-opt-' tag.
There are a few ways to influence the front-end: for example, the "support_*" class-level flags. Maybe the front-end did either do or skip a specific optimization when compared with x86, and it results only in a 'guard_not_invalidated' being present or not (and then once it is emitted, it's not emitted again a few instructions below). Or there are some other reasons. But as a general rule, we can mostly ignore the position of 'guard_not_invalidated'. It should have no effect except in corner cases.
Also, I found that reduce_logical_and (failed) and reduce_logical_xor (passed) are very different.
Is there more information on the details of this test? Any ideas to debug this test case are very welcomed! Thanks.
A possibility is that some of these tests are flaky in the sense of passing half by chance on x86---I vaguely remember having some troubles sometimes. It's sometimes hard to write tests without testing too many details. Others may have better comments about them. Generally, it's OK to look at what you got and compare it with what the test expects. If you can come up with a reason for why what you got is correct too, and free of real performance issues, then that's good enough.
A bientôt, Armin