Hi all, Recently, we've got a few more of the common bug report "cannot find gc roots!". This is caused by asmgcc somehow failing to parse the ".s" files produced by gcc on Linux. I'm investigating what can be done to improve the situation of asmgcc in a more definitive way. There are basically two solutions: 1) we improve shadowstack. This is the alternative to asmgcc, which is used on any non-Linux platform already. So far it is around 10% slower than asmgcc. 2) we improve asmgcc by finding some better way than parsing assembler files. I worked during the past month in the branch "shadowstack-perf-2". This gives a major improvement on the placement of pushing and popping GC roots on the shadow stack. I think it's worth merging that branch in any case. On x86, it gives roughly 3-4% speed improvements; I'd guess on arm it is slightly more. (I'm comparing the performance outside JITted machine code; the JITted machine code we produce is more similar.) The problem is that asmgcc used to be ~10% better. IMHO, 3-4% is not quite enough to be happy and kill asmgcc. Improving beyond these 3-4% seems to require some new ideas. So I'm also thinking about ways to fix asmgcc more generally, this time focusing on Linux only; asmgcc contains old code that tries to parse MSVC output, and I bet we tried with clang at some point, but these attempts both failed. So let's focus on Linux and gcc only. Asmgcc does two things with the parsed assembler: it computes the stack size at every point, and it tracks some marked variables backward until the previous "call" instruction. I think we can assume that the version of gcc is not older than, say, the one on tannit32 (Ubuntu 12.04), which is gcc 4.6. At least from that version, both on x86-32 and x86-64, gcc will emit "CFI directives" (https://sourceware.org/binutils/docs/as/CFI-directives.html). These are a saner way to get the information about the current stack size. About the backward tracking, we need to have a complete understanding of all instructions, even if e.g. for any xmm instruction we just say "can't handle GC pointers". The backward tracking itself is often foiled because the assembler is lacking a way to know clearly "this call never returns" (e.g. calls to abort(), or to some RPython helper that prints stuff and aborts). In other words, the control flow is sometimes hard to get correctly, because a "call" generally returns, but not always. Such mistakes can produce bogus results (including "cannot find gc roots!"). What can we do about that? Maybe we can compile with "-s -fdump-final-insns". This dumps a gcc-specific summary of the RTL, which is the final intermediate representation, which looks like it is in one-to-one correspondance with the actual assembly. It would be a better input for the backward-tracker, because we don't have to handle tons of instructions with unknown effects, and because it contains explicit points at which control flow cannot pass. On the other hand, we'd need to parse both the .s and this dump in parallel, matching them as we go along. But I still think it would be better than now. Of course the best would be to get rid of asmgcc completely... This mail is meant to be a dump of my current mind's state :-) A bientôt, Armin.
hi armin I don't have very deep opinions - but I'm worried about one particular thing. GCC tends to change its IR with every release, would be parsing this not be a nightmare that has to be updated with each new release of gcc? On Mon, May 30, 2016 at 9:18 AM, Armin Rigo <arigo@tunes.org> wrote:
Hi all,
Recently, we've got a few more of the common bug report "cannot find gc roots!". This is caused by asmgcc somehow failing to parse the ".s" files produced by gcc on Linux.
I'm investigating what can be done to improve the situation of asmgcc in a more definitive way. There are basically two solutions:
1) we improve shadowstack. This is the alternative to asmgcc, which is used on any non-Linux platform already. So far it is around 10% slower than asmgcc.
2) we improve asmgcc by finding some better way than parsing assembler files.
I worked during the past month in the branch "shadowstack-perf-2". This gives a major improvement on the placement of pushing and popping GC roots on the shadow stack. I think it's worth merging that branch in any case. On x86, it gives roughly 3-4% speed improvements; I'd guess on arm it is slightly more. (I'm comparing the performance outside JITted machine code; the JITted machine code we produce is more similar.)
The problem is that asmgcc used to be ~10% better. IMHO, 3-4% is not quite enough to be happy and kill asmgcc. Improving beyond these 3-4% seems to require some new ideas.
So I'm also thinking about ways to fix asmgcc more generally, this time focusing on Linux only; asmgcc contains old code that tries to parse MSVC output, and I bet we tried with clang at some point, but these attempts both failed. So let's focus on Linux and gcc only.
Asmgcc does two things with the parsed assembler: it computes the stack size at every point, and it tracks some marked variables backward until the previous "call" instruction.
I think we can assume that the version of gcc is not older than, say, the one on tannit32 (Ubuntu 12.04), which is gcc 4.6. At least from that version, both on x86-32 and x86-64, gcc will emit "CFI directives" (https://sourceware.org/binutils/docs/as/CFI-directives.html). These are a saner way to get the information about the current stack size.
About the backward tracking, we need to have a complete understanding of all instructions, even if e.g. for any xmm instruction we just say "can't handle GC pointers". The backward tracking itself is often foiled because the assembler is lacking a way to know clearly "this call never returns" (e.g. calls to abort(), or to some RPython helper that prints stuff and aborts). In other words, the control flow is sometimes hard to get correctly, because a "call" generally returns, but not always. Such mistakes can produce bogus results (including "cannot find gc roots!").
What can we do about that? Maybe we can compile with "-s -fdump-final-insns". This dumps a gcc-specific summary of the RTL, which is the final intermediate representation, which looks like it is in one-to-one correspondance with the actual assembly. It would be a better input for the backward-tracker, because we don't have to handle tons of instructions with unknown effects, and because it contains explicit points at which control flow cannot pass. On the other hand, we'd need to parse both the .s and this dump in parallel, matching them as we go along. But I still think it would be better than now.
Of course the best would be to get rid of asmgcc completely...
This mail is meant to be a dump of my current mind's state :-)
A bientôt,
Armin. _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
participants (2)
-
Armin Rigo
-
Maciej Fijalkowski