Mailman 3 asmgcc versus shadowstack - pypy-dev

30 May 2016

      Hi all,

Recently, we've got a few more of the common bug report "cannot find
gc roots!".  This is caused by asmgcc somehow failing to parse the
".s" files produced by gcc on Linux.

I'm investigating what can be done to improve the situation of asmgcc
in a more definitive way.  There are basically two solutions:

1) we improve shadowstack.  This is the alternative to asmgcc, which
is used on any non-Linux platform already.  So far it is around 10%
slower than asmgcc.

2) we improve asmgcc by finding some better way than parsing assembler files.

I worked during the past month in the branch "shadowstack-perf-2".
This gives a major improvement on the placement of pushing and popping
GC roots on the shadow stack.  I think it's worth merging that branch
in any case.  On x86, it gives roughly 3-4% speed improvements; I'd
guess on arm it is slightly more.  (I'm comparing the performance
outside JITted machine code; the JITted machine code we produce is
more similar.)

The problem is that asmgcc used to be ~10% better.  IMHO, 3-4% is not
quite enough to be happy and kill asmgcc.  Improving beyond these 3-4%
seems to require some new ideas.

So I'm also thinking about ways to fix asmgcc more generally, this
time focusing on Linux only; asmgcc contains old code that tries to
parse MSVC output, and I bet we tried with clang at some point, but
these attempts both failed.  So let's focus on Linux and gcc only.

Asmgcc does two things with the parsed assembler: it computes the
stack size at every point, and it tracks some marked variables
backward until the previous "call" instruction.

I think we can assume that the version of gcc is not older than, say,
the one on tannit32 (Ubuntu 12.04), which is gcc 4.6.  At least from
that version, both on x86-32 and x86-64, gcc will emit "CFI
directives" (https://sourceware.org/binutils/docs/as/CFI-directives.html).
These are a saner way to get the information about the current stack
size.

About the backward tracking, we need to have a complete understanding
of all instructions, even if e.g. for any xmm instruction we just say
"can't handle GC pointers".  The backward tracking itself is often
foiled because the assembler is lacking a way to know clearly "this
call never returns" (e.g. calls to abort(), or to some RPython helper
that prints stuff and aborts).  In other words, the control flow is
sometimes hard to get correctly, because a "call" generally returns,
but not always.  Such mistakes can produce bogus results (including
"cannot find gc roots!").

What can we do about that?  Maybe we can compile with "-s
-fdump-final-insns".  This dumps a gcc-specific summary of the RTL,
which is the final intermediate representation, which looks like it is
in one-to-one correspondance with the actual assembly.  It would be a
better input for the backward-tracker, because we don't have to handle
tons of instructions with unknown effects, and because it contains
explicit points at which control flow cannot pass.  On the other hand,
we'd need to parse both the .s and this dump in parallel, matching
them as we go along.  But I still think it would be better than now.

Of course the best would be to get rid of asmgcc completely...

This mail is meant to be a dump of my current mind's state :-)

A bientôt,

Armin.

asmgcc versus shadowstack

Armin Rigo

Maciej Fijalkowski

tags

participants (2)