Given those facts I think including pybench is a mistake. It does not allow for a fair or meaningful comparison between implementations which is one of the things the suite is supposed to be used for in the future.
This easily leads to misinterpretation of the results from this particular benchmark and it negatively affects the performance data as a whole.
The same applies to several Unladen Swallow microbenchmarks such as bm_call_method_*, bm_call_simple and bm_unpack_sequence.
I don't think we should exclude any implementation specific benchmarks from a common suite.
They will not necessarily allow for comparisons between implementations, but will provide important information about the progress made in optimizing a particular implementation.