[Python-Dev] Compact ordered set

Tue Feb 26 11:33:28 EST 2019

On Wed, Feb 27, 2019 at 12:37 AM Victor Stinner <vstinner at redhat.com> wrote:
>
> Le mar. 26 févr. 2019 à 12:33, INADA Naoki <songofacandy at gmail.com> a écrit :
> > - unpickle_list: 8.48 us +- 0.09 us -> 12.8 us +- 0.5 us: 1.52x slower (+52%)> ...
> > ...
> > unpickle and unpickle_list shows massive slowdown.  I suspect this slowdown
> > is not caused from set change.  Linux perf shows many pagefault is happened
> > in pymalloc_malloc.  I think memory usage changes hit weak point of pymalloc
> > accidentally.  I will try to investigate it.
>
> Please contact me to get access to speed.python.org server. *Maybe*
> your process to run benchmarks is not reliable and you are getting
> "noise" in results.

My company gives me dedicated Linux machine with Core(TM) i7-6700.
So I think it's not issue of my machine.

perf shows this line caused many page fault.
https://github.com/python/cpython/blob/c606a9cbd48f69d3f4a09204c781dda9864218b7/Objects/obmalloc.c#L1513

This line is executed when pymalloc can't reuse existing pool and uses new pool.
So I suspect there is some weak point about pymalloc and adding more hysteresis
may help it.  But I'm not sure yet.  I'll investigate it later.

If you want to reproduce it, try this commit.
https://github.com/methane/cpython/pull/16/commits/3178dc96305435c691af83515b2e4725ab6eb826

Ah, another interesting point, this huge slowdown happens only when bm_pickle.py
is executed through pyperformance.  When run it directly, slowdown is
not so large.
So I think this issue is tightly coupled with how pages are mapped.

$ ./python -m performance.benchmarks.bm_pickle --compare-to ./py-master unpickle
py-master: ..................... 27.7 us +- 1.8 us
python: ..................... 28.7 us +- 2.5 us

Mean +- std dev: [py-master] 27.7 us +- 1.8 us -> [python] 28.7 us +-
2.5 us: 1.04x slower (+4%)

>
> > On the other hand, meteor_contest shows 13% speedup.  It uses set.
> > Other doesn't show significant performance changes.
>
> I recall that some benchmarks are unstable and depend a lot on how you
> run the benchmark, how Python is compiled (ex: PGO or not).

As far as reading bm_meteor_contest.py source, it uses frozenset heavily.
So I think this is real performance gain.

Anyway, pyperformance is not perfect and doesn't cover all set
workloads.  I need to write more benchmarks.

-- 
INADA Naoki  <songofacandy at gmail.com>