[Python-Dev] Compact ordered set

26 Feb 2019

      Hello, folks.

I'm working on compact and ordered set implementation.
It has internal data structure similar to new dict from Python 3.6.

It is still work in progress.  Comments, tests, and documents
should be updated.  But it passes existing tests excluding
test_sys and test_gdb (both tests checks implementation detail)
https://github.com/methane/cpython/pull/16

Before completing this work, I want to evaluate it.
Following is my current thoughts about the compact ordered set.

## Preserving insertion order

Order is not fundamental for set.  There are no order in set in the
math world.

But it is convenient sometime in real world.  For example, it makes
doctest easy.  When writing set to logs, we can use "grep" command
if print order is stable.  pyc is stable without PYTHONHASHSEED=0 hack.

Additionally, consistency with dict is desirable.  It removes one pitfall for
new Python users.  "Remove duplicated items from list" idiom become
`list(set(duplicated))` from `list(dict.fromkeys(duplicated))`.

## Memory efficiency

Hash table has dilemma.  To reduce collision rate,  hash table
should be sparse.  But it wastes memory.

Since current set is optimized for both of hit and miss cases,
it is more sparse than dict.  (It is bit surprise that set typically uses
more memory than same size dict!)

New implementation partially solve this dilemma.  It has sparse
"index table" which items are small (1byte when table size <= 256,
2bytes when table size <= 65536), and dense entry table (each item
has key and hash, which is 16bytes on 64bit system).

I use 1/2 for capacity rate for now.  So new implementation is
memory efficient when len(s) <= 32768.  But memory efficiency is
roughly equal to current implementation when 32768 < len(s) <= 2**31,
and worse than current implementation when len(s) > 2**31.

Here is quick test about memory usage.
https://gist.github.com/methane/98b7f43fc00a84964f66241695112e91

# Performance

pyperformance result:

$ ./python -m perf compare_to master.json oset2.json -G --min-speed=2
Slower (3):
- unpickle_list: 8.48 us +- 0.09 us -> 12.8 us +- 0.5 us: 1.52x slower (+52%)
- unpickle: 29.6 us +- 2.5 us -> 44.1 us +- 2.5 us: 1.49x slower (+49%)
- regex_dna: 448 ms +- 3 ms -> 462 ms +- 2 ms: 1.03x slower (+3%)

Faster (4):
- meteor_contest: 189 ms +- 1 ms -> 165 ms +- 1 ms: 1.15x faster (-13%)
- telco: 15.8 ms +- 0.2 ms -> 15.3 ms +- 0.2 ms: 1.03x faster (-3%)
- django_template: 266 ms +- 6 ms -> 259 ms +- 3 ms: 1.03x faster (-3%)
- unpickle_pure_python: 818 us +- 6 us -> 801 us +- 9 us: 1.02x faster (-2%)

Benchmark hidden because not significant (49)

unpickle and unpickle_list shows massive slowdown.  I suspect this slowdown
is not caused from set change.  Linux perf shows many pagefault is happened
in pymalloc_malloc.  I think memory usage changes hit weak point of pymalloc
accidentally.  I will try to investigate it.

On the other hand, meteor_contest shows 13% speedup.  It uses set.
Other doesn't show significant performance changes.

I need to write more benchmarks for various set workload.
I expect new set is faster on simple creation, iteration and destruction.
Especially, sequential iteration and deletion will reduce cache misses.
(e.g. https://bugs.python.org/issue32846 )

On the other hand, new implementation will be slow on complex
(heavy random add & del) case.

-----

Any comments are welcome.  And any benchmark for set workloads
are very welcome.

Regards,
--
INADA Naoki