[New-bugs-announce] [issue46675] Allow more than 16 items in split-keys dicts and "virtual" object dicts.
Mark Shannon
report at bugs.python.org
Mon Feb 7 06:52:04 EST 2022
New submission from Mark Shannon <mark at hotpy.org>:
https://bugs.python.org/issue45340 and https://github.com/python/cpython/pull/28802 allowed "virtual" object dicts (see faster-cpython/ideas#72 for full details).
In order for this to work, we need to keep the insertion order on the values. The initial version (https://github.com/python/cpython/pull/28802) used a 64 bit value as a vector of 16 4-bit values, which allows only 16 items per values array.
Stats gathered from the standard benchmark suite and informal evidence from elsewhere suggests that this causes a significant (5% and upwards) of these dicts to be materialized due to exceeding the 16 item limit.
An alternative design that would allow up to ~254 items in the values array is to make the insertion order vector an array of bytes. The capacity is 254 as we need a byte for size, and another for capacity.
This will increase the size of the values a bit for sizes from 7 to 15, but save a lot of memory for sizes 17+, as keys could still be shared.
Pros:
No need to materialize dicts of size 16+, saving ~3/4 of the memory per dict and helping specialization.
Cons:
Extra memory write to store a value*
1 extra word for values of size 7 to 14, 2 extra for size 15.
Some extra complexity.
*In a hypothetical optimized JIT, the insertion order vector would be stored as a single write for several writes, so this would make no difference.
----------
assignee: Mark.Shannon
messages: 412735
nosy: Mark.Shannon
priority: normal
severity: normal
status: open
title: Allow more than 16 items in split-keys dicts and "virtual" object dicts.
versions: Python 3.11
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue46675>
_______________________________________
More information about the New-bugs-announce
mailing list