Review request: issue 27350, compact ordered dict

Hi, devs. I've implemented compact and ordered dictionary [1], which PyPy implemented in 2015 [2]. Since it is my first large patch, I would like to have enough time for review cycle by Python 3.6 beta1. Could someone review it? [1] http://bugs.python.org/issue27350 [2] https://morepypy.blogspot.jp/2015/01/faster-more-memory-efficient-and-more.h... -- INADA Naoki <songofacandy@gmail.com>

Hello everyone. I did do only a cursory look on that one, but I would like to reiterate that this gives huge benefits in general and we measured nice speedups on pypy (where all the dicts are ordered forever): * you can essentially kill OrderedDict or make it almost OrderedDict = dict (in pypy it's a simple dict subclass that has one or two extra things that OrderedDict has in the API) * there are nice speedups * the C version of OrderedDict can be killed * it saves memory, on 64bit by quite a bit (not everyone stores more than 4bln items in a dictionary) * it solves the problem of tests relying on order in dictionaries In short, it has no downsides On Tue, Aug 9, 2016 at 3:12 PM, INADA Naoki <songofacandy@gmail.com> wrote:

Thanks you for supporting. BTW, my patch doesn't include OrderedDict speedup. (I'll try it when compact dict is merged.) In case of Python 3, OrderedDict.move_to_end(key, last=False) can append item at front. So implementing OrderedDict based on compact dict is not so easy as OrderedDict = dict. My idea is using compact_array (described in PyPy blog) as ring, not fixed capacity array. It means one of these two: a. dict has one additional word and support ring internally. b. OrderedDict reimplements many APIs (iterating, resizing, etc...) to support ring. I like (a) because it's easier to implement and maintain. But I don't know it can be in time for Python 3.6 beta. On Thu, Aug 11, 2016 at 12:06 AM Maciej Fijalkowski <fijall@gmail.com> wrote:

Hi Inada, On 10 August 2016 at 18:52, INADA Naoki <songofacandy@gmail.com> wrote:
a. dict has one additional word and support ring internally. b. OrderedDict reimplements many APIs (iterating, resizing, etc...) to support ring.
There is a solution "c." which might be simpler. Let's think first about what occurs with a normal dict (with your patch, or in PyPy) if the user repeatedly pops the oldest item and adds a new item. In this case, the dict will accumulate dead entries at the start of the list, and when it gets too many of them, it needs to make a complete copy of the list to get rid of them. (This is not a problem as it is amortized over many pop-and-add operations.) So now, when the user calls move_to_end(last=False), we can do the opposite. We look if there are already some deleted entries at the start, and if so, we add the item at the place of the last deleted entry. If there aren't any, then we make a copy of the list that *adds* some number of entries at the start, which are initially marked as deleted... A bientôt, Armin.

On 10/08/2016 17:06, Maciej Fijalkowski wrote:
* there are nice speedups
in this blog post https://morepypy.blogspot.fr/2015/01/faster-more-memory-efficient-and-more.h... it is mentioned big speedup only on microbenchmark and small speedups on pypy benchmark. is it what you call nice speedups or does there is other things ?

On Mon, Aug 15, 2016 at 6:02 AM, Xavier Combelle <xavier.combelle@gmail.com> wrote:
Yes, making dictionaries faster by a bit would not give you huge speedups everywhere. It'll give you a small, measurable speedup a bit everywhere. This is much better than a lot of things that cpython does which is a performance win. Note that there are two PEPs (sorted order in kwargs and sorted order in class names) which would be superseded by just reviewing this patch and merging it. Best regards, Maciej Fijalkowski

On Sun, Aug 28, 2016 at 2:05 PM, Guido van Rossum <guido@python.org> wrote:
Hopefully some core dev(s) can work on this during the core sprint, which is from Sept 5-9.
OK. While I'm in Japan (UTC+9) and cannot join the sprint, I'll be active as possible while the sprint. Thank you!
-- --Guido van Rossum (python.org/~guido)
-- INADA Naoki <songofacandy@gmail.com>

Hello everyone. I did do only a cursory look on that one, but I would like to reiterate that this gives huge benefits in general and we measured nice speedups on pypy (where all the dicts are ordered forever): * you can essentially kill OrderedDict or make it almost OrderedDict = dict (in pypy it's a simple dict subclass that has one or two extra things that OrderedDict has in the API) * there are nice speedups * the C version of OrderedDict can be killed * it saves memory, on 64bit by quite a bit (not everyone stores more than 4bln items in a dictionary) * it solves the problem of tests relying on order in dictionaries In short, it has no downsides On Tue, Aug 9, 2016 at 3:12 PM, INADA Naoki <songofacandy@gmail.com> wrote:

Thanks you for supporting. BTW, my patch doesn't include OrderedDict speedup. (I'll try it when compact dict is merged.) In case of Python 3, OrderedDict.move_to_end(key, last=False) can append item at front. So implementing OrderedDict based on compact dict is not so easy as OrderedDict = dict. My idea is using compact_array (described in PyPy blog) as ring, not fixed capacity array. It means one of these two: a. dict has one additional word and support ring internally. b. OrderedDict reimplements many APIs (iterating, resizing, etc...) to support ring. I like (a) because it's easier to implement and maintain. But I don't know it can be in time for Python 3.6 beta. On Thu, Aug 11, 2016 at 12:06 AM Maciej Fijalkowski <fijall@gmail.com> wrote:

Hi Inada, On 10 August 2016 at 18:52, INADA Naoki <songofacandy@gmail.com> wrote:
a. dict has one additional word and support ring internally. b. OrderedDict reimplements many APIs (iterating, resizing, etc...) to support ring.
There is a solution "c." which might be simpler. Let's think first about what occurs with a normal dict (with your patch, or in PyPy) if the user repeatedly pops the oldest item and adds a new item. In this case, the dict will accumulate dead entries at the start of the list, and when it gets too many of them, it needs to make a complete copy of the list to get rid of them. (This is not a problem as it is amortized over many pop-and-add operations.) So now, when the user calls move_to_end(last=False), we can do the opposite. We look if there are already some deleted entries at the start, and if so, we add the item at the place of the last deleted entry. If there aren't any, then we make a copy of the list that *adds* some number of entries at the start, which are initially marked as deleted... A bientôt, Armin.

On 10/08/2016 17:06, Maciej Fijalkowski wrote:
* there are nice speedups
in this blog post https://morepypy.blogspot.fr/2015/01/faster-more-memory-efficient-and-more.h... it is mentioned big speedup only on microbenchmark and small speedups on pypy benchmark. is it what you call nice speedups or does there is other things ?

On Mon, Aug 15, 2016 at 6:02 AM, Xavier Combelle <xavier.combelle@gmail.com> wrote:
Yes, making dictionaries faster by a bit would not give you huge speedups everywhere. It'll give you a small, measurable speedup a bit everywhere. This is much better than a lot of things that cpython does which is a performance win. Note that there are two PEPs (sorted order in kwargs and sorted order in class names) which would be superseded by just reviewing this patch and merging it. Best regards, Maciej Fijalkowski

On Sun, Aug 28, 2016 at 2:05 PM, Guido van Rossum <guido@python.org> wrote:
Hopefully some core dev(s) can work on this during the core sprint, which is from Sept 5-9.
OK. While I'm in Japan (UTC+9) and cannot join the sprint, I'll be active as possible while the sprint. Thank you!
-- --Guido van Rossum (python.org/~guido)
-- INADA Naoki <songofacandy@gmail.com>
participants (6)
-
Armin Rigo
-
Guido van Rossum
-
INADA Naoki
-
Maciej Fijalkowski
-
Xavier Combelle
-
Yury Selivanov