To begin with, please compare performance of dict-with-None-values to that of a set, for operations that can be expressed by both (e.g. both have update()).
Good idea. It would be useful to have a baseline comparison. I emboldened the comparisons with a notable difference. Tested on: Version: Python 3.8.0 OS: Linux kernel 5.4.5 CPU: Intel i5-4460 Initialization (about the same):
timeit.timeit("set(i for i in range(1000))", number=100_000) 4.878508095000143 timeit.timeit("{i: None for i in range(1000)}", number=100_000) 4.9192055170024105
Membership (about the same):
timeit.timeit("random.randint(0, 999) in s", setup="import random; s = set(i for i in range(1000))", number=10_000_000) 7.992674231001729 timeit.timeit("random.randint(0, 999) in d", setup="import random; d = {i: None for i in range(1000)}", number=10_000_000) 8.032404395999038
*Add* (much faster for dicts):
timeit.timeit("s = set(); s.add(0)", number=100_000_000) 13.330938750001224 timeit.timeit("d = {}; d[0] = None", number=100_000_000) 5.788865385999088
*Update* (much faster for updating sets):
timeit.timeit("s.update(other_s)", setup="s = set(); other_s = set(i for i in range(1000))", number=1_000_000) 6.2428832090008655 timeit.timeit("d.update(other_d)", setup="d = {}; other_d = {i: None for i in range(1000)}", number=1_000_000) 13.031371458000649
*Create list from keys* (faster for sets):
timeit.timeit("list(s)", setup="s = set(i for i in range(1000))", number=10_00_000) 5.24364303600305 timeit.timeit("list(d)", setup="d = {i: None for i in range(1000)}", number=10_00_000) 6.303017043999716
Removal isn't easily comparable, since s.pop(), s.discard(), d.pop(), and d.popitem() all behave quite differently. Also, I'm sure these benchmarks could be improved significantly, particularly with the "Add" ones. This should provide a decent general idea though. On Mon, Dec 23, 2019 at 8:12 PM Guido van Rossum <guido@python.org> wrote:
To begin with, please compare performance of dict-with-None-values to that of a set, for operations that can be expressed by both (e.g. both have update()).
On Mon, Dec 23, 2019 at 18:08 Kyle Stanley <aeros167@gmail.com> wrote:
Larry Hastings wrote:
a hypothetical collections.OrderedSet would probably work just fine.
I'm also in agreement with adding a collections.OrderedSet. There certainly seems to be a practical use case for a Set that preserves insertion order, but with significant push back to modifying the existing Set implementation (primarily due to performance concerns).
If later down the road an improvement to OrderedSet is made that gives it equal or better average performance across the board compared to Set, we can consider changing the default implementation *if* there's significant demand for it.
And he'd probably use it too, as that makes the code clearer / easier to read. If you read code using an OrderedSet, you know what the intent was and what the semantics are. If you see code using a dict but setting
Larry Hastings wrote: the values to None, you think "that's crazy" and now you have a puzzle to solve. Let's hope those comments are accurate!
This is an excellent point. My first thought if someone were to be using a Dict with all values set to None would be: why not use a Set here? As Larry said, the need for preservation of insertion order would have to be explicitly described in the code comments. Realistically speaking, code comments are not something that can be consistently relied upon, especially if it's part of some boilerplate format that gets replicated or if the container is used in multiple locations. I'd much rather have a dedicated data structure that clearly describes what it does based on the name alone, IMO that's a million times better for readability purposes.
Also, this is mostly speculation since I haven't ran any benchmarks for an OrderedSet implementation, but I strongly suspect that OrderedSet will end up having better average performance for add, pop, and update than trying to use a dictionary with None values (assuming it's implemented in C or at least with a C accelerator).
Not to mention allowing the ability to update (for both addition and removal) based on intersections and unions across ordered sets, which currently isn't possible to do with dictionaries (at least not directly with |=, &=, -=. and ^=). I'm not sure how much of a practical use case this would have for ordered sets specifically, but it's a nice bonus.
On Sat, Dec 21, 2019 at 7:59 AM Larry Hastings <larry@hastings.org> wrote:
(mixing together messages in the thread, sorry threaded-view readers)
On 12/19/19 3:15 PM, Tim Peters wrote:
[Nick]
I took Larry's request a slightly different way:
[...]
Only Larry can answer whether that would meet his perceived need. My _guess_ is that he wouldn't know OrderedSet existed, and, even if he did, he'd use a dict with None values anyway (because it's less hassle and does everything he wanted).
At last, something I can speak knowledgeably about: Larry's use case! Regarding Larry, I'd say
1. his use case was small enough that almost anything maintaining order would have worked, even a list, 2. he often *does have* a pretty good idea what goodies are salted away in the Python standard library, and 3. a hypothetical collections.OrderedSet would probably work just fine. And he'd probably use it too, as that makes the code clearer / easier to read. If you read code using an OrderedSet, you know what the intent was and what the semantics are. If you see code using a dict but setting the values to None, you think "that's crazy" and now you have a puzzle to solve. Let's hope those comments are accurate!
Also, speaking personally, at least once (maybe twice?) in this thread folks have asked "would YOU, Mr Larry, really want ordered sets if it meant sets were slightly slower?"
The first thing I'd say is, I'm not sure why people should care about what's best for me. That's sweet of you! But you really shouldn't.
The second thing is, my needs are modest, so the speed hit we're talking about would likely be statistically insignificant, for me.
And the third thing is, I don't really put the set() API through much of a workout anyway. I build sets, I add and remove items, I iterate over them, I do the occasional union, and only very rarely do I use anything else. Even then, when I write code where I reach for a fancy operation like intersection or symmetric_difference--what tends to happen is, I have a rethink and a refactor, and after I simplify my code I don't need the fancy operations anymore. I can't remember the last time I used one of these fancy operations where the code really stuck around for a long time.
So, personally? Sure, I'd take that tradeoff. As already established, I like that dict objects maintain their order, and I think it'd be swell if sets maintained their order too. I suspect the performance hit wouldn't affect *me* in any meaningful way, and I could benefit from the order-maintaining semantics. I bet it'd have other benefits too, like making regression tests more stable. And my (admittedly uninformed!) assumption is, the loss of performance would mostly be in these sophisticated operations that I never seem to use anyway.
But I don't mistake my needs for the needs of the Python community at large. I'd be mighty interested to read other folks coming forward and saying, for example, "please don't make set objects any slower!" and talking us through neat real-world use cases. Bonus points if you include mind-boggling numbers and benchmarks!
On 12/20/19 5:09 AM, Peter Wang wrote:
As Larry pointed out earlier, ordered dicts posed a problem for MicroPython.
Just a minor correction: that wasn't me, that was Petr Viktorin.
Ten days left until we retire 2.7,
*/arry* _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/JYSJ55CW... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/2KEAXZBQ... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido (mobile)