Re the benchmarks:

yes, they're micro benchmarks, the intent was to show that the performance can be non impacting

no, that doesn't invalidate them (just scopes their usefulness, my sales pitch at the end was slightly over-egging things but reasonable, imo),

yes I ignored direct quadratic behaviour in indexing, as I would never propose that as a goal

adding in iter() based comparisons would be interesting, however it doesn't invalidate the list() option, as this is very often used as a solution to the problem.

It's true, benchmarks that don't match your incentives and opinions always lie.

As for use-cases, I'll admit that I see this as a fairly minor quality-of-life issue. Finding use-cases is a bit tricky, as the fact that dictionaries have defined order is a recent feature, and I know I am (and I'm sure many other people) are still adapting to take advantage of this new functionality. There's also the fact that in python < 3, the results of dict.keys(), values() and items() was a Sequence, so the impact of this change may still be being felt (yes, even decades later, the majority of the python I've written to deal with 'messy' data involving lots of dictionaries has been in python 2).

However I've put together a set of cases that I personally would like to work as they appear to (Several of these are paraphrases of production code I've worked with):

--
>>> import random
>>> random.choice({'a': 1, 'b': 2}.keys())
'a'
--
>>> import numpy as np
>>> mapping_table = np.array(BIG_LOOKUP_DICT.items())
[[1, 99],
[2, 23],
...
]
--
>>> import sqlite3
>>> conn = sqlite3.connect(":memory:")
>>> params = {'a': 1, 'b': 2}
>>> placeholders = ', '.join(f':{p}' for p in params)
>>> statement = f"select {placeholders}"
>>> print(f"Running: {statement}")
Running: select :a, :b
>>> cur=conn.execute(statement, params.values())
>>> cur.fetchall()
[(1, 2)]
--
# This currently works, but is deprecated in 3.9
>>> import random
>>> dict(random.sample({'a': 1, 'b': 2}.items(), 2))
{'b': 2, 'a': 1}
--
>>> def min_max_keys(d):
>>>     min_key, min_val = d.items()[0]
>>>     max_key, max_val = min_key, min_val
>>>     for key, value in d.items():
>>>         if value < min_val:
>>>             min_key = key
>>>             min_val = value
>>>         if value > max_val:
>>>             max_key = key
>>>             max_val = value
>>>     return min_key, max_key

>>> min_max_keys({'a': 1, 'b': 2, 'c': -9999})
>>> min_max_keys({'a': 'x', 'b': 'y', 'c': 'z'})
--
>>> import os
>>> users = {'cups': 209, 'service': 991}
>>> os.setgroups(users.values())
---

Obviously, python is a general-purpose, turing complete language, so each of these options can be written in other ways. But it would be nice if the simple, readable versions also worked :D

The idea that there are future, unspecified changes to dicts() that may or may not be hampered by allowing indexing sounds like FUD to me, unless there are concrete references?

Steve

On Thu, Jul 9, 2020 at 3:04 PM Inada Naoki <songofacandy@gmail.com> wrote:

On Thu, Jul 9, 2020 at 12:45 PM Christopher Barker <pythonchb@gmail.com> wrote:
>
> On Wed, Jul 8, 2020 at 7:13 PM Inada Naoki <songofacandy@gmail.com> wrote:
>>
>> I think this comparison is unfair.
>
> well, benchmarks always lie ....
>
>> > d.items()[0] vs list(d.items())[0]
>>
>> Should be compared with `next(iter(d.items())`
>
> why? the entire point of this idea is to have indexing syntax -- we can already use the iteration protocol top do this. Not that it's a bad idea to time that too, but since under the hood it's doing the same or less work, I'm not sure what the point is.
>

Because this code solves "take the first item in the dict".

If you need to benchmark index access, you should compare your
dict.items()[0] and list index.
You shouldn't create list from d.items8) every loop.

>> > d.keys()[-1] vs list(d.keys())[-1]
>>
>> Should be compared with `next(reversed(d.keys()))`, or `next(reversed(d))`.
>
>
> Same point - the idea is to have indexing syntax. Though yes, it would be good to see how it compares. But I know predicting performance is usually wrong, but this is going to require a full traversal of the underlying keys in either case.
>

Same here. And note that dict and dict views now supports reversed().

>>
>> > random.choice(d.items()) vs random.choice(list(d.items()))
>>
>> Should be compared with `random.choice(items_list)` with `items_list =
>> list(d.items())` setup too.
>
> I don't follow this one -- could you explain? what is items_list ?

I explained `item_list = list(d.items())`. Do it in setup (e.g. before loop.)
("setup" is term used by timeit module.)

>
> But what this didn't check is how bad the performance could be for what I expect would be a bad performance case -- indexing teh keys repeatedly:
>
> for i in lots_of_indexes:
> a_dict.keys[i]
>
> vs:
>
> keys_list = list(a_dict.keys)
> for it in lots_of_indexes:
> keys_list[i]
>

You should do this.

> I suspect it wouldn't take all that many indexes for making a list a better option.
>

If you need to index access many times, creating list is the recommended way.
You shouldn't ignore it. That's why I said it's an unfair comparison.
You should compare "current recommended way" vs "propsed way".

> But again, we are badk to use cases. As Stephen pointed out no one has produced an actualy production code use case.

I agree.

Regards,

--
Inada Naoki <songofacandy@gmail.com>