But we're not talking about *dict*, we're talking about dict.items which
returns a set-like object:
py> from collections.abc import Set
py> isinstance({}.items(), Set)
True
This property is nice, as .keys() (for example) can operate like a set and give lots of convenience features. However this doesn't really mean that these objects are trying to *be* sets in any meaning way, dictionaries are now ordered, this is a property of dictionaries, whereas sets are explicitly unordered.
Plus, the following example shows how quickly the set emulation falls-down:
py> set({'a': [1]}.items())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
> for i in range(len(d)):
> do(d.items()[i]) # if dict_items supports index access.
>
> sample 1 is O(n) where n = len(d), but sample 2 is O(n^2).
>
> By not supporting index access, dict_items prevents to write such
> inefficient code.
I don't think that this is a compelling or strong argument. We can write
inefficient code because dicts do *not* support index access:
for i in range(len(d)):
for j,item in enumerate(d.items()):
if j == i:
do(item)
I guess that adding index read access to dicts could be done, perhaps
with a method:
d.getitem(3) # Returns (key, value)
and I don't think that would be expensive. My recollection is that
internally dicts have, or had, an internal array of keys. So it could be
a constant time operation to look up the key by index.
I don't think inventing new methods for emulating __getitem__ is that useful here, there is a well established standard for getting items from a container by index, I think we should use it :).
Here's my understanding of the performance implications of allowing __getitem__ on dictview objects:
There are currently two variant layouts of PyDictObjects, 'traditional' dicts, and split dicts.
Split dicts are stored as (two) compact arrays with an added hashtable, so O(1) lookup is possible.
However, split dicts are *not* really relevant here, because typically dicts in use aren't split() (for example mutating a split dict) returns a traditional' dict.
Traditional dicts are stored as an array based on insertion order (like split dicts), *but* deletions result in the entry being NULL'd out in-place. Therefore index access is O(n) because you have to walk the items in the DK_ENTRIES list counting non-NULL keys. This should be really fast because it's just walking a list of pointers, no dereferences or anything, however it isn't O(1) which does make me slightly hesitant about the proposal.
This is easy enough to solve with an augmented dict. A sketch of a
Implementing this functionality is trivial, but adding __getitem__ avoids these paper-cut like surprising examples, and fixes code that should "just work", for example `random.choice`
But even if we could do these efficiently, who would use them? If you
need this for a table, you probably need to support insertion and
deletion as well, maybe slices, and you probably need both columns and
rows, so I really don't think that dict is the right place for this.
I don't think so, you can see some examples I dug out earlier in the thread (when my posts get moderated), that support this use-case without needing mutation ability or any table structures.
Steve
--
Steven
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/44LD4SI7SVATA52ZBLR4AIUHOQZKCRAR/
Code of Conduct: http://python.org/psf/codeofconduct/