I had a nice note almost written yesterday, but now there've been a bunch more discussion, so I'm going to try to hit a few points that have been recently made.

TL;DR: I personally think it would be a nice feature to add indexing to the dict views. But to be fair, the only real use case I've seen is random.choice(), so it's really not very compelling. And it seems the consensus is coming down on the side of no.

However, I find I disagree with many of the "no" arguments, other than: "it's too much churn for little gain", so I'm going to make my points, thinking that maybe that trade-off will be rethought. So if yiure firmly on the side of no, I guess there's no point in reading this, but I do hope it will be considered.

On Fri, Jul 10, 2020 at 12:45 PM David Mertz <mertz@gnosis.cx> wrote:
> The strongest argument I've seen is: `list(d.items())` adds six characters.

That's a misrepresentation: the reason to prefer not to use the list call is twofold:

1) Matching our mental model / usability: if I want the nth item (or a random item) from a dict, I want to ask for that -- I don't want to make a list, just to index it and throw it away. the list(d.items()) idiom is the right one if I actually need a list -- it's a bit awkward to have to make a list, just to throw it away.

2) Performance: making an entire list just to get one item out is a potentially expensive operation. Again, for the limited use cases, probably not a big deal, I'm having a really hard time imagining a application where that would be a bottleneck, but it is *a* reason, if not a compelling one.

> Moreover, even apart from the work of maintaining the feature itself, the attractive nuisance of getting O(N) behavior rather than O(1) seems like a strong anti-feature.

Yes, this is the only anti-feature I've seen described in this thread. But it's only an anti-feature for the use case of making multiple indexing operations from the same dict view, without changes to the dict. It's a feature if you need to make only one (or very few) indexing operations from the same non-mutated dict. After all, that's exactly why we have the dict views in the first place: you don't want to have to make an unnecessary copy if don't need to. That clearly applies to iteration and membership: why not to the "getting one item out" case?

But of course, it is indeed an attractive nuisance in some cases, which is different than the other view use cases: they are the same or more efficient than the old "make a list" approach, whereas this would be more efficient in some cases, and less in others -- so users would need to evaluate the trade offs, and many wouldn't even know they should think about that. Overall though, I think that folks would still need to make a list if they wanted to do any other MutableSequence operations (or be clearly working with a copy), so I don't think there's all that much danger is this feature being accidentally used.

On Fri, Jul 10, 2020, 1:07 PM Stestagg <stestagg@gmail.com> wrote:
I don't mind the shooting down, as long as the arguments make sense :D. 

I agree here for sure: I've no problem with folks having a different opinion about the value of the trade offs, but I think the trade offs have been misrepresented -- hence this post ... (no, I don't think anyone's misrepresenting anything on purpose -- this is about the technical issues)
 
It seems like we're both in agreement that the cost of implementing & maintaining the change is non-zero.

Another note there: one of the big costs is implementation and documentation. But this is Open Source: we can all decide that a feature is a good idea, but it'll never get done unless someone(s) actually decides it is worth it, to them, to write the code and docs. If no one does, then it's not going to happen. So that part of the cost is self limiting. Granted, once written, it needs to be maintained, but that is a lesser cost, at least in this case, where it's not a whole new object or anything.
 
I don't believe that this feature would steepen the language learning curve however, but actually help to shallow it slightly (Explained more below)

I agree here. Granted, it's again, only the one use case, but when my newbie students have to figure out how to get a random key from a dict, there is no question that:

random.choice(the_dict.keys())

is a little easier than:

random.choice(list(the_dict.keys())

and a lot easier than (untested):

idx = random.randint(0, len(the_dict))
it = iter(the_dict.keys())
for _ in range(choice):
    choice = next(it)

getting an arbitrary one is a bit easier:

choice = next(iter(the_dict.keys()))

In practice, I use this as a teaching opportunity -- but the fact that it IS a teaching opportunity kind makes my point.

Granted, if this feature were there, there'd be the need to teach folks about why they want to avoid the attractive nuisance discussed above -- so I'll set a net-zero.
 > >>> import numpy as np
> >>> mapping_table = np.array(BIG_LOOKUP_DICT.items())

one note on numpy: the numpy array() function is very much designed for Sequences: partly due to history, but also for convenience and performance -- it needs to know what the size and data type of the array it is going to create is before it creates it.

And honestly, I'm not sure that array() would work with the dict views anyway if we added indexing -- we'd have to look at the logic inside array()

And numpy has from_iter() for working with iterators.

In short: it would work with numpy is NOT a reason to add this feature :-)

> And I expect that even if dict.items() was indexable, numpy would
still have to copy the items. I don't know how numpy works in detail,
but I doubt that it will be able to use a view of a hash table internals
as a fast array without copying.

of course not -- but it makes a copy of the items in a list too -- so the extra copy for the list is still there.
(numpy works with homogenous lower level data types -- the actual bytes of the C datatype -- so it is always copying the values  when it makes an array out of Python types. (except for the numpy object dtype, but that's a special case)

> What making dict_* types a Sequence will do is make this code (as written) behave:

For my part, I'm not asking for the dict views to be full blown Sequences -- I think that *would* be an attractive nuisance. I'm thinking only adding indexing.

still think of concrete sequences and indexing as fundamental, while
Python 3 has moved in the direction of making the iterator protocol and
iterators as fundamental.

That is indeed a change in Python over the years, but i don't think it was a practicality-driven change: in short: don't make copies you don't need to make. So I don't think we should use "Iterators are fundamental to Python" as a reason to NOT add Sequence-like behavior.

You have a hammer (indexing), so you want views to be nails so you can
hammer them. But views are screws, and need a screwdriver (iter and
next).

But there are, in carpentry, many places where you can use either a screw or a nail, and some of us have even been known to hammer a screw in, even if we had a screwdriver handy, and knew what the heck we were doing. That is the argument here: when the screw can be well used, in a particular case, by  hitting it with a hammer, then why not let me do that. To take the analogy way too far: don't take the hammer out of my toolbox just because there are some screwdrivers in there.

> The existing dictionary memory layout doesn't support direct indexing (without stepping), so this functionality is not being added as a requirement.

But it does make it much more efficient if the stepping is done inside the dict object by code that knows its internal structure. Both because it can be in C, and can be done without any additional references or copying. yes, it's all O(n) but a very different constant.

>The
fact that they can be indexed in reasonable time is not part of the
design, just an accident of implementation, and being an accident, it
could change in the future.

It *could*, but I can't imagine how you could have an efficient order-preserving data structure that could not be indexed reasonably -- in particular, more efficiently than making a full list copy first. And even so -- fine: performance characteristics are not guaranteed anyway.

> If random.choice should support non-sequence ordered container,
just propose it to random.choice.

That would indeed solve the usability issue, and so may be a good idea,

The problem here is that there is no way for random.choice to efficiently work with generic Mappings. This whole discussion started because now that dicts preserve order, there is both a logical reason, and a practical implementation for indexing. But if that is not exposed, then random.choice(), nor any other function, can take advantage of it.

Which would lead to adding a random_choice protocol -- but THAT sure seems like overkill.
(OK, you could have the builtin random.choice check for an actual dict, and then use custom code to make a random selection, but that would really be a micro-optimization!)

> but they can't be Sequences, since they are already Sets. They would
> have to be a hybrid of the two, and that, I feel, comes with more
> baggage than just being one or the other.

I Think this is where I fundamentally disagree, as far as language design and Python philosophy is concerned. I've been using Python for 20+ years (terrifying!) and I have always really like the Duck typing concept. in fact, even one better, it doesn't have to look, walk, and quack like a duck to be a duck -- if I only need it to quack, I don't care how it looks and walks.

Since those pre-2.0 days, Python has grown a lot more "structure" to its typing, notably ABCs and now facilities for static type checking. So far, those *enable* more formal typing, but don't *require* it. But as more folks start to use them, I'm going to have to start writing more strictly typed code if I want to use other libraries -- I"m hoping it won't come to that, but we'll see.

To bring this back to the case at hand:

I haven't looked at the code, but I"m pretty sure that random.choice() does not check for the Sequence ABC: it simply tries to get the length, and then index the object to get a random item. If that works, then it works -- This is proven by passing it a dict with integer indexes in the right range:

In [28]: d
Out[28]: {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9}
In [29]: random.choice(d)
Out[29]: 9

I LIKE this -- so the argument that dict views shouldn't support indexing because they are a Set and can't be a proper Sequence is exactly backwards from how I think Python should work:

If a feature is useful, and doesn't conflict with another feature, then we can add it.

In the end though, while I think there is very little reason NOT to add indexing to dict views, unless someone comes up with a good use case beyond random.choice(), it may not be worth the churn.

-CHB

--
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython