[Python-3000] Iterators for dict keys, values, and items == annoying :)

Thu Mar 30 19:44:06 CEST 2006

On Wed, 2006-03-29 at 11:08 -0800, Brett Cannon wrote:
> On 3/29/06, Adam DePrince <adam.deprince at gmail.com> wrote:
> >
> > > set interface where we could have a __container__/__view__/__set__
> >
> > Why would I call a method to get a view on an object when the object can
> > just as well implement the view?  The *only* time we want to call a
> > method to get a view is when there is not one, single, completing
> > definition of the object's canonical perspective should be.
> >
> 
> That's my point.  What are we gaining by trying to augment iterators
> or come up with some specified way to know if something contains
> something else when we can get it off of the object itself.  Although,
> as Nick pointed out, dicts have multiple views of their data.

All views are the object itself, just different
perspectives.  .iter,.values would just return a parasitic object to the
dict that contains a different set of methods to support that
perspective.  

> But if you just want to know if a key or value exists, then you can
> either use a dict's __contains__ for keys or create a set to use for
> values.  That's what a view will end up having to do anyway underneath

There are three problems with your proposal.  

1. The semantics of the problem and your solution are different
2. You ignore the issue of mutability amoung the items
3. The performance in the simple case is reduced with no gain in the
larger case

1.

d = dict( .... 
'x' in set( d.items() )

This creates a snap shot of your items; the snapshot of your items is
then examined for membership.

'x' in d.items

This directly examines the data store.  

The difference is one of concurrency.  

2. Mutability.  

Items in dict.items can be mutable.  set fails for that.  

3. Performance.

If you have even one mutable item, then a linear scan is the best you
can do.  But a scan will always be faster than a copy and scan.  

If you have all unique, immutable items, then future dict
implementations are in a good position to do this optimization for you.
Some sort of heuristic, such as revere indexing the table after
lg(__len__) calls to values.__contains__, or maintaining a reverse index
once beyond a certain size until you start inserting immutable or
duplicate items might be other possiabilities.

But really, if set operations on .values are what you really want on a
regular basis, then the dict is the wrong data structure for you
anyway.  

> the covers, so I don't know if we are going to get much benefit from
> going through this beyond just saying dict.value_view() returns a set
> of the values in a dict.  Don't know if we really need to go through
> all of this formality.
> 
> > List has a single obvious view.  If you really want to see a list as a
> > setview, just pretend it doesn't implement OrderedView and
> > CollectionView.  Down-casting by neglect works fine.
> >
> > Dict doesn't, there are 4 possiable views.  The mapping view provided
> > directly by the dict, and keys/values/items.
> 
> Four?  In terms of "atomic" data views, there are keys and values. 

keys
values
items 

> Items could be counted, but that is just a way to pair the different
> types of data in a dict together so I don't know if I would count it
> as a view, let alone whether it would be useful outside of an
> iterator.  But counting the dict itself is just counting the key view
> twice since you can get all of that data directly from the dict
> without needing a view as you pointed out above.

There are 3 views, the items view is a valid perspective that is used
enough to consider special treatment.  

I separate the dict and dict.key uses for the same reason we do now ...
we had keys, we added iterkeys, now we have dict.__iter__ and nobody
wants to take the brave step of breaking old code by taking away
dict.keys()  In the collective minds of the python development community
there are 4, even if there are only 3.  

- Adam