[Python-3000] Iterators for dict keys, values, and items == annoying :)

Wed Mar 29 18:10:53 CEST 2006

On Wed, 2006-03-29 at 21:15 +1000, Nick Coghlan wrote:
> Paul Moore wrote:
> > On 3/29/06, Brett Cannon <brett at python.org> wrote:
> >> Without a direct reason in terms of the language needing a
> >> standardization of an interface, perhaps we just don't need views.  If
> >> people want their iterator to have a __len__ method, then fine, they
> >> can add it without breaking anything, just realize it isn't part of
> >> the iterator protocol and thus may limit what objects a function can
> >> accept, but that is there choice.
> > 
> > Good point. I think we need to start from strong use cases. With
> > these, I agree that the view concept is a good implementation
> > technique to consider. But let's not implement views just for the sake
> > of having them - I'm pretty sure that was never Guido's intention.
> 
> There are three big use cases:
> 
>    dict.keys
>    dict.values
>    dict.items

There is more than that.  Everybody who accesses a database has to jump
and down to extract their fields.  Wouldn't it be nice if you could say
to your result set from a database: 

>>> rs.execute( "select upc, description, price from my_table" )
>>> data = rs.fetch().fieldby( 'price','upc')
>>> print type( data )
<MultiViewMapping>

Or a tree implementation of a dictionary. 

>>> type( tree_dict.keys() )
<OrderedSet>

The idea that is there is so much more we can do if we had some
mechanism of identifying at a higher level the semantics of the data
structure.  While dict is pretty much it for core python, there are a
lot of data stores in the wild, and the View's would give us the ability
for better interaction and abstraction than passing around lists or
their performance modified twin iter.

Consider for instance if you had to dictionaries, both of which are so
large you don't want to work on copies of their keys.   You want to know
which items are in only the first ... 

dicta.keys() - dictb.keys()

Because each supports the SetView interface, we need only provide a
single generic SetView.difference operator and move on.  This prevents
the ungainly conversion to sets first which, while easy to write, is
slow, especially considering how well dict's implement sets in the first
place.

Cheers - Adam DePrince

> To give these views the benefits of having a real list, the following is all 
> that's really needed:
> 
>    1. implement __len__ (allows bool() and len() to work)
>        - all delegate to dict.__len__
> 
>    2. implement __contains__ (allows containment tests to work)
>        - delegate to dict.__contains__ for dict.keys()
>        - use (or fallback to) linear search for dict.values()
>        - check "dict[item[0]] == item[1]" for dict.items()
> 
>    3. implement __iter__ (allows iteration to work)
>        - make iter(dict.keys()) equivalent to current dict.iterkeys()
>        - make iter(dict.values()) equivalent to current dict.itervalues()
>        - make iter(dict.items()) equivalent to current dict.iteritems()
> 
> For an immutable view, that's all you need. IOW, take the iterable protocol 

Mutability isn't really a problem for Views, unlike iters, views don't
store state, they are just wrappers.  Now for a view created iter, yeah,
the normal iter mutation problems still exist.   

Views do partly solve the iter mutability problem by allowing many
operations of an iteration that would otherwise take place within.
Consider this:

unwanted_words = set( ... 

index = { .... 

for k in index.keys():
	if k in unwanted_words:
		del( index[ k ] )

But with a view, we could say:

index.keys() -= unwanted_words

Basically, my understanding of the the idea behind a view is eliminate
the need for a mutation compatible iterator by reducing the pressure and
demand for one to a level acceptable for something ignored.  

> (an __iter__ that returns a new iterator when invoked) and add __len__ and 
> __contains__ to get a "container" protocol. Given that containment falls back 
> on __iter__ anyway, __len__ is the only essential addition to turn an iterable 
> into a container.
> 
> Note that adding __len__ to an *iterator* does NOT give you something that 
> would satisfy such a container protocol - invoking __iter__ again does not 
> give you a fresh iterator, so you can't easily iterate repeatedly.
> 
> With reiterability as a defining characteristic, other niceties become 
> possible (potentially available as a mixin):
> 
>    1. a generic container __str__ (not __repr__!) implementation:
> 
>        def __str__(self):
>            # keep default __repr__ since eval(repr(x)) won't round trip
>            name = self.__name__
>            guts = ", ".join(repr(x) for x in self)
>            return "%s([%s])" % guts
> 
>    2. generic container value based equality testing:
>        def __eq__(self, other):
>            if len(self) != len(other):
>                return False
>            for this, that in izip(self, other):
>                if this != that:
>                    return False
>            return True
> 
> Further refinement of such a container protocol to the minimal requirements 
> for a sequence protocol is already defined by such things as the requirements 
> of the reversed() builtin:
> 
>    for i, x in enumerate(seq):
>       assert seq[i] == x
> 
> Cheers,