[Python-3000] Iterators for dict keys, values, and items == annoying :)

Nick Coghlan ncoghlan at gmail.com
Thu Mar 30 11:46:26 CEST 2006


Adam DePrince wrote:
> On Wed, 2006-03-29 at 21:15 +1000, Nick Coghlan wrote:
>> Paul Moore wrote:
>>> On 3/29/06, Brett Cannon <brett at python.org> wrote:
>>>> Without a direct reason in terms of the language needing a
>>>> standardization of an interface, perhaps we just don't need views.  If
>>>> people want their iterator to have a __len__ method, then fine, they
>>>> can add it without breaking anything, just realize it isn't part of
>>>> the iterator protocol and thus may limit what objects a function can
>>>> accept, but that is there choice.
>>> Good point. I think we need to start from strong use cases. With
>>> these, I agree that the view concept is a good implementation
>>> technique to consider. But let's not implement views just for the sake
>>> of having them - I'm pretty sure that was never Guido's intention.
>> There are three big use cases:
>>
>>    dict.keys
>>    dict.values
>>    dict.items
> 
> There is more than that.

But those are the three where we *need* to do something for Py3k. We want to 
get rid of the copying that exists in Py 2.x, but get a result that is as easy 
to work with as a real set or list. We don't really have a convention for 
doing that at this point.

The other characteristic of these three is that they can be easily generalised 
to a simple protocol (add __len__ to a reiterable iterable to get something 
that implements the container protocol).

>  Everybody who accesses a database has to jump
> and down to extract their fields.  Wouldn't it be nice if you could say
> to your result set from a database: 
> 
>>>> rs.execute( "select upc, description, price from my_table" )
>>>> data = rs.fetch().fieldby( 'price','upc')
>>>> print type( data )
> <MultiViewMapping>

Think new protocols, not new types. This specific example is the old 
named-tuple problem. While expressing that as a wrapper type that works on an 
arbitrary underlying sequence is an interesting (and, IMO, good) idea, it 
doesn't require a new protocol, just a convenient type.

> Or a tree implementation of a dictionary. 
> 
>>>> type( tree_dict.keys() )
> <OrderedSet>
> 
> The idea that is there is so much more we can do if we had some
> mechanism of identifying at a higher level the semantics of the data
> structure.  While dict is pretty much it for core python, there are a
> lot of data stores in the wild, and the View's would give us the ability
> for better interaction and abstraction than passing around lists or
> their performance modified twin iter.

But we can get most of that benefit just by defining a container protocol, and 
implementing a few container views for things that currently return lists or 
iterators. We don't yet have any concrete use cases to justify going beyond 
the simple interface needed to make dict.keys(), .values() and .items() both 
memory efficient and easy to use.

> Consider for instance if you had to dictionaries, both of which are so
> large you don't want to work on copies of their keys.   You want to know
> which items are in only the first ... 
> 
> dicta.keys() - dictb.keys()
> 
> Because each supports the SetView interface, we need only provide a
> single generic SetView.difference operator and move on.  This prevents
> the ungainly conversion to sets first which, while easy to write, is
> slow, especially considering how well dict's implement sets in the first
> place.

But there's already an easy way to write that diff so it works based on the 
iterator and container protocols:

   def only_in_first(first, second):
       """Returns a set containing only items in the first iterable

           first can be any iterable
           second can be any container
             (preferably a container with O(1) item lookup such as a set)
       """
       first_only = set()
       for key in first:
           if key not in second:
               first_only.add(key)
       return first_only

Memory efficient creation of new collections almost always involves starting 
with an empty container and building it incrementally.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org


More information about the Python-3000 mailing list