[Python-3000] Interfaces for views and extended collections

Wed Apr 12 23:44:00 CEST 2006

On 4/11/06, Ian Bicking <ianb at colorstudy.com> wrote:
> So Guido asked for more concrete discussion of things like views.  A
> richer set of collections also fits in here, as for instance dict.keys()
> would be a view with a collection interface not exactly like other
> collections.  I wrote up some notes, which I'll pass on.
>
> So, I can imagine 3 attributes which seem orthogonal to me:
>
> a) Is it ordered?
> b) Can it contain multiple equal items?
> c) Is it associative?
>
> In addition there is mutability, though anything that can be mutable
> could also have an immutable form.  In general the only difference I see
> is the removal of methods, and the collection becomes hashable.
>
> There's 8 combinations.
>
> Associative collections:
>
>    dict: no multiple items, unordered (missing an immutable form)
>    multidict: multiple items, unordered
>    ordered multidict: multiple items, ordered
>    ordered dict: no multiple items, ordered (useful?)
>
> Non-associative:
>
>    Set/ImmutableSet: no multiple items, unordered
>    bag: multiple items, unordered
>    ordered set: no multiple items, ordered (useful? -- priority queue?)
>    list/tuple: multiple items, ordered
>
>
> So, ordered sets and ordered dicts seem questionable.  Also, it is
> unclear to me if there's a compelling reason for both ordered multidict
> and multidict, though the performance is likely to be different for the
> two (underneath multidict likely being a dictionary of sets, ordered
> multidict being a list of tuples).
>
> So, at least, we are missing bags, (ordered) multidicts.  In terms of
> views, .keys() is a set, multidict.keys() is a bag, and
> orderedmultidict.keys() is a list (or tuple, or immutable list-like object).
>

Making sure views work with bags and multidicts is good, but order, I
think, does not fall within this scope.  If people want them ordered
they can get the iterator and pass it to sorted().

> There's some other attributes.  For instance, we have defaultdict.
> Another option that could be applied to almost any of these is a key
> function.  E.g., KeyedDict(key=lambda x: x.lower()) would be a
> case-insensitive dictionary.  Because views introduce an relationship
> between these collections, if we add a keyed dictionary we should also
> have a keyed set (returned by KeyedDict.keys()).  If we build this
> directly into the collection, that's easy, if not then we double the
> number of collections to be implemented; which is perhaps fine.
>

If we make them mutable, then yes, having a way to automatically
transform all passed-in values would be handy.

> OK... so now we're this far; what will the interfaces be for these
> objects?  What follows is a long list of, I think, all the major methods
> and operators collections define.  This is mostly just source material,
> there's no conclusions or proposals.  I hope this will be useful to
> consider what the interfaces might look like, or to compare proposed
> interfaces against.
>
>
>
> Methods that don't imply mutability
> ===================================
>
> Combine two collections: +
> Currently only applies to lists and tuples.  Lists cannot be added to
> tuples.  This only applies to objects with order, that can contain
> multiple items.
>
> Combine two collections: |
> For sets, produces the union.  Applies to objects without order and
> cannot contain multiple items.  Should dicts implement this?  Which side
> would take precedence?
>
> Combine two collections: &
> For sets, produces the intersection.  Should dicts implement this?
>
> Combine two collections: ^
> For sets, produces the symmetric difference.  Dicts could implement
> this; neither side would have to take precedence.
>
> Difference: -
> For sets, a-b produces a new set with items from b removed.  Could apply
> to bags.  Maybe also to dicts?
>
> Repeat collection: *int
> For lists and tuples, repeats the contents.  Could apply to ordered
> multidicts.  Maybe also to bags and multidicts?
>
> Get item at index: [index]
> Returns the item at that index.  Doesn't apply to dicts, even ordered
> dicts, because it conflicts with another meaning of __getitem__.
> Doesn't apply to unordered collections
>
> Get item for association: [key]
> Returns the value for that key.  Applies to all dictionaries.  Do
> multidicts return all values or the first value?  Or the last value?
> Dictionaries essentially return the *last* value (where all previous
> values were disposed of).  An additional method is needed for the other
> option.  If multiple values, in what kind of container are those values?
>   Mutable?  A view (yuck)?
>

I would say multidicts return all values.

> Get an item, with not-found handling: .get()
> Returns the value for that key.  For multidicts, does it return None or
> an empty collection if nothing is found?  For whatever method added in
> addition to __getitem__, is there an equivalent get*() form?
>
> Count items: .count(item)
> This works on lists.  For some reason not on tuples.  Should work on
> bags.  Would this apply to multidicts?  Is item a key or (key, value) in
> that case?  Would there be a way to count values?  Value counting can
> apply to all associations.
>
> Get index: .index(item)
> This works on lists.  Also for ordered dicts?  Would item be key, or
> (key, value)?
>
> Count items: len(collection)
> This works on everything.  For multidicts, does it count values or keys?
>   For bags does it count unique items or all items?
>

Multidicts: keys
Bags: unique items.

> Test if an item is in collection: item in collection
> Works for keys and items.  No way to tell if a value is in a dictionary,
> or a (key, value).
>
> Test if a collection is a superset: .issuperset(collection)
> For sets; could apply to bags.  Could be defined in terms of
> __contains__ and __iter__.  Could be a function.  .issubset(collection)
> is related, of course.
>
> Get keys: .keys() (.iterkeys())
> Returns all the keys in a dictionary.  In py3k as a view.  .iterkeys()
> goes away?  For multidicts, do the keys show up multiple times?  For
> ordered multidicts, this seems necessary.
>
> Get values: .values() (.itervalues())
> Returns all values.
>
> Get items: .items() (.iteritems())
> Returns all (key, value).  Or (key, values) for multidict?
>
> Mutable methods
> ===============
>
> Delete index: del col[index]
> For lists only.
>
> Delete key: del col[key]
> For dicts.  For multidicts, does this delete the first value or all values?
>
> Remove first matching: .remove(item)
> For lists, sets, bags.  Could be used as a single-value-delete-only
> version of __delitem__ for multidicts.
>
> Discard: .discard(item)
> For sets, bags.  Removes, but doesn't signal error if missing.  Could be
> implement anywhere .remove() is implemented.
>
> Pop index: .pop(index=-1)
> Removes item at index, and returns that.
>
> Pop key: .pop(key)
> Removes an value at key, and returns.  For multidicts, removes just the
> first value?  No default key.
>
> Pop key/value: .popitem()
> Removes an arbitrary key/value and returns.  For multidicts, returns
> values one at a time?
>
>
> Set item/index: col[index_or_key] = value
> Sets the value, overwriting previous value (if any).  Cannot extend
> lists.  For multidicts, does this append a new value, or overwrite the
> value?
>
> Append: .append(item)
> For lists.  Would this apply to ordered dicts?  .append(key, value)?
>
> Extend from sequence: .extend(seq)
> Defined in terms of .append().
>
> Extend without sequence: .update(mapping)
> Defined for dict and set, should be for bag.  Defined in terms of
> __setitem__; has special behavior for sequences of tuples, vs. dict-like
> objects.
>
> Add item: .add()
> Applies to sets, bags.  Could apply to non-ordered multidicts; .add(key,
> value)?
>
> Rotate: .rotate()
> Currently implemented only in queues.  Could apply to any ordered
> collection.
>
> Copy: .copy()
> Applies to dicts, sets, bags.  Missing in lists.  Should all mutable
> collections have this?
>
> Clear: .clear()
> Removes all items.  Not implemented for lists for some reason.
>
> In-place versions of &|^-: .union(other), x.intersection(other),
> x.difference(other), x.symmetric_difference(other)
> Defined in terms of the operators (or vice versa).
>
> In-place sort: .sort(cmp=None, key=None, reverse=None)
> Does a sort.  Could apply to any ordered collection.  Sort on (key,
> value) or just key?
>
> Reverse: .reverse()
> Reverses; applies to all ordered collections.
>
>
>
> Missing
> =======
>
> There's no method for getting the key or keys for a value (similar to
> list.index).
>
> There's no way to invert a dictionary (change key->value to value->key).
>   With more types of dictionaries proper collections will exist to
> represent the inverse.  E.g., multidict is the inverse of dict.  This
> maybe is an argument for the full range of dictionaries.
>
> The inverted form could also be a view.  somedict.invert()['foo'] would
> return all the keys which had the value of 'foo'.  This is a clever way
> of adding lots of new concise lookups by leveraging existing methods,
> but I suspect very hard to read in practice.
>

For immutable views, the basics I think that are needed are __len__(),
__contains__(),  and __iter__().  I would normally say __getitem__(),
but that doesn't really work for sets unless we just say they always
return None and for lists it would need to return indexes to make
sense in terms of what the iterator returns.  We can add more methods
if we can agree on the bar minimum first.

I am ignoring mutable views for now to keep the discussion simple.

I think that should provide enough information to query about the
variaous data structures (sequences and collections) and what they
contain.  The iterator can be used to access the data or the original
data structure itself.

-Brett