An ordered dictionary for the Python library?

Mark Summerfield m.n.summerfield at googlemail.com
Fri Sep 14 09:12:48 CEST 2007


On 14 Sep, 02:35, Jürgen Urner <jUr... at arcor.de> wrote:
> Puh, what a discussion... most common use case I can think of is
>
> >> d = {'a': 1, 'b': 2, 'c': 3}
> >> for key in d:
> >>     # do something that relies on order of keys as specified in the constructor
>
> It's a bit tireing having to type
>
> >> for key in sorted(d.keys()):
> >>     do_somethig_with(d[key])
>
> instead of a trustfully
>
> >> for value in d.values():
> >>     do_somethig_with(value)
>
> As far as I can see much cleaner. No black magic needed ++ sort the
> dict
> to another order and rely on the sort order being stable would be a
> really
> a nice thing to have.
>
> My 2 cents,  Jürgen

What I envisage is:

d = ordereddict(a=1, x=20, b=35, m=4)
# some time later
d["e"] = 15
# later still
d["b"] = 70
d.keys() # returns ['a', 'b', 'e', 'm', 'x']
d.values() # returns [1, 70, 15, 4, 20]

So no matter _when_ you add because the underlying data structure is
ordered (by key) you can always get access in key order. Also, you
can't "sort" the ordereddict; it is in key order and that's it. The
whole point is that you can use these things and never have to sort;
if you need different orders you create extra ordereddicts with
suitably munged keys and with values that are object references to the
objects you are interested in.

In reply to James Stroud:

- The reason you can't assign a slice is that the index positions
depend on the keys, so if you do:
    od[newkey] = newvalue
where newkey goes (i.e., its index position) depends on how it orders
in relation to the rest, so it could go "anywhere".
I now feel that offering a slice is wrong because the [] syntax is
used for access by key, whereas a slice would be for access by index,
so using [] for both would be v. confusing.

- James proposed better method names which I've adopted (see later),
but I don't think set_value() should be set() because that would make
it seem to be the partner of get(), whereas get() uses a key and
set_value() uses an index for lookup.

- James also suggested a name and behaviour change:

    keys(fromindex : int = None, uptobutexcludingindex : int = None) -
> ordered list of keys

So keys() called on its own behaves just like dict.keys() (except that
all the keys are returned in order), but with one argument returns the
keys with index positions from the given index to the end, and with
two arguments returns the keys in the given range of indexes. (Note:
in an ordereddict values() returns all the values in key order.)

So from the earlier example:

d.key(2) # returns 'e'
d.item(3) # returns ('m', 4)
d.value(4) # returns 20
d.set_value(3, 99) # maybe returns 'm'; but anyway d.value(3) and
d['m'] now both return 99

- James (and some others) also felt that insertions should just go as
key/value pairs at the end. But that is not what I'm proposing (that's
a completely different data structure).

The whole point of ordereddict is that whatever you insert and
whenever you insert it, the ordereddict is always in key order.

Paul Rubin and at least one other person mentioned the use of an AVL
tree. At the moment I just want to see if I can get enough consensus
on an API and behaviour so that I could put in a PEP for just one
ordered data structure.

So to clarify, here's the entire API I'm proposing for ordereddict. In
all cases the ordereddict is always in (and kept in) key order, and
any method that returns a list or iterator always returns that list or
iterator (whether of keys or values) in key order:

ordereddict(dictionary=None)
    The argument can be a dict or another ordereddict; all the dict()
initializer
    approaches should also be supported.

ordereddict.update(dictonary=None, **kwargs)
    Same as dict.update()---except that key order is preserved (a
point I won't
    repeat in the others when I say "same as dict", but which is true
throughout)

@classmethod
ordereddict.fromkeys(cls, iterable, value=None) # Same as dict

ordereddict.key(index : int) -> key
    Returns the index-th item's key

ordereddict.item(index : int) -> (key, value)
    Returns the index-th item

ordereddict.value(index : int) -> value
    Returns the index-th item's value

ordereddict.set_value(index : int, value)
    Sets the index-th item's value to value; raises IndexError if
index is out of
    range. If not expensive, maybe return the key.

ordereddict.copy() # Same as dict.
ordereddict.clear() # Same as dict.
ordereddict.get(key, value=None) # Same as dict
ordereddict.setdefault(key, value) # Same as dict
ordereddict.pop(key, value) # Same as dict
ordereddict.popitem() # Same as dict

ordereddict.keys(fromindex : int = None, uptoindex : int : None) ->
list of keys
    Returns an ordered list of keys, or a slice of keys if one or two
indexes is given

ordereddict.values() # Same as dict
ordereddict.items() # Same as dict
ordereddict.iterkeys() # Same as dict
ordereddict.itervalues() # Same as dict
ordereddict.iteritems() # Same as dict
ordereddict.has_key() # Same as dict

Also the same as dict (and as always, working in key order):

for key in d: pass
if key in d: pass
len(d)
del d[key]
d[key]
d[key] = value

What this means is that users could drop in an ordereddict in place of
a plain dict and get the same behaviour (maybe not the same
performance!), but would now find that they could rely on the ordering
of keys, as well as having just four additional methods and one
existing method with additional optional arguments.





More information about the Python-list mailing list