Why are there no ordered dictionaries?
Anton Vredegoor
anton.vredegoor at gmail.com
Tue Nov 22 11:46:14 EST 2005
Christoph Zwerschke wrote:
> But of course, it will always be slower since it is constructed on top
> of the built-in dict. In end effect, you always have to maintain a
> sequence *plus* a dictionary, which will be always slower than a sheer
> dictionary. The ordered dictionary class just hides this uglyness of
> having to maintain a dictionary plus a sequence, so it's rather an issue
> of convenience in writing and reading programs than a performance issue.
>
> It may be different if the ordered dict would be implemented directly as
> an ordered hash table in C.
The problem with hashing is that one is storing data from a possibly
wildly varying range in a set of values with a limited range. That's
where the ordering problems come from in the first place. If one wants
to solve it once and for all one has to give up the speed advantage
that makes hashing so popular. I wonder if optimized C code using
bisect would be very much slower for small ranges?
The current set implementation uses dicts to implement sets, while sets
are a more basic data type than dicts. At least dicts and sets should
be derived from the same type. Just speaking from an intuitive point of
view of course :-)
Here's a sorted dict implementation without using hashes, which maybe
would be fast if implemented in C. The insertion order can be retrieved
using the keys and values lists from the object itself, items() gives a
sorted sequence.
Anton.
NB warning untested code.
When using mutables as keys which is possible by this implementation,
don't change the keys after they're used in the object.
from array import array
class VDict:
def __init__(self,sequence = []):
self.keys = []
self.values = []
self.unranks = array('L',[])
for key,value in sequence:
self[key] = value
def __setitem__(self,key,value):
keys = self.keys
values = self.values
unranks = self.unranks
n = len(keys)
i = self.bisect_left(key)
if i == n or keys[unranks[i]] != key:
keys.append(key)
values.append(value)
unranks.insert(i,n)
else:
values[i] = value
def __getitem__(self,key):
i = self.bisect_left(key)
return self.values[self.unranks[i]]
def bisect_left(self,key, lo=0, hi=None):
keys = self.keys
unranks = self.unranks
if hi is None:
hi = len(keys)
while lo < hi:
mid = (lo+hi)//2
if keys[unranks[mid]] < key:
lo = mid+1
else:
hi = mid
return lo
def __contains__(self,key):
keys = self.keys
unranks = self.unranks
n = len(keys)
i = self.bisect_left(key)
return i < n and keys[unranks[i]] == key
def __len__(self):
return len(self.keys)
def items(self):
keys = self.keys
values = self.values
unranks = self.unranks
return [(keys[i],values[i]) for i in unranks]
def __iter__(self):
return iter(self.items())
def remove(self,key):
keys = self.keys
values = self.values
unranks = self.unranks
n = len(keys)
i = self.bisect_left(key)
x = unranks[i]
if i < n and keys[x] == key:
del keys[x]
del values[x]
del unranks[i]
for j,k in enumerate(unranks):
if k > x:
unranks[j] -= 1
def test():
V = VDict()
L = [1,2,3]
V[L] = 10
print V[L]
V[L] = 20
print V[L]
V.remove(L)
print V.items()
V = VDict(zip('edcba',range(5)))
print V.items()
print V['d']
V.remove('d')
print V.items()
if __name__=='__main__':
test()
More information about the Python-list
mailing list