heapq.merge with key=
Scott David Daniels
Scott.Daniels at Acm.Org
Sat May 9 12:51:24 EDT 2009
Peter Otten wrote:
> Kevin D. Smith wrote:
>
>> On 2009-05-07 23:48:43 -0500, Chris Rebert <clp2 at rebertia.com> said:
>>
>>> On Thu, May 7, 2009 at 2:23 PM, Kevin D. Smith <Kevin.Smith at sas.com>
>>> wrote:
>>>> I need the behavior of heapq.merge to merge a bunch of results from a
>>>> database.... I would prefer to do this with generators.... I need
>>>> the key argument to sort on the correct field in the database...
>>> I think so.... [some code]
>> Ah, that's not a bad idea. I think it could be simplified by letting
>> Python do the comparison work as follows (also untested).
>>
>> def keyify(gen, key=lamda x:x):
>> for item in gen:
>> yield (key(item), item)
> ... This is not as general. It makes sorting unstable and can even fail
> if the items are unorderable:
> One way to fix it:
>>>> def decorate(gen, key):
> ... for index, item in enumerate(gen):
> ... yield key(item), index, item
>>>> sorted(decorate([1j, -1j, 1+1j], key=lambda c: c.real))
> [(0.0, 0, 1j), (0.0, 1, -1j), (1.0, 2, (1+1j))]
>>>> list(undecorate(_))
> [1j, -1j, (1+1j)]
However, a heap gets faster if equal items are equal, so you
really want the decorated value to be the sole component of a
comparison. (and remember, sorts are now stable).
As in your code, key generation is likely best calculated once.
Here's another fix:
class Keyed(object):
def __init__(self, obj, key):
self.obj = obj
self.key = key
def __lt__(self, other): # cmp is old-style magic
return self.key < other.key
def __eq__(self, other):
return self.key == other.key
def __repr__(self):
return '%s(%r, %r)' % (
self.__class__.__name__, self.obj, self.key)
def kmerge(key, *streams):
keyed_streams = [(Keyed(obj, key(obj)) for obj in stream)
for stream in streams]
for element in heapq.merge(*keyed_streams):
yield element.obj
--Scott David Daniels
Scott.Daniels at Acm.Org
More information about the Python-list
mailing list