longest sequence

Tim Peters tim.one at comcast.net
Mon Feb 17 23:15:39 CET 2003


[Brian Quinlan]
> You didn't fix the important part: this is still a O(nlogn) solution to
> an O(n) problem. Of course *args is likely to be small so who cares?

[Evan Simpson]
> True enough.  Who can resist the opportunity for a one-liner that uses
> a list comprehesion, though?
>
> def longest(*args):
>      return max([(len(s), s) for s in args])[1]

This is linear-time if all the sequences have different lengths, but, e.g.,
if all have the same length len(s)==N, can take time proportional to
N*len(args).

The difficulty is that, when comparing

    (n, seq1)
to
    (n, seq2)

the tie on n means seq1 gets compared to seq2 in order to try to break the
tie.  Suppose seq1 and seq2 are both range(1000000):  then Python has to do
a million more comparisons to decide seq1 and seq2 are equal.

This wouldn't be worth belaboring <wink>, except that the same thing
routinely happens in Decorate-Sort-Undecorate patterns, where the sequence
to be sorted is transformed into a list of tuples first:

    [(key1, obj1), (key2, obj2), ...]

Same thing applies:  whenever two keys compare equal, Python goes on to
compare the objects in order to try to break the tie -- and object
comparison can be arbitrarily expensive.

A cheap solution in both cases is to inject another element into the tuples
that never compares equal, and is known to be cheap to compare.  The most
natural thingie to inject is the index in the original sequence, like so in
Python 2.3:

def longest(*args):
     return max([(len(s), i, s) for i, s in enumerate(args)])[-1]

Now the second element of each tuple is unique, so comparing two tuples can
never tie, and never needs to compare the third tuple elements.  This is
reliably linear in len(args).

Note that this is the same trick used in DSU to force a stable sort.
Because of the speed implications when object comparison can be expensive,
this is still a valuable trick in Python 2.3, despite that 2.3's list.sort()
is stable without any help:  embedding sequence indices prevents falling
back to general-object comparison, as well as forcing stability.






More information about the Python-list mailing list