ANN: equivalence 0.1
Giuseppe Ottaviano
giuott at gmail.com
Mon Jun 2 10:05:01 EDT 2008
>
> Interesting.. it took me a while to figure out why the second case is
> so much slower and you're right, it is indeed quadratic. I don't know
> how likely would such pathological cases be in practice, given that
> the preferred way to merge a batch of objects is
> eq.merge(*xrange(10001)), which is more than 3 times faster than the
> non-pathologic first case (and which I optimized even further to avoid
> attribute lookups within the loop so it's more like 5 times faster
> now). Also the batch version in this case remains linear even if you
> merge backwards, eq.merge(*xrange(10000,-1,-1)), or in any order for
> that matter.
The example just showed what could happen if the merges are done in
pathological order, it is not about batch merging. I think that
pathological cases like this indeed show up in real cases: many
algorithms of near duplicate elimination and clustering reduce to
finding connected components of a graph whose edges are given as a
stream, so you can't control their order.
With this implementation, every time a component sized N is given a
second (or following) argument to merge, you pay Omega(N).
> I am familiar with it and I will certainly consider it for the next
> version; for now I was primarily interested in functionality (API) and
> correctness.
Good :)
More information about the Python-list
mailing list