[Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken

Antoine Pitrou solipsis at pitrou.net
Wed Jul 7 20:40:08 CEST 2010


On Wed, 7 Jul 2010 19:44:31 +0200
Eli Bendersky <eliben at gmail.com> wrote:
> 
> For what it's worth, my benchmarking showed that modifying the
> heuristic to only kick in when there are more than 100 kinds of
> elements (Terry's option A) didn't affect the runtime of matching
> whatsoever, even when the heuristic *does* kick in. All it adds,
> really, is the overhead of a single 'if' statement. So it wouldn't be
> right to assume that somehow modifying the heuristic or allowing to
> turn it off will negatively affect performance in the special case Tim
> originally optimized for.

Just because it doesn't affect performance in your tests doesn't mean it
won't do so in the general case. Consider a case where Tim's junk
optimization kicked in and helped improve performance a lot, but where
there are still less than 100 alphabet symbols. The new heuristic will
ruin this use case.

That's why I'm advocating a dedicated flag instead.




More information about the Python-Dev mailing list