modifying standard library functionality (difflib)

Vlastimil Brom vlastimil.brom at gmail.com
Wed Jun 23 21:07:48 EDT 2010


Hi all,
I'd like to ask about the most reasonable/recommended/... way to
modify the functionality of the standard library module (if it is
recommended at all).
I'm using difflib.SequenceMatcher for character-wise comparisons of
the texts; although this might not be a usual use case, the results
are fine for the given task; however,  there were some cornercases,
where the shown differences were clearly larger than needed. As it
turned out, this is due to a kind of specialcasing of relatively more
frequent items; cf.
http://bugs.python.org/issue1528074#msg29269
http://bugs.python.org/issue2986
The solution (or workaround) for me was to modify the SequenceMatcher
class by adding another parameter checkpopular=True which influences
the behaviour of the __chain_b function accordingly. The possible
speed issues with this optimisation turned off (checkpopular=False)
don't really matter now and the comparison results are much better for
my use cases.

However, I'd like to ask, how to best maintain this modified
functionality in the sourcecode.
I tried some possibilities, which seem to work, but I'd appreciate
suggestions on the preferred way in such cases.
- It is simply possibly to have a modified sourcefile difflib.py in
the script directory.
- Furthermore one can subclass difflib.SequenceMatcher an overide its
__chain_b function (however the name doesn't look like a "public"
function ...
- I guess, it wouldn't be recommended to directly replace
difflib.SequenceMatcher._SequenceMatcher__chain_b ...
In all cases I have either a copy of the whole file or the respective
function as a part of my source.

I'd appreciate comments or suggestions on this or maybe another better
approaches to this problem.

Thanks in advance,
           vbr



More information about the Python-list mailing list