SequenceMatcher bug ?

rdmurray at bitdance.com rdmurray at bitdance.com
Wed Dec 10 15:00:30 CET 2008


On Tue, 9 Dec 2008 at 22:15, eliben wrote:
> On Dec 10, 4:12 am, rdmur... at bitdance.com wrote:
>> On Mon, 8 Dec 2008 at 23:46, eliben wrote:
>>> This is about Python 2.5.2 - I don't know if there were fixes to this
>>> module in 2.6/3.0
>>
>>> I think I ran into a bug with difflib.SequenceMatcherclass.
>>> Specifically, its ratio() method. The following:
>>
>>> SequenceMatcher(None, [4] + [10] * 500 + [5], [10] * 500 + [5]).ratio
>>> ()
>>
>>> returns 0.0
>>
>>> While the same with 500 replaced by 100 returns .99... something
>>> Looking at the code ofSequenceMatcherthere's some caching going on
>>> when the sequences are longer than 200 elements (and indeed, I can
>>> reproduce the bug above 200 but not below). Can anyone confirm that
>>> this misbehaves and suggest a workaround ?
>>
>> Python 2.5.2 (r252:60911, Sep 29 2008, 20:34:04)
>> [GCC 4.3.1] on linux2
>> Type "help", "copyright", "credits" or "license" for more information.>>> from difflib importSequenceMatcher
>>>>> SequenceMatcher(None, [4] + [10] * 500 + [5], [10] * 500 +
>>>>> [5]).ratio()
>>
>> 0.99900299102691925
>>
>
> Strange. I could reproduce the problem both on ActiveState Python
> 2.5.2 for Windows, and in the online Try Python evaluator:
>
> http://try-python.mired.org/

My system is Gentoo, which installs python from source.  Maybe gentoo
applies patches that the binary releases don't have.

--RDM


More information about the Python-list mailing list