Bug in RE's?
Bengt Richter
bokr at oz.net
Thu Oct 17 19:30:37 EDT 2002
On 17 Oct 2002 09:49:13 -0700, robin.siebler at corp.palm.com (Robin Siebler) wrote:
>I have a list of words that I need to search for in a large number of
>files. I decided to put the patterns into one huge string and make an
>RE out of it. However, at a certain point the RE stops working. By
>this, I mean that the findall method does not return any matches for a
>line that contain matches. If I recompile the RE, it works fine. the
>RE always stops working in the same place and it always starts working
>again in the same place. I have no idea why. Has anyone ever seen
>this? If this is a bug, where would I report it?
I wonder if it is related to the 65535 problem seen before. That particular
magic value in a repeat x{65535} is apparently equivalent to x*:
>>> import re
>>> rxo = re.compile(r'xy{65535}(z+)')
>>> rxo.findall('x'+'y'*66000+'z'*5)
['zzzzz']
>>> rxo = re.compile(r'xy{65534}(z+)')
>>> rxo.findall('x'+'y'*66000+'z'*5)
[]
It seems to be storing string lengths modulo 2**16:
>>> for i in range(65536-8,65536+8):
... rxo = re.compile(r'xy{%d}(z+)'% i)
... print '%6d: %s' % (i, rxo.findall('x'+'y'*i+'z'*5)),
... print ' -- vs %6d: %s' % (i%65536, rxo.findall('x'+'y'*(i%65536)+'z'*5))
...
65528: ['zzzzz'] -- vs 65528: ['zzzzz']
65529: ['zzzzz'] -- vs 65529: ['zzzzz']
65530: ['zzzzz'] -- vs 65530: ['zzzzz']
65531: ['zzzzz'] -- vs 65531: ['zzzzz']
65532: ['zzzzz'] -- vs 65532: ['zzzzz']
65533: ['zzzzz'] -- vs 65533: ['zzzzz']
65534: ['zzzzz'] -- vs 65534: ['zzzzz']
65535: ['zzzzz'] -- vs 65535: ['zzzzz']
65536: [] -- vs 0: ['zzzzz']
65537: [] -- vs 1: ['zzzzz']
65538: [] -- vs 2: ['zzzzz']
65539: [] -- vs 3: ['zzzzz']
65540: [] -- vs 4: ['zzzzz']
65541: [] -- vs 5: ['zzzzz']
65542: [] -- vs 6: ['zzzzz']
65543: [] -- vs 7: ['zzzzz']
Regards,
Bengt Richter
More information about the Python-list
mailing list