Bug in RE's?

Bengt Richter bokr at oz.net
Thu Oct 17 19:30:37 EDT 2002


On 17 Oct 2002 09:49:13 -0700, robin.siebler at corp.palm.com (Robin Siebler) wrote:

>I have a list of words that I need to search for in a large number of
>files.  I decided to put the patterns into one huge string and make an
>RE out of it.  However, at a certain point the RE stops working.  By
>this, I mean that the findall method does not return any matches for a
>line that contain matches.  If I recompile the RE, it works fine.  the
>RE always stops working in the same place and it always starts working
>again in the same place.  I have no idea why.  Has anyone ever seen
>this?  If this is a bug, where would I report it?

I wonder if it is related to the 65535 problem seen before. That particular
magic value in a repeat x{65535} is apparently equivalent to x*:

 >>> import re
 >>> rxo = re.compile(r'xy{65535}(z+)')
 >>> rxo.findall('x'+'y'*66000+'z'*5)
 ['zzzzz']
 >>> rxo = re.compile(r'xy{65534}(z+)')
 >>> rxo.findall('x'+'y'*66000+'z'*5)
 []

It seems to be storing string lengths modulo 2**16:

>>> for i in range(65536-8,65536+8):
...     rxo = re.compile(r'xy{%d}(z+)'% i)
...     print '%6d: %s' % (i, rxo.findall('x'+'y'*i+'z'*5)),
...     print ' -- vs %6d: %s' % (i%65536, rxo.findall('x'+'y'*(i%65536)+'z'*5))
...
 65528: ['zzzzz']  -- vs  65528: ['zzzzz']
 65529: ['zzzzz']  -- vs  65529: ['zzzzz']
 65530: ['zzzzz']  -- vs  65530: ['zzzzz']
 65531: ['zzzzz']  -- vs  65531: ['zzzzz']
 65532: ['zzzzz']  -- vs  65532: ['zzzzz']
 65533: ['zzzzz']  -- vs  65533: ['zzzzz']
 65534: ['zzzzz']  -- vs  65534: ['zzzzz']
 65535: ['zzzzz']  -- vs  65535: ['zzzzz']
 65536: []  -- vs      0: ['zzzzz']
 65537: []  -- vs      1: ['zzzzz']
 65538: []  -- vs      2: ['zzzzz']
 65539: []  -- vs      3: ['zzzzz']
 65540: []  -- vs      4: ['zzzzz']
 65541: []  -- vs      5: ['zzzzz']
 65542: []  -- vs      6: ['zzzzz']
 65543: []  -- vs      7: ['zzzzz']

Regards,
Bengt Richter



More information about the Python-list mailing list