Making regex suck less

Bengt Richter bokr at oz.net
Tue Sep 10 03:40:08 CEST 2002


On Thu, 05 Sep 2002 06:56:24 GMT, Ben Wolfson <wolfson at midway.uchicago.edu> wrote:

>On Thu, 05 Sep 2002 06:41:14 GMT, "Fredrik Lundh" <fredrik at pythonware.com>
>wrote:
>
>>Bengt Richter wrote:
>>
>>> >you can ask SRE to dump the internal parse tree
>>> >to stdout:
>>> >
>>> >>>> sre.compile("[a-z]\d*", sre.DEBUG)
>>> >in
>>> >  range (97, 122)
>>> >max_repeat 0 65535
>>> >  in
>>> >    category category_digit
>>> >
>>> >turning this into 'English' is left as an exercise etc.
>>>
>>> Interesting, thanks. Does the above mean that sre can't fully match
>>>  'a'+'9'*65537
>>> ?
>>
>>in this context, 65535 represents any number:
>
>Doesn't that cause problems for something like this?
>
>>>> m=re.compile(r'\d{0,65535}a').match(('9'*1000000)+'a')
>>>> len(m.group(0))
>1000001
>
Looks like a bug to me if {0,65535} acts like {0,} 

BTW, a search for \d{0,65534} seems to mean it, and compiles
so slowly that I lost patience waiting. Not very optimized, I guess.

 >>> import re
 >>> m=re.compile(r'\d{0,65535}a').search(('9'*1000000)+'a')
 >>> len(m.group(0))
 1000001

That went reasonably in time(though it's wrong), but this snoozed.
It must be brute forcing something.

 >>> m=re.compile(r'\d{0,65534}a').search(('9'*1000000)+'a')
 ^C
 [18:50] C:\pywk\junk>

Regards,
Bengt Richter



More information about the Python-list mailing list