Is the regular expression module written in C or Python?

Richie Hindle richie at entrian.com
Sat Oct 5 15:45:44 EDT 2002


Hi Ulli,

> >>> import re
> >>> re.findall("\[(.*?)\]", "["+"x"*10000+"]")
> Traceback (most recent call last):
> 
> If the part which .*? will match exceeds 9996 bytes python throws the above 
> exception. Having this bug, re renders itself unusable.

'Unusable' is putting it a bit strong:

>>> import re
>>> re.findall(r"\[([^\]]*)\]", "["+"x"*10000+"]")
['xxxxxxxxxx...

I could be wrong, but I believe the latter is more efficient - I've a
feeling that the lookahead construct makes the RE potentially very slow
(it may be an implementation issue).  Hopefully a passing RE expert
will be along to support/correct me...?

-- 
Richie Hindle
richie at entrian.com



More information about the Python-list mailing list