Is the regular expression module written in C or Python?

Changjune Kim juneaftn at REMOVETHIShanmail.net
Tue Oct 8 11:39:43 EDT 2002


'\[-([^-]|-[^\]])*-\]'

You could make a function that returns the "anything-but-sequence" part
of the RE for convenience.


"Ulli Stein" <mennosimons at gmx.net> wrote in message
news:anuj1l$6vm$03$1 at news.t-online.com...
> Richie Hindle wrote:
>
> > Hi Ulli,
> >
> >> >>> import re
> >> >>> re.findall("\[(.*?)\]", "["+"x"*10000+"]")
> >> Traceback (most recent call last):
> >>
> >> If the part which .*? will match exceeds 9996 bytes python throws the
> >> above exception. Having this bug, re renders itself unusable.
> >
> > 'Unusable' is putting it a bit strong:
> >
> >>>> import re
> >>>> re.findall(r"\[([^\]]*)\]", "["+"x"*10000+"]")
> > ['xxxxxxxxxx...
> >
> > I could be wrong, but I believe the latter is more efficient - I've a
> > feeling that the lookahead construct makes the RE potentially very slow
> > (it may be an implementation issue).  Hopefully a passing RE expert
> > will be along to support/correct me...?
> >
>
> This way of replacing the lookahaed works only in cases where you have
only
> one char to look ahaed for.
>
> I tried very long without success in replacing the (.*?) part for a RE in
> which I am looking for "[- ... -]", "[+ ... +]", "[$ ... $]", and "[# ...
> #]". How would you replace the (.*?) for this RE?
>
> Ulli




More information about the Python-list mailing list