Bottleneck? More efficient regular expression?

Robin Becker robin at
Fri Sep 26 09:44:46 CEST 2003

In article <%nNcb.3635$RW4.47 at>, Andrew
Dalke <adalke at> writes
>Tina Li:
>> The lag is *perceivable* (this is what I meant; sorry) by a human user so
>it's slower.
>Yup, that's what I meant.  Too many people make theoretical
>arguments for why to choose one (complicated) approach
>over a simpler one on the basis of performance, when it turns
>out performance isn't the issue.  My appreciation goes out to you
>for doing it the right way.
>You may also want to look at pyRXP from ReportLab.
>However, there seems to be some drastic problems on their
>site -- links on fail and goes
>to's site placeholder page.

yup we're reassembling everything again .... sigh :(

New more dynamic confusion ......

>It's a very fast XML parser for Python.
>> I in fact tried that before but the over-limit error still happened. So
>> not just the non-greedy .*? that's causing the problem. Hmm.
>No, I don't think it is.  The stack space increases by one for
>each ambiguity and the .*? should only produce one ambiguity.
>Usually there's a stack problem only if you have an ambiguity
>or empty match inside a repeat, and I didn't see that in your
>If you get really interested in tracking this down, you might look
>around for some of the GUI regexp debugging tools.  There's
>one in ActiveState's product, as I recall.  Err, but it's based on
>Perl's regexp parser and won't handle (?P<>)
>(I do have an experimental pure-Python regexp engine that
>I would offer for debugging, but it doesn't handle .*? yet and
>needs a rewrite before it does.)
>> It only handles tags without space because all tags are
>> guaranteed to be generated without space.
>Sure.  All I was saying was that if you're going to code for
>a specific layout then you don't need to be as general.
>You might even consider using "[^\n]*\n{5}" if you just
>want to skip 5 lines.
>                    Andrew
>                    dalke at
>  If you are doing anything open-sourceish, or using
>open source in bioinformatics, structural biology, and
>related fields, and will be at ISMB in Edinborough next
>year, you might consider attending the Bioinformatics
>Open Source Conference.

Robin Becker

More information about the Python-list mailing list