Bottleneck? More efficient regular expression?

Tina Li tina_li23AThotmailDOTcom
Fri Sep 26 00:28:52 CEST 2003

Hi Andrew,

| > Thanks for the XML code. I've written up something
| > similar using xml.parsers.expat, but it's conceivably slower
| > than regexp.
| Conceivable, yes.  But 1) did you test it, and 2) would it make a
| difference?

I tested it in terms of correctness. I didn't do any serious performance bench-marking as it's not the most important.
The lag is *perceivable* (this is what I meant; sorry) by a human user so it's slower.

| Here's a question for you.  When is it easier to read the
| documentation for existing code (which has been tested)
| then it is to write and debug your own code?

Is it a test question? =) I think it depends on the nature of the task. If it's intrinsically complex and error-prone,
and that I have little exposure to it, I wouldn't risk re-inventing a wheel that doesn't turn. As well, what would the
opportunity cost be between the time it takes to understand the documentation and that to code and test? Are the docs
long, drab and obscure? Sometimes existing code isn't easy to customize for a particular need, then we'd have to redo

| You can also make the pattern a bit less ambiguous,
| eg, use [^>]* instead of .*? when you are inside an element,
| which turns
|    r'<pdbcode>(?P<pdbcode>.*?)</pdbcode>.*?'
| into
|    r'<pdbcode>(?P<pdbcode>[^<]*)</pdbcode>.*?'
| (and use [^"]* instead of .*? for getting the text of an
| attribute.)
| You can get rid of the other ambiguity (skipping characters
| until the start of the next tag) by using something like
|   ([^<]|<(?!queryGaps))*<queryGaps
| instead of
|   .*?<queryGaps

I in fact tried that before but the over-limit error still happened. So it's not just the non-greedy .*? that's causing
the problem. Hmm.

But again, thanks for the tips. I'm sure they'll come in handy someday. I'm doing this for a quick "hack" rather than a
generic and robust parser module. It only handles tags without space because all tags are guaranteed to be generated
without space.



-----= Posted via Newsfeeds.Com, Uncensored Usenet News =----- - The #1 Newsgroup Service in the World!
-----==  Over 100,000 Newsgroups - 19 Different Servers! =-----

More information about the Python-list mailing list