re.sub() loops

Sun Apr 18 07:53:06 EDT 1999

On Sat, Apr 17, 1999 at 02:35:48PM +0000, Aahz Maruch wrote:
> In article <Pine.GSO.4.10.9904171556010.28418-100000 at moses.sz-sb.de>,
> Andreas Jung  <ajung at sz-sb.de> wrote:
> >
> >I am trying to do some lame HTML processing with some
> >HTML. The following lines tries to remove some
> >unneccessary code from a HTML file. However python hangs
> >in this call:
> >
> >data = re.sub('<TABLE.*?es.*?da.*?en.*?fi.*?sv.*?TABLE>','',data)    
> 
> Does the <TABLE>...</TABLE> contain *all* the strings "es", "da", "en",
> "fi", and "sv"?  Or are the strings supposed to be "?es" and so on?  In
> any event, with six ".*" patterns in there, you've got exponential
> processing time, even if it's not hanging.

The strings are all contained within the TABLE section. I used
".*?" to get the smallest match because there are several
TABLE sections in the HTML document. You're right - re did not
hang - after about 5 minutes a got a reply :) However meanwhile
I got another working solution for the problem.

Thanks,
Andreas