What is built-in method sub

John Machin sjmachin at lexicon.net
Tue Jan 12 01:02:38 CET 2010


On Jan 12, 7:30 am, Jeremy <jlcon... at gmail.com> wrote:
> On Jan 11, 1:15 pm, "Diez B. Roggisch" <de... at nospam.web.de> wrote:
>
>
>
> > Jeremy schrieb:
>
> > > On Jan 11, 12:54 pm, Carl Banks <pavlovevide... at gmail.com> wrote:
> > >> On Jan 11, 11:20 am, Jeremy <jlcon... at gmail.com> wrote:
>
> > >>> I just profiled one of my Python scripts and discovered that >99% of
> > >>> the time was spent in
> > >>> {built-in method sub}
> > >>> What is this function and is there a way to optimize it?
> > >> I'm guessing this is re.sub (or, more likely, a method sub of an
> > >> internal object that is called by re.sub).
>
> > >> If all your script does is to make a bunch of regexp substitutions,
> > >> then spending 99% of the time in this function might be reasonable.
> > >> Optimize your regexps to improve performance.  (We can help you if you
> > >> care to share any.)
>
> > >> If my guess is wrong, you'll have to be more specific about what your
> > >> sctipt does, and maybe share the profile printout or something.
>
> > >> Carl Banks
>
> > > Your guess is correct.  I had forgotten that I was using that
> > > function.
>
> > > I am using the re.sub command to remove trailing whitespace from lines
> > > in a text file.  The commands I use are copied below.  If you have any
> > > suggestions on how they could be improved, I would love to know.
>
> > > Thanks,
> > > Jeremy
>
> > > lines = self._outfile.readlines()
> > > self._outfile.close()
>
> > > line = string.join(lines)
>
> > > if self.removeWS:
> > >     # Remove trailing white space on each line
> > >     trailingPattern = '(\S*)\ +?\n'
> > >     line = re.sub(trailingPattern, '\\1\n', line)
>
> > line = line.rstrip()?
>
> > Diez
>
> Yep.  I was trying to reinvent the wheel.  I just remove the trailing
> whitespace before joining the lines.

Actually you don't do that. Your regex has three components:

(1) (\S*) zero or more occurrences of not-whitespace
(2) \ +? one or more (non-greedy) occurrences of SPACE
(3) \n a newline

Component (2) should be \s+?

In any case this is a round-about way of doing it. Try writing a regex
that does it simply: replace trailing whitespace by an empty string.

Another problem with your approach: it doesn't work if the line is not
terminated by \n -- this is quite possible if the lines are being read
from a file.

A wise person once said: Re-inventing the wheel is often accompanied
by forgetting to re-invent the axle.






More information about the Python-list mailing list