What is built-in method sub

Philip Semanchuk philip at semanchuk.com
Mon Jan 11 16:03:06 EST 2010


On Jan 11, 2010, at 3:30 PM, Jeremy wrote:

> On Jan 11, 1:15 pm, "Diez B. Roggisch" <de... at nospam.web.de> wrote:
>> Jeremy schrieb:
>>
>>> On Jan 11, 12:54 pm, Carl Banks <pavlovevide... at gmail.com> wrote:
>>>> On Jan 11, 11:20 am, Jeremy <jlcon... at gmail.com> wrote:
>>
>>>>> I just profiled one of my Python scripts and discovered that  
>>>>> >99% of
>>>>> the time was spent in
>>>>> {built-in method sub}
>>>>> What is this function and is there a way to optimize it?
>>>> I'm guessing this is re.sub (or, more likely, a method sub of an
>>>> internal object that is called by re.sub).
>>
>>>> If all your script does is to make a bunch of regexp substitutions,
>>>> then spending 99% of the time in this function might be reasonable.
>>>> Optimize your regexps to improve performance.  (We can help you  
>>>> if you
>>>> care to share any.)
>>
>>>> If my guess is wrong, you'll have to be more specific about what  
>>>> your
>>>> sctipt does, and maybe share the profile printout or something.
>>
>>>> Carl Banks
>>
>>> Your guess is correct.  I had forgotten that I was using that
>>> function.
>>
>>> I am using the re.sub command to remove trailing whitespace from  
>>> lines
>>> in a text file.  The commands I use are copied below.  If you have  
>>> any
>>> suggestions on how they could be improved, I would love to know.
>>
>>> Thanks,
>>> Jeremy
>>
>>> lines = self._outfile.readlines()
>>> self._outfile.close()
>>
>>> line = string.join(lines)
>>
>>> if self.removeWS:
>>>     # Remove trailing white space on each line
>>>     trailingPattern = '(\S*)\ +?\n'
>>>     line = re.sub(trailingPattern, '\\1\n', line)
>>
>> line = line.rstrip()?
>>
>> Diez
>
> Yep.  I was trying to reinvent the wheel.  I just remove the trailing
> whitespace before joining the lines.

I second the suggestion to use rstrip(), but for future reference you  
should also check out the compile() function in the re module. You  
might want to time the code above against a version using a compiled  
regex to see how much difference it makes.

Cheers
Philip






More information about the Python-list mailing list