What is built-in method sub

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Tue Jan 12 00:25:44 CET 2010


On Mon, 11 Jan 2010 13:51:48 -0800, Chris Rebert wrote:

> On Mon, Jan 11, 2010 at 12:34 PM, Steven D'Aprano
> <steve at remove-this-cybersource.com.au> wrote: <snip>
>> If you can avoid regexes in favour of ordinary string methods, do so.
>> In general, something like:
>>
>> source.replace(target, new)
>>
>> will potentially be much faster than:
>>
>> regex = re.compile(target)
>> regex.sub(new, source)
>> # equivalent to re.sub(target, new, source)
>>
>> (assuming of course that target is just a plain string with no regex
>> specialness). If you're just cracking a peanut, you probably don't need
>> the 30 lb sledgehammer of regular expressions.
> 
> Of course, but is the regex library really not smart enough to
> special-case and optimize vanilla string substitutions?


Apparently not in Python 2.5:


>>> from timeit import Timer
>>> t1 = Timer('x.sub("Dutch", "Nobody expects the Spanish 
Inquisition!")',
... 'from re import compile; x = compile("Spanish")')
>>> t2 = Timer('x.replace("Spanish", "Dutch")', 
... 'x="Nobody expects the Spanish Inquisition!"')
>>>
>>> t1.repeat()
[3.7209370136260986, 2.7262279987335205, 2.6416280269622803]
>>> t2.repeat()
[2.2915709018707275, 1.2584249973297119, 1.2730350494384766]


Even if it did, I wouldn't rely on that sort of special casing unless the 
language guaranteed it. Keep in mind that regexes are essentially a 
programming language (although not Turing Complete), and the engine 
implementation may choose purity and simplicity over such optimizations.


-- 
Steven



More information about the Python-list mailing list