What is built-in method sub
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Mon Jan 11 15:34:22 EST 2010
On Mon, 11 Jan 2010 11:20:34 -0800, Jeremy wrote:
> I just profiled one of my Python scripts
Well done! I'm not being sarcastic, or condescending, but you'd be AMAZED
(or possibly not...) at how many people try to optimize their scripts
*without* profiling, and end up trying to speed up parts of the code that
don't matter while ignoring the actual bottlenecks.
> and discovered that >99% of the time was spent in
>
> {built-in method sub}
>
> What is this function
You don't give us enough information to answer with anything more than a
guess. You know what is in your scripts, we don't. I can do this:
>>> sub
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'sub' is not defined
So it's not a built-in function. Nor do strings have a sub method. So I'm
reduced to guessing. Based on your previous post, you're probably using
regexes, so:
>>> import re
>>> type(re.sub)
<type 'function'>
Getting closer, but that's a function, not a method.
>>> type(re.compile("x").sub)
<type 'builtin_function_or_method'>
That's probably the best candidate: you're probably calling the sub
method on a pre-compiled regular expression object.
As for the second part of your question:
> and is there a way to optimize it?
I think you'll find that Python's regex engine is pretty much optimised
as well as it can be, short of a major re-write. But to quote Jamie
Zawinski:
Some people, when confronted with a problem, think "I know,
I'll use regular expressions." Now they have two problems.
The best way to optimize regexes is to use them only when necessary. They
are inherently an expensive operation, a mini-programming language of
it's own. Naturally some regexes are more expensive than others: some can
be *really* expensive, some are not.
If you can avoid regexes in favour of ordinary string methods, do so. In
general, something like:
source.replace(target, new)
will potentially be much faster than:
regex = re.compile(target)
regex.sub(new, source)
# equivalent to re.sub(target, new, source)
(assuming of course that target is just a plain string with no regex
specialness). If you're just cracking a peanut, you probably don't need
the 30 lb sledgehammer of regular expressions.
Otherwise, we'd need to see the actual regexes that you are using in
order to comment on how you might optimize them.
--
Steven
More information about the Python-list
mailing list