any chance regular expressions are cached?
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Sun Mar 9 23:39:22 EDT 2008
On Mon, 10 Mar 2008 00:42:47 +0000, mh wrote:
> I've got a bit of code in a function like this:
>
> s=re.sub(r'\n','\n'+spaces,s)
> s=re.sub(r'^',spaces,s)
> s=re.sub(r' *\n','\n',s)
> s=re.sub(r' *$','',s)
> s=re.sub(r'\n*$','',s)
>
> Is there any chance that these will be cached somewhere, and save me the
> trouble of having to declare some global re's if I don't want to have
> them recompiled on each function invocation?
At the interactive interpreter, type "help(re)" [enter]. A page or two
down, you will see:
purge()
Clear the regular expression cache
and looking at the source code I see many calls to _compile() which
starts off with:
def _compile(*key):
# internal: compile pattern
cachekey = (type(key[0]),) + key
p = _cache.get(cachekey)
if p is not None:
return p
So yes, the re module caches it's regular expressions.
Having said that, at least four out of the five examples you give are
good examples of when you SHOULDN'T use regexes.
re.sub(r'\n','\n'+spaces,s)
is better written as s.replace('\n', '\n'+spaces). Don't believe me?
Check this out:
>>> s = 'hello\nworld'
>>> spaces = " "
>>> from timeit import Timer
>>> Timer("re.sub('\\n', '\\n'+spaces, s)",
... "import re;from __main__ import s, spaces").timeit()
7.4031901359558105
>>> Timer("s.replace('\\n', '\\n'+spaces)",
... "import re;from __main__ import s, spaces").timeit()
1.6208670139312744
The regex is nearly five times slower than the simple string replacement.
Similarly:
re.sub(r'^',spaces,s)
is better written as spaces+s, which is nearly eleven times faster.
Also:
re.sub(r' *$','',s)
re.sub(r'\n*$','',s)
are just slow ways of writing s.rstrip(' ') and s.rstrip('\n').
--
Steven
More information about the Python-list
mailing list