Precompiled regular expressions slower?

Raymond Hettinger othello
Wed Feb 27 06:52:52 CET 2002


"Sean 'Shaleh' Perry" <shalehperry at attbi.com> wrote in message
news:mailman.1014753464.19952.python-list at python.org...
> >
> > The times of the following three tests are:
> > 1.94322395325   # search(string, string)
> > 2.62431204319   # search(re_obj, string)
> > 0.667925000191  # re_obj.search(string)
> >
> > Someone care to explain why the second ends up slower than the first?
> >
>
>  1 def _compile(*key):
>  2    # internal: compile pattern
>  3    p = _cache.get(key)
>  4    if p is not None:
>  5        return p
>  6    pattern, flags = key
>  7    if type(pattern) not in sre_compile.STRING_TYPES:
>  8        return pattern
>  9    try:
> 10        p = sre_compile.compile(pattern, flags)
> 11    except error, v:
> 12        raise error, v # invalid expression
> 13    if len(_cache) >= _MAXCACHE:
> 14        _cache.clear()
> 15    _cache[key] = p
> 16    return p
>
> My guess is #2 is slower because of the if type() check on line 7.
>

I think the culprit is in line #3.  Looking-up a string in the cache
dictionary
is very fast -- hashing in C and comparing in C or even better matching an
interned string with 'is'.  Looking up a compiled regular expression object
involves checking for a Python class definition of  __hash__  and __cmp__,
then not finding the definitions, revert to default hash and followed by an
id() based compare.

The moral is to give strings to functions that expect strings and make
instance.method calls when a pre-compiled instance is available.


Raymond Hettinger






More information about the Python-list mailing list