[Python-Dev] Investigating time for `import requests`

Paul Moore p.f.moore at gmail.com
Mon Oct 2 04:57:01 EDT 2017


On 2 October 2017 at 06:13, Raymond Hettinger
<raymond.hettinger at gmail.com> wrote:
>
>> On Oct 1, 2017, at 7:34 PM, Nathaniel Smith <njs at pobox.com> wrote:
>>
>> In principle re.compile() itself could be made lazy -- return a
>> regular exception object that just holds the string, and then compiles
>> and caches it the first time it's used. Might be tricky to do in a
>> backwards compatibility way if it moves detection of invalid regexes
>> from compile time to use time, but it could be an opt-in flag.
>
> ISTM that someone writing ``re.compile(pattern)`` is explicitly saying they want the regex to be pre-compiled.   For cache on first-use, we already have a way to do that with ``re.search(pattern, some string)`` which compiles and then caches.

In practice, I don't think the fact that re.search() et al cache the
compiled expressions is that well known (it's mentioned in the
re.compile docs, but not in the re.search docs) and so people often
compile up front because they think it helps, rather than actually
measuring to check. Also, many regexes are long and complex, so
factoring them out as global variables is a reasonable practice. And
it's easy to imagine people deciding that putting the re.compile step
into the global, rather than having the global be a string that gets
passed to re.search, is a sensible thing to do (I know I'd do that,
without even thinking about it).

So I think that cache on first use is likely to be a useful
optimisation in practical terms. I don't have any feel for how many
uses of re.compile up front would be harmed if we defer compilation to
first use (other than "probably not many") but we could make it opt-in
if necessary - we'd hit the same problem of people not thinking to opt
in, though.

Paul


More information about the Python-Dev mailing list