On 2013-03-23, at 03:00 , Nick Coghlan wrote:
On Fri, Mar 22, 2013 at 3:42 PM, Gregory P. Smith greg@krypto.org wrote:
On Fri, Mar 22, 2013 at 3:31 PM, Ronny Pfannschmidt Ronny.Pfannschmidt@gmx.de wrote:
Hi,
while reviewing urllib.parse i noticed a pretty ugly pattern
many functions had an attached global and in their own code they would compile an regex on first use and assign it to that global
its clear that compiling a regex is expensive, so having them be compiled later at first use would be of some benefit
It isn't expensive to do, it is expensive to do repeatedly for no reason. Thus the use of compiled regexes. Code like this would be better off refactored to reference a precompiled global rather than conditionally check if it needs compiling every time it is called.
Alternatively, if there are a lot of different regexes, it may be better to rely on the implicit cache inside the re module.
Wouldn't it be better if there are *few* different regexes? Since the module itself caches 512 expressions (100 in Python 2) and does not use an LRU or other "smart" cache (it just clears the whole cache dict once the limit is breached as far as I can see), *and* any explicit call to re.compile will *still* use the internal cache (meaning even going through re.compile will count against the _MAXCACHE limit), all regex uses throughout the application (including standard library &al) will count against the built-in cache and increase the chance of the regex we want cached to be thrown out no?