Masklinn, 23.03.2013 14:26:
On 2013-03-23, at 03:00 , Nick Coghlan wrote:
On Fri, Mar 22, 2013 at 3:42 PM, Gregory P. Smith wrote:
On Fri, Mar 22, 2013 at 3:31 PM, Ronny Pfannschmidt wrote:
while reviewing urllib.parse i noticed a pretty ugly pattern many functions had an attached global and in their own code they would compile an regex on first use and assign it to that global
its clear that compiling a regex is expensive, so having them be compiled later at first use would be of some benefit
It isn't expensive to do, it is expensive to do repeatedly for no reason. Thus the use of compiled regexes. Code like this would be better off refactored to reference a precompiled global rather than conditionally check if it needs compiling every time it is called.
Alternatively, if there are a lot of different regexes, it may be better to rely on the implicit cache inside the re module.
Wouldn't it be better if there are *few* different regexes? Since the module itself caches 512 expressions (100 in Python 2) and does not use an LRU or other "smart" cache (it just clears the whole cache dict once the limit is breached as far as I can see), *and* any explicit call to re.compile will *still* use the internal cache (meaning even going through re.compile will count against the _MAXCACHE limit), all regex uses throughout the application (including standard library &al) will count against the built-in cache and increase the chance of the regex we want cached to be thrown out no?
Remember that any precompiled regex that got thrown out of the cache will be rebuilt as soon as it's being used. So the problem only ever arises when you really have more than _MAXCACHE different regexes that are all being used within the same loop, and even then, they'd have to be used in (mostly) the same order to draw the cache completely useless. That's a very rare case, IMHO. In all other cases, whenever the number of different regexes that are being used within a loop is lower than _MAXCACHE, the cache will immediately bring a substantial net win. And if a regex is not being used in a loop, then it's really unlikely that its compilation time will dominate the runtime of your application (assuming that your application is doing more than just compiling regexes...). Stefan