[Python-ideas] No need to add a regex pattern literal

Stefan Behnel stefan_ml at behnel.de
Tue Jan 1 08:39:11 EST 2019


Ma Lin schrieb am 31.12.18 um 14:02:
> On 18-12-31 19:47, Antoine Pitrou wrote:
>> The complaint is that the global cache is still too costly.
>> See measurements in https://bugs.python.org/issue35559
> 
> In this issue, using a global variable `_has_non_base16_digits` [1] will
> accelerate 30%.
> Is re module's internal cache [2] so bad?
> 
> If rewrite re module's cache with C and use a custom data structure, maybe
> we will get a small speedup.
> 
> [1] `_has_non_base16_digits` in PR11287
> [1] https://github.com/python/cpython/pull/11287/files
> 
> [2] re module's internal cache code:
> [2] https://github.com/python/cpython/blob/master/Lib/re.py#L268-L295
> 
> _cache = {}  # ordered!
> _MAXCACHE = 512
> def _compile(pattern, flags):
>     # internal: compile pattern
>     if isinstance(flags, RegexFlag):
>         flags = flags.value
>     try:
>         return _cache[type(pattern), pattern, flags]
>     except KeyError:
>         pass
>     ...

I wouldn't be surprised if the slowest part here was the isinstance()
check. Maybe the RegexFlag class could implement "__hash__()" as "return
hash(self.value)" ?

Stefan



More information about the Python-ideas mailing list