[Python-ideas] No need to add a regex pattern literal
Stefan Behnel
stefan_ml at behnel.de
Tue Jan 1 08:39:11 EST 2019
Ma Lin schrieb am 31.12.18 um 14:02:
> On 18-12-31 19:47, Antoine Pitrou wrote:
>> The complaint is that the global cache is still too costly.
>> See measurements in https://bugs.python.org/issue35559
>
> In this issue, using a global variable `_has_non_base16_digits` [1] will
> accelerate 30%.
> Is re module's internal cache [2] so bad?
>
> If rewrite re module's cache with C and use a custom data structure, maybe
> we will get a small speedup.
>
> [1] `_has_non_base16_digits` in PR11287
> [1] https://github.com/python/cpython/pull/11287/files
>
> [2] re module's internal cache code:
> [2] https://github.com/python/cpython/blob/master/Lib/re.py#L268-L295
>
> _cache = {} # ordered!
> _MAXCACHE = 512
> def _compile(pattern, flags):
> # internal: compile pattern
> if isinstance(flags, RegexFlag):
> flags = flags.value
> try:
> return _cache[type(pattern), pattern, flags]
> except KeyError:
> pass
> ...
I wouldn't be surprised if the slowest part here was the isinstance()
check. Maybe the RegexFlag class could implement "__hash__()" as "return
hash(self.value)" ?
Stefan
More information about the Python-ideas
mailing list