[Python-ideas] No need to add a regex pattern literal

Antoine Pitrou solipsis at pitrou.net
Mon Dec 31 06:23:16 EST 2018


On Thu, 27 Dec 2018 19:48:40 +0800
Ma Lin <malincns at 163.com> wrote:
> We can use this literal to represent a compiled pattern, for example:
> 
>  >>> p"(?i)[a-z]".findall("a1B2c3")  
> ['a', 'B', 'c']
> 
>  >>> compiled = p"(?<=abc)def"
>  >>> m = compiled.search('abcdef')
>  >>> m.group(0)  
> 'def'
> 
>  >>> rp'\W+'.split('Words, words, words.')  
> ['Words', 'words', 'words', '']
> 
> This allows peephole optimizer to store compiled pattern in .pyc file, 
> we can get performance optimization like replacing constant set by 
> frozenset in .pyc file.
> 
> Then such issue [1] can be solved perfectly.
> [1] Optimize base64.b16decode to use compiled regex
> [1] https://bugs.python.org/issue35559

The simple solution to the perceived performance problem (not sure how
much of a problem it is in real life) is to have a stdlib function that
lazily-compiles a regex (*). Just like "re.compile", but lazy: you don't
bear the cost of compiling when simply importing the module, but once
the pattern is compiled, there is no overhead for looking up a global
cache dict.

No need for a dedicated literal.

(*) Let's call it "re.pattern", for example.

Regards

Antoine.




More information about the Python-ideas mailing list