[Python-ideas] Add regex pattern literal p""

Steven D'Aprano steve at pearwood.info
Sat Dec 29 01:52:44 EST 2018


On Sat, Dec 29, 2018 at 04:29:32PM +1100, Alexander Heger wrote:
> for regular strings one can write
> 
> "aaa" + "bbb"
> 
> which also works for f-strings, r-strings, etc.; in regular expressions,
> there is, e.g., parameter counting and references to numbered matches.  How
> would that be dealt with in a compound p-string?  Either it would have to
> re-compiled or not, either way could lead to unexpected results

What does Perl do?



> p"(\d)\1" + p"(\s)\1"

Since + is used for concatenation, then that would obviously be the 
same as:

p"(\d)\1(\s)\1"

Whether it gets done at compile-time or run-time depends on how smart 
the keyhole optimiser is. If it is smart enough to recognise regex 
literals, it could fold the two strings together and regex-compile them 
at python-compile time, otherwise it could be equivalent to:

_t1 = re.compile(r"(\d)\1")  # compile-time
_t2 = re.compile(r"(\s)\1")  # compile-time
re.compile(_t1.pattern + _t2.pattern)  # run-time


Obviously that defeats the purpose of using a p"" pre-compiled regex 
object, but the answer to that is either:

1. Don't do that then; or
2. We better make sure the keyhole optimizer is smarter.

Or we just ban concatenation. "P-strings" aren't strings, even though 
they look like them.


> This brings me to the point that
> the key difference is that f- and r- strings actually return strings,

To be precise, f-"strings" are actually code that returns a string when 
executed at runtime; r-strings are literal syntax for strings.

> whereas p- string would return a different kind of object.
> That would seem certainly very confusing to novices - and also for the
> language standard as a whole.

Indeed. Perhaps something like

\\regex\\

would be better, *if* this feature is desired.


-- 
Steve


More information about the Python-ideas mailing list