<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr">On Sat, Dec 29, 2018 at 12:30 AM Alexander Heger <<a href="mailto:python@2sn.net">python@2sn.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div style="font-family:monospace,monospace">for regular strings one can write</div><div style="font-family:monospace,monospace"><br></div><div style="font-family:monospace,monospace">"aaa" + "bbb"</div><div style="font-family:monospace,monospace"><br></div><div style="font-family:monospace,monospace">which also works for f-strings, r-strings, etc.; in regular expressions, there is, e.g., parameter counting and references to numbered matches.  How would that be dealt with in a compound p-string?  Either it would have to re-compiled or not, either way could lead to unexpected results</div><div style="font-family:monospace,monospace"><br></div><div style="font-family:monospace,monospace">p"(\d)\1" + p"(\s)\1"</div><div style="font-family:monospace,monospace"><br></div><div style="font-family:monospace,monospace">or </div><div style="font-family:monospace,monospace"><br></div><div style="font-family:monospace,monospace">p"^(\w)" + p"^(\d)"</div><div style="font-family:monospace,monospace"><br></div><div style="font-family:monospace,monospace">regular strings can be added, bu the results of p-string could not - well, their are not strings.  </div></div></blockquote><div><br></div><div>Isn't this a feature, not a bug, of encouraging literals to be specified as patterns: addition of patterns would raise an error (as is currently the case for addition of compiled patterns in the re and regex modules)? Currently, I find it easiest to use r-strings for patterns and call re.search() etc. without precompiling them, which means that I could accidentally concatenate two patterns together that would silently produce an unmatchable pattern. Using p-literals for most patterns would mean I have to be explicit in the exceptional case where I do want to assemble a pattern from multiple parts:</div><div><br></div><div><div>FIRSTNAME = p"[A-Z][-A-Za-z']+"</div><div>LASTNAME = p"[-A-Za-z']([-A-Za-z' ]+[-A-Za-z'])?"</div></div><div>FULLNAME = FIRSTNAME + p' ' + LASTNAME # error<br></div><div><br></div><div>FIRSTNAME = r"[A-Z][-A-Za-z']+"</div><div>LASTNAME = r"[-A-Za-z']([-A-Za-z' ]+[-A-Za-z'])?"</div><div>FULLNAME = re.compile(FIRSTNAME + ' ' + LASTNAME) # success</div><div><br></div><div>Another potential advantage is that an ill-formed p-literal (such as a mismatched parenthesis) would be caught immediately, rather than when it is first used. This could pay off, for example, if I am defining a data structure with a bunch of regexes that would get used for different input. (But there may be performance tradeoffs here.)<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div style="font-family:monospace,monospace">This brings me to the point that  </div><div style="font-family:monospace,monospace">the key difference is that f- and r- strings actually return strings, whereas p- string would return a different kind of object. </div><div style="font-family:monospace,monospace">That would seem certainly very confusing to novices - and also for the language standard as a whole.</div><div style="font-family:monospace,monospace"><br></div></div></blockquote><div><br></div><div>The b prefix produces a bytes literal. Is a bytes object a kind of string, more so than a regex pattern is? I could see an argument that bytes is a particular encoding of sequential character data, whereas a regex pattern represents a string *language*, i.e. an abstraction over string data. But...this distinction starts to feel very theoretical rather than practical. If novices are expected to read code with regular expressions in it, why would they have trouble understanding that the "p" prefix means "pattern"?<br></div><div><br></div><div>As someone who works with text a lot, I think there's a decent practicality-beats-purity argument in favor of p-literals, which would make regex operations more easily accessible and prevent patterns from being mixed up with string data.<br></div><div><br></div><div>A potential downside, though, is that it will be tempting to introduce flags as prefixes, too. Do we want to go down the road of pui"my Unicode-compatible case-insensitive pattern"?</div><div><br></div><div>Nathan<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div style="font-family:monospace,monospace"></div><div style="font-family:monospace,monospace">-Alexander</div><div style="font-family:monospace,monospace"><br></div><div style="font-family:monospace,monospace"><br></div></div>

_______________________________________________<br>

Python-ideas mailing list<br>

<a href="mailto:Python-ideas@python.org" target="_blank">Python-ideas@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/python-ideas" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/python-ideas</a><br>

Code of Conduct: <a href="http://python.org/psf/codeofconduct/" rel="noreferrer" target="_blank">http://python.org/psf/codeofconduct/</a><br>

</blockquote></div></div>