On 4/1/2018 10:20 PM, Tim Peters wrote:
[MRAB
[ A thread on python-ideas is talking about the prefixes of string literals, and the regex used in IDLE.
Line 25 of Lib\idlelib\colorizer.py is:
stringprefix = r"(?i:\br|u|f|fr|rf|b|br|rb)?"
which looks slightly wrong to me.
This must be a holdover from years ago, before I was involved. I have wondered about it but left it as is. Thanks for confirming that it is not right.
The \b will apply only to the first choice.
Shouldn't it be more like:
stringprefix = r"(?:\b(?i:r|u|f|fr|rf|b|br|rb))?"
?
See below.
I believe the change would capture its real intent. It doesn't seem to matter a whole lot, though - IDLE isn't a syntax checker, and applies heuristics to color on the fly based on best guesses. As is, if you type this fragment into an IDLE shell:
kr"sdf"
only the last 5 characters get "string colored", presumably because of the leading \br in the original regexp. But if you type in
ku"sdf"
the last 6 characters get "string colored", because - as you pointed out - the \b part of the original regexp has no effect on anything other than the r following \b.
I tested with uf versus ur, which are both plausibly legal but are not.
But in neither case is the fragment legit Python. If you do type in legit Python, it makes no difference (legit string literals always start at a word boundary, regardless of whether the regexp checks for that).
I want uniform behavior. I decided to drop the \b because I prefer coloring the maximal legal string rather than the minimum. I think the contrast between two chars legal by themselves, but differently colored when put together, makes the bug more obvious. https://bugs.python.org/issue33204 -- Terry Jan Reedy