<div dir="ltr">Heh. The good old manual approach. :-) How bad indeed?<br><br>>>> from idlelib import colorizer; colorizer.make_pat()<br>from idlelib import colorizer; colorizer.make_pat()<br>'\\b(?P<KEYWORD>False|None|True|and|as|assert|break|class|continue|def|del|elif|else|except|finally|for|from|global|if|import|in|is|lambda|nonlocal|not|or|pass|raise|return|try|while|with|yield)\\b|([^.\'\\"\\\\#]\\b|^)(?P<BUILTIN>ArithmeticError|AssertionError|AttributeError|BaseException|BlockingIOError|BrokenPipeError|BufferError|BytesWarning|ChildProcessError|ConnectionAbortedError|ConnectionError|ConnectionRefusedError|ConnectionResetError|DeprecationWarning|EOFError|Ellipsis|EnvironmentError|Exception|FileExistsError|FileNotFoundError|FloatingPointError|FutureWarning|GeneratorExit|IOError|ImportError|ImportWarning|IndentationError|IndexError|InterruptedError|IsADirectoryError|KeyError|KeyboardInterrupt|LookupError|MemoryError|ModuleNotFoundError|NameError|NotADirectoryError|NotImplemented|NotImplementedError|OSError|OverflowError|PendingDeprecationWarning|PermissionError|ProcessLookupError|RecursionError|ReferenceError|ResourceWarning|RuntimeError|RuntimeWarning|StopAsyncIteration|StopIteration|SyntaxError|SyntaxWarning|SystemError|SystemExit|TabError|TimeoutError|TypeError|UnboundLocalError|UnicodeDecodeError|UnicodeEncodeError|UnicodeError|UnicodeTranslateError|UnicodeWarning|UserWarning|ValueError|Warning|ZeroDivisionError|abs|all|any|ascii|bin|bool|bytearray|bytes|callable|chr|classmethod|compile|complex|copyright|credits|delattr|dict|dir|divmod|enumerate|eval|exec|exit|filter|float|format|frozenset|getattr|globals|hasattr|hash|help|hex|id|input|int|isinstance|issubclass|iter|len|license|list|locals|map|max|memoryview|min|next|object|oct|open|ord|pow|print|property|quit|range|repr|reversed|round|set|setattr|slice|sorted|staticmethod|str|sum|super|tuple|type|vars|zip)\\b|(?P<COMMENT>#[^\\n]*)|(?P<STRING>(?i:\\br|u|f|fr|rf|b|br|rb)?\'\'\'[^\'\\\\]*((\\\\.|\'(?!\'\'))[^\'\\\\]*)*(\'\'\')?|(?i:\\br|u|f|fr|rf|b|br|rb)?"""[^"\\\\]*((\\\\.|"(?!""))[^"\\\\]*)*(""")?|(?i:\\br|u|f|fr|rf|b|br|rb)?\'[^\'\\\\\\n]*(\\\\.[^\'\\\\\\n]*)*\'?|(?i:\\br|u|f|fr|rf|b|br|rb)?"[^"\\\\\\n]*(\\\\.[^"\\\\\\n]*)*"?)|(?P<SYNC>\\n)'<br>>>><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Apr 2, 2018 at 11:32 AM, MRAB <span dir="ltr"><<a href="mailto:python@mrabarnett.plus.com" target="_blank">python@mrabarnett.plus.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 2018-04-02 05:43, Guido van Rossum wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
My question for you: how on earth did you find this?! Speaking of a needle in a haystack. Did you run some kind of analysis program that looks for regexprs? (We've received some good reports from someone who did that looking for possible DoS attacks.)<br>
<br>
</blockquote></span>
The thread was about string prefixes.<br>
<br>
Terry Reedy wrote "IDLE's colorizer does its parsing with a giant regex."<br>
<br>
I wondered: "How bad could it be?" (It's smaller now that the IGNORECASE flag can have a local scope.)<br>
<br>
It wasn't hard to find because it was in a file called "colorizer.py" in a folder called "idlelib".<div class="HOEnZb"><div class="h5"><br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On Sun, Apr 1, 2018 at 6:49 PM, MRAB <<a href="mailto:python@mrabarnett.plus.com" target="_blank">python@mrabarnett.plus.com</a> <mailto:<a href="mailto:python@mrabarnett.plus.com" target="_blank">python@mrabarnett.plus<wbr>.com</a>>> wrote:<br>
<br>
    A thread on python-ideas is talking about the prefixes of string<br>
    literals, and the regex used in IDLE.<br>
<br>
    Line 25 of Lib\idlelib\colorizer.py is:<br>
<br>
        stringprefix = r"(?i:\br|u|f|fr|rf|b|br|rb)?"<br>
<br>
    which looks slightly wrong to me.<br>
<br>
    The \b will apply only to the first choice.<br>
<br>
    Shouldn't it be more like:<br>
<br>
        stringprefix = r"(?:\b(?i:r|u|f|fr|rf|b|br|rb<wbr>))?"<br>
<br>
    ?<br>
<br>
</blockquote>
<br>
______________________________<wbr>_________________<br>
Python-Dev mailing list<br>
<a href="mailto:Python-Dev@python.org" target="_blank">Python-Dev@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/python-dev" rel="noreferrer" target="_blank">https://mail.python.org/mailma<wbr>n/listinfo/python-dev</a><br>
Unsubscribe: <a href="https://mail.python.org/mailman/options/python-dev/guido%40python.org" rel="noreferrer" target="_blank">https://mail.python.org/mailma<wbr>n/options/python-dev/guido%<wbr>40python.org</a><br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature">--Guido van Rossum (<a href="http://python.org/~guido" target="_blank">python.org/~guido</a>)</div>
</div>