[Python-ideas] PEP 8: raw strings & regular expressions

Nathaniel Smith njs at pobox.com
Wed Oct 21 23:31:07 EDT 2015

On Wed, Oct 21, 2015 at 7:44 PM, Ben Finney <ben+python at benfinney.id.au> wrote:
> Yury Selivanov <yselivanov.ml at gmail.com> writes:
>> In the process, we had to make a decision on how to highlight raw
>> string literals -- r''. Many existing highlighters assume that all raw
>> strings are regexps, and highlight them as such, i.e. '\s' and '\n'
>> will be highlighted.
> That is evidently a simple mistake. Merely knowing that a token is a raw
> string does not justify the assumption that the string is a regular
> expression, or a filesystem entry name, or a line in a network protocol,
> or anything except plain text.

This isn't necessarily true, just as a matter of like... epistemology.
For example, if hypothetically it turned out that 99% of raw strings
are in fact regular expressions, then knowing something is a raw
string would give you quite a bit of evidence that it's a regular
expression -- quite possibly enough to justify treating it as such for
something like code highlighting.

I haven't actually gathered any data to find out how strong the
association between raw strings and regexen is, but it'd be pretty
easy for someone to do. (Parse a large corpus of python code to
extract all raw strings, randomly subsample 100 of them, review
manually to decide if each is a regex.)


More information about the Python-ideas mailing list