[Python-ideas] PEP 8: raw strings & regular expressions

Fri Oct 23 12:14:37 EDT 2015

On 22 October 2015 at 05:31, Nathaniel Smith <njs at pobox.com> wrote:
> On Wed, Oct 21, 2015 at 7:44 PM, Ben Finney <ben+python at benfinney.id.au> wrote:
>> Yury Selivanov <yselivanov.ml at gmail.com> writes:
>>> In the process, we had to make a decision on how to highlight raw
>>> string literals -- r''. Many existing highlighters assume that all raw
>>> strings are regexps, and highlight them as such, i.e. '\s' and '\n'
>>> will be highlighted.
>>
>> That is evidently a simple mistake. Merely knowing that a token is a raw
>> string does not justify the assumption that the string is a regular
>> expression, or a filesystem entry name, or a line in a network protocol,
>> or anything except plain text.
>
> This isn't necessarily true, just as a matter of like... epistemology.
> For example, if hypothetically it turned out that 99% of raw strings
> are in fact regular expressions, then knowing something is a raw
> string would give you quite a bit of evidence that it's a regular
> expression -- quite possibly enough to justify treating it as such for
> something like code highlighting.
>
> I haven't actually gathered any data to find out how strong the
> association between raw strings and regexen is, but it'd be pretty
> easy for someone to do. (Parse a large corpus of python code to
> extract all raw strings, randomly subsample 100 of them, review
> manually to decide if each is a regex.)

I'd expect Windows filesystem paths to win handily if we could scan
all the Python code in the world, but they wouldn't show up in a scan
of POSIX specific open source code.

Cheers,
Nick

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia