[Python-ideas] PEP 8: raw strings & regular expressions

Andrew Barnert abarnert at yahoo.com
Mon Oct 26 07:55:55 EDT 2015


On Oct 26, 2015, at 04:33, Ned Batchelder <ned at nedbatchelder.com> wrote:
> 
> 
>> On 10/26/15 3:23 AM, Alexander Walters wrote:
>> 
>>> On 10/23/2015 14:40, Ned Batchelder wrote:
>>>> On 10/22/15 6:56 PM, Yury Selivanov wrote:
>>>> In principle, there is no reason why *both* of these groups 
>>>> of users can't use one tool and be happy.  I propose to 
>>>> establish a convention in PEP 8, explaining that, while both 
>>>> literals are semantically equivalent, 
>>>> 
>>>> - r'..' strings *should* be used for regexps, 
>>>> 
>>>> - R'..' strings *should* be used for unstyled raw strings, 
>>>> 
>>>> and tools *should* treat them as such. 
>>>> 
>>>> All of this is merely about codifying the current status quo.
>>> But you are not codifying the status quo.  The distinction you are proposing is one that you have invented.  I have never used R"" strings.
>>> 
>>> I think the best solution to the problem is to improve the highlighters, and luckily you have written one!  To me, it is clear which of these strings is the regex:
>>> 
>>>     r"\d+"
>>>     r"\dir"
>>> 
>>> If the highlighters tried some heuristics, they could do a better job "being helpful" by making better guesses about the meaning of programs.  I don't mind when highlighters make wrong guesses, as long as they don't ruin the entire rest of the file.  But better guesses will be better. :)
>>> 
>>> --Ned.
>> 
>> it should be noted that most regexes are also valid paths on NTFS.  is r'\dir[a-zA-Z0-9]\\' a path or a regex?
> I understand developers' penchant for getting everything precisely right and accounting for the darkest of corners and the farthest reaches of obscure edge cases.  But I'm talking about making a reasonable guess.  If the string contains square brackets, especially paired brackets with hyphens inside, it's probably a regex.

From working on music tagging software, I can tell you that an awful lot of users have mp3s with square brackets, hyphens, and other such things in their filenames, so if your software makes any assumptions about what filenames look like, their libraries will break your software.

And to verify that this isn't some weird artifact of the way people used to name files on piracy networks back when people traded individual songs, I went to The Pirate Bay and checked the most popular current download in any category, and its first file is named:

    [ www.CpasBien.pw ] Tomorrowland.2015.TRUEFRENCH.BDRip.VxiD-EXTREME.avi

So, I don't think you can assume that paired square brackets or hyphens mean something is not a Windows pathname.

Of course with a wide enough corpus of filenames people have to deal with, you could come up with a better heuristic. (Not many regexes have character classes that are dotted domain names, or match a standard language code followed by "-sub", or most of the other examples I see from a quick scan.) But just guessing based on what you guess filenames are like without looking around is not going to get you that far.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20151026/2ea0decc/attachment.html>


More information about the Python-ideas mailing list