Mailman 3 What to do about invalid escape sequences - Python-Dev

What to do about invalid escape sequences

older
It all works - wheels for AIX - up...

raymond.hettinger＠gmail.com

4 Aug 2019 4 Aug '19

11:22 p.m.

We should revisit what we want to do (if anything) about invalid escape sequences. For Python 3.8, the DeprecationWarning was converted to a SyntaxWarning which is visible by default. The intention is to make it a SyntaxError in Python 3.9. This once seemed like a reasonable and innocuous idea to me; however, I've been using the 3.8 beta heavily for a month and no longer think it is a good idea. The warning crops up frequently, often due to third-party packages (such as docutils and bottle) that users can't easily do anything about. And during live demos and student workshops, it is especially distracting. I now think our cure is worse than the disease. If code currently has a non-raw string with '\latex', do we really need Python to yelp about it (for 3.8) or reject it entirely (for 3.9)? If someone can't remember exactly which special characters need to be escaped, do we really need to stop them in their tracks during a data analysis session? Do we really need to reject ASCII art in docstrings: ` \-------> special case'? IIRC, the original problem to be solved was false positives rather than false negatives: filename = '..\training\new_memo.doc'. The warnings and errors don't do (and likely can't do) anything about this. If Python 3.8 goes out as-is, we may be punching our users in the nose and getting almost no gain from it. ISTM this is a job best left for linters. For a very long time, Python has been accepting the likes of 'more \latex markup' and has been silently converting it to 'more \\latex markup'. I now think it should remain that way. This issue in the 3.8 beta releases has been an almost daily annoyance for me and my customers. Depending on how you use Python, this may not affect you or it may arise multiple times per day. Raymond P.S. Before responding, it would be a useful exercise to think for a moment about whether you remember exactly which characters must be escaped or whether you habitually put in an extra backslash when you aren't sure. Then see: https://bugs.python.org/issue32912

Show replies by date

Chris Angelico

5 Aug 5 Aug

3:07 a.m.

On Mon, Aug 5, 2019 at 2:24 PM <raymond.hettinger@gmail.com> wrote:

...

For Python 3.8, the DeprecationWarning was converted to a SyntaxWarning which is visible by default. The intention is to make it a SyntaxError in Python 3.9.

This once seemed like a reasonable and innocuous idea to me; however, I've been using the 3.8 beta heavily for a month and no longer think it is a good idea. The warning crops up frequently, often due to third-party packages (such as docutils and bottle) that users can't easily do anything about. And during live demos and student workshops, it is especially distracting.

Those third-party packages will need to be fixed in time for 3.9. This isn't the first time this has happened.

...

I now think our cure is worse than the disease. If code currently has a non-raw string with '\latex', do we really need Python to yelp about it (for 3.8) or reject it entirely (for 3.9)? If someone can't remember exactly which special characters need to be escaped, do we really need to stop them in their tracks during a data analysis session? Do we really need to reject ASCII art in docstrings: ` \-------> special case'?

IIRC, the original problem to be solved was false positives rather than false negatives: filename = '..\training\new_memo.doc'. The warnings and errors don't do (and likely can't do) anything about this.

Many MANY file names will include some that are and some that aren't. If you're lucky, it'll begin "C:\Users" and you'll get an immediate hard error; but if it starts out "C:\Program Files", this warning is what's going to catch it. (Still true in lowercase for both of those.)

...

If Python 3.8 goes out as-is, we may be punching our users in the nose and getting almost no gain from it. ISTM this is a job best left for linters. For a very long time, Python has been accepting the likes of 'more \latex markup' and has been silently converting it to 'more \\latex markup'. I now think it should remain that way. This issue in the 3.8 beta releases has been an almost daily annoyance for me and my customers. Depending on how you use Python, this may not affect you or it may arise multiple times per day.

P.S. Before responding, it would be a useful exercise to think for a moment about whether you remember exactly which characters must be escaped or whether you habitually put in an extra backslash when you aren't sure. Then see: https://bugs.python.org/issue32912

With alphabetics, I never put in an extra backslash just to be sure. My habit is to escape the quotation mark used to delimit the string, the escape character itself, and nothing else, unless I'm actually constructing a deliberate escape sequence. Except in a regular expression, and then I'm inevitably bitten by differences between grep, sed, Python, etc, etc, etc, so I have to just test it anyway. What does Python actually gain by allowing errors to pass silently? Are there any other languages that allow arbitrary backslash sequences through unchanged? I tested Lua (syntax error), Pike (syntax warning), GCC in both C and C++ (warning), and Node.js (silent). Incidentally, I haven't found any other language in which "asdf\qwer" == "asdf\\qwer"; they're always just ignoring the backslash altogether. IMO Python's solution is better, but Lua's is best. A bad escape is an error. ChrisA

raymond.hettinger＠gmail.com

3:11 p.m.

Thanks for looking at other languages do. It gives some hope that this won't end-up being a usability fiasco.

Serhiy Storchaka

3:53 a.m.

05.08.19 07:22, raymond.hettinger@gmail.com пише:

...

We should revisit what we want to do (if anything) about invalid escape sequences.

For Python 3.8, the DeprecationWarning was converted to a SyntaxWarning which is visible by default. The intention is to make it a SyntaxError in Python 3.9.

I do not think there is such intention. I think this warning can be kept for a long time.

...

This once seemed like a reasonable and innocuous idea to me; however, I've been using the 3.8 beta heavily for a month and no longer think it is a good idea. The warning crops up frequently, often due to third-party packages (such as docutils and bottle) that users can't easily do anything about. And during live demos and student workshops, it is especially distracting.

Since the bytecode is cached, the warning in a third-party code is emitted at most once: when you install a package or when use it the first time. The warning in your code is emitted every time when you change it, until you fix it. In contrary, other deprecation warnings (for example a one about importing Mapping from collections) is emitted on every run.

...

I now think our cure is worse than the disease. If code currently has a non-raw string with '\latex', do we really need Python to yelp about it (for 3.8) or reject it entirely (for 3.9)?

Yes, because it is very likely that there is something like '\arrow' or '\newline' in the same string literal or in other string literals in the same file. I follow for fixes of incompatibilities with new Python versions in third-party projects, and it looks to me, that in many (if not most) cases a warning about invalid escape sequences exposes a real bug. So there is a real benefit from such warnings.

...

If someone can't remember exactly which special characters need to be escaped, do we really need to stop them in their tracks during a data analysis session?

The only character which must be escaped is a backslash. And a quote if it happens to match a quote used for creating a string literal. It is a simple rule.

...

Do we really need to reject ASCII art in docstrings: ` \-------> special case'?

Do we reject ASCII art like '\xxx/'? A backslash should be escaped, otherwise use raw string literals.

raymond.hettinger＠gmail.com

8:47 p.m.

End-user experience isn't something that can just be argued away. Steve and I are reporting a recurring annoyance. The point of a beta release is to elicit these kinds of reports so they can be addressed before it is too late. ISTM you are choosing not to believe the early feedback and don't want to provide a mitigation. This decision reverses 25+ years of Python practice and is the software equivalent of telling users "you're holding it wrong". Instead of an awareness campaign to use the silent-by-default warnings, we're going directly towards breaking working code. That seems pretty user hostile to me. Chris's language survey one shows only language, Lua, that treated this an error. For compiled languages that emit warnings, the end-user will never see those warning so there is no end-user consequence. In our case though, end-users will see the messages and may not have an ability to do anything about it. I wish people with more product management experience would chime in; otherwise, 3.8 is going to ship with an intentional hard-to-ignore annoyance on the premise that we don't like the way people have been programming and that they need to change their code even if it was working just fine.

Chris Angelico

9:20 p.m.

On Tue, Aug 6, 2019 at 11:48 AM <raymond.hettinger@gmail.com> wrote:

...

End-user experience isn't something that can just be argued away. Steve and I are reporting a recurring annoyance. The point of a beta release is to elicit these kinds of reports so they can be addressed before it is too late. ISTM you are choosing not to believe the early feedback and don't want to provide a mitigation.

This decision reverses 25+ years of Python practice and is the software equivalent of telling users "you're holding it wrong". Instead of an awareness campaign to use the silent-by-default warnings, we're going directly towards breaking working code. That seems pretty user hostile to me.

Chris's language survey one shows only language, Lua, that treated this an error. For compiled languages that emit warnings, the end-user will never see those warning so there is no end-user consequence. In our case though, end-users will see the messages and may not have an ability to do anything about it.

But my (tiny) survey also found that most languages treat "asdf\qwer" as the same as "asdfqwer", not "asdf\\qwer". So if you're going to use that data to guide you, then Python has been flat-out wrong for those 25+ years. IMO Python's approach of inserting the backslashes was on par with allowing b"asdf" and u"asdf" to compare equal. In one common case, it magically does the right thing; but then it leads to data-dependent bugs elsewhere (eg "C:\Program Files" works but "C:\Users" doesn't). The ONLY reason this is causing pain is that there are fragile programs out there that currently happen to work, but might easily break at any time. Having tried to remotely diagnose bugs in my students' code, I would be *much* happier with an immediate error that tells them what's going wrong. One of the hardest things to debug (even for an expert, never mind about a novice) is the situation where an insignificant change completely breaks your code - "all I did was rename the file and change the open() call". The price is that there are some noisy errors when code is probably buggy. ChrisA

Toshio Kuratomi

7 Aug 7 Aug

11:55 p.m.

On Mon, Aug 5, 2019 at 6:47 PM <raymond.hettinger@gmail.com> wrote:

...

I wish people with more product management experience would chime in; otherwise, 3.8 is going to ship with an intentional hard-to-ignore annoyance on the premise that we don't like the way people have been programming and that they need to change their code even if it was working just fine.

I was resisting weighing in since I don't know the discussion around deprecating this language feature in the first place (other than what's given in this thread). However, in the product I work on we made a very similar change in our last release so I'll throw it out there for people to take what they will from it. We have a long standing feature which allows people to define groups of hosts and give them a name. In the past that name could include dashes, dots, and other characters which are not legal as Python identifiers. When users use those group names in our "DSL" (not truly a DSL but close enough), they can do it using either dictionary-lookup syntax (groupvars['groupname']) or using dotted attribute notation groupvars.groupname. We also have a longstanding problem where users will try to do something like groupvars.group-name.... using the dotted attribute notation with group names that aren't proper python identifiers. This causes problems as the name then gets split on the characters that aren't legal in identifiers and results in something unexpected (undefined variable, an actual subtraction operation, etc). In our last release we decided to deprecate and eventually make it illegal to use non-python-identifiers for the group names. At first, product management *did* let us get away with this. But after some time and usage of the pre-releases, they came to realize that this was a major problem. User's had gotten used to being able to use these characters in their group names. They had defined their group names and gotten used to typing their group names and built up a whole body of playbooks that used these group names.... Product management still let us get away with this.. sort of. The scope of the change was definitely modified. Users were now allowed to select whether invalid group names were disallowed (so they could port their installations), allowed with a warning (presumably so they could do work but also see that they were affected) or allow without a warning (presumably because they knew not to use these group names with dotted attribute notation) . This feature was also no longer allowed to be deprecated... We could have a warning that said "Don't do this" but not remove the feature in the future. Now... I said this was a config option.... So what we do have in the release is that the config option allows but warns by default and *the config option* has a deprecation warning. You see... we're planning on changing from warn by default now to disallowing by default in the future so the deprecation is flagging the change in config value. And you know what? User's absolutely hate this. They don't like the warning. They don't like the implication that they're doing something wrong by using a long-standing feature. They don't like that we're going to change the default so that they're current group names will break. They dislike that it's being warned about because of attribute-lookup-notation which they can just learn not to use with their group names. They dislike this so much that some of us have talked about abandoning this idea... instead, having a public group name that users use when they write in the "DSL" and an internal group name that we use when evaluating the group names. Perhaps that works, perhaps it doesn't, but I think that's where my story starts being specific to our feature and no longer applicable to Python and escape sequences.... Now like I said, I don't know the discussions that lead to invalid escape sequences being deprecated so I don't know whether there's more compelling reasons for doing it but it seems to me that there's even less to gain by doing this than what we did in Ansible. The thing Ansible is complaining about can do the wrong thing when used in conjunction with certain other features of our "DSL". The thing that the python escape sequences is complaining about are never invalid (As was pointed out, it's complaining when a sequence of two characters will do what the user intended rather than complaining when a sequence of two characters will do something that the user did not intend). Like the Ansible feature, though, the problem is that over time we've discovered that it is hard to educate users about the exact characteristic of the feature (\k == k but \n == newline; groupvars['group-name'] works but groupvars.group-name does not) so we've both given up on continuing to educate the users in favor of attempting to nanny the user into not using the feature. That most emphatically has not worked for us and has spent a bunch of goodwill with our users but the python userbase is not necessarily the same as ours so the audience may not be as tough to please. -Toshio P.S. One entry point to "discussions" about the Ansible feature: https://github.com/ansible/ansible/issues/56930 That bug starts off just asking for better documentation but becomes embroiled with other asks and just general dislike of the idea as a whole... getting linked to other issues, PRs, and referenced by mailing list threads..... The dislike of this feature really is a many headed hydra).

Serhiy Storchaka

8 Aug 8 Aug

12:32 a.m.

08.08.19 07:55, Toshio Kuratomi пише:

...

Like the Ansible feature, though, the problem is that over time we've discovered that it is hard to educate users about the exact characteristic of the feature (\k == k but \n == newline;

No, \k == \\k. This differs from most other programming languages.

Facundo Batista

5 Aug 5 Aug

9:36 a.m.

El lun., 5 de ago. de 2019 a la(s) 01:25, <raymond.hettinger@gmail.com> escribió:

...

We should revisit what we want to do (if anything) about invalid escape sequences.

For Python 3.8, the DeprecationWarning was converted to a SyntaxWarning which is visible by default. The intention is to make it a SyntaxError in Python 3.9.

What about allowing in 3.8 to from __future__ import this SyntaxWarning (so anybody can start kicking own code's tires), then 3.9 could make it a real SyntaxWarning, and 4.0 (3.10?) make it a SyntaxError? Of course, in a year or so we could decide to delay the last two steps even more... Regards, -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org.ar/ Twitter: @facundobatista

Paul Moore

10:11 a.m.

On Mon, 5 Aug 2019 at 15:44, Facundo Batista <facundobatista@gmail.com> wrote:

...

El lun., 5 de ago. de 2019 a la(s) 01:25, <raymond.hettinger@gmail.com> escribió:

...
We should revisit what we want to do (if anything) about invalid escape sequences.

For Python 3.8, the DeprecationWarning was converted to a SyntaxWarning which is visible by default. The intention is to make it a SyntaxError in Python 3.9.

What about allowing in 3.8 to from __future__ import this SyntaxWarning (so anybody can start kicking own code's tires), then 3.9 could make it a real SyntaxWarning, and 4.0 (3.10?) make it a SyntaxError?

Do existing linters flag suspicious/incorrect use of escapes? Could (or should) they? I suspect that a future import will be used by a lot fewer projects than would benefit from this being flagged by linters. Hmm, it looks like linters do flag this: pycodestyle (used by flake8) - https://pep8.readthedocs.io/en/latest/intro.html#error-codes pylint - https://pylint.readthedocs.io/en/latest/technical_reference/features.html#st... This begs the question why libraries haven't already fixed this :-(

...

Of course, in a year or so we could decide to delay the last two steps even more...

It's way too easy to simply get into a cycle of repeatedly delaying something like this. IMO the only point in delaying the switch to a SyntaxWarning would be if there were a way to encourage libraries to fix the problem in the extra time before the switch occurred (to minimise the effect on end users). But from the look of it, that's not going to happen - even linter warnings don't seem to make much difference there :-( So I don't see that a delay (or a future import) would actually help much. But I still suspect that we're in for a painful period where users get warnings from libraries that haven't been fixed. All of the above ignores the question of whether we should make invalid escapes generate a (visible) warning. I've seen enough errors caused by the "false positives" (\U or \n in Windows filenames, for example) that I'm inclined to be strict (even though I accept that strictness hits "false negatives" rather than "false positives"). And I really don't see why it's so hard to fix code - use raw strings for docstrings, and for Windows filenames, and always double backslashes in "simple" strings unless you know you mean to use an escape. And use a linter to check when you make a typo! Paul

Steve Dower

10:39 a.m.

On 04Aug2019 2122, raymond.hettinger@gmail.com wrote:

...

We should revisit what we want to do (if anything) about invalid escape sequences.

For Python 3.8, the DeprecationWarning was converted to a SyntaxWarning which is visible by default. The intention is to make it a SyntaxError in Python 3.9.

This once seemed like a reasonable and innocuous idea to me; however, I've been using the 3.8 beta heavily for a month and no longer think it is a good idea. The warning crops up frequently, often due to third-party packages (such as docutils and bottle) that users can't easily do anything about. And during live demos and student workshops, it is especially distracting.

I now think our cure is worse than the disease. If code currently has a non-raw string with '\latex', do we really need Python to yelp about it (for 3.8) or reject it entirely (for 3.9)? If someone can't remember exactly which special characters need to be escaped, do we really need to stop them in their tracks during a data analysis session? Do we really need to reject ASCII art in docstrings: ` \-------> special case'?

IIRC, the original problem to be solved was false positives rather than false negatives: filename = '..\training\new_memo.doc'. The warnings and errors don't do (and likely can't do) anything about this.

I broadly agree that the warning is very annoying, particularly when it comes from third-party packages (I see it from some of pip's vendored dependencies all the time), though I do also see many people bitten by FileNotFoundError because of a '\n' in their filename. Raymond - a question if I may. How often do you see these occurring from docstrings, compared to regular strings? I feel like I only ever see the irrelevant warnings being raised from docstrings, so if others confirm this perhaps there's a way we could suppress the warnings where the string is the entire expression? Cheers, Steve

raymond.hettinger＠gmail.com

3:30 p.m.

...

I broadly agree that the warning is very annoying, particularly when it comes from third-party packages (I see it from some of pip's vendored dependencies all the time),

The same here as well. The other annoyance is that it pops up during live demos, student teaching sessions, and during ipython data analysis in a way that becomes a distractor and makes Python look and feel like it is broken. I haven't found a since case where it improved the user experience.

...

though I do also see many people bitten by FileNotFoundError because of a '\n' in their filename.

Yes, I've seen that as well. Unfortunately, the syntax warning or error doesn't detect that case. It only complains about invalid sequences which weren't the actual problem we were trying to solve. The new warning soon-to-be error breaks code that currently works but is otherwise innocuous.

...

Raymond - a question if I may. How often do you see these occurring from docstrings, compared to regular strings?

About half. Thanks for weighing in. I think this is an important usability discussion. IMO it is the number one issue affecting the end user experience with this release. If we could get more people to actively use the beta release, the issue would stand-out front and center. But if people don't use the beta in earnest, we won't have confirmation until it is too late. We really don't have to go this path. Arguably, the implicit conversion of '\latex' to '\\latex' is a feature that has existed for three decades, and now we're deciding to turn it off to define existing practices as errors. I don't think any commercial product manager would allow this to occur without a lot of end user testing. Raymond P.S. In the world of C compilers, I suspect that if the relatively new compiler warnings were treated as errors, the breakage would be widespread. Presumably that's why they haven't gone down this road.

Neil Schemenauer

6 Aug 6 Aug

11:59 a.m.

Making it an error so soon would be mistake, IMHO. That will break currently working code for small benefit. When Python was a young language with a few thousand users, it was easier to make these kinds of changes. Now, we should be much more conservative and give people a long time and a lot of warning. Ideally, we should provide tools to fix code if possible. Could PyPI and pip gain the ability to warn and even fix these issues? Having a warning from pip at install time could be better than a warning at import time. If linting was built into PyPI, we could even do a census to see how many packages would be affected by turning it into an error. On 2019-08-05, raymond.hettinger@gmail.com wrote:

...

P.S. In the world of C compilers, I suspect that if the relatively new compiler warnings were treated as errors, the breakage would be widespread. Presumably that's why they haven't gone down this road.

The comparision with C compilers is relevant. C and C++ represent a fairly extreme position on not breaking working code. E.g. K & R style functional declarations were supported for decades. I don't think we need to go quite that far but also one or two releases is not enough time. Regards, Neil

Steve Dower

12:26 p.m.

On 06Aug2019 0959, Neil Schemenauer wrote:

...

Could PyPI and pip gain the ability to warn and even fix these issues? Having a warning from pip at install time could be better than a warning at import time. If linting was built into PyPI, we could even do a census to see how many packages would be affected by turning it into an error.

If the "generate .pyc" step is used, then it should trigger warnings at install time. Hopefully pip does not suppress these (nor treat them as fatal), though that would be one satisfactory way of hiding third-party warnings from end users. As Serhiy said, once the .pyc exists, you won't see this warning again. Cheers, Steve

Gregory P. Smith

5:52 p.m.

On Tue, Aug 6, 2019 at 10:06 AM Neil Schemenauer <nas-python@arctrix.com> wrote:

...

Making it an error so soon would be mistake, IMHO. That will break currently working code for small benefit. When Python was a young language with a few thousand users, it was easier to make these kinds of changes. Now, we should be much more conservative and give people a long time and a lot of warning. Ideally, we should provide tools to fix code if possible.

Could PyPI and pip gain the ability to warn and even fix these issues? Having a warning from pip at install time could be better than a warning at import time. If linting was built into PyPI, we could even do a census to see how many packages would be affected by turning it into an error.

PyPI could warn on or reject packages at upload time when they contain these kinds code compileall -> pyc time of issues. that'd force the issue to be in front of the package owners rather than anyone else. Similarly, automation could proactively scan existing pypi packages claiming py3 compatibility for the issue and reach out to owners. But the issue becomes one of versions: we shouldn't be complaining about older versions of packages to the package owners. yet people are free to list old versions or < constraints in their own packages requirements regardless of what python version they're running on. -gps

...

On 2019-08-05, raymond.hettinger@gmail.com wrote:

...
P.S. In the world of C compilers, I suspect that if the relatively new compiler warnings were treated as errors, the breakage would be widespread. Presumably that's why they haven't gone down this road.

The comparision with C compilers is relevant. C and C++ represent a fairly extreme position on not breaking working code. E.g. K & R style functional declarations were supported for decades. I don't think we need to go quite that far but also one or two releases is not enough time.

Regards,

Neil _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/V2EDFDJG...

Christian Tismer

8 Aug 8 Aug

5:31 a.m.

Hey friends, This is IMHO a great idea. If a package claims to be Python 3.8 compatible, then it has to be correct concerning invalid escapes. A new pip version could perhaps even refuse packages with such literals when it claims to be supporting Python 3.8 . But how can it actually happen that a pre-3.8 package gets installed when you install Python 3.8? Does pip allow installation without a section that defines the allowed versions? Ok, maybe packages are claimed for Python 3.8 and not further checked. But let's assume the third-party things that Raymond sees do _not_ come from pip, but elsewhere. Pre-existing stuff that is somehow copied into the newer Python version? Sure, quite possible! But then it is quite likely that those third-party things still have their creation date from pre-3.8 time. What about the simple heuristic that a Python module with a creation date earlier than xxx does simply not issue the annoying warning? Maybe that already cures the disease in enough cases? just a wild idea - \leave \old \code \untouched -- ciao - \Chris On 06.08.19 18:59, Neil Schemenauer wrote:

...

Making it an error so soon would be mistake, IMHO. That will break currently working code for small benefit. When Python was a young language with a few thousand users, it was easier to make these kinds of changes. Now, we should be much more conservative and give people a long time and a lot of warning. Ideally, we should provide tools to fix code if possible.

Could PyPI and pip gain the ability to warn and even fix these issues? Having a warning from pip at install time could be better than a warning at import time. If linting was built into PyPI, we could even do a census to see how many packages would be affected by turning it into an error.

On 2019-08-05, raymond.hettinger@gmail.com wrote:

...
P.S. In the world of C compilers, I suspect that if the relatively new compiler warnings were treated as errors, the breakage would be widespread. Presumably that's why they haven't gone down this road.

The comparision with C compilers is relevant. C and C++ represent a fairly extreme position on not breaking working code. E.g. K & R style functional declarations were supported for decades. I don't think we need to go quite that far but also one or two releases is not enough time.

Regards,

Neil _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/V2EDFDJG...

-- Christian Tismer :^) tismer@stackless.com Software Consulting : http://www.stackless.com/ Karl-Liebknecht-Str. 121 : https://github.com/PySide 14482 Potsdam : GPG key -> 0xFB7BEE0E phone +49 173 24 18 776 fax +49 (30) 700143-0023

Eric V. Smith

9:09 a.m.

On 8/5/2019 4:30 PM, raymond.hettinger@gmail.com wrote:

...

Thanks for weighing in. I think this is an important usability discussion. IMO it is the number one issue affecting the end user experience with this release. If we could get more people to actively use the beta release, the issue would stand-out front and center. But if people don't use the beta in earnest, we won't have confirmation until it is too late.

We really don't have to go this path. Arguably, the implicit conversion of '\latex' to '\\latex' is a feature that has existed for three decades, and now we're deciding to turn it off to define existing practices as errors. I don't think any commercial product manager would allow this to occur without a lot of end user testing.

As much as I'd love to force this change through [0], it really does seem like we're forcing it. Especially given Nathaniel's point about the discoverability problems with compile-time warnings, I think we should delay a visible warning about this. Possibly in 3.9 we can do something about making these warnings visible at run time, not just compile time. I had a similar problems with f-strings (can't recall the details now, since resolved), and the compile-time-only nature made it difficult to notice. I realize a run-time warning for this would require a fair bit of work that might not be worth it. I think Raymond's point goes beyond this. I think he's proposing that we never make this change. I'm sympathetic to that, too. But the first step is to change 3.8's behavior to not make this visible. That is, we should restore the 3.7 warning behavior. Eric [0] And the real reason I'd like this is so we can add \e

eryk sun

6 Aug 6 Aug

9:34 a.m.

On 8/5/19, Steve Dower <steve.dower@python.org> wrote:

...

though I do also see many people bitten by FileNotFoundError because of a '\n' in their filename.

Thankfully the common filesystems used in Windows reserve ASCII control characters in filenames (except not in stream names or named-pipe names). So a mistaken string literal usually fails with a more obvious ERROR_INVALID_NAME or C EINVAL instead of a mysterious ERROR_FILE_NOT_FOUND or C ENOENT.

Michael

1:14 a.m.

On 05/08/2019 06:22, raymond.hettinger@gmail.com wrote: I have read through (most) of the thread, and visited the issue referenced.

...

We should revisit what we want to do (if anything) about invalid escape sequences. IMHO - revisit is okay, generally - but that was actually done a long time ago. Now it is, to me, just another example of "Python" being indecisive.

For Python 3.8, the DeprecationWarning was converted to a SyntaxWarning which is visible by default. The intention is to make it a SyntaxError in Python 3.9. Sounds like this has been discussed in depth - and decided.

This once seemed like a reasonable and innocuous idea to me; however, I've been using the 3.8 beta heavily for a month and no longer think it is a good idea. The warning crops up frequently, often due to third-party packages (such as docutils and bottle) that users can't easily do anything about. And during live demos and student workshops, it is especially distracting.

Because it is not innocuous? My experience with developers (you mention 3rd party) - is that they are lazy. If something is not up there, "in the face", they will always have a reason to say - tomorrow. Or, perhaps, since this has been a silent issue (and they are too lazy to read "What's new" they do not even know. The "head buried in the sand" sort of thing. As to demo's and workshops - YOU know this - so use it as an example to explain how Python development works and DEPENDS on 3rd party developers paying attention. Yes,I am sure you are concerned about speeding adoption of Python3.latest-is-greatest, but that is not the world. For example, RHEL8 is (coming) out. iirc, they way it comes out it what they intend to support for 10 years - so changes are it will be Python 3.7 (at best) for several years. I have a system with Centos(-7) and it's default python is python2 [root@t430 ~]# python3 bash: python3: command not found... Similar command is: 'python' [root@t430 ~]# python Python 2.7.5 (default, Jun 20 2019, 20:27:34) ...

...

I now think our cure is worse than the disease. If code currently has a non-raw string with '\latex', do we really need Python to yelp about it (for 3.8) or reject it entirely (for 3.9)? If someone can't remember exactly which special characters need to be escaped, do we really need to stop them in their tracks during a data analysis session? Do we really need to reject ASCII art in docstrings: ` \-------> special case'? Simply put - yes, reject. You decided. There is a solution - perhaps boring to implement - but as is mentioned - there are 'linters', so an automated approach is likely possible. If not today, someone will write a module. IIRC, the original problem to be solved was false positives rather than false negatives: filename = '..\training\new_memo.doc'. The warnings and errors don't do (and likely can't do) anything about this. For "filenames" you could, perhaps, make an exception in the calls that use them. e.g., when they are hard-coded in something such as open("..\training\new_memo.doc"). iirc, Windows can (and does) use forward-slash for file names for system calls like open. The "shell" command.exe does not, because it uses "/" the way posix shells use "-" (as in /h and -h for the "option" h).

If Python 3.8 goes out as-is, we may be punching our users in the nose and getting almost no gain from it. ISTM this is a job best left for linters. For a very long time, Python has been accepting the likes of 'more \latex markup' and has been silently converting it to 'more \\latex markup'. I now think it should remain that way. This issue in the 3.8 beta releases has been an almost daily annoyance for me and my customers. Depending on how you use Python, this may not affect you or it may arise multiple times per day.

IMHO - Python will not be punching anyone. Python will be delivering "a promise", being decisive, being clear. Not following through only creates insecurity - will they ever do it? Nah - no guts (these are 3rd-party developers chatting). Users are your friend. If they really want Python3.8+ and they get lots of warning messages - THEY will complain - and be heard - in ways CPython never will (or was). Again - revisit is fine - and I hope my 2 cents helps you stay the course! Michael

Rob Cliffe

10:21 a.m.

On 06/08/2019 07:14:35, Michael wrote:

...

For "filenames" you could, perhaps, make an exception in the calls that use them. e.g., when they are hard-coded in something such as open("..\training\new_memo.doc").

Sorry, that won't work. Strings are parsed at compile time, open() is executed at run-time. And the use of open might be masked by using a synonym for it; open might be shadowed by the program assigning to it; the argument to open() might be an expression such as ("..training\new_memo" + extn) etc., etc. Rob Cliffe

Greg Ewing

5:41 p.m.

Rob Cliffe via Python-Dev wrote:

...

Sorry, that won't work. Strings are parsed at compile time, open() is executed at run-time.

It could check for control characters, which are probably the result of a backslash accident. Maybe even auto-correct them... -- Greg

Steven D'Aprano

5:56 p.m.

On Wed, Aug 07, 2019 at 10:41:25AM +1200, Greg Ewing wrote:

...

Rob Cliffe via Python-Dev wrote:

...
Sorry, that won't work. Strings are parsed at compile time, open() is executed at run-time.

It could check for control characters, which are probably the result of a backslash accident. Maybe even auto-correct them...

http://www.catb.org/jargon/html/D/DWIM.html -- Steven

Rob Cliffe

10 Aug 10 Aug

6:29 a.m.

On 06/08/2019 23:41:25, Greg Ewing wrote:

...

Rob Cliffe via Python-Dev wrote:

...
Sorry, that won't work. Strings are parsed at compile time, open() is executed at run-time.

It could check for control characters, which are probably the result of a backslash accident. Maybe even auto-correct them...

By "It", do you mean open() ? If so: It already checks for control characters, at least with Python 2.7 on Windows:

...

...
...
open('mydir\test') Traceback (most recent call last): File "<stdin>", line 1, in <module> IOError: [Errno 22] invalid mode ('r') or filename: 'mydir\test'

As for auto-correct (presumably "\a" to "\\a", "\b" to "\\b" etc.), I hope you're not serious. "In the face of gibberish, refuse the temptation to show how smart your guessing is."

Steven D'Aprano

6 Aug 6 Aug

6:07 p.m.

On Tue, Aug 06, 2019 at 08:14:35AM +0200, Michael wrote:

...

For "filenames" you could, perhaps, make an exception in the calls that use them. e.g., when they are hard-coded in something such as open("..\training\new_memo.doc").

Functions such as open cannot tell whether their argument was provided as a string literal or a variable or other expression. So if you are thinking we change open() to be something like this: if filename contains control characters, and filename is a literal: fix filename that's not going to work. Also, it is important that open() be able to work with filenames which contain control characters, because some files actually do have control characters in them. They are super-hard to deal with in typical POSIX shells, but easy to work with in Python. Unless Python tries to be "helpful" and auto-corrects (auto-corrupts) filenames for us. I don't want the interpreter trying to guess what I meant and running that instead of what I actually wrote. -- Steven

Ivan Pozdeev

2:28 a.m.

On 05.08.2019 7:22, raymond.hettinger@gmail.com wrote:

...

We should revisit what we want to do (if anything) about invalid escape sequences.

For Python 3.8, the DeprecationWarning was converted to a SyntaxWarning which is visible by default. The intention is to make it a SyntaxError in Python 3.9.

I don't see a problem with the DeprecationWarning approach. It's our standard procedure for removing features. Just make it an error in a later release -- whoever didn't heed the warning can't say we didn't warn them well in advance that this will stop working in the future. It doesn't matter if this was a "perfectly working code": we change Python over time and what worked in the past may stop in the future. That's not something unexpected and we don't guarantee compatibility between minor releases. It looks like either the core team has no unifying vision on what the deprecation process is and/or what each warning is for, or that this was a user experience experiment and Raymond's complaint is a part of its feedback (so the Mysterious Omniscient Council is probably going to incorporate it into other feedback and decide what they want to do next).

...

This once seemed like a reasonable and innocuous idea to me; however, I've been using the 3.8 beta heavily for a month and no longer think it is a good idea. The warning crops up frequently, often due to third-party packages (such as docutils and bottle) that users can't easily do anything about. And during live demos and student workshops, it is especially distracting.

I now think our cure is worse than the disease. If code currently has a non-raw string with '\latex', do we really need Python to yelp about it (for 3.8) or reject it entirely (for 3.9)? If someone can't remember exactly which special characters need to be escaped, do we really need to stop them in their tracks during a data analysis session? Do we really need to reject ASCII art in docstrings: ` \-------> special case'?

IIRC, the original problem to be solved was false positives rather than false negatives: filename = '..\training\new_memo.doc'. The warnings and errors don't do (and likely can't do) anything about this.

If Python 3.8 goes out as-is, we may be punching our users in the nose and getting almost no gain from it. ISTM this is a job best left for linters. For a very long time, Python has been accepting the likes of 'more \latex markup' and has been silently converting it to 'more \\latex markup'. I now think it should remain that way. This issue in the 3.8 beta releases has been an almost daily annoyance for me and my customers. Depending on how you use Python, this may not affect you or it may arise multiple times per day.

Raymond

P.S. Before responding, it would be a useful exercise to think for a moment about whether you remember exactly which characters must be escaped or whether you habitually put in an extra backslash when you aren't sure. Then see: https://bugs.python.org/issue32912

I use raw strings whenever I have literals with backslashes. Very rarely, I need to insert \n-as-newline -- which I do into a regular string. Since the first one appears in regexes and parameters that are Windows paths and the second one in output messages, they do not intersect.

...

_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ZX2JLOZD...

-- Regards, Ivan

Random832

9:26 a.m.

On Mon, Aug 5, 2019, at 00:29, raymond.hettinger@gmail.com wrote:

...

The warning crops up frequently, often due to third-party packages (such as docutils and bottle) that users can't easily do anything about. And during live demos and student workshops, it is especially distracting.

Maybe what we really need is a general solution for warning visibility in third-party packages. Isn't that the whole reason DeprecationWarning isn't visible by default, for that matter?

Matt Billenstein

11:32 a.m.

On Mon, Aug 05, 2019 at 04:22:50AM -0000, raymond.hettinger@gmail.com wrote:

...

This once seemed like a reasonable and innocuous idea to me; however, I've been using the 3.8 beta heavily for a month and no longer think it is a good idea. The warning crops up frequently, often due to third-party packages (such as docutils and bottle) that users can't easily do anything about.

Perhaps those packages could be flagged now via pylint and problems raised with the respective package maintainers before the actual 3.8 release? Checking the top 100 or top 1000 packages on PyPI? m -- Matt Billenstein matt@vazor.com http://www.vazor.com/

MRAB

12:13 p.m.

On 2019-08-06 17:32, Matt Billenstein wrote:

...

On Mon, Aug 05, 2019 at 04:22:50AM -0000, raymond.hettinger@gmail.com wrote:

...
This once seemed like a reasonable and innocuous idea to me; however, I've been using the 3.8 beta heavily for a month and no longer think it is a good idea. The warning crops up frequently, often due to third-party packages (such as docutils and bottle) that users can't easily do anything about.

Perhaps those packages could be flagged now via pylint and problems raised with the respective package maintainers before the actual 3.8 release? Checking the top 100 or top 1000 packages on PyPI?

Or it could be deferred until Python 4.0. Are there any other issues where we could say that from Python 4.0 you shouldn't do X?

Paul Moore

12:37 p.m.

On Tue, 6 Aug 2019 at 17:39, Matt Billenstein <matt@vazor.com> wrote:

...

On Mon, Aug 05, 2019 at 04:22:50AM -0000, raymond.hettinger@gmail.com wrote:

...
This once seemed like a reasonable and innocuous idea to me; however, I've been using the 3.8 beta heavily for a month and no longer think it is a good idea. The warning crops up frequently, often due to third-party packages (such as docutils and bottle) that users can't easily do anything about.

Perhaps those packages could be flagged now via pylint and problems raised with the respective package maintainers before the actual 3.8 release? Checking the top 100 or top 1000 packages on PyPI?

I don't see issues reported in the bug trackers for docutils and bottle. Maybe as a start, someone could raise issues there? And any other projects Raymond encountered issues with? If nothing else, it's polite to give these projects a warning now that they should be stricter about how they use escape sequences, because the core devs intend to deprecate and ultimately remove the current permissive behaviour. That's what the 3.8 betas are for, after all. If the feedback from bug reports like this is that projects consider it an unacceptable burden to change, then maybe we would then rethink the timescales of the deprecation, or even whether we should do it at all. If they just release a quick fix, maybe we're worrying over the wrong thing here? I remain ambivalent about the change myself. The point that it's the false positives we *want* to address, but this change hits the false negatives, is a telling one for me. I don't think I'd support the change now if we were discussing it for the first time. But on the other hand, the discussion has already happened, and the decision was made, and while I'm OK with responding to the evidence that having loud user-visible warnings is a bad UX, I don't think there's any new evidence here that significantly changes the facts on which the decision to eventually make invalid escapes an error was made - just evidence that the way we chose to introduce that change may be a problem. (And honestly, to my knowledge, we've never yet found a case where there *hasn't* been controversy about warnings from libraries being triggered by user code, so it's not like this is a new problem). Paul

Brett Cannon

5:37 p.m.

Paul Moore wrote:

...

On Tue, 6 Aug 2019 at 17:39, Matt Billenstein matt@vazor.com wrote:

...
On Mon, Aug 05, 2019 at 04:22:50AM -0000, raymond.hettinger@gmail.com wrote: This once seemed like a reasonable and innocuous idea to me; however, I've been using the 3.8 beta heavily for a month and no longer think it is a good idea. The warning crops up frequently, often due to third-party packages (such as docutils and bottle) that users can't easily do anything about. Perhaps those packages could be flagged now via pylint and problems raised with the respective package maintainers before the actual 3.8 release? Checking the top 100 or top 1000 packages on PyPI? I don't see issues reported in the bug trackers for docutils and

bottle. Maybe as a start, someone could raise issues there? And any other projects Raymond encountered issues with? If nothing else, it's polite to give these projects a warning now that they should be stricter about how they use escape sequences, because the core devs intend to deprecate and ultimately remove the current permissive behaviour. That's what the 3.8 betas are for, after all. If the feedback from bug reports like this is that projects consider it an unacceptable burden to change, then maybe we would then rethink the timescales of the deprecation, or even whether we should do it at all. If they just release a quick fix, maybe we're worrying over the wrong thing here?

I think this is a good example of how the community is not running tests with warnings on and making sure that their code is warnings-free. This warning has existed for at least one full release and fixing it doesn't require some crazy work-around for backwards compatibility, and so this tells me people are simply either ignoring the warnings or they are not aware of them. If it's the case that people are choosing to ignore warnings then that's on them and there's not much we can do there. But my suspicion is it's the latter case of people simply not thinking about running with warnings on and making sure to check for them. For instance, are people running their CI with warnings turned on? How about making sure to check the output of their CI to make sure there are no warnings? Or even better, how many people are running CI with warnings turned into exceptions? My guess is all of this is rather low because people are probably just doing `pytest` without thinking of turning on warnings as exceptions to trigger a CI failure and are only looking for CI passing versus checking its output. Quick, how many pytest users here know what CLI arg to pass to pytest to have it turn warnings into exceptions? The answer is `-W` with the same type of argument as `python` takes and I had to look that up myself. So this seems to show a communication issue. In this specific instance I'm torn because this isn't a new thing and maybe we need to shock people into starting to care about warnings? Or maybe we should make a concerted effort in this beta cycle to get people to really test their code by telling them they probably have a warning hiding in their code that they don't know about and if we don't see better uptake we hold off on a release with the SyntaxWarning and make a **really** big push for people to run their tests under Python 3.8 with warnings flipped on in some fashion and we don't back down in 3.9?

Nathaniel Smith

9:58 p.m.

On Tue, Aug 6, 2019 at 3:44 PM Brett Cannon <brett@python.org> wrote:

...

I think this is a good example of how the community is not running tests with warnings on and making sure that their code is warnings-free. This warning has existed for at least one full release and fixing it doesn't require some crazy work-around for backwards compatibility, and so this tells me people are simply either ignoring the warnings or they are not aware of them.

If it's the case that people are choosing to ignore warnings then that's on them and there's not much we can do there.

But my suspicion is it's the latter case of people simply not thinking about running with warnings on and making sure to check for them. For instance, are people running their CI with warnings turned on? How about making sure to check the output of their CI to make sure there are no warnings? Or even better, how many people are running CI with warnings turned into exceptions? My guess is all of this is rather low because people are probably just doing `pytest` without thinking of turning on warnings as exceptions to trigger a CI failure and are only looking for CI passing versus checking its output.

There's an important point here that I think has been missed. These days deprecation warnings are much more visible in general, because all the major test systems enable them by default. BUT, this SPECIFIC warning almost completely circumvented all those systems, so almost no-one saw it. For example, all my projects run tests with deprecation warnings enabled and warnings turned into errors, but I never saw any of these warnings. What happens is: the warning is issued when the .py file is byte-compiled; but at this point, deprecation warnings probably aren't visible. Later on, when pytest imports the file, it has warnings enabled... but now the warning isn't issued. Quoting Aaron Meurer from the bpo thread:

...

As an anecdote, for SymPy's CI, we went through five (if I am counting correctly) iterations of trying to test this. Each of the first four were subtly incorrect, until we finally managed to find the correct one (for reference, 'python -We:invalid -m compileall -f -q module/'). So most library authors who will attempt to add tests against this will get it wrong.

Since folks don't seem to be reading that thread, I'll re-post my comment from it as well:

...

I think we haven't *actually* done a proper DeprecationWarning period for this. We tried, but because of the issue with byte-compiling, the warnings were unconditionally suppressed for most users -- even the users who are diligent enough to enable warnings and look at warnings in their test suites. I can see a good argument for making the change, but if we're going to do it then it's obviously the kind of change that requires a proper deprecation period, and that hasn't happened. Maybe .pyc files need to be extended to store a list of syntax-related DeprecationWarnings and SyntaxWarnings, that are re-issued every time the .pyc is loaded? Then we'd at least have the technical capability to deprecate this properly.

-n -- Nathaniel J. Smith -- https://vorpus.org

Steven D'Aprano

11:05 p.m.

On Tue, Aug 06, 2019 at 07:58:12PM -0700, Nathaniel Smith wrote:

...

For example, all my projects run tests with deprecation warnings enabled and warnings turned into errors, but I never saw any of these warnings. What happens is: the warning is issued when the .py file is byte-compiled; but at this point, deprecation warnings probably aren't visible. Later on, when pytest imports the file, it has warnings enabled... but now the warning isn't issued.

This! If Nathaniel's analysis is correct, and I think it is, we've identified a flaw in our deprecation process. We've assumed that library devs will see the warnings long before end users. Since the benefit of this breaking change is quite small, let's delay it long enough to fix the deprecation process. -- Steven

Serhiy Storchaka

7 Aug 7 Aug

1:21 a.m.

07.08.19 01:37, Brett Cannon пише:

...

I think this is a good example of how the community is not running tests with warnings on and making sure that their code is warnings-free. This warning has existed for at least one full release and fixing it doesn't require some crazy work-around for backwards compatibility, and so this tells me people are simply either ignoring the warnings or they are not aware of them.

There are several PRs for fixing warnings on GitHub every month. And seems a deprecation warning about importing ABCs from collections is at least so common (if not more) as a warning about "invalid escape sequences". The former is more visible to end users because is emitted at every run, not only at the first bytecode compilation.

Serhiy Storchaka

1:03 a.m.

06.08.19 20:37, Paul Moore пише:

...

I don't see issues reported in the bug trackers for docutils and bottle. Maybe as a start, someone could raise issues there?

The warning in docutils was fixed. https://sourceforge.net/p/docutils/code/8255/

Matt Billenstein

6 Aug 6 Aug

5:29 p.m.

On Tue, Aug 06, 2019 at 04:32:04PM +0000, Matt Billenstein wrote:

...

Perhaps those packages could be flagged now via pylint and problems raised with the respective package maintainers before the actual 3.8 release? Checking the top 100 or top 1000 packages on PyPI?

fwiw, ran pylint on the top 100 pypi pkgs from: https://hugovk.github.io/top-pypi-packages/top-pypi-packages-30-days.json The list of packages is pretty small: https://gist.github.com/mattbillenstein/ad862d032b8575f8d6e08384850f2223 but some have quite a few errors... m -- Matt Billenstein matt@vazor.com http://www.vazor.com/

Steven D'Aprano

7:01 p.m.

This really is a hairy one. The current behaviour encourages people to use a single backslash when they should be using a double, but that works only sometimes. Should you include an escape or not? Sometimes the backslash stays and sometimes it disappears: py> "abc \d \' xyz" "abc \\d ' xyz" To be honest, I'm kind of surprised that I haven't seen a bug report for the backslash disappearing when followed by a quote. This behaviour also prevents us from adding new kinds of escape sequences in the future without a deprecation period. For example, there's a long-missing escape sequence often found in C, \e for ESC, but we can't add it now even if we wanted it because it would break code that relies on "\e" being the same as "\\e". So I remain +1 on changing the behaviour. On the other hand, I see Raymond's point. Having tried it with a few contrived modules, I agree that it would be intimidating and annoying for many users. As an educator, Raymond could easily teach people how to silence the warnings. But people outside the classroom are going to be hit with these warnings too, and many of them are not going to know how to silence them or even that it is possible to silence them. I see Raymond's point about not breaking ASCII art, but I think that there are work-arounds for that, using raw strings and string concatenation. Besides, its 2019 and we use UTF-8 by default. Unicode provides alternative ways to draw ASCII art than pure ASCII: Raymond's example: '\-------> special case' Unicode: '└───────▷ special case' So I'm coming around to the position that while we should continue with the plan to make invalid escape sequences an error, we should slow down a bit. This isn't a critical problem that needs to be fixed soonest. Let's push everything back one release: - Keep the SyntaxWarning silent by default for 3.8. That gives us another year or more to gently pressure third-party libraries to fix their code, and to find ways to encourage developers to run with warnings enabled. - In 3.9, we can try making the warnings visible again. - And aim to make it an error in 4.0/3.10. -- Steven

Chris Angelico

7:14 p.m.

On Wed, Aug 7, 2019 at 10:03 AM Steven D'Aprano <steve@pearwood.info> wrote:

...

- Keep the SyntaxWarning silent by default for 3.8. That gives us another year or more to gently pressure third-party libraries to fix their code, and to find ways to encourage developers to run with warnings enabled.

How do you propose to apply this pressure? How about: whenever a third-party library uses a potentially-wrong escape sequence, it creates a message on the console. Then when someone sees that message, they can post a bug report against the package. In other words, a non-silent warning. ChrisA

Rob Cliffe

7:31 p.m.

On 07/08/2019 01:14:08, Chris Angelico wrote:

...

On Wed, Aug 7, 2019 at 10:03 AM Steven D'Aprano <steve@pearwood.info> wrote:

...
- Keep the SyntaxWarning silent by default for 3.8. That gives us another year or more to gently pressure third-party libraries to fix their code, and to find ways to encourage developers to run with warnings enabled. How do you propose to apply this pressure?

How about: whenever a third-party library uses a potentially-wrong escape sequence, it creates a message on the console. Then when someone sees that message, they can post a bug report against the package.

In other words, a non-silent warning.

ChrisA _______________________________________________

The interpreter knows which module contains the questionable string. So: is it feasible for the warning message to include something like "... If you are not the maintainer of xxxmodule.py, please contact them, or post a bug report on ..."

Gregory P. Smith

7:57 p.m.

People distribute code via pypi. if we reject uploads of packages with these problems and link to fixers (modernize can be taught what to do), we prevent them from spreading further. A few years after doing that, we can revisit how much pain and for whom making this a SyntaxWarning or even SyntaxError would actually be. it isn't useful to tell *users* of packages to spend time figuring out who to complain to that some packages code that they somehow depend on (often transitively) is not modern enough. On Tue, Aug 6, 2019 at 5:39 PM Rob Cliffe via Python-Dev < python-dev@python.org> wrote:

...

On 07/08/2019 01:14:08, Chris Angelico wrote:

...
On Wed, Aug 7, 2019 at 10:03 AM Steven D'Aprano <steve@pearwood.info> wrote:

...
- Keep the SyntaxWarning silent by default for 3.8. That gives us another year or more to gently pressure third-party libraries to fix their code, and to find ways to encourage developers to run with warnings enabled. How do you propose to apply this pressure?

How about: whenever a third-party library uses a potentially-wrong escape sequence, it creates a message on the console. Then when someone sees that message, they can post a bug report against the package.

In other words, a non-silent warning.

ChrisA _______________________________________________

The interpreter knows which module contains the questionable string. So: is it feasible for the warning message to include something like "... If you are not the maintainer of xxxmodule.py, please contact them, or post a bug report on ..." _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/JJULDRHO...

Glenn Linderman

9:37 p.m.

On 8/6/2019 5:57 PM, Gregory P. Smith wrote:

...

People distribute code via pypi. if we reject uploads of packages with these problems and link to fixers (modernize can be taught what to do), we prevent them from spreading further. A few years after doing that, we can revisit how much pain and for whom making this a SyntaxWarning or even SyntaxError would actually be.

it isn't useful to tell /users/ of packages to spend time figuring out who to complain to that some packages code that they somehow depend on (often transitively) is not modern enough.

People also distribute code other ways. A few years after cleaning up pypi, you still wouldn't know how much pain and for whom making this a SyntaxWarning or SyntaxError would actually be. Just do it and get it over with. Glenn

Serhiy Storchaka

7 Aug 7 Aug

1:29 a.m.

07.08.19 03:57, Gregory P. Smith пише:

...

People distribute code via pypi. if we reject uploads of packages with these problems and link to fixers (modernize can be taught what to do), we prevent them from spreading further.

How can we check that there are such problems in the package? Pass all *.py files through a linter? But the package can contain "incorrect" files, for example files for Python 2 or earlier Python 3 versions. Even the CPython testsuite contains bad Python files for testing purpose.

Serhiy Storchaka

1:24 a.m.

07.08.19 03:31, Rob Cliffe via Python-Dev пише:

...

...
How about: whenever a third-party library uses a potentially-wrong escape sequence, it creates a message on the console. Then when someone sees that message, they can post a bug report against the package.

Would not it just increase the amount of a noise? The main complain about new warnings is a noise.

Steven D'Aprano

6 Aug 6 Aug

10:53 p.m.

On Wed, Aug 07, 2019 at 10:14:08AM +1000, Chris Angelico wrote:

...

On Wed, Aug 7, 2019 at 10:03 AM Steven D'Aprano <steve@pearwood.info> wrote:

...
- Keep the SyntaxWarning silent by default for 3.8. That gives us another year or more to gently pressure third-party libraries to fix their code, and to find ways to encourage developers to run with warnings enabled.

How do you propose to apply this pressure?

We already have some good information about the offending libraries. (Remember, the libraries here haven't done anything wrong. They were using a documented feature. We've just changed our mind about that feature.) Raymond mentioned two, docutils and bottle, and Matt did a scan of the top 100 downloads on PyPI. We can start by reporting this as a bug to them.

...

How about: whenever a third-party library uses a potentially-wrong escape sequence, it creates a message on the console. Then when someone sees that message, they can post a bug report against the package.

You're right, of course, and if we were talking about one or two warnings a week, affecting a handful of users, I don't think Raymond would have said anything. But apparently this is a widespread problem with common third party libraries. That means its going to affect lots of people. We have a few problems: - The people affected will mostly be the end users, not the developers. - These sorts of SyntaxWarnings are scary and intimdating to beginners, even when they are harmless. Many of them will not know how to silence warnings, or who to report it as a bug to. - Since end users rarely search for existing bug reports before adding a new one, or upgrade to the latest version, we're effectively sentencing the third-party library authors to be flooded with potentially dozens of identical bug reports long after they have fixed the issue. - I expect that many end users will report it as a *Python* bug, so we're going to share some of that pain too. - The benefit of the desired change is relatively low. The intention was for the developers, not end users, to see the warning. If end users see more than a tiny number of these warnings, our plan failed. That's okay: since the benefit of the breaking change is small, we can rethink the plan, delay the breaking change, and try to come up with a better system that ensures developers see these warnings before their users do. We're not fixing a major security issue here, or adding a new feature that will make people's code enormously better. We're breaking people's code to force them to write "better" code, so that *maybe* some day in the future we can add new escape sequences. That's a really small benefit for breaking backwards compatibility. We don't break backwards compatibility lightly because of the knock on effects of code churn, libraries that stop working, frustrated users, obsoleted blog posts and books, questions asked on Stackoverflow etc. When the benefit is small, we require the pain to be correspondingly small. That's not going to be the case if we continue with the plan. Don't think of this as a failure. Think of it as an opportunity: we've identified a weakness in our deprecation process. Let's fix that process, make sure that *developers* will see the warning in 3.8 or 3.9, and not raise an exception until 4.0 or 4.1. I know people just want to get it over and done with, I do too. But we have responsibilities to the community, and we've lived with the current behaviour for 25+ years, another 2-3 years won't kill us. -- Steven

Chris Angelico

11:33 p.m.

On Wed, Aug 7, 2019 at 1:54 PM Steven D'Aprano <steve@pearwood.info> wrote:

...

Don't think of this as a failure. Think of it as an opportunity: we've identified a weakness in our deprecation process. Let's fix that process, make sure that *developers* will see the warning in 3.8 or 3.9, and not raise an exception until 4.0 or 4.1.

So HOW are you going to make sure developers see it? Currently it requires some extra steps or flags, which are not well known. What change are you proposing for 3.8 that will ensure that this actually gets solved? Otherwise, all you're doing is saying "I wish this problem would just go away". Library authors can start _right now_ fixing their code so it's more 3.8 compatible. ("More" because 3.8 doesn't actually break anything.) What is actually gained by waiting longer, and how do you propose to make this transition easier? ChrisA

Steven D'Aprano

7 Aug 7 Aug

4:31 a.m.

On Wed, Aug 07, 2019 at 02:33:51PM +1000, Chris Angelico wrote:

...

On Wed, Aug 7, 2019 at 1:54 PM Steven D'Aprano <steve@pearwood.info> wrote:

...
Don't think of this as a failure. Think of it as an opportunity: we've identified a weakness in our deprecation process. Let's fix that process, make sure that *developers* will see the warning in 3.8 or 3.9, and not raise an exception until 4.0 or 4.1.

So HOW are you going to make sure developers see it?

I've only just started thinking about it, give me a couple of minutes! *wink* What's the rush? Let's be objective here: what benefit are we going to get from this change? Is there anyone hanging out desperately for "\d" and "\-" to become SyntaxErrors, so they can... do what? Because our processes don't work the way we assumed, it turns out that in practice we haven't given developers the deprecation period we thought we had. Read Nathaniel's post, if you haven't already done so: https://mail.python.org/archives/list/python-dev@python.org/message/E7QCC74O... He makes a compelling case that while we might have had the promised deprecation period by the letter of the law, in practice most developers will have never seen it, and we will be breaking the spirit of the promise if we continue with the unmodified plan. Quite frankly, if we continue with the unmodified plan, third-party devs who are affected will have the right to feel mightly pissed off at us. We make an implicit, if not explicit, promise that we won't break backswards compatibility lightly, but if we do, we will give them plenty of notice except under the most dire circumstances (such as a serious security vulnerability). And yet here we are rushing through a breaking change in an accelerated manner, for a change of marginal benefit. Sure, we can say that *technically* we gave them all the notice promised, it was at the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying "Beware of The Leopard". https://www.goodreads.com/quotes/40705-but-the-plans-were-on-display-on-disp... I'm sure that the affected devs will understand why it was *their* fault they couldn't see the warnings, when even people from a first-class library like SymPy took four iterations to do it right.

...

Currently it requires some extra steps or flags, which are not well known. What change are you proposing for 3.8 that will ensure that this actually gets solved?

Absolutely nothing. I don't have to: we're an entire community, this doesn't have to fall only on my shoulders. I'm not even the messenger: that's Raymond. I'm just (partly) agreeing with him. Just because I don't have a solution for this problem doesn't mean the problem doesn't exist.

...

Otherwise, all you're doing is saying "I wish this problem would just go away".

No, I'm saying we don't have to rush this into 3.8. Let's keep the warning silent and push everything back a release. Now is better than never. Although never is often better than *right* now. Right now, we're looking at a seriously compromised user-experience for 3.8. People are going to hate these warnings, many of them won't know what to do with them and will be sure that Python is buggy, and for very little benefit. Let's slow down and put it off for another release, giving us time to solve the warnings problem, and library authors the deprecation period promised.

...

Library authors can start _right now_ fixing their code so it's more 3.8 compatible.

Provided that (1) they are aware that this is a problem that needs to be fixed, and (2) they have the round tuits to actually fix it by 3.8.0. Neither are guaranteed. Its not a big fix, but people have other priorities, like work, family, a life, etc. That's why we normally give developers *multiple years* of warnings to fix problems, not weeks. This change is not so important that we have to push it through in an accelerated time frame.

...

("More" because 3.8 doesn't actually break anything.) What is actually gained by waiting longer

We gain the avoidance of a painful experience in 3.8 for a significant number of users and third-party devs. The question we haven't had answered is what we gain by pushing through with the original plan. Plenty of people have said "Let's just do it" but as far as I can see not one has explained *why* we should put end- users and library developers through this frustrating and annoying rushed deprecation period. -- Steven

Paul Moore

4:44 a.m.

On Wed, 7 Aug 2019 at 10:32, Steven D'Aprano <steve@pearwood.info> wrote:

...

No, I'm saying we don't have to rush this into 3.8. Let's keep the warning silent and push everything back a release.

Now is better than never. Although never is often better than *right* now.

Right now, we're looking at a seriously compromised user-experience for 3.8. People are going to hate these warnings, many of them won't know what to do with them and will be sure that Python is buggy, and for very little benefit.

Let's slow down and put it off for another release, giving us time to solve the warnings problem, and library authors the deprecation period promised.

+1 from me. The arguments made here are pretty compelling to me, and I agree that we should take a breath and not rush this warning into 3.8, given what we now know. Paul

Chris Angelico

4:47 a.m.

On Wed, Aug 7, 2019 at 7:33 PM Steven D'Aprano <steve@pearwood.info> wrote:

...

What's the rush? Let's be objective here: what benefit are we going to get from this change? Is there anyone hanging out desperately for "\d" and "\-" to become SyntaxErrors, so they can... do what?

So that problems can start to be detected. Time and again, Python users on Windows get EXTREMELY confused by the way their code worked perfectly with one path, then bizarrely fails with another. That is a very real problem, and the problem is that it appeared to work when actually it was wrong. Python has a history of fixing these problems. It used to be that b"\x61\x62\x63\x64" was equal to u"abcd", but now Python sees these as fundamentally different. Data-dependent bugs caused by a syntactic oddity are a language flaw that needs to be fixed.

...

Because our processes don't work the way we assumed, it turns out that in practice we haven't given developers the deprecation period we thought we had. Read Nathaniel's post, if you haven't already done so:

https://mail.python.org/archives/list/python-dev@python.org/message/E7QCC74O...

He makes a compelling case that while we might have had the promised deprecation period by the letter of the law, in practice most developers will have never seen it, and we will be breaking the spirit of the promise if we continue with the unmodified plan.

Yes, that's a fair complaint. But merely pushing the deprecation back by a version is not solving it. There has to be SOMETHING done differently.

...

And yet here we are rushing through a breaking change in an accelerated manner, for a change of marginal benefit.

It's not a marginal benefit. For people who try to teach Python on multiple operating systems, this is a very very real benefit. Just because YOU don't see the benefit doesn't mean it isn't there.

...

...
Otherwise, all you're doing is saying "I wish this problem would just go away".

No, I'm saying we don't have to rush this into 3.8. Let's keep the warning silent and push everything back a release.

Now is better than never. Although never is often better than *right* now.

Not sure how the Zen supports what you're saying there, since you're specifically saying "not never, not now, just later". But what do you actually mean by not rushing this into 3.8?

...

Right now, we're looking at a seriously compromised user-experience for 3.8. People are going to hate these warnings, many of them won't know what to do with them and will be sure that Python is buggy, and for very little benefit.

Then the problem is that people blame Python for these warnings. That is a problem to be solved; we need people to understand that a warning emitted by a library is a *library bug* not a language flaw.

...

...
Library authors can start _right now_ fixing their code so it's more 3.8 compatible.

Provided that (1) they are aware that this is a problem that needs to be fixed, and (2) they have the round tuits to actually fix it by 3.8.0. Neither are guaranteed.

(1) Yes it is, see above; (2) fair point, but this is restricted to string literals and can be detected simply by compiling the code, so it's a reasonably findable problem.

...

...
("More" because 3.8 doesn't actually break anything.) What is actually gained by waiting longer

We gain the avoidance of a painful experience in 3.8 for a significant number of users and third-party devs.

The question we haven't had answered is what we gain by pushing through with the original plan. Plenty of people have said "Let's just do it" but as far as I can see not one has explained *why* we should put end- users and library developers through this frustrating and annoying rushed deprecation period.

And unless you have a plan to do something different in 3.8 that ensures that library devs see the warnings, there's no justification for the delay. All you'll do is defer the exact same problem by another eighteen months. If the warning remains silent in 3.8, how will library devs get any indication that they need to fix something? If you can offer a better plan, then by all means, do so. But deferring without a change is of no real value, and it means ANOTHER eighteen months added onto the time before novice programmers get to be told about string literal problems. ChrisA

Joao S. O. Bueno

9:56 a.m.

For what I can see, the majority of new users in an interactive environment seeing the warning will do so because the incorrect string will be in _their_ code. The benefits are immediate, as people change to either using raw-strings or using forward-slashes for file paths. The examples in the beggining of this thread, where one changing a file path to "C:\users" sudden have broken code speaks for themselves: this is a _fix_ . Broken libraries will be fixed within weeks of a Py 3.8 release. People will either be using an old install, with Python 3.7, or they keep everything up to date, and for those after 2 months max, the library warnings will be all but gone. In the meantime, what is possible is to publicize more how to disable these warnings on end-users side, since we all agree that few people know how to that. On Wed, 7 Aug 2019 at 06:51, Chris Angelico <rosuav@gmail.com> wrote:

...

On Wed, Aug 7, 2019 at 7:33 PM Steven D'Aprano <steve@pearwood.info> wrote:

...
What's the rush? Let's be objective here: what benefit are we going to get from this change? Is there anyone hanging out desperately for "\d" and "\-" to become SyntaxErrors, so they can... do what?

So that problems can start to be detected. Time and again, Python users on Windows get EXTREMELY confused by the way their code worked perfectly with one path, then bizarrely fails with another. That is a very real problem, and the problem is that it appeared to work when actually it was wrong.

Python has a history of fixing these problems. It used to be that b"\x61\x62\x63\x64" was equal to u"abcd", but now Python sees these as fundamentally different. Data-dependent bugs caused by a syntactic oddity are a language flaw that needs to be fixed.

...
Because our processes don't work the way we assumed, it turns out that in practice we haven't given developers the deprecation period we thought we had. Read Nathaniel's post, if you haven't already done so:

https://mail.python.org/archives/list/python-dev@python.org/message/E7QCC74O...

...
He makes a compelling case that while we might have had the promised deprecation period by the letter of the law, in practice most developers will have never seen it, and we will be breaking the spirit of the promise if we continue with the unmodified plan.

Yes, that's a fair complaint. But merely pushing the deprecation back by a version is not solving it. There has to be SOMETHING done differently.

...
And yet here we are rushing through a breaking change in an accelerated manner, for a change of marginal benefit.

It's not a marginal benefit. For people who try to teach Python on multiple operating systems, this is a very very real benefit. Just because YOU don't see the benefit doesn't mean it isn't there.

...
...
Otherwise, all you're doing is saying "I wish this problem would just go away".

No, I'm saying we don't have to rush this into 3.8. Let's keep the warning silent and push everything back a release.

Now is better than never. Although never is often better than *right* now.

Not sure how the Zen supports what you're saying there, since you're specifically saying "not never, not now, just later". But what do you actually mean by not rushing this into 3.8?

...
Right now, we're looking at a seriously compromised user-experience for 3.8. People are going to hate these warnings, many of them won't know what to do with them and will be sure that Python is buggy, and for very little benefit.

Then the problem is that people blame Python for these warnings. That is a problem to be solved; we need people to understand that a warning emitted by a library is a *library bug* not a language flaw.

...
...
Library authors can start _right now_ fixing their code so it's more 3.8 compatible.

Provided that (1) they are aware that this is a problem that needs to be fixed, and (2) they have the round tuits to actually fix it by 3.8.0. Neither are guaranteed.

(1) Yes it is, see above; (2) fair point, but this is restricted to string literals and can be detected simply by compiling the code, so it's a reasonably findable problem.

...
...
("More" because 3.8 doesn't actually break anything.) What is actually gained by waiting longer

We gain the avoidance of a painful experience in 3.8 for a significant number of users and third-party devs.

The question we haven't had answered is what we gain by pushing through with the original plan. Plenty of people have said "Let's just do it" but as far as I can see not one has explained *why* we should put end- users and library developers through this frustrating and annoying rushed deprecation period.

And unless you have a plan to do something different in 3.8 that ensures that library devs see the warnings, there's no justification for the delay. All you'll do is defer the exact same problem by another eighteen months. If the warning remains silent in 3.8, how will library devs get any indication that they need to fix something?

If you can offer a better plan, then by all means, do so. But deferring without a change is of no real value, and it means ANOTHER eighteen months added onto the time before novice programmers get to be told about string literal problems.

ChrisA _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RISO4KST...

Steve Dower

10:47 a.m.

On 07Aug2019 0247, Chris Angelico wrote:

...

On Wed, Aug 7, 2019 at 7:33 PM Steven D'Aprano <steve@pearwood.info> wrote:

...
What's the rush? Let's be objective here: what benefit are we going to get from this change? Is there anyone hanging out desperately for "\d" and "\-" to become SyntaxErrors, so they can... do what?

So that problems can start to be detected. Time and again, Python users on Windows get EXTREMELY confused by the way their code worked perfectly with one path, then bizarrely fails with another. That is a very real problem, and the problem is that it appeared to work when actually it was wrong. [...] If you can offer a better plan, then by all means, do so. But deferring without a change is of no real value, and it means ANOTHER eighteen months added onto the time before novice programmers get to be told about string literal problems.

Allow me to offer one: * change the SyntaxWarning into a default-silenced one that fires every time a .pyc is loaded (this is the hard part, but it's doable) * change pathlib.PureWindowsPath, os.fsencode and os.fsdecode to explicitly warn when the path contains control characters * change the PyErr_SetExcFromWindowsErrWithFilenameObjects function to append (or chain) an extra message when either of the filenames contains control characters (or change OSError to do it, or the default sys.excepthook) I don't care whether the changes are applied to all platforms rather than just Windows, but since Windows developers hit the problem and (some) Linux developers like to use control characters in filenames, I can see a justification for only warning on Windows. Long term we can still deprecate and eventually block unrecognized escape sequences, but the long standing behaviour can stand for a few more years without creating more harm. Cheers, Steve

eryk sun

2:06 p.m.

On 8/7/19, Steve Dower <steve.dower@python.org> wrote:

...

* change the PyErr_SetExcFromWindowsErrWithFilenameObjects function to append (or chain) an extra message when either of the filenames contains c control characters (or change OSError to do it, or the default sys.excepthook)

On a related note for Windows, if the error is specifically ERROR_INVALID_NAME, we could extend this to look for and warn about the five reserved wildcard characters (asterisk, question mark, double quote, less than, greater than), pipe, and colon. It's only sometimes the case for colon because it's allowed in device names and used as the name and type delimiter for stream names. Kernel object names don't reserve wildcard characters, pipe, and colon. So I wouldn't want anything but the control-character warning if it's say ERROR_FILE_NOT_FOUND. An example would be SharedMemory(name='Global\test'), or a similar error for registry key and value names such as OpenKey(hkey, 'spam\test'), that is if winreg were updated to include the name in the exception. Note that forward slash is just a name character in these cases, not a path separator, so we have to use backslash, even if just via replace('/', '\\').

Steve Holden

5:43 p.m.

This whole thread would be an excellent justification for following 3.9 with 4.0. It's as near as we ever want to get to a breaking change, and a major version number would indicate the need to review. If increasing strictness of escape code interpretation in string literals is the only incompatibility there would surely be general delight. Kind regards, Steve Holden On Wed, Aug 7, 2019 at 8:19 PM eryk sun <eryksun@gmail.com> wrote:

...

On 8/7/19, Steve Dower <steve.dower@python.org> wrote:

...
* change the PyErr_SetExcFromWindowsErrWithFilenameObjects function to append (or chain) an extra message when either of the filenames contains

c

...
control characters (or change OSError to do it, or the default sys.excepthook)

On a related note for Windows, if the error is specifically ERROR_INVALID_NAME, we could extend this to look for and warn about the five reserved wildcard characters (asterisk, question mark, double quote, less than, greater than), pipe, and colon. It's only sometimes the case for colon because it's allowed in device names and used as the name and type delimiter for stream names.

Kernel object names don't reserve wildcard characters, pipe, and colon. So I wouldn't want anything but the control-character warning if it's say ERROR_FILE_NOT_FOUND. An example would be SharedMemory(name='Global\test'), or a similar error for registry key and value names such as OpenKey(hkey, 'spam\test'), that is if winreg were updated to include the name in the exception. Note that forward slash is just a name character in these cases, not a path separator, so we have to use backslash, even if just via replace('/', '\\'). _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/UFMVFL4Q...

MRAB

6:24 p.m.

On 2019-08-07 23:43, Steve Holden wrote:

...

This whole thread would be an excellent justification for following 3.9 with 4.0. It's as near as we ever want to get to a breaking change, and a major version number would indicate the need to review. If increasing strictness of escape code interpretation in string literals is the only incompatibility there would surely be general delight.

I can think of another possible one: import * requires __all__. [snip]

Stephen J. Turnbull

9 Aug 9 Aug

12:14 a.m.

Steve Holden writes:

...

This whole thread would be an excellent justification for following 3.9 with 4.0. It's as near as we ever want to get to a breaking change, and a major version number would indicate the need to review. If increasing strictness of escape code interpretation in string literals is the only incompatibility there would surely be general delight.

This should be the first chapter in the Beautiful Version Numbering book! I love it!

brian.skinn＠gmail.com

9:10 a.m.

...

This whole thread would be an excellent justification for following 3.9 with 4.0. It's as near as we ever want to get to a breaking change, and a major version number would indicate the need to review. If increasing strictness of escape code interpretation in string literals is the only incompatibility there would surely be general delight.

Kind regards, Steve Holden

I rather doubt that allowing breaking changes into a Python 4.0 would end up with this as the only proposed incompatibility. Once word got out, a flood of incompat requests would probably get raised. I personally have a change I'd like made to doctest (https://bugs.python.org/issue36714), and I know of another in argparse (https://bugs.python.org/issue33109) that I'm personally neutral on but that others have stronger feelings about.

Guido van Rossum

10:30 a.m.

This discussion looks like there's no end in sight. Maybe the Steering Council should take a vote? -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him/his **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Serhiy Storchaka

11:05 a.m.

09.08.19 18:30, Guido van Rossum пише:

...

This discussion looks like there's no end in sight. Maybe the Steering Council should take a vote?

Possible options: 1. SyntaxWarning in 3.8+ (the current status). 2. DeprecationWarning in 3.8, SyntaxWarning in 3.9+ (revert changes in 3.8 only). 3. DeprecationWarning in 3.8 and 3.9 (revert changes in master and 3.8). 4. No warnings at all.

Steve Dower

11:39 a.m.

On 09Aug2019 0905, Serhiy Storchaka wrote:

...

09.08.19 18:30, Guido van Rossum пише:

...
This discussion looks like there's no end in sight. Maybe the Steering Council should take a vote?

Possible options:

1. SyntaxWarning in 3.8+ (the current status). 2. DeprecationWarning in 3.8, SyntaxWarning in 3.9+ (revert changes in 3.8 only). 3. DeprecationWarning in 3.8 and 3.9 (revert changes in master and 3.8). 4. No warnings at all.

I also posted another possible option that helps solve the real problem faced by users, and not just the "we want to have a warning" problem that is purely ours.

...

* change the SyntaxWarning into a default-silenced one that fires every time a .pyc is loaded (this is the hard part, but it's doable) * change pathlib.PureWindowsPath, os.fsencode and os.fsdecode to explicitly warn when the path contains control characters * change the PyErr_SetExcFromWindowsErrWithFilenameObjects function to append (or chain) an extra message when either of the filenames contains control characters (or change OSError to do it, or the default sys.excepthook)

Cheers, Steve

Paul Moore

12:15 p.m.

On Fri, 9 Aug 2019 at 17:55, Steve Dower <steve.dower@python.org> wrote:

...

...
* change the SyntaxWarning into a default-silenced one that fires every time a .pyc is loaded (this is the hard part, but it's doable) * change pathlib.PureWindowsPath, os.fsencode and os.fsdecode to explicitly warn when the path contains control characters * change the PyErr_SetExcFromWindowsErrWithFilenameObjects function to append (or chain) an extra message when either of the filenames contains control characters (or change OSError to do it, or the default sys.excepthook)

The second and third art of this seem like they are both independent of the first, and useful improvements in their own right. Paul

Serhiy Storchaka

10 Aug 10 Aug

1:41 a.m.

09.08.19 19:39, Steve Dower пише:

...

I also posted another possible option that helps solve the real problem faced by users, and not just the "we want to have a warning" problem that is purely ours.

Warnings solve two problems: * Teaching users that a backslash has special meaning and should be escaped unless it is used for special meaning. * Avoid breaking or introducing bugs if we add new escape sequences (like \e).

...

...
* change the SyntaxWarning into a default-silenced one that fires every time a .pyc is loaded (this is the hard part, but it's doable)

It was considered an advantage that these warnings are shown only once at compile time. So they will be shown to the author of the code, but the user of the code will not see them (except of installation time). Actually we need to distinguish the the author and the user of the code and show warnings only to the author. Using .pyc files was just an heuristic: the author compiles the Python code, and the user uses compiled .pyc files. Would be nice to have more reliable way to determine the owning of the code. It is related not only to SyntaxWarnings, but to runtime DeprecationWarnings. Maybe silence warnings only for readonly files and make files installed by PIP readonly?

...

...
* change pathlib.PureWindowsPath, os.fsencode and os.fsdecode to explicitly warn when the path contains control characters

This can cause additional harm. Currently you get expected FileNotFound when use user specified bad path, it can be caught and handled. But with warnings you will either get a noise on the output or an unexpected unhandled error.

...

...
* change the PyErr_SetExcFromWindowsErrWithFilenameObjects function to append (or chain) an extra message when either of the filenames contains control characters (or change OSError to do it, or the default sys.excepthook)

I do not understand what goal will be achieved by this.

Neil Schemenauer

12 Aug 12 Aug

4:02 p.m.

On 2019-08-10, Serhiy Storchaka wrote:

...

Actually we need to distinguish the the author and the user of the code and show warnings only to the author. Using .pyc files was just an heuristic: the author compiles the Python code, and the user uses compiled .pyc files. Would be nice to have more reliable way to determine the owning of the code. It is related not only to SyntaxWarnings, but to runtime DeprecationWarnings. Maybe silence warnings only for readonly files and make files installed by PIP readonly?

Identifying the author vs the user seems like a good idea. Relying on the OS filesystem seems like a solution that would cause some challenges. Can we embed that information in the .pyc file instead? That way, Python knows that it is module/package that has been installed with pip or similar and the end user is likely not the author.

Nick Coghlan

9 Aug 9 Aug

11:08 a.m.

On Sat, 10 Aug 2019 at 01:44, Guido van Rossum <guido@python.org> wrote:

...

This discussion looks like there's no end in sight. Maybe the Steering Council should take a vote?

I find the "Our deprecation warnings were even less visible than normal" argument for extending the deprecation period compelling. I also think the UX of the warning itself could be reviewed to provide a more explicit nudge towards using raw strings when folks want to allow arbitrary embedded backslashes. Consider: SyntaxWarning: invalid escape sequence \, vs something like: SyntaxWarning: invalid escape sequence \, (Note: adding the raw string literal prefix, r, will accept all non-trailing backslashes) After all, the habit we're trying to encourage is "If I want to include backslashes without escaping them all, I should use a raw string", not "I should memorize the entire set of valid escape sequences" or even "I should always escape backslashes". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Jonathan Goble

1:28 p.m.

On Fri, Aug 9, 2019 at 12:34 PM Nick Coghlan <ncoghlan@gmail.com> wrote:

...

I find the "Our deprecation warnings were even less visible than normal" argument for extending the deprecation period compelling.

Outsider's 2 cents from reading this discussion (with no personal experience with this warning): I am perplexed at the opinion, seemingly espoused by multiple people in this thread, that because a major part of the problem is that the warnings were not visible enough, somehow the proposed solution is making them not visible enough again? It's too late, in my understanding, in the 3.8 cycle to add a new feature like a change to how these warnings are produced (it seems a significant change to the .pyc structure is needed to emit them at runtime), so this supposed "solution" is nothing but kicking the can down the road. When 3.9 rolls around, public exposure to the problem of invalid escape sequences will still be approximately what it is now (because if nobody saw the warnings in 3.7, they certainly won't see them in 3.8 with this "fix"), so you'll end up with the same complaints about SyntaxWarning that started this discussion, end up back on DeprecationWarning for 3.9 (hopefully with support for emitting them at runtime instead of just compile-time), then have to wait until 3.10/4.0 for SyntaxWarning and eventually the next version to actually make them errors. It seems to me, in my humble but uneducated opinion, that if people are not seeing the warnings, then continuing to give them warnings they won't see isn't a solution to anything. Put the warning front and center. The argument of third-party packages will always be an issue, even if we wait ten years. So put these warnings front and center now so package and code maintainers actually see it, and I'll bet the problematic escape sequences get fixed rather quickly. What am I missing here?

Eric V. Smith

1:36 p.m.

On 8/9/2019 2:28 PM, Jonathan Goble wrote:

...

On Fri, Aug 9, 2019 at 12:34 PM Nick Coghlan <ncoghlan@gmail.com> wrote:

...
I find the "Our deprecation warnings were even less visible than normal" argument for extending the deprecation period compelling. Outsider's 2 cents from reading this discussion (with no personal experience with this warning):

I am perplexed at the opinion, seemingly espoused by multiple people in this thread, that because a major part of the problem is that the warnings were not visible enough, somehow the proposed solution is making them not visible enough again? It's too late, in my understanding, in the 3.8 cycle to add a new feature like a change to how these warnings are produced (it seems a significant change to the .pyc structure is needed to emit them at runtime), so this supposed "solution" is nothing but kicking the can down the road. When 3.9 rolls around, public exposure to the problem of invalid escape sequences will still be approximately what it is now (because if nobody saw the warnings in 3.7, they certainly won't see them in 3.8 with this "fix"), so you'll end up with the same complaints about SyntaxWarning that started this discussion, end up back on DeprecationWarning for 3.9 (hopefully with support for emitting them at runtime instead of just compile-time), then have to wait until 3.10/4.0 for SyntaxWarning and eventually the next version to actually make them errors.

Yes, I think that's the idea: Deprecation warning in 3.9, but more visible that what 3.7 has. That is, not just at compile time but at run time. What's required to make that happen is an open question.

...

It seems to me, in my humble but uneducated opinion, that if people are not seeing the warnings, then continuing to give them warnings they won't see isn't a solution to anything. Put the warning front and center. The argument of third-party packages will always be an issue, even if we wait ten years. So put these warnings front and center now so package and code maintainers actually see it, and I'll bet the problematic escape sequences get fixed rather quickly.

What am I missing here?

Hopefully the warnings in 3.9 would be more visible that what we saw in 3.7, so that library authors can take notice and do something about it before 3.10 rolls around. Eric

Jonathan Goble

1:49 p.m.

On Fri, Aug 9, 2019 at 2:36 PM Eric V. Smith <eric@trueblade.com> wrote:

...

On 8/9/2019 2:28 PM, Jonathan Goble wrote:

...
On Fri, Aug 9, 2019 at 12:34 PM Nick Coghlan <ncoghlan@gmail.com> wrote:

...
I find the "Our deprecation warnings were even less visible than normal" argument for extending the deprecation period compelling. Outsider's 2 cents from reading this discussion (with no personal experience with this warning):

I am perplexed at the opinion, seemingly espoused by multiple people in this thread, that because a major part of the problem is that the warnings were not visible enough, somehow the proposed solution is making them not visible enough again? It's too late, in my understanding, in the 3.8 cycle to add a new feature like a change to how these warnings are produced (it seems a significant change to the .pyc structure is needed to emit them at runtime), so this supposed "solution" is nothing but kicking the can down the road. When 3.9 rolls around, public exposure to the problem of invalid escape sequences will still be approximately what it is now (because if nobody saw the warnings in 3.7, they certainly won't see them in 3.8 with this "fix"), so you'll end up with the same complaints about SyntaxWarning that started this discussion, end up back on DeprecationWarning for 3.9 (hopefully with support for emitting them at runtime instead of just compile-time), then have to wait until 3.10/4.0 for SyntaxWarning and eventually the next version to actually make them errors.

Yes, I think that's the idea: Deprecation warning in 3.9, but more visible that what 3.7 has. That is, not just at compile time but at run time. What's required to make that happen is an open question.

...
It seems to me, in my humble but uneducated opinion, that if people are not seeing the warnings, then continuing to give them warnings they won't see isn't a solution to anything. Put the warning front and center. The argument of third-party packages will always be an issue, even if we wait ten years. So put these warnings front and center now so package and code maintainers actually see it, and I'll bet the problematic escape sequences get fixed rather quickly.

What am I missing here?

Hopefully the warnings in 3.9 would be more visible that what we saw in 3.7, so that library authors can take notice and do something about it before 3.10 rolls around.

OK, so I'm at least understanding the plan correctly. I just don't get the idea of kicking the can down the road on the hope that in 3.9 people will see the warning (knowing that you are still using a warning that is disabled by default and thus has a high chance of not being seen until 3.10), when we already have the ability to push out a visible-by-default warning now in 3.8 and get people to take notice two whole feature releases (= about 3 years) earlier. The SyntaxWarning disruption (or SyntaxError disruption) has to happen eventually, and while I support the idea of making compile-time DeprecationWarnings be emitted at runtime, I really don't think that a disabled-by-default warning is going to change a whole lot. Sure, the major packages will likely see it and update their code, but lots of smaller specialty packages and independent developers won't see it in 3.9. The bulk of the change isn't going to happen until we go to SyntaxWarning, so why not just get it over with instead of dragging it out for three years?

brian.skinn＠gmail.com

1:51 p.m.

Eric V. Smith wrote:

...

Hopefully the warnings in 3.9 would be more visible that what we saw in 3.7, so that library authors can take notice and do something about it before 3.10 rolls around. Eric

Apologies for the ~double-post on the thread, but: the SymPy team has figured out the right pytest incantation to expose these warnings. Given the extensive adoption of pytest, perhaps it would be good to combine (1) a FR on pytest to add a convenience flag enabling this mix of options with (2) an aggressive "marketing push", encouraging library maintainers to add it to their testing/CI.

Nathaniel Smith

3:52 p.m.

On Fri, Aug 9, 2019 at 12:07 PM <brian.skinn@gmail.com> wrote:

...

Eric V. Smith wrote:

...
Hopefully the warnings in 3.9 would be more visible that what we saw in 3.7, so that library authors can take notice and do something about it before 3.10 rolls around. Eric

Apologies for the ~double-post on the thread, but: the SymPy team has figured out the right pytest incantation to expose these warnings. Given the extensive adoption of pytest, perhaps it would be good to combine (1) a FR on pytest to add a convenience flag enabling this mix of options with (2) an aggressive "marketing push", encouraging library maintainers to add it to their testing/CI.

Unfortunately, their solution isn't a pytest incantation, it's a separate 'compileall' invocation they run on their source tree. I'm not sure how you'd convert this into a pytest feature, because I don't think pytest always know which parts of your code are your code versus which parts are supporting libraries. -n -- Nathaniel J. Smith -- https://vorpus.org

brian.skinn＠gmail.com

3:59 p.m.

Nathaniel Smith wrote:

...

Unfortunately, their solution isn't a pytest incantation, it's a separate 'compileall' invocation they run on their source tree. I'm not sure how you'd convert this into a pytest feature, because I don't think pytest always know which parts of your code are your code versus which parts are supporting libraries. -n

Ahh, did not appreciate this. :-( Nevermind, then!

Gregory P. Smith

6:07 p.m.

On Fri, Aug 9, 2019 at 11:37 AM Eric V. Smith <eric@trueblade.com> wrote:

...

On 8/9/2019 2:28 PM, Jonathan Goble wrote:

...
On Fri, Aug 9, 2019 at 12:34 PM Nick Coghlan <ncoghlan@gmail.com> wrote:

...
I find the "Our deprecation warnings were even less visible than normal" argument for extending the deprecation period compelling. Outsider's 2 cents from reading this discussion (with no personal experience with this warning):

I am perplexed at the opinion, seemingly espoused by multiple people in this thread, that because a major part of the problem is that the warnings were not visible enough, somehow the proposed solution is making them not visible enough again? It's too late, in my understanding, in the 3.8 cycle to add a new feature like a change to how these warnings are produced (it seems a significant change to the .pyc structure is needed to emit them at runtime), so this supposed "solution" is nothing but kicking the can down the road. When 3.9 rolls around, public exposure to the problem of invalid escape sequences will still be approximately what it is now (because if nobody saw the warnings in 3.7, they certainly won't see them in 3.8 with this "fix"), so you'll end up with the same complaints about SyntaxWarning that started this discussion, end up back on DeprecationWarning for 3.9 (hopefully with support for emitting them at runtime instead of just compile-time), then have to wait until 3.10/4.0 for SyntaxWarning and eventually the next version to actually make them errors.

Yes, I think that's the idea: Deprecation warning in 3.9, but more visible that what 3.7 has. That is, not just at compile time but at run time. What's required to make that happen is an open question.

i've lost track of who suggested what in this thread, but yes, that concept has been rolling over in my mind as a potentially good idea after someone suggested it. Compile time warnings should turn into bytecode for a warnings.warn call in the generated pyc. I haven't spent time trying to reason if that actually addresses the real issues we're having moving forward with a syntax warning change though. A reasonable feature to ask for as a feature in 3.9 or later perhaps. -gps

Glenn Linderman

8:41 p.m.

On 8/9/2019 4:07 PM, Gregory P. Smith wrote:

...

On Fri, Aug 9, 2019 at 11:37 AM Eric V. Smith <eric@trueblade.com <mailto:eric@trueblade.com>> wrote:

On 8/9/2019 2:28 PM, Jonathan Goble wrote: > On Fri, Aug 9, 2019 at 12:34 PM Nick Coghlan <ncoghlan@gmail.com <mailto:ncoghlan@gmail.com>> wrote: >> I find the "Our deprecation warnings were even less visible than >> normal" argument for extending the deprecation period compelling. > Outsider's 2 cents from reading this discussion (with no personal > experience with this warning): > > I am perplexed at the opinion, seemingly espoused by multiple people > in this thread, that because a major part of the problem is that the > warnings were not visible enough, somehow the proposed solution is > making them not visible enough again? It's too late, in my > understanding, in the 3.8 cycle to add a new feature like a change to > how these warnings are produced (it seems a significant change to the > .pyc structure is needed to emit them at runtime), so this supposed > "solution" is nothing but kicking the can down the road. When 3.9 > rolls around, public exposure to the problem of invalid escape > sequences will still be approximately what it is now (because if > nobody saw the warnings in 3.7, they certainly won't see them in 3.8 > with this "fix"), so you'll end up with the same complaints about > SyntaxWarning that started this discussion, end up back on > DeprecationWarning for 3.9 (hopefully with support for emitting them > at runtime instead of just compile-time), then have to wait until > 3.10/4.0 for SyntaxWarning and eventually the next version to actually > make them errors.

Yes, I think that's the idea: Deprecation warning in 3.9, but more visible that what 3.7 has. That is, not just at compile time but at run time. What's required to make that happen is an open question.

i've lost track of who suggested what in this thread, but yes, that concept has been rolling over in my mind as a potentially good idea after someone suggested it. Compile time warnings should turn into bytecode for a warnings.warn call in the generated pyc. I haven't spent time trying to reason if that actually addresses the real issues we're having moving forward with a syntax warning change though. A reasonable feature to ask for as a feature in 3.9 or later perhaps.

The documentation actually claims it was deprecated in version 3.6. So it has already been 2 releases worth of deprecation, visible warning or not. Ship it.

Steven D'Aprano

6:12 p.m.

On Fri, Aug 09, 2019 at 02:28:13PM -0400, Jonathan Goble wrote:

...

I am perplexed at the opinion, seemingly espoused by multiple people in this thread, that because a major part of the problem is that the warnings were not visible enough, somehow the proposed solution is making them not visible enough again?

Making the warnings invisible by default is only the first step, not the entire solution. We don't break backwards compatibility lightly, and the current behaviour is not an accident, it is a documented feature which developers are entitled to rely on. We are chosing to change that behaviour, breaking backwards compatibility, to the inconvenience of end-users, library authors, and developers on Mac/Unix/Linux, for two benefits: 1. To possibly allow the addition of new escape sequences such as \e some time in the future. 2. To strongly discourage newbie Windows developers from hard-coding paths using backslashes, but to use forward slashes instead. Especially on Python-Ideas, time and time again we hear the mantra that we should only break backwards compatibility if the benefit strongly outweighs the cost of change. Raymond has given compelling (to me at least) testimony that right now, the cost of change is far too high for the two minor benefits gained. So *right now*, it looks like we ought to be prepared to back away from the change altogether. We thought that the balance would be: "it will be a little bit painful, but the benefit will outweigh the pain" justifying breaking backwards compatibility, but we have found that the pain is greater than expected. If we cannot reduce the pain, and move the balance into the "nett positive" rather than the "nett negative" we have right now, we ought to cancel the deprecation. Making the deprecation silent by default will reduce the pain. That's the first step. Pushing the deprecation schedule back a release or more will give us time to rethink the deprecation process, fix the technical issues we discovered about SyntaxWarnings, and give library authors time to eliminate the warnings from their libraries.

...

It's too late, in my understanding, in the 3.8 cycle to add a new feature like a change to how these warnings are produced (it seems a significant change to the .pyc structure is needed to emit them at runtime), so this supposed "solution" is nothing but kicking the can down the road.

Is that a problem? Any deadline we have to make unrecognised backslash escapes an error is a self-imposed deadline. We lived with this feature for more than a quarter of a century, we can keep kicking the can down the road until the benefit outweighs the pain. If that means "forever", then I personally will be sad, but so be it. However, even if it is too late to add any new tools or features to Python 3.8 (and that's not clear: this won't be a *language* change, so the feature freeze may not apply) all is not lost. We're aware of the problem, and can start pointing library authors at this thread, and the relevent b.p.o. ticket, and push them in the right direction. Raymond mentioned two libraries by name, bottle and docutils, and Matt scanned the top 100 packages on PyPI. That's a good place to start for anyone wanting to contribute: raise bug reports on the individual library trackers. (If they haven't already been raised.) https://github.com/bottlepy/bottle/issues (I'd do that myself except I have technical problems using Github.) I have reported it to docutils: https://sourceforge.net/p/docutils/bugs/373/ [...]

...

So put these warnings front and center now so package and code maintainers actually see it

The problem is that this seriously and negatively affects the experience for many end-users. That's what we're trying to prevent, or at least mitigate. -- Steven

Paul Moore

10 Aug 10 Aug

3:33 a.m.

On Sat, 10 Aug 2019 at 00:36, Steven D'Aprano <steve@pearwood.info> wrote:

...

2. To strongly discourage newbie Windows developers from hard-coding paths using backslashes, but to use forward slashes instead.

(Side issue) As a Windows developer, who has seen far too many cases where use of slashes in filenames implies a Unix-based developer not thinking sufficiently about Windows compatibility, or where it leads to people hard coding '/' rather than using os.sep (or better, pathlib), I strongly object to this characterisation. Rather, I would simply say "to make Windows users more aware of the clash in usage between backslashes in filenames and backslashes as string escapes". There are *many* valid ways to write Windows pathnames in your code: 1. Raw strings 2. Doubling the backslashes 3. Using pathlib (possibly with slash as a directory separator, where it's explicitly noted as a portable option) 4. Using slashes IMO, using slashes is the *worst* of these. But this latter is a matter of opinion - I've no objection to others believing differently, but I *do* object to slashes being presented as the only option, or the recommended option without qualification. Paul

Chris Angelico

6:05 a.m.

On Sat, Aug 10, 2019 at 6:39 PM Paul Moore <p.f.moore@gmail.com> wrote:

...

There are *many* valid ways to write Windows pathnames in your code:

1. Raw strings 2. Doubling the backslashes 3. Using pathlib (possibly with slash as a directory separator, where it's explicitly noted as a portable option) 4. Using slashes

IMO, using slashes is the *worst* of these. But this latter is a matter of opinion - I've no objection to others believing differently, but I *do* object to slashes being presented as the only option, or the recommended option without qualification.

Please expand on why this is the worst? ChrisA

Richard Damon

6:37 a.m.

New subject: [SPAM?] Re: What to do about invalid escape sequences

On 8/10/19 7:05 AM, Chris Angelico wrote:

...

On Sat, Aug 10, 2019 at 6:39 PM Paul Moore <p.f.moore@gmail.com> wrote:

...
There are *many* valid ways to write Windows pathnames in your code:

1. Raw strings 2. Doubling the backslashes 3. Using pathlib (possibly with slash as a directory separator, where it's explicitly noted as a portable option) 4. Using slashes

IMO, using slashes is the *worst* of these. But this latter is a matter of opinion - I've no objection to others believing differently, but I *do* object to slashes being presented as the only option, or the recommended option without qualification. Please expand on why this is the worst?

ChrisA

One big issue with trying to get use to using / on windows for the directory separator is that it doesn't work for many windows programs because on Windows the / character is defined to be the option character (instead of - for *nix) Yes, you can write your program to use the foreign convention of using - for options, and because the system calls accept either \ or / as the directory separator, paths which use the 'wrong' separator will work, but your program will be violating the conventions of the host environment. -- Richard Damon

Paul Moore

9:03 a.m.

On Sat, 10 Aug 2019 at 12:06, Chris Angelico <rosuav@gmail.com> wrote:

...

On Sat, Aug 10, 2019 at 6:39 PM Paul Moore <p.f.moore@gmail.com> wrote:

...
There are *many* valid ways to write Windows pathnames in your code:

1. Raw strings 2. Doubling the backslashes 3. Using pathlib (possibly with slash as a directory separator, where it's explicitly noted as a portable option) 4. Using slashes

IMO, using slashes is the *worst* of these. But this latter is a matter of opinion - I've no objection to others believing differently, but I *do* object to slashes being presented as the only option, or the recommended option without qualification.

Please expand on why this is the worst?

I did say it was a matter of opinion, so I'm not going to respond if people say that any of the following is "wrong", but since you asked: 1. Backslash is the native separator, whereas slash is not (see eryk sun's post for *way* more detail). 2. People who routinely use slash have a tendency to forget to use os.sep rather than a literal slash in places where it *does* matter. 3. Using slash, in my experience, ends up with paths with "mixed" separators (os.path.join("C:/work/apps", "foo") -> 'C:/work/apps\\foo') which are messy to deal with, and ugly for the user. 4. If a path with slashes is displayed directly to the user without normalisation, it looks incorrect and can confuse users who are only used to "native" Windows programs. Etc. Paul

Glenn Linderman

12:16 p.m.

...

...
On Sat, Aug 10, 2019 at 6:39 PM Paul Moore <p.f.moore@gmail.com> wrote:

...
There are *many* valid ways to write Windows pathnames in your code:

1. Raw strings 2. Doubling the backslashes 3. Using pathlib (possibly with slash as a directory separator, where it's explicitly noted as a portable option) 4. Using slashes

IMO, using slashes is the *worst* of these. But this latter is a matter of opinion - I've no objection to others believing differently, but I *do* object to slashes being presented as the only option, or the recommended option without qualification. Please expand on why this is the worst? I did say it was a matter of opinion, so I'm not going to respond if

On Sat, 10 Aug 2019 at 12:06, Chris Angelico <rosuav@gmail.com> wrote: people say that any of the following is "wrong", but since you asked:

1. Backslash is the native separator, whereas slash is not (see eryk sun's post for *way* more detail). 2. People who routinely use slash have a tendency to forget to use os.sep rather than a literal slash in places where it *does* matter. 3. Using slash, in my experience, ends up with paths with "mixed" separators (os.path.join("C:/work/apps", "foo") -> 'C:/work/apps\\foo') which are messy to deal with, and ugly for the user. 4. If a path with slashes is displayed directly to the user without normalisation, it looks incorrect and can confuse users who are only used to "native" Windows programs.

Etc. Not to mention the problem of passing paths with / to other windows

On 8/10/2019 7:03 AM, Paul Moore wrote: programs via system or subprocess.

Terry Reedy

1:16 p.m.

On 8/10/2019 4:33 AM, Paul Moore wrote:

...

(Side issue)

This deserves its own thread.

...

As a Windows developer, who has seen far too many cases where use of slashes in filenames implies a Unix-based developer not thinking sufficiently about Windows compatibility, or where it leads to people hard coding '/' rather than using os.sep (or better, pathlib), I strongly object to this characterisation. Rather, I would simply say "to make Windows users more aware of the clash in usage between backslashes in filenames and backslashes as string escapes".

There are *many* valid ways to write Windows pathnames in your code:

1. Raw strings 2. Doubling the backslashes 3. Using pathlib (possibly with slash as a directory separator, where it's explicitly noted as a portable option) 4. Using slashes

IMO, using slashes is the *worst* of these. But this latter is a matter of opinion - I've no objection to others believing differently, but I *do* object to slashes being presented as the only option, or the recommended option without qualification.

Perhaps Python Setup and Usage, 3. Using Python on Windows, should have a section of file paths, at most x.y.z, so visible in the TOC listed by https://docs.python.org/3/using/index.html -- Terry Jan Reedy

Glenn Linderman

2:10 p.m.

On 8/10/2019 11:16 AM, Terry Reedy wrote:

...

On 8/10/2019 4:33 AM, Paul Moore wrote:

...
(Side issue)

This deserves its own thread.

...
As a Windows developer, who has seen far too many cases where use of slashes in filenames implies a Unix-based developer not thinking sufficiently about Windows compatibility, or where it leads to people hard coding '/' rather than using os.sep (or better, pathlib), I strongly object to this characterisation. Rather, I would simply say "to make Windows users more aware of the clash in usage between backslashes in filenames and backslashes as string escapes".

There are *many* valid ways to write Windows pathnames in your code:

1. Raw strings

As pointed out elsewhere, Raw strings have limitations, paths ending in \ cannot be represented, and such do exist in various situations, not all of which can be easily avoided... except by the "extra character contortion" of "C:\directory\ "[:-1] (does someone know a better way?) It would be useful to make a "really raw" string that doesn't treat \ special in any way. With 4 different quoting possibilities ( ' " ''' """ ) there isn't really a reason to treat \ special at the end of a raw string, except for backward compatibility. I wonder how many raw strings actually use the \" escape productively? Maybe that should be deprecated too! ? I can't think of a good and necessary use for it, can anyone? Or invent "really raw" in some spelling, such as rr"c:\directory\" or e for exact, or x for exact, or <your favorite character here>"c:\directory\" And that brings me to the thought that if \e wants to become an escape for escape, that maybe there should be an "extended escape" prefix... if you want to use more escapes, define ee"string where \\ can only be used as an escape or escaped character, \e means the ASCII escape character, and \ followed by a character with no escape definition would be an error." Of course "extended escape" could be spelled lots of different ways too, but not the same way as "really raw" :)

...

...
2. Doubling the backslashes 3. Using pathlib (possibly with slash as a directory separator, where it's explicitly noted as a portable option) 4. Using slashes

IMO, using slashes is the *worst* of these. But this latter is a matter of opinion - I've no objection to others believing differently, but I *do* object to slashes being presented as the only option, or the recommended option without qualification.

Perhaps Python Setup and Usage, 3. Using Python on Windows, should have a section of file paths, at most x.y.z, so visible in the TOC listed by https://docs.python.org/3/using/index.html

Guido van Rossum

2:19 p.m.

Regular expressions. On Sat, Aug 10, 2019 at 12:12 Glenn Linderman <v+python@g.nevcal.com> wrote:

...

On 8/10/2019 11:16 AM, Terry Reedy wrote:

On 8/10/2019 4:33 AM, Paul Moore wrote:

(Side issue)

This deserves its own thread.

As a Windows developer, who has seen far too many cases where use of slashes in filenames implies a Unix-based developer not thinking sufficiently about Windows compatibility, or where it leads to people hard coding '/' rather than using os.sep (or better, pathlib), I strongly object to this characterisation. Rather, I would simply say "to make Windows users more aware of the clash in usage between backslashes in filenames and backslashes as string escapes".

There are *many* valid ways to write Windows pathnames in your code:

1. Raw strings

As pointed out elsewhere, Raw strings have limitations, paths ending in \ cannot be represented, and such do exist in various situations, not all of which can be easily avoided... except by the "extra character contortion" of "C:\directory\ "[:-1] (does someone know a better way?)

It would be useful to make a "really raw" string that doesn't treat \ special in any way. With 4 different quoting possibilities ( ' " ''' """ ) there isn't really a reason to treat \ special at the end of a raw string, except for backward compatibility.

I wonder how many raw strings actually use the \" escape productively? Maybe that should be deprecated too! ? I can't think of a good and necessary use for it, can anyone?

Or invent "really raw" in some spelling, such as rr"c:\directory\" or e for exact, or x for exact, or <your favorite character here>"c:\directory\"

And that brings me to the thought that if \e wants to become an escape for escape, that maybe there should be an "extended escape" prefix... if you want to use more escapes, define ee"string where \\ can only be used as an escape or escaped character, \e means the ASCII escape character, and \ followed by a character with no escape definition would be an error."

Of course "extended escape" could be spelled lots of different ways too, but not the same way as "really raw" :)

2. Doubling the backslashes 3. Using pathlib (possibly with slash as a directory separator, where it's explicitly noted as a portable option) 4. Using slashes

IMO, using slashes is the *worst* of these. But this latter is a matter of opinion - I've no objection to others believing differently, but I *do* object to slashes being presented as the only option, or the recommended option without qualification.

Perhaps Python Setup and Usage, 3. Using Python on Windows, should have a section of file paths, at most x.y.z, so visible in the TOC listed by https://docs.python.org/3/using/index.html

_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5MZAXJJY...

-- --Guido (mobile)

Glenn Linderman

2:33 p.m.

On 8/10/2019 12:19 PM, Guido van Rossum wrote:

...

Regular expressions.

I assume that is in response to the "good use for \" escape" question? But can't you just surround them with ' instead of " ? Or ''' ?

...

On Sat, Aug 10, 2019 at 12:12 Glenn Linderman <v+python@g.nevcal.com <mailto:v%2Bpython@g.nevcal.com>> wrote:

On 8/10/2019 11:16 AM, Terry Reedy wrote:

...
On 8/10/2019 4:33 AM, Paul Moore wrote:

...
(Side issue)

This deserves its own thread.

...
As a Windows developer, who has seen far too many cases where use of slashes in filenames implies a Unix-based developer not thinking sufficiently about Windows compatibility, or where it leads to people hard coding '/' rather than using os.sep (or better, pathlib), I strongly object to this characterisation. Rather, I would simply say "to make Windows users more aware of the clash in usage between backslashes in filenames and backslashes as string escapes".

There are *many* valid ways to write Windows pathnames in your code:

1. Raw strings

As pointed out elsewhere, Raw strings have limitations, paths ending in \ cannot be represented, and such do exist in various situations, not all of which can be easily avoided... except by the "extra character contortion" of "C:\directory\ "[:-1] (does someone know a better way?)

It would be useful to make a "really raw" string that doesn't treat \ special in any way. With 4 different quoting possibilities ( ' " ''' """ ) there isn't really a reason to treat \ special at the end of a raw string, except for backward compatibility.

I wonder how many raw strings actually use the \" escape productively? Maybe that should be deprecated too! ? I can't think of a good and necessary use for it, can anyone?

Or invent "really raw" in some spelling, such as rr"c:\directory\" or e for exact, or x for exact, or <your favorite character here>"c:\directory\"

And that brings me to the thought that if \e wants to become an escape for escape, that maybe there should be an "extended escape" prefix... if you want to use more escapes, define ee"string where \\ can only be used as an escape or escaped character, \e means the ASCII escape character, and \ followed by a character with no escape definition would be an error."

Of course "extended escape" could be spelled lots of different ways too, but not the same way as "really raw" :)

...
...
2. Doubling the backslashes 3. Using pathlib (possibly with slash as a directory separator, where it's explicitly noted as a portable option) 4. Using slashes

IMO, using slashes is the *worst* of these. But this latter is a matter of opinion - I've no objection to others believing differently, but I *do* object to slashes being presented as the only option, or the recommended option without qualification.

Perhaps Python Setup and Usage, 3. Using Python on Windows, should have a section of file paths, at most x.y.z, so visible in the TOC listed by https://docs.python.org/3/using/index.html

_______________________________________________ Python-Dev mailing list -- python-dev@python.org <mailto:python-dev@python.org> To unsubscribe send an email to python-dev-leave@python.org <mailto:python-dev-leave@python.org> https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5MZAXJJY...

-- --Guido (mobile)

Greg Ewing

5:36 p.m.

Glenn Linderman wrote:

...

I wonder how many raw strings actually use the \" escape productively? Maybe that should be deprecated too! ? I can't think of a good and necessary use for it, can anyone?

Quite rare, I expect, but it's bound to break someone's code. It might be better to introduce a new string prefix, e.g. 'v' for 'verbatim': v"C:\Users\Fred\" -- Greg

Glenn Linderman

5:44 p.m.

On 8/10/2019 3:36 PM, Greg Ewing wrote:

...

Glenn Linderman wrote:

...
I wonder how many raw strings actually use the \" escape productively? Maybe that should be deprecated too! ? I can't think of a good and necessary use for it, can anyone?

Quite rare, I expect, but it's bound to break someone's code. It might be better to introduce a new string prefix, e.g. 'v' for 'verbatim':

v"C:\Users\Fred\"

Which is why I suggested rr"C:\directory\", but allowed as how there might be better spellings.... I like your v for verbatim !

Steve Dower

12 Aug 12 Aug

11:58 a.m.

On 10Aug2019 1544, Glenn Linderman wrote:

...

On 8/10/2019 3:36 PM, Greg Ewing wrote:

...
It might be better to introduce a new string prefix, e.g. 'v' for 'verbatim':

v"C:\Users\Fred\"

Which is why I suggested rr"C:\directory\", but allowed as how there might be better spellings.... I like your v for verbatim !

The only new prefix I would support is 'p' to construct a pathlib.Path object directly from the string literal. But that doesn't change any of the existing discussion (apart from please take all the new prefix suggestions to python-ideas). People have been solving the trailing backslash problem for a long time, and it's not a big enough burden to need a new fix. Unintentional escapes in paths are a much bigger burden for new users and deserve a fix, but our current warning about the upcoming change is not targeted at the right people. Because we intend to fix the warning, delaying it by a release is not just "kicking the can down the road". But we need some agreement on what that looks like. The bug is already at https://bugs.python.org/issue32912 Cheers, Steve

Serhiy Storchaka

11 Aug 11 Aug

3:26 a.m.

10.08.19 22:10, Glenn Linderman пише:

...

As pointed out elsewhere, Raw strings have limitations, paths ending in \ cannot be represented, and such do exist in various situations, not all of which can be easily avoided... except by the "extra character contortion" of "C:\directory\ "[:-1] (does someone know a better way?)

Other common idiom is r"C:\directory" "\\"

...

I wonder how many raw strings actually use the \" escape productively? Maybe that should be deprecated too! ? I can't think of a good and necessary use for it, can anyone?

This is an interesting question. I have performed some experiments. 15 files in the stdlib (not counting the tokenizer) use \' or \" in raw strings. And one test (test_venv) is failed because of using them in third-party code. All cases are in regular expressions. It is possible to rewrite them, but it is less trivial task than fixing invalid escape sequences. So changing this will require much much more long deprecation period.

Glenn Linderman

3:07 p.m.

On 8/11/2019 1:26 AM, Serhiy Storchaka wrote:

...

10.08.19 22:10, Glenn Linderman пише:

...
As pointed out elsewhere, Raw strings have limitations, paths ending in \ cannot be represented, and such do exist in various situations, not all of which can be easily avoided... except by the "extra character contortion" of "C:\directory\ "[:-1] (does someone know a better way?)

Other common idiom is

r"C:\directory" "\\"

I suppose that concatenation happens at compile time; less sure about [:-1], I would guess not. Thanks for this.

...

...
I wonder how many raw strings actually use the \" escape productively? Maybe that should be deprecated too! ? I can't think of a good and necessary use for it, can anyone?

This is an interesting question. I have performed some experiments. 15 files in the stdlib (not counting the tokenizer) use \' or \" in raw strings. And one test (test_venv) is failed because of using them in third-party code. All cases are in regular expressions. It is possible to rewrite them, but it is less trivial task than fixing invalid escape sequences. So changing this will require much much more long deprecation period.

Couldn't they be rewritten using the above idiom? Why would that be less trivial? Or by using triple quotes, so the \" could be written as " ? That seems trivial.

Serhiy Storchaka

12 Aug 12 Aug

2:11 a.m.

11.08.19 23:07, Glenn Linderman пише:

...

On 8/11/2019 1:26 AM, Serhiy Storchaka wrote:

...
10.08.19 22:10, Glenn Linderman пише:

...
I wonder how many raw strings actually use the \" escape productively? Maybe that should be deprecated too! ? I can't think of a good and necessary use for it, can anyone?

This is an interesting question. I have performed some experiments. 15 files in the stdlib (not counting the tokenizer) use \' or \" in raw strings. And one test (test_venv) is failed because of using them in third-party code. All cases are in regular expressions. It is possible to rewrite them, but it is less trivial task than fixing invalid escape sequences. So changing this will require much much more long deprecation period.

Couldn't they be rewritten using the above idiom? Why would that be less trivial? Or by using triple quotes, so the \" could be written as " ? That seems trivial.

Yes, they could. You can use different quote character, triple quotes, string literal concatenation. There are many options, and you should choose what is applicable in any particular case and what is optimal. You need to analyze the whole string literal and code transformation usually is more complex than just duplicating a backslash or adding the `r` prefix. For example, in many cases `\"` can be replaced with `"'"'r"`, but it does not look pretty readable. See https://github.com/python/cpython/pull/15217.

Glenn Linderman

2:51 p.m.

On 8/12/2019 12:11 AM, Serhiy Storchaka wrote:

...

11.08.19 23:07, Glenn Linderman пише:

...
On 8/11/2019 1:26 AM, Serhiy Storchaka wrote:

...
10.08.19 22:10, Glenn Linderman пише:

...
I wonder how many raw strings actually use the \" escape productively? Maybe that should be deprecated too! ? I can't think of a good and necessary use for it, can anyone?

This is an interesting question. I have performed some experiments. 15 files in the stdlib (not counting the tokenizer) use \' or \" in raw strings. And one test (test_venv) is failed because of using them in third-party code. All cases are in regular expressions. It is possible to rewrite them, but it is less trivial task than fixing invalid escape sequences. So changing this will require much much more long deprecation period.

Couldn't they be rewritten using the above idiom? Why would that be less trivial? Or by using triple quotes, so the \" could be written as " ? That seems trivial.

Yes, they could. You can use different quote character, triple quotes, string literal concatenation. There are many options, and you should choose what is applicable in any particular case and what is optimal. You need to analyze the whole string literal and code transformation usually is more complex than just duplicating a backslash or adding the `r` prefix. For example, in many cases `\"` can be replaced with `"'"'r"`, but it does not look pretty readable.

No, that is not readable. But neither does it seem to be valid syntax, or else I'm not sure what you are saying. Ah, maybe you were saying that a seqence like the '\"' that is already embedded in a raw string can be converted to the sequence `"'"'r"` also embedded in the raw string. That makes the syntax work, but if that is what you were saying, your translation dropped the \ from before the ", since the raw string preserves both the \ and the ". Regarding the readability, I think any use of implicitly concatenated strings should have at least two spaces or a newline between them to make the implicit concatenation clearer.

Serhiy Storchaka

13 Aug 13 Aug

12:52 a.m.

12.08.19 22:51, Glenn Linderman пише:

...

On 8/12/2019 12:11 AM, Serhiy Storchaka wrote:

...
For example, in many cases `\"` can be replaced with `"'"'r"`, but it does not look pretty readable.

No, that is not readable. But neither does it seem to be valid syntax, or else I'm not sure what you are saying. Ah, maybe you were saying that a seqence like the '\"' that is already embedded in a raw string can be converted to the sequence `"'"'r"` also embedded in the raw string. That makes the syntax work, but if that is what you were saying, your translation dropped the \ from before the ", since the raw string preserves both the \ and the ".

Yes, this is what I meant. Thank you for correction. I dropped the `\` because in context of regular expression `\"` and `"` is the same, and a backslash is only used to prevent `"` to end a string literal. This is why `\"` is so rarely used in other strings: because only in regular expressions `\` before `"` does not matter.

...

Regarding the readability, I think any use of implicitly concatenated strings should have at least two spaces or a newline between them to make the implicit concatenation clearer.

Agree. I have wrote it without spaces for dramatic effect.

Steven D'Aprano

11 Aug 11 Aug

4:50 a.m.

On Sat, Aug 10, 2019 at 12:10:55PM -0700, Glenn Linderman wrote:

...

Or invent "really raw" in some spelling, such as rr"c:\directory\" or e for exact, or x for exact, or <your favorite character here>"c:\directory\"

And that brings me to the thought that if \e wants to become an escape for escape, that maybe there should be an "extended escape" prefix... if you want to use more escapes, define ee"string where \\ can only be used as an escape or escaped character, \e means the ASCII escape character, and \ followed by a character with no escape definition would be an error."

Please no. We already have b-strings, r-strings, u-strings, f-strings, br-strings, rb-strings, fr-strings, rf-strings, each of which comes in four varieties (single quote, double quote, triple single quote and triple double quote). Now you're talking about adding rr-strings, v-strings (Greg suggested that) and ee-strings, presumably some or all of which will need b*- and *b- or f*- and *f- varieties too. If the plan to deprecate unrecognised escapes and then make them an exception goes ahead, and I expect that it will, in a few more releases this "extended escape" ee-string will be completely redundent. If \e is required, we will be able to add it to regular strings as needed, likewise for any future new escapes we might want. (If any.) And if we end up keeping the existing behaviour, oh well, we can always write \x1B instead. New escapes are a Nice To Have, not a Must Have. "Really raw" rr'' versus "nearly raw" r'' is a source of confusion just waiting to happen, when people use the wrong numbers of r's, or are simply unclear which they should use. It's not like we have no other options: location = r'C:\directory\subdirectory' '\\' works fine. So does this: location = 'directory/subdirectory/'.replace('/', os.sep) Even better, instead of hard-coding our paths in the source code, we can read them from a config file or database. It is unfortunate that Windows is so tricky with backslashes and forwards slashes, and that it clashes with the escape character, but I'm sure that other languages which use \ for escaping haven't proliferated a four or more kinds of strings with different escaping rules in response. -- Steven

Glenn Linderman

3:18 p.m.

On 8/11/2019 2:50 AM, Steven D'Aprano wrote:

...

On Sat, Aug 10, 2019 at 12:10:55PM -0700, Glenn Linderman wrote:

...
Or invent "really raw" in some spelling, such as rr"c:\directory\" or e for exact, or x for exact, or <your favorite character here>"c:\directory\"

And that brings me to the thought that if \e wants to become an escape for escape, that maybe there should be an "extended escape" prefix... if you want to use more escapes, define ee"string where \\ can only be used as an escape or escaped character, \e means the ASCII escape character, and \ followed by a character with no escape definition would be an error." Please no.

We already have b-strings, r-strings, u-strings, f-strings, br-strings, rb-strings, fr-strings, rf-strings, each of which comes in four varieties (single quote, double quote, triple single quote and triple double quote). Now you're talking about adding rr-strings, v-strings (Greg suggested that) and ee-strings, presumably some or all of which will need b*- and *b- or f*- and *f- varieties too.

Don't forget the upper & lower case varieties :)

...

If the plan to deprecate unrecognised escapes and then make them an exception goes ahead, and I expect that it will, in a few more releases this "extended escape" ee-string will be completely redundent. If \e is required, we will be able to add it to regular strings as needed, likewise for any future new escapes we might want. (If any.) So unrecognized escapes were deprecated in 3.6. And didn't get removed in 3.7. And from all indications, aren't going to be removed in 3.8. What makes you think the same arguments won't happen again for 3.9?

...

And if we end up keeping the existing behaviour, oh well, we can always write \x1B instead. New escapes are a Nice To Have, not a Must Have.

"Really raw" rr'' versus "nearly raw" r'' is a source of confusion just waiting to happen, when people use the wrong numbers of r's, or are simply unclear which they should use. I agree that Greg's v is far better than rr, especially if someone tried to write rfr or rbr. It's not like we have no other options:

location = r'C:\directory\subdirectory' '\\'

works fine. But I never thought of that, until Serhiy mentioned it in his reply, so there are probably lots of other stupid people that didn't think of it either. It's not like it is even suggested in the documentation as a way to work around the non-rawness of raw strings. And it still requires doubling one of the \, so it is more consistent and understandable to just double them all.

...

So does this:

location = 'directory/subdirectory/'.replace('/', os.sep)

This is a far greater run-time cost with the need to scan the string. Granted the total cost isn't huge, unless it is done repeatedly.

...

Even better, instead of hard-coding our paths in the source code, we can read them from a config file or database. Yep, I do that sometimes. But hard-coded paths make good defaults in many circumstances.

...

It is unfortunate that Windows is so tricky with backslashes and forwards slashes, and that it clashes with the escape character, but I'm sure that other languages which use \ for escaping haven't proliferated a four or more kinds of strings with different escaping rules in response.

I agree with this. But Bill didn't consult Guido about the matter.

Eric V. Smith

10:40 p.m.

On 8/11/2019 4:18 PM, Glenn Linderman wrote:

...

On 8/11/2019 2:50 AM, Steven D'Aprano wrote:

...
On Sat, Aug 10, 2019 at 12:10:55PM -0700, Glenn Linderman wrote:

...
Or invent "really raw" in some spelling, such as rr"c:\directory\" or e for exact, or x for exact, or <your favorite character here>"c:\directory\"

And that brings me to the thought that if \e wants to become an escape for escape, that maybe there should be an "extended escape" prefix... if you want to use more escapes, define ee"string where \\ can only be used as an escape or escaped character, \e means the ASCII escape character, and \ followed by a character with no escape definition would be an error." Please no.

We already have b-strings, r-strings, u-strings, f-strings, br-strings, rb-strings, fr-strings, rf-strings, each of which comes in four varieties (single quote, double quote, triple single quote and triple double quote). Now you're talking about adding rr-strings, v-strings (Greg suggested that) and ee-strings, presumably some or all of which will need b*- and *b- or f*- and *f- varieties too.

Don't forget the upper & lower case varieties :)

And all orders!

...

...
...
_all_string_prefixes() {'', 'b', 'BR', 'bR', 'B', 'rb', 'F', 'RF', 'rB', 'FR', 'Rf', 'Fr', 'RB', 'f', 'r', 'rf', 'rF', 'R', 'u', 'fR', 'U', 'Br', 'Rb', 'fr', 'br'} len(_all_string_prefixes()) 25

And if you add just 'bv' and 'fv', it's 41: {'', 'fr', 'Bv', 'BR', 'F', 'rb', 'Fv', 'VB', 'vb', 'vF', 'br', 'FV', 'vf', 'FR', 'fV', 'bV', 'Br', 'Vb', 'Rb', 'RF', 'bR', 'r', 'R', 'Vf', 'fv', 'U', 'RB', 'B', 'rB', 'vB', 'Fr', 'rF', 'fR', 'Rf', 'BV', 'VF', 'bv', 'b', 'u', 'f', 'rf'} There would be no need for 'uv' (not needed for backward compatibility) or 'rv' (can't be both raw and verbatim). I'm not in any way serious about this. I just want people to realize how many wacky combinations there would be. And heaven forbid we ever add some combination of 3 characters. If 'rfv' were actually also valid, you get to 89: {'', 'br', 'vb', 'fR', 'F', 'rFV', 'fRv', 'fV', 'rVF', 'Rfv', 'u', 'vRf', 'fVR', 'rfV', 'Fvr', 'vrf', 'fVr', 'vB', 'Vb', 'Rvf', 'Fv', 'Fr', 'FVr', 'B', 'rVf', 'FVR', 'vfr', 'VB', 'VrF', 'BR', 'VRf', 'vfR', 'FR', 'Br', 'RFV', 'Rf', 'fvR', 'f', 'rb', 'VfR', 'VFR', 'fr', 'vFR', 'VRF', 'frV', 'bR', 'b', 'FrV', 'r', 'R', 'RVF', 'FV', 'rvF', 'FRV', 'Vrf', 'rvf', 'FRv', 'Frv', 'vF', 'bV', 'VF', 'fv', 'RF', 'RB', 'rB', 'vRF', 'RFv', 'RVf', 'Rb', 'Vfr', 'vrF', 'rf', 'Bv', 'vf', 'rF', 'U', 'bv', 'FvR', 'RfV', 'Vf', 'VFr', 'vFr', 'fvr', 'BV', 'rFv', 'rfv', 'fRV', 'frv', 'RvF'} If only we could deprecate upper case prefixes! Eric

Glenn Linderman

12 Aug 12 Aug

12:59 a.m.

On 8/11/2019 8:40 PM, Eric V. Smith wrote:

...

On 8/11/2019 4:18 PM, Glenn Linderman wrote:

...
On 8/11/2019 2:50 AM, Steven D'Aprano wrote:

...
On Sat, Aug 10, 2019 at 12:10:55PM -0700, Glenn Linderman wrote:

...
Or invent "really raw" in some spelling, such as rr"c:\directory\" or e for exact, or x for exact, or <your favorite character here>"c:\directory\"

And that brings me to the thought that if \e wants to become an escape for escape, that maybe there should be an "extended escape" prefix... if you want to use more escapes, define ee"string where \\ can only be used as an escape or escaped character, \e means the ASCII escape character, and \ followed by a character with no escape definition would be an error." Please no.

We already have b-strings, r-strings, u-strings, f-strings, br-strings, rb-strings, fr-strings, rf-strings, each of which comes in four varieties (single quote, double quote, triple single quote and triple double quote). Now you're talking about adding rr-strings, v-strings (Greg suggested that) and ee-strings, presumably some or all of which will need b*- and *b- or f*- and *f- varieties too.

Don't forget the upper & lower case varieties :)

And all orders!

...
...
...
_all_string_prefixes() {'', 'b', 'BR', 'bR', 'B', 'rb', 'F', 'RF', 'rB', 'FR', 'Rf', 'Fr', 'RB', 'f', 'r', 'rf', 'rF', 'R', 'u', 'fR', 'U', 'Br', 'Rb', 'fr', 'br'} len(_all_string_prefixes()) 25

And if you add just 'bv' and 'fv', it's 41:

{'', 'fr', 'Bv', 'BR', 'F', 'rb', 'Fv', 'VB', 'vb', 'vF', 'br', 'FV', 'vf', 'FR', 'fV', 'bV', 'Br', 'Vb', 'Rb', 'RF', 'bR', 'r', 'R', 'Vf', 'fv', 'U', 'RB', 'B', 'rB', 'vB', 'Fr', 'rF', 'fR', 'Rf', 'BV', 'VF', 'bv', 'b', 'u', 'f', 'rf'}

There would be no need for 'uv' (not needed for backward compatibility) or 'rv' (can't be both raw and verbatim).

I'm not in any way serious about this. I just want people to realize how many wacky combinations there would be. And heaven forbid we ever add some combination of 3 characters. If 'rfv' were actually also valid, you get to 89:

{'', 'br', 'vb', 'fR', 'F', 'rFV', 'fRv', 'fV', 'rVF', 'Rfv', 'u', 'vRf', 'fVR', 'rfV', 'Fvr', 'vrf', 'fVr', 'vB', 'Vb', 'Rvf', 'Fv', 'Fr', 'FVr', 'B', 'rVf', 'FVR', 'vfr', 'VB', 'VrF', 'BR', 'VRf', 'vfR', 'FR', 'Br', 'RFV', 'Rf', 'fvR', 'f', 'rb', 'VfR', 'VFR', 'fr', 'vFR', 'VRF', 'frV', 'bR', 'b', 'FrV', 'r', 'R', 'RVF', 'FV', 'rvF', 'FRV', 'Vrf', 'rvf', 'FRv', 'Frv', 'vF', 'bV', 'VF', 'fv', 'RF', 'RB', 'rB', 'vRF', 'RFv', 'RVf', 'Rb', 'Vfr', 'vrF', 'rf', 'Bv', 'vf', 'rF', 'U', 'bv', 'FvR', 'RfV', 'Vf', 'VFr', 'vFr', 'fvr', 'BV', 'rFv', 'rfv', 'fRV', 'frv', 'RvF'}

If only we could deprecate upper case prefixes!

Eric

Yes. Happily while there is a combinatorial explosion in spellings and casings, there is no cognitive overload: each character has an independent effect on the interpretation and use of the string, so once you understand the 5 existing types (b r u f and plain) you understand them all. Should we add one or two more, it would be with the realization (hopefully realized in the documentation also) that v and e would effectively be replacements for r and plain, rather than being combined with them. Were I to design a new language with similar string syntax, I think I would use plain quotes for verbatim strings only, and have the following prefixes, in only a single case: (no prefix) - verbatim UTF-8 (at this point, I see no reason not to require UTF-8 for the encoding of source files) b - for verbatim bytes e - allow (only explicitly documented) escapes f - format strings Actually, the above could be done as a preprocessor for python, or a future import. In other words, what you see is what you get, until you add a prefix to add additional processing. The only combinations that seem useful are eb and ef. I don't know that constraining the order of the prefixes would be helpful or not, if it is helpful, I have no problem with a canonical ordering being prescribed. As a future import, one could code modules to either the current combinatorial explosion with all its gotchas, special cases, and passing of undefined escapes; or one could code to the clean limited cases above. Another thing that seems awkward about the current strings is that {{ and }} become "special escapes". If it were not for the permissive usage of \{ and \} in the current plain string processing, \{ and \} could have been used to escape the non-format-expression uses of { and }, which would be far more consistent with other escapes. Perhaps the future import could regularize that, also. A future import would have no backward compatibility issues to disrupt a simplified, more regular syntax. Does anyone know of an existing feature that couldn't be expressed in a straightforward manner with only the above capabilities? The only other thing that I have heard about regarding strings is that multi-line strings have their first line indented, and other lines not. Some have recommended making the first line blank, and just chopping off the first \n, others have recommended indenting all lines, and replacing "\n" followed by the number of indented spaces by "\n", so the text can be aligned in the code like it will be aligned for use. Both techniques seem to have their place in aiding code readability. Both techniques could be used together, in practice, using one more prefix character for triple quotes only: longstring = l""" The traditional first blank line form could be used at it has.""" If the first character of a long-string is a newline character, then it will be removed. If the string wants to have an initial newline character, a second one can be provided, which would not be removed. longstring = l"""The traditional indented form could be used as it has, also.""" This would be contracted by removing up to the number of space characters to reach the first character of the first line of the string (if the lexer can provide that) after newlines within the string. If fewer space characters are available after a newline, only the number available would be removed. If there are more, they would be retained. A new form would also be permitted: longstring = l""" An indented form that isn't pushed as far right as the traditional indented form could also be used.""" If the first character of an l-string is a newline and the second character is a space character, this form would count the number of space characters in the second line, and remove up to that many space characters from all lines, as well as removing the initial newline character. If l-strings were implemented (l for layout), they could be combined with f and/or e. Are there any other string feature workarounds in common use that could be codified in a future import scenario? Glenn

Greg Ewing

1:52 a.m.

Eric V. Smith wrote:

...

I'm not in any way serious about this. I just want people to realize how many wacky combinations there would be.

It doesn't matter how many combinations there are, as long as multiple prefixes combine in the way you would expect, which they do as far as I can see. -- Greg

Eric V. Smith

5:34 a.m.

On 8/12/2019 2:52 AM, Greg Ewing wrote:

...

Eric V. Smith wrote:

...
I'm not in any way serious about this. I just want people to realize how many wacky combinations there would be.

It doesn't matter how many combinations there are, as long as multiple prefixes combine in the way you would expect, which they do as far as I can see.

In general I agree, although there's some cognitive overhead to which combinations are valid or not. There's no "fu" strings, for example. But for reading code that doesn't matter, so your point stands. Eric

Terry Reedy

2:07 p.m.

On 8/12/2019 6:34 AM, Eric V. Smith wrote:

...

On 8/12/2019 2:52 AM, Greg Ewing wrote:

...
Eric V. Smith wrote:

...
I'm not in any way serious about this. I just want people to realize how many wacky combinations there would be.

It doesn't matter how many combinations there are, as long as multiple prefixes combine in the way you would expect, which they do as far as I can see.

In general I agree, although there's some cognitive overhead to which combinations are valid or not. There's no "fu" strings, for example.

But for reading code that doesn't matter, so your point stands.

Please no more combinations. The presence of both legal and illegal combinations is already a mild nightmare for processing and testing. idlelib.colorizer has the following re to detest legal combinations stringprefix = r"(?i:r|u|f|fr|rf|b|br|rb)?" and the following test strings to make sure it works "# All valid prefixes for unicode and byte strings should be colored.\n" "'x', '''x''', \"x\", \"\"\"x\"\"\"\n" "r'x', u'x', R'x', U'x', f'x', F'x'\n" "fr'x', Fr'x', fR'x', FR'x', rf'x', rF'x', Rf'x', RF'x'\n" "b'x',B'x', br'x',Br'x',bR'x',BR'x', rb'x', rB'x',Rb'x',RB'x'\n" "# Invalid combinations of legal characters should be half colored.\n" "ur'x', ru'x', uf'x', fu'x', UR'x', ufr'x', rfu'x', xf'x', fx'x'\n" Or, if another prefix is added, please add an expanded guaranteed-correct regex to the stdlib somewhere. -- Terry Jan Reedy

Random832

14 Aug 14 Aug

10:02 a.m.

On Mon, Aug 12, 2019, at 15:15, Terry Reedy wrote:

...

Please no more combinations. The presence of both legal and illegal combinations is already a mild nightmare for processing and testing. idlelib.colorizer has the following re to detest legal combinations

stringprefix = r"(?i:r|u|f|fr|rf|b|br|rb)?"

More advanced syntax highlighting editors have to handle each string type separately anyway, because they highlight (valid) backslash-escapes and f-string formatters. The proposed 'v-string' type would need separate handling even in a simplistic editor like IDLE, because it's different at the basic level of \" not ending the string (whereas, for better or worse, all current string types have exactly the same rules for how to find the end delimiter)

Eric V. Smith

1:18 p.m.

On 8/14/2019 11:02 AM, Random832 wrote:

...

On Mon, Aug 12, 2019, at 15:15, Terry Reedy wrote:

...
Please no more combinations. The presence of both legal and illegal combinations is already a mild nightmare for processing and testing. idlelib.colorizer has the following re to detest legal combinations

stringprefix = r"(?i:r|u|f|fr|rf|b|br|rb)?"

More advanced syntax highlighting editors have to handle each string type separately anyway, because they highlight (valid) backslash-escapes and f-string formatters. The proposed 'v-string' type would need separate handling even in a simplistic editor like IDLE, because it's different at the basic level of \" not ending the string (whereas, for better or worse, all current string types have exactly the same rules for how to find the end delimiter)

The reason I defined f-strings as I did is so that lexer/parsers (editors, syntax highlighters, other implementations, etc.) could easily ignore them, at least as a first pass. They're literally like all other strings to the lexer. Python's lexer/parser says that a string is: - some optional letters, making the string prefix - an opening quote or triple quote - some optional chars, with \ escaping - a matching closing quote or triple quote The parser then validates the string prefix ('f' is okay, 'b' is okay, 'fb' isn't okay, 'x' isn't okay, etc.) It then operates on the contents of the string, based on what the string prefix tell it to do. So all an alternate lexer/parser has to do is add 'f' to the valid string prefixes, and it could then at least skip over f-strings. Somewhere in my notes I have 3 or 4 examples of projects that did this, and voila: they "supported" f-strings. Imagine a syntax highlighter that didn't want to highlight the inside of an f-string. The proposed v-strings would indeed break this. I'm opposed to them for this reason, among others. That all said, I am considering moving f-string parsing into the CPython parser. That would let you say things like: f'some text {ord('a')}' I'm not sure that's a great idea, but I've discussed it with several alternate implementations, and with authors of several editors, and they seem okay with it. I'm following Guido's parser experiment with some interest, to see how it might interact with this proposal. Might they also be okay with v-strings? Maybe. But it seems like a lot of hassle for a very minor feature. Eric

Glenn Linderman

3:49 p.m.

...

On Mon, Aug 12, 2019, at 15:15, Terry Reedy wrote:

...
Please no more combinations. The presence of both legal and illegal combinations is already a mild nightmare for processing and testing. idlelib.colorizer has the following re to detest legal combinations

stringprefix = r"(?i:r|u|f|fr|rf|b|br|rb)?" More advanced syntax highlighting editors have to handle each string type separately anyway, because they highlight (valid) backslash-escapes and f-string formatters. The proposed 'v-string' type would need separate handling even in a simplistic editor like IDLE, because it's different at the basic level of \" not ending the string (whereas, for better or worse, all current string types have exactly the same rules for how to find the end delimiter) I had to read this several times, and then only after reading Eric's reply, it finally hit me that what you are saying is that \" doesn't end

...

Currently a raw literal cannot end in a single backslash (e.g. in r"C:\User\"). Although there are reasons for this. It is an old gotcha, and there are many closed issues about it. This question is even included in FAQ. which indicates that I am not the only one that has been tripped up by

On 8/14/2019 8:02 AM, Random832 wrote: the string in any other form of string, but that sequence would end a v-string. It seems that also explains why Serhiy, in describing his experiment really raw string literals mentioned having to change the tokenizer as well as the parser (proving that it isn't impossible to deal with truly raw strings). \" not ending a raw string was certainly a gotcha for me when I started using Python (with a background in C and Perl among other languages), and it convinced me not to raw strings, that that gotcha was not worth the other benefits of raw strings. Serhiy said: that over the years. Trying to look at it from the eyes of a beginning programmer, the whole idea of backslash being an escape character is an unnatural artifice. I'm unaware (but willing to be educated) of any natural language, when using quotations, that has such a concept. Nested quotations exist, in various forms: use of a different quotation mark for the inner and outer quotations, and block quotations (which in English, have increased margin on both sides, and have a blank line before and after). Python actually supports constructs very similar to the natural language formats, allowing both " and ' for quotations and nested quotations, and the triple-quoted string with either " or ' is very similar in concept to a block quotation. But _all_ the strings forms are burdened with surprises for the beginning programmer: escape sequences of one sort or another must be learned and understood to avoid surprises when using the \ character. Programming languages certainly need an escape character mechanism to deal with characters that cannot easily be typed on a keyboard (such as ¤ ¶ etc.), or which are visually indistinguishable from other characters or character sequences (various widths of white space), or which would be disruptive to the flow of code or syntax if represented by the usual character (newline, carriage return, formfeed, maybe others). But these are programming concepts, not natural language concept. The basic concept of a quoted string should best be borrowed directly from natural language, and then enhancements to that made to deal with programming concepts. In Python, as in C, the escape characters are built in the basic string syntax, one must learn the quirks of the escaping mechanism in order to write In Perl, " strings include escapes, and ' strings do not. So there is a basic string syntax that is similar to natural language, and one that is extended to include programming concepts. [N.B. There are lots of reasons I switched from Perl to Python, and don't have any desire to go back, but I have to admit, that the lack of a truly raw string in Python was a disappointment.] So that, together with the desire for new escape sequences, and the creation of a new escape mechanism in the f-string {} (which adds both { and } as escape characters by requiring them to be doubled to be treated as literal inside an f-string, instead of using \{ and \} as the escapes [which would have been possible, due to the addition of the f prefix]), and the issue that because every current \-escape is defined to do something, is why I suggested elsewhere in this thread <https://mail.python.org/archives/list/python-dev@python.org/message/XJNS45JG...> that perhaps the whole irregular string syntax should be rebooted with a future import, and it seems it could both be simpler, more regular, and more powerful as a result. And by using a future import, there are no backward incompatibility issues, and migration can be module by module. The more I think about this, the more tempting it is to attempt to fork Python just to have a better string syntax! But alas! So many other time commitments, and a lack of in-depth internals knowledge make that an impossibility. I daresay, though, that if I get a free week, I might well write a preprocessor that converts my suggested future syntax to C-Python, so that I can use it in my own projects!

Greg Ewing

15 Aug 15 Aug

3:40 a.m.

If we want a truly raw string format that allows all characters, including any kind of quote, we could take a tip from Fortran: s = 31HThis is a "totally raw" string! -- Greg

Petr Viktorin

6:17 a.m.

On 8/15/19 10:40 AM, Greg Ewing wrote:

...

If we want a truly raw string format that allows all characters, including any kind of quote, we could take a tip from Fortran:

s = 31HThis is a "totally raw" string!

Or from Rust: let s = r"Here's a raw string"; let s = r#"Here's a raw string with "quotes" in it"#; let s = r##"Here's r#"the raw string syntax"# in raw string"##; let s = r###"and here's a '##"' as well"###;

Glenn Linderman

12:49 p.m.

On 8/15/2019 4:17 AM, Petr Viktorin wrote:

...

On 8/15/19 10:40 AM, Greg Ewing wrote:

...
If we want a truly raw string format that allows all characters, including any kind of quote, we could take a tip from Fortran:

s = 31HThis is a "totally raw" string!

Or from Rust:

let s = r"Here's a raw string"; let s = r#"Here's a raw string with "quotes" in it"#; let s = r##"Here's r#"the raw string syntax"# in raw string"##; let s = r###"and here's a '##"' as well"###;

Indeed, Fortran has raw strings, but comes with the disadvantage of having to count characters. This is poor form when edits want to change the length of the string, although it might be advantageous if the string must fit into a certain fixed-width on a line printer. Let's not go there. Without reading the Rust spec, but from your examples, it seems that Rust has borrowed concepts from Perl's q and qq operators, both of which allowed specification of any non-alphanumeric character as the delimiter. Not sure if that included Unicode characters (certainly not in the early days before Unicode support was added), but it did have a special case for paired characters such as <> [] {} to allow those pairs to be used as delimiters, and still allow properly nested instances of themselves inside the string. It looks like Rust might only allow #, but any number of them, to delimit raw strings. This is sufficient, but for overly complex raw strings containing lots of # character sequences, it could get cumbersome, and starts to border on the problems of the Fortran solution, where character counting is an issue, whereas the choice of an alternative character or character sequence would result in a simpler syntax. I don't know if Rust permits implicit string concatenation, but a quick search convinces me it doesn't. The combination of Python's triple-quote string literal, together with implicit concatenation, is a powerful way to deal with extremely complex string literals, although it does require breaking them into pieces occasionally, mostly when including a string describing the triple-quote syntax. Note that regex searching for triple-quotes can use "{3} or '{3} to avoid the need to embed triple-quotes in the regex. Perl's "choice of delimiter" syntax is maybe a bit more convenient sometimes, but makes parsing of long strings mentally exhausting (although it is quick for the interpreter), due to needing to remember what character is being used as the delimiter. My proposal isn't intended to change the overall flavor of Python's string syntax, just to regularize and simplify it, while allowing additional escapes and other extensions to be added in the future, without backward-compatibility issues.

Rob Cliffe

1 p.m.

On 15/08/2019 12:17:36, Petr Viktorin wrote:

...

On 8/15/19 10:40 AM, Greg Ewing wrote:

...
If we want a truly raw string format that allows all characters, including any kind of quote, we could take a tip from Fortran:

s = 31HThis is a "totally raw" string!

Or from Rust:

let s = r"Here's a raw string"; let s = r#"Here's a raw string with "quotes" in it"#; let s = r##"Here's r#"the raw string syntax"# in raw string"##; let s = r###"and here's a '##"' as well"###; _______________________________________________ I rather like the idea! (Even though it would add to the proliferation of string types.) Obviously Python can't use # as the special character since that introduces a comment, and a lot of other possibilities are excluded because they would lead to ambiguous syntax. Say for the sake of argument we used "!" (exclamation mark). Possible variations include: (1) Like Rust: s = r"Here's a raw string"; s = r!"Here's a raw string with "quotes" in it"!; s = r!!"Here's r!"the raw string syntax"! in raw string"!!; s = r!!!"and here's a '!!"' as well"!!!; (2) Same, but omit the leading 'r' when using !: s = r"Here's a raw string"; s = !"Here's a raw string with "quotes" in it"!; s = !!"Here's a raw string with "quotes" and !exclamation marks! in it"!!; s = !!!"and here's a '!!"' as well"!!!; # Cons: Would conflict with adding ! as an operator (or at minimum, as a unary operator) for some other purpose in future. # Makes it less obvious that a !string! is a raw string. (3) Allow the user to specify his own delimiting character: s = r!|This raw string can't contain a "bar".| (4) As above, but the "!" is not required: s = r|This raw string can't contain a "bar".| # In this case the delimiter ought not to be a letter # (it might conflict with current or future string prefixes); # this could be forbidden. (5) Similar, but allow the user to specify his own delimiting *string* (specified between "!"s) (as long as it doesn't contain !): let s = r!?@!Could this string could contain almost anything? Yes!?@ # The text in this string is: # Could this string could contain almost anything? Yes! (6) Same except the first "!" is not required. In this case the first character of the delimiting string should not be a letter: let s = r?@!Could this string could contain almost anything? Yes!?@ # The text in this string is: # Could this string could contain almost anything? Yes!

I can dream ... A point about the current syntax: It is not true that a raw string can't end in a backslash, as https://en.wikipedia.org/wiki/String_literal points out. It can't end in an *odd number* of backslashes. 42 is fine, 43 is no good. Which makes it seem even more of a language wart (think of program-generated art). Rob Cliffe

Glenn Linderman

9 Aug 9 Aug

3:12 p.m.

On 8/9/2019 9:08 AM, Nick Coghlan wrote:

...

On Sat, 10 Aug 2019 at 01:44, Guido van Rossum <guido@python.org> wrote:

...
This discussion looks like there's no end in sight. Maybe the Steering Council should take a vote? I find the "Our deprecation warnings were even less visible than normal" argument for extending the deprecation period compelling.

I also think the UX of the warning itself could be reviewed to provide a more explicit nudge towards using raw strings when folks want to allow arbitrary embedded backslashes. Consider:

SyntaxWarning: invalid escape sequence \,

vs something like:

SyntaxWarning: invalid escape sequence \, (Note: adding the raw string literal prefix, r, will accept all non-trailing backslashes)

After all, the habit we're trying to encourage is "If I want to include backslashes without escaping them all, I should use a raw string", not "I should memorize the entire set of valid escape sequences" or even "I should always escape backslashes".

Cheers, Nick.

The reason I never use raw strings is in the documentation, it is because \ still has a special meaning, and the first several times I felt the need for raw strings, it was for directory names that wanted to end with \ and couldn't. Quoted below. Also relevant to the discussion is the "benefit" of leaving the backslash in the result of an illegal escape, which no one has mentioned in this huge thread.

...

Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e., /the backslash is left in the result/. (This behavior is useful when debugging: if an escape sequence is mistyped, the resulting output is more easily recognized as broken.) It is also important to note that the escape sequences only recognized in string literals fall into the category of unrecognized escapes for bytes literals.

Changed in version 3.6: Unrecognized escape sequences produce a DeprecationWarning. In some future version of Python they will be a SyntaxError.

Even in a raw literal, quotes can be escaped with a backslash, but the backslash remains in the result; for example, |r"\""| is a valid string literal consisting of two characters: a backslash and a double quote; |r"\"| is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, /a raw literal cannot end in a single backslash/ (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the literal, /not/ as a line continuation.

Steven D'Aprano

4:53 p.m.

On Fri, Aug 09, 2019 at 01:12:59PM -0700, Glenn Linderman wrote:

...

The reason I never use raw strings is in the documentation, it is because \ still has a special meaning, and the first several times I felt the need for raw strings, it was for directory names that wanted to end with \ and couldn't.

Can you elaborate? I find it unlikely that I would ever want a docstring that ends with a backslash: def func(): r"""Documentation goes here... more documentation... ending with a Windows path that needs a trailing backslash like this C:\directory\""" That seems horribly contrived. Why use backslashes in the path when the strong recommendation is to use forward slashes? And why not solve the problem by simply moving the closing quotes to the next line, as PEP 8 recommends? r"""Documentation ... C:\directory\ """ [...]

...

...
Even in a raw literal, quotes can be escaped with a backslash

Indeed, they're not so much "raw" strings as only slightly blanched strings. -- Steven

Glenn Linderman

5:18 p.m.

On 8/9/2019 2:53 PM, Steven D'Aprano wrote:

...

On Fri, Aug 09, 2019 at 01:12:59PM -0700, Glenn Linderman wrote:

...
The reason I never use raw strings is in the documentation, it is because \ still has a special meaning, and the first several times I felt the need for raw strings, it was for directory names that wanted to end with \ and couldn't. Can you elaborate? I find it unlikely that I would ever want a docstring

I didn't mention docstring. I just wanted a string with a path name ending in \.

...

that ends with a backslash:

def func(): r"""Documentation goes here... more documentation... ending with a Windows path that needs a trailing backslash like this C:\directory\"""

That seems horribly contrived. Why use backslashes in the path when the strong recommendation is to use forward slashes?

Windows users are used to seeing backslashes in paths, I don't care to be the one to explain why my program uses / and all the rest use \.

...

And why not solve the problem by simply moving the closing quotes to the next line, as PEP 8 recommends?

r"""Documentation ... C:\directory\ """

This isn't my problem, I wasn't using docstrings, and including a newline in a path name doesn't work. I suppose one could "solve" the problem by using "c:\directory\ "[ :-1] but that is just as annoying as "c:\\directory\\" and back when I discovered the problem, I was still learning Python, and didn't think of the above solution either.

...

[...]

...
...
Even in a raw literal, quotes can be escaped with a backslash Indeed, they're not so much "raw" strings as only slightly blanched strings.

Steven D'Aprano

5:56 p.m.

I'm not trying to be confrontational, I'm trying to understand your use-case(s) and see if it would be broken by the planned change to string escapes. On Fri, Aug 09, 2019 at 03:18:29PM -0700, Glenn Linderman wrote:

...

On 8/9/2019 2:53 PM, Steven D'Aprano wrote:

...
On Fri, Aug 09, 2019 at 01:12:59PM -0700, Glenn Linderman wrote:

...
The reason I never use raw strings is in the documentation, it is because \ still has a special meaning, and the first several times I felt the need for raw strings, it was for directory names that wanted to end with \ and couldn't. Can you elaborate? I find it unlikely that I would ever want a docstring

I didn't mention docstring. I just wanted a string with a path name ending in \.

You said you never used raw strings in the documentation. I read that as doc strings. What sort of documentation are you writing that isn't a doc string but is inside your .py files where the difference between raw and regular strings is meaningful?

...

Windows users are used to seeing backslashes in paths, I don't care to be the one to explain why my program uses / and all the rest use \.

If you don't use raw strings for paths, you get to explain why your program uses \\ and all the rest use \ *wink* If they're Windows end users, they won't be reading your source code and will never know how you represent hard-coded paths in the source code. If they're Windows developers, they ought to be aware that the Windows file system API allows / anywhere you can use \ and it is the common convention in Python to use forward slashes. I'm also curious why the string needs to *end* with a backslash. Both of these are the same path: C:\foo\bar\baz\ C:\foo\bar\baz -- Steven

MRAB

6:08 p.m.

On 2019-08-09 23:56, Steven D'Aprano wrote:

...

I'm not trying to be confrontational, I'm trying to understand your use-case(s) and see if it would be broken by the planned change to string escapes.

On Fri, Aug 09, 2019 at 03:18:29PM -0700, Glenn Linderman wrote:

...
On 8/9/2019 2:53 PM, Steven D'Aprano wrote:

...
On Fri, Aug 09, 2019 at 01:12:59PM -0700, Glenn Linderman wrote:

...
The reason I never use raw strings is in the documentation, it is because \ still has a special meaning, and the first several times I felt the need for raw strings, it was for directory names that wanted to end with \ and couldn't. Can you elaborate? I find it unlikely that I would ever want a docstring

I didn't mention docstring. I just wanted a string with a path name ending in \.

You said you never used raw strings in the documentation. I read that as doc strings. What sort of documentation are you writing that isn't a doc string but is inside your .py files where the difference between raw and regular strings is meaningful?

...
Windows users are used to seeing backslashes in paths, I don't care to be the one to explain why my program uses / and all the rest use \.

If you don't use raw strings for paths, you get to explain why your program uses \\ and all the rest use \ *wink*

If they're Windows end users, they won't be reading your source code and will never know how you represent hard-coded paths in the source code.

If they're Windows developers, they ought to be aware that the Windows file system API allows / anywhere you can use \ and it is the common convention in Python to use forward slashes.

I'm also curious why the string needs to *end* with a backslash. Both of these are the same path:

C:\foo\bar\baz\ C:\foo\bar\baz

The only time it's required is for the root directory of a drive: C:\

Glenn Linderman

9:30 p.m.

On 8/9/2019 4:08 PM, MRAB wrote:

...

On 2019-08-09 23:56, Steven D'Aprano wrote:

...
I'm not trying to be confrontational, I'm trying to understand your use-case(s) and see if it would be broken by the planned change to string escapes.

On Fri, Aug 09, 2019 at 03:18:29PM -0700, Glenn Linderman wrote:

...
On 8/9/2019 2:53 PM, Steven D'Aprano wrote:

...
On Fri, Aug 09, 2019 at 01:12:59PM -0700, Glenn Linderman wrote:

...
The reason I never use raw strings is in the documentation, it is because \ still has a special meaning, and the first several times I felt the need for raw strings, it was for directory names that wanted to end with \ and couldn't. Can you elaborate? I find it unlikely that I would ever want a docstring

I didn't mention docstring. I just wanted a string with a path name ending in \.

You said you never used raw strings in the documentation. I read that as doc strings. What sort of documentation are you writing that isn't a doc string but is inside your .py files where the difference between raw and regular strings is meaningful?

...
Windows users are used to seeing backslashes in paths, I don't care to be the one to explain why my program uses / and all the rest use \.

If you don't use raw strings for paths, you get to explain why your program uses \\ and all the rest use \ *wink*

If they're Windows end users, they won't be reading your source code and will never know how you represent hard-coded paths in the source code.

If they're Windows developers, they ought to be aware that the Windows file system API allows / anywhere you can use \ and it is the common convention in Python to use forward slashes.

I'm also curious why the string needs to *end* with a backslash. Both of these are the same path:

C:\foo\bar\baz\ C:\foo\bar\baz

The only time it's required is for the root directory of a drive:

C:\

That's not the only time it's required, but it is a case that is far harder to specify in other ways. It's required any time you want to say + filename without writing + "\\" + filename, or os.path.join( 'C:\\", filename )

Glenn Linderman

9:27 p.m.

On 8/9/2019 3:56 PM, Steven D'Aprano wrote:

...

I'm not trying to be confrontational, I'm trying to understand your use-case(s) and see if it would be broken by the planned change to string escapes.

Yeah, that's fine. Sometimes it is hard to communicate via email (versus saying a lot).

...

On Fri, Aug 09, 2019 at 03:18:29PM -0700, Glenn Linderman wrote:

...
On 8/9/2019 2:53 PM, Steven D'Aprano wrote:

...
On Fri, Aug 09, 2019 at 01:12:59PM -0700, Glenn Linderman wrote:

...
The reason I never use raw strings is in the documentation, it is because \ still has a special meaning, and the first several times I felt the need for raw strings, it was for directory names that wanted to end with \ and couldn't. Can you elaborate? I find it unlikely that I would ever want a docstring I didn't mention docstring. I just wanted a string with a path name ending in \. You said you never used raw strings in the documentation. I read that as doc strings. What sort of documentation are you writing that isn't a doc string but is inside your .py files where the difference between raw and regular strings is meaningful?

No, what I said was that the reason is in the documentation. The reason that I don't use raw strings is in the Python documentation. I don't claim to use raw strings for documentation I write. The reason is because \" to end the string doesn't work, and the first good-sounding justification for using raw strings that I stumbled across was to avoid "c:\\directory\\" in favor of r"c:\directory\" but that doesn't work, and neither do r"c:\directory\\". Since then, I have not found any other compelling need for raw strings that overcome that deficiency... the benefit of raw strings is that you don't have to double the \\. But the benefit is contradicted by not being able to use one at the end of sting. If you can't use it at the end of the string, the utility of not doubling them in the middle of the string is just too confusing to make it worth figuring out the workarounds when you have a string full of \ that happens to end in \. Just easier to remember the "always double \" rule, than to remember the extra "but if your string containing \ doesn't have one at the end you can get away with using a raw string and not doubling the \.

...

...
Windows users are used to seeing backslashes in paths, I don't care to be the one to explain why my program uses / and all the rest use \. If you don't use raw strings for paths, you get to explain why your program uses \\ and all the rest use \ *wink* Wrong. Users don't look at the source code. They look at the output. I also don't want to have to write code to convert /-laden paths to \-laden paths when I display them to the user.

...

If they're Windows end users, they won't be reading your source code and will never know how you represent hard-coded paths in the source code.

They will if I display the path as a default value for an argument, or show them the path for other reasons, or if the path shows up in an exception message.

...

If they're Windows developers, they ought to be aware that the Windows file system API allows / anywhere you can use \ and it is the common convention in Python to use forward slashes.

This, we can agree on.

...

I'm also curious why the string needs to *end* with a backslash. Both of these are the same path:

C:\foo\bar\baz\ C:\foo\bar\baz

Sure. But only one of them can be used successfully with + filename (for example).

eryk sun

10 Aug 10 Aug

5:50 a.m.

On 8/9/19, Steven D'Aprano <steve@pearwood.info> wrote:

...

I'm also curious why the string needs to *end* with a backslash. Both of these are the same path:

C:\foo\bar\baz\ C:\foo\bar\baz

The above two cases are equivalent. But that's not the case for the root directory. Unlike Unix, filesystem namespaces are implemented directly on devices. For example, "//./C:" might resolve to a volume device such as "\\Device\\HarddiskVolume2". With a trailing slash added, "//./C:/" resolves to "\\Device\\HarddiskVolume2\\", which is the root directory of the mounted filesystem on the volume. Also, as a classic DOS path, "C:" without a trailing slash expands to the working directory on drive "C:". The system runtime library looks for this path in a hidden environment variable named "=C:". The Windows API never sets these hidden "=X:" drive variables. The C runtime sets them, as does Python's os.chdir. Some volume-management functions require a trailing slash or backslash, such as GetVolumeInformationW [1]. GetVolumeNameForVolumeMountPointW [2] actually requires it to be a trailing backslash. It will not accept a trailing forward slash such as "C:\\Mount\\Volume/" (a bug since Windows 2000). The volume name (e.g. "\\\\?\\Volume{xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}\\") returned by the latter includes a trailing backslash, which must be present in the target path in order for a mountpoint to function properly as a directory, else it would resolve to the volume device instead of the root directory. [1] https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getvol... [2] https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getvol...

...

If they're Windows developers, they ought to be aware that the Windows file system API allows / anywhere you can use \ and it is the common convention in Python to use forward slashes.

The Windows file API actually does not allow slash to be used anywhere that we can use backslash. It's usually allowed, but not always. For the most part, the conditions where forward slash is not supported are intentional. Windows replaces forward slash with backslash in normal DOS paths and normal device paths. But sometimes we have to use a special form of device path that bypasses normalization. A path that isn't normalized can only use backslash as the path separator. For example, the most common case is that the process doesn't have long paths enabled. In this case we're limited to MAX_PATH, which limits file paths to a paltry 259 characters (sans the terminating null); the current directory to 258 characters (sans a trailing backslash and null); and the path of a new directory to 247 characters (subtract 12 from 259 to leave space for an 8.3 filename). By skipping DOS normalization, we can access a path with up to about 32,750 characters (i.e. 32,767 sans the length of the device name in the final NT path under "\\Device\\"). (Long normalized paths are available starting in Windows 10, but the system policy that allows this is disabled by default, and even if enabled, each application has to declare itself to be long-path aware in its manifest. This is declared for python[w].exe in Python 3.6+.) A device path is an explicit reference to a user's local device directory (in the object namespace), which shadows the global device directory. In NT, this directory is aliased to a special "\\??\\" prefix (backslash only). A local device directory is created for each logon session (not terminal session) by the security system that runs in terminal session 0 (i.e. the system services session). The per-logon directory is located at "\\Sessions\\0\\DosDevices\\<Logon Session ID>". In the Windows API, it's accessible as "//?/" or "//./", or with any mix of forward slashes or backslashes, but only the all-backslash form is special-cased to bypass the normalization step.

eryk sun

6:34 a.m.

On 8/10/19, eryk sun <eryksun@gmail.com> wrote:

...

The per-logon directory is located at "\\Sessions\\0\\DosDevices\\<Logon Session ID>". In the Windows API, it's accessible as "//?/" or "//./", or with any mix of forward slashes or backslashes, but only the all-backslash form is special-cased to bypass the normalization step.

Correction: I slipped up in that last sentence. Only the all-backslash form that's in the "?" namespace bypasses normalization, as most Windows users should at least have seen in passing. These special device paths pop up here and there. For example, r'\\?\C:\Temp\spam. . .' allows creating or opening a file named "spam. . .", which the Windows API would normalize as "spam". But I don't recommend sidestepping the normal rules -- except for the path length limit because there are ways to make long paths conveniently accessible (e.g. symbolic links, bind-like mountpoints, and subst drives). Sometimes people also come across "\\??\\" paths and come to the mistaken conclusion that these can be used in Windows API programs. No, they're for NT. The runtime library mangles them, e.g. nt._getfullpathname(r'\??\C:') == 'C:\\??\\C:'.

Rob Cliffe

1:24 p.m.

On 10/08/2019 11:50:35, eryk sun wrote:

...

...
I'm also curious why the string needs to *end* with a backslash. Both of these are the same path:

C:\foo\bar\baz\ C:\foo\bar\baz Also, the former is simply more *informative* - it tells the reader that baz is expected to be a directory, not a file. Rob Cliffe The above two cases are equivalent. But that's not the case for the root directory. Unlike Unix, filesystem namespaces are implemented

On 8/9/19, Steven D'Aprano <steve@pearwood.info> wrote: directly on devices. For example, "//./C:" might resolve to a volume device such as "\\Device\\HarddiskVolume2". With a trailing slash added, "//./C:/" resolves to "\\Device\\HarddiskVolume2\\", which is the root directory of the mounted filesystem on the volume.

Also, as a classic DOS path, "C:" without a trailing slash expands to the working directory on drive "C:". The system runtime library looks for this path in a hidden environment variable named "=C:". The Windows API never sets these hidden "=X:" drive variables. The C runtime sets them, as does Python's os.chdir.

Some volume-management functions require a trailing slash or backslash, such as GetVolumeInformationW [1]. GetVolumeNameForVolumeMountPointW [2] actually requires it to be a trailing backslash. It will not accept a trailing forward slash such as "C:\\Mount\\Volume/" (a bug since Windows 2000). The volume name (e.g. "\\\\?\\Volume{xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}\\") returned by the latter includes a trailing backslash, which must be present in the target path in order for a mountpoint to function properly as a directory, else it would resolve to the volume device instead of the root directory.

[1] https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getvol... [2] https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getvol...

...
If they're Windows developers, they ought to be aware that the Windows file system API allows / anywhere you can use \ and it is the common convention in Python to use forward slashes. The Windows file API actually does not allow slash to be used anywhere that we can use backslash. It's usually allowed, but not always. For the most part, the conditions where forward slash is not supported are intentional.

Windows replaces forward slash with backslash in normal DOS paths and normal device paths. But sometimes we have to use a special form of device path that bypasses normalization. A path that isn't normalized can only use backslash as the path separator. For example, the most common case is that the process doesn't have long paths enabled. In this case we're limited to MAX_PATH, which limits file paths to a paltry 259 characters (sans the terminating null); the current directory to 258 characters (sans a trailing backslash and null); and the path of a new directory to 247 characters (subtract 12 from 259 to leave space for an 8.3 filename). By skipping DOS normalization, we can access a path with up to about 32,750 characters (i.e. 32,767 sans the length of the device name in the final NT path under "\\Device\\").

(Long normalized paths are available starting in Windows 10, but the system policy that allows this is disabled by default, and even if enabled, each application has to declare itself to be long-path aware in its manifest. This is declared for python[w].exe in Python 3.6+.)

A device path is an explicit reference to a user's local device directory (in the object namespace), which shadows the global device directory. In NT, this directory is aliased to a special "\\??\\" prefix (backslash only). A local device directory is created for each logon session (not terminal session) by the security system that runs in terminal session 0 (i.e. the system services session). The per-logon directory is located at "\\Sessions\\0\\DosDevices\\<Logon Session ID>". In the Windows API, it's accessible as "//?/" or "//./", or with any mix of forward slashes or backslashes, but only the all-backslash form is special-cased to bypass the normalization step. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/3SDFM2EK...

--- This email has been checked for viruses by AVG. https://www.avg.com

eryk sun

3:44 p.m.

On 8/10/19, Rob Cliffe via Python-Dev <python-dev@python.org> wrote:

...

On 10/08/2019 11:50:35, eryk sun wrote:

...
On 8/9/19, Steven D'Aprano <steve@pearwood.info> wrote:

...
I'm also curious why the string needs to *end* with a backslash. Both of these are the same path:

C:\foo\bar\baz\ C:\foo\bar\baz

Also, the former is simply more *informative* - it tells the reader that baz is expected to be a directory, not a file.

This is an important point that I overlooked. The trailing backslash is more than just a redundant character to inform human readers. Refer to [MS-FSA] 2.1.5.1 "Server Requests an Open of a File" [1]. A create/open fails with STATUS_OBJECT_NAME_INVALID if either of the following is true: * PathName contains a trailing backslash and CreateOptions.FILE_NON_DIRECTORY_FILE is TRUE. * PathName contains a trailing backslash and StreamTypeToOpen is DataStream For NtCreateFile or NtOpenFile (in the NT API), the FILE_NON_DIRECTORY_FILE option restricts the call to a regular file, and FILE_DIRECTORY_FILE restricts it to a directory. With neither option, the call can target either a file or directory. A trailing backslash is another information channel. It tells the filesystem that the target has to be a directory. If we specify FILE_NON_DIRECTORY_FILE with a trailing backslash on the name, this is an immediate failure as an invalid name without even checking the entry. If we specify neither option and use a trailing backslash, it's an invalid name if the filesystem finds a regular file or data stream. Had the call specified the FILE_DIRECTORY_FILE option, it would instead fail with STATUS_NOT_A_DIRECTORY. We can see this in practice in the published source for the fastfat filesystem driver. FatCommonCreate [2] (for a create or open) has the following code to handle the second case (in this code, an FCB is a file control block for a regular file, and a DCB is a directory control block): if (NodeType(Fcb) == FAT_NTC_FCB) { // // Check if we were only to open a directory // if (OpenDirectory) { DebugTrace(0, Dbg, "Cannot open file as directory\n", 0); try_return( Iosb.Status = STATUS_NOT_A_DIRECTORY ); } DebugTrace(0, Dbg, "Open existing fcb, Fcb = %p\n", Fcb); if ( TrailingBackslash ) { try_return( Iosb.Status = STATUS_OBJECT_NAME_INVALID ); } We observe the first case with a typical CreateFileW call, which uses the option FILE_NON_DIRECTORY_FILE. In the following example "baz" is a regular file: >>> f = open(r'foo\bar\baz') # success >>> try: open('foo\\bar\\baz\\') ... except OSError as e: print(e) ... [Errno 22] Invalid argument: 'foo\\bar\\baz\\' C EINVAL (22) is mapped from Windows ERROR_INVALID_NAME (123), which is mapped from NT STATUS_OBJECT_NAME_INVALID (0xC0000033). We can observe the second case with os.stat(), which calls CreateFileW with backup semantics, which omits the FILE_NON_DIRECTORY_FILE option in order to allow the call to open either a file or directory. In this case the filesystem has to actually check that "baz" is a data file before it can fail the call, as was shown in the fasfat code snippet above: >>> try: os.stat('foo\\bar\\baz\\') ... except OSError as e: print(e) ... [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'foo\\bar\\baz\\' [1] https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-fsa/8ada5fbe... [2] https://github.com/microsoft/Windows-driver-samples/blob/74200/filesys/fastf...

Greg Ewing

5:30 p.m.

Rob Cliffe via Python-Dev wrote:

...

Also, the former is simply more *informative* - it tells the reader that baz is expected to be a directory, not a file.

On Windows you can usually tell that from the fact that filenames almost always have an extension, and directory names almost never do. -- Greg

Rob Cliffe

9:30 p.m.

On 10/08/2019 23:30:18, Greg Ewing wrote:

...

Rob Cliffe via Python-Dev wrote:

...
Also, the former is simply more *informative* - it tells the reader that baz is expected to be a directory, not a file.

On Windows you can usually tell that from the fact that filenames almost always have an extension, and directory names almost never do.

Usually, but not always. I have not infrequently used files with a blank extension. I can't recall using a directory name with an extension (but I can't swear that I never have). Rob Cliffe

Eric V. Smith

11 Aug 11 Aug

4:44 a.m.

On 8/10/2019 10:30 PM, Rob Cliffe via Python-Dev wrote:

...

On 10/08/2019 23:30:18, Greg Ewing wrote:

...
Rob Cliffe via Python-Dev wrote:

...
Also, the former is simply more *informative* - it tells the reader that baz is expected to be a directory, not a file.

On Windows you can usually tell that from the fact that filenames almost always have an extension, and directory names almost never do.

Usually, but not always. I have not infrequently used files with a blank extension. I can't recall using a directory name with an extension (but I can't swear that I never have).

I most commonly see this with bare git repositories <reponame>.git. And I've created directory names with "extensions" for my own use. Eric

Paul Moore

5:15 a.m.

On Sun, 11 Aug 2019 at 03:37, Rob Cliffe via Python-Dev <python-dev@python.org> wrote:

...

Usually, but not always. I have not infrequently used files with a blank extension. I can't recall using a directory name with an extension (but I can't swear that I never have).

I've often seen directory names like "1. Overview" on Windows. Technically, " Overview" would be the extension here. Of course, that's a silly example, but the point is that there's a difference between what's clear to a human and what's clear to a computer... Paul

Glenn Linderman

10 Aug 10 Aug

6:46 p.m.

New subject: An f-string issue [Was: Re: Re: What to do about invalid escape sequences]

Because of the "invalid escape sequence" and "raw string" discussion, when looking at the documentation, I also noticed the following description for f-strings:

...

Escape sequences are decoded like in ordinary string literals (except when a literal is also marked as a raw string). After decoding, the grammar for the contents of the string is: followed by lots of stuff, followed by Backslashes are not allowed in format expressions and will raise an error: f"newline: {ord('\n')}" # raises SyntaxError

What I don't understand is how, if f-strings are processed AS DESCRIBED, how the \n is ever seen by the format expression. The description is that they are first decoded like ordinary strings, and then parsed for the internal grammar containing {} expressions to be expanded. If that were true, the \n in the above example would already be a newline character, and the parsing of the format expression would not see the backslash. And if it were true, that would actually be far more useful for this situation. So given that it is not true, why not? And why go to the extra work of prohibiting \ in the format expressions? The PEP 498, of course, has an apparently more accurate description, that the {} parsing actually happens before the escape processing. Perhaps this avoids making multiple passes over the string to do the work, as the literal pieces and format expression pieces have to be separate in the generated code, but that is just my speculation: I'd like to know the real reason. Should the documentation be fixed to make the description more accurate? If so, I'd be glad to open an issue. The PEP further contains the inaccurate statement:

...

Like all raw strings in Python, no escape processing is done for raw f-strings:

not mentioning the actual escape processing that is done for raw strings, regarding \" and \'.

Eric V. Smith

7:22 p.m.

New subject: An f-string issue [Was: Re: Re: What to do about invalid escape sequences]

...

Because of the "invalid escape sequence" and "raw string" discussion, when looking at the documentation, I also noticed the following description for f-strings:

...
Escape sequences are decoded like in ordinary string literals (except when a literal is also marked as a raw string). After decoding, the grammar for the contents of the string is: followed by lots of stuff, followed by Backslashes are not allowed in format expressions and will raise an error: f"newline: {ord('\n')}" # raises SyntaxError

What I don't understand is how, if f-strings are processed AS DESCRIBED, how the \n is ever seen by the format expression. If I recall correctly, the mentioned decoding is happening on the string

n 8/10/2019 7:46 PM, Glenn Linderman wrote: literal parts of the f-strings (above, the "newline: " part), not the expression parts (inside the {}). But it's been a while and I don't recall all of the details.

...

The description is that they are first decoded like ordinary strings, and then parsed for the internal grammar containing {} expressions to be expanded. If that were true, the \n in the above example would already be a newline character, and the parsing of the format expression would not see the backslash. And if it were true, that would actually be far more useful for this situation.

So given that it is not true, why not? And why go to the extra work of prohibiting \ in the format expressions?

It's a future-proofing thing. See the discussion at https://mail.python.org/archives/list/python-dev@python.org/thread/EVXD72IYU... It has pointers to other parts of the discussion. At some point, I'm planning on switching the parsing of f-strings from the custom parser (see Python/ast.c, FstringParser_ConcatFstring()) to having the python parser itself parse the f-strings. This will be similar to PEP 536, which doesn't have much detail, but does describe some of the motivations.

...

The PEP 498, of course, has an apparently more accurate description, that the {} parsing actually happens before the escape processing. Perhaps this avoids making multiple passes over the string to do the work, as the literal pieces and format expression pieces have to be separate in the generated code, but that is just my speculation: I'd like to know the real reason.

Should the documentation be fixed to make the description more accurate? If so, I'd be glad to open an issue.

Sure. I'm always in favor of accuracy. The f-string documentation was a last-minute rush job that could have used a lot more editing, and more eyes are always welcome. But it will take a fair amount of research to understand it well enough to document it in more detail.

...

The PEP further contains the inaccurate statement:

...
Like all raw strings in Python, no escape processing is done for raw f-strings:

not mentioning the actual escape processing that is done for raw strings, regarding \" and \'.

It should probably just say it uses the same rules as raw strings. Eric

Greg Ewing

7:32 p.m.

New subject: An f-string issue [Was: Re: Re: What to do about invalid escape sequences]

Glenn Linderman wrote:

...

If that were true, the \n in the above example would already be a newline character, and the parsing of the format expression would not see the backslash. And if it were true, that would actually be far more useful for this situation.

But then it would fail for a different reason -- the same reason that this is a syntax error: 'hello world'

...

Why go to the extra work of prohibiting \ in the format expressions?

Maybe to avoid problems like the above? Or maybe because it would be confusing -- there are two levels of string literal processing going on, one on the outer f-string and one on the embedded string literal in the expression. What level is the backslash expansion done in? Is it done in both? To get a backslash in the embedded string, do I need two backslashes or four? Banning backslashes altogether sidesteps all these issues.

...

not mentioning the actual escape processing that is done for raw strings, regarding \" and \'.

Technically that's not part of "escape processing", since it takes place during lexical analysis -- it has to, because it affects how the input stream is divided into tokens. However, the backslash prohibition seems to apply even to this use in f-strings:

...

...
...
f"quote: {ord('\"')}" File "<stdin>", line 1 SyntaxError: f-string expression part cannot include a backslash

So it seems that f-strings are even more special than r-strings when it comes to the treatment of backslashes. -- Greg

Glenn Linderman

8:08 p.m.

New subject: An f-string issue [Was: Re: Re: What to do about invalid escape sequences]

On 8/10/2019 5:32 PM, Greg Ewing wrote:

...

Glenn Linderman wrote:

...
If that were true, the \n in the above example would already be a newline character, and the parsing of the format expression would not see the backslash. And if it were true, that would actually be far more useful for this situation.

But then it would fail for a different reason -- the same reason that this is a syntax error:

'hello world'

Would it really? Or would it, because it has already been lexed and parsed as string content by then, simply be treated as a new line that is part of the string? just like "hello\nworld" is treated after it is lexed and parsed? Of course, if it is passed back through the parser again, you would be correct. I don't know the internals that apply here. Anyway, Eric supplied the real reasons for the limitation, but it does seem like if it would be passed back through the "real" parser, that the real parser would have no problem handling the ord('\n') part of f"newline: {ord('\n')}" if it weren't prohibited by prechecking for \ and making it illegal. But there is also presently a custom parser involved, so whether the \ check is in there or in a preprocessing step before the parser, I don't know.

Random832

14 Aug 14 Aug

10:09 a.m.

New subject: An f-string issue [Was: Re: Re: What to do about invalid escape sequences]

On Sat, Aug 10, 2019, at 19:54, Glenn Linderman wrote:

...

Because of the "invalid escape sequence" and "raw string" discussion, when looking at the documentation, I also noticed the following description for f-strings:

...
Escape sequences are decoded like in ordinary string literals (except when a literal is also marked as a raw string). After decoding, the grammar for the contents of the string is: followed by lots of stuff, followed by Backslashes are not allowed in format expressions and will raise an error: f"newline: {ord('\n')}" # raises SyntaxError What I don't understand is how, if f-strings are processed AS DESCRIBED, how the \n is ever seen by the format expression.

The description is that they are first decoded like ordinary strings, and then parsed for the internal grammar containing {} expressions to be expanded. If that were true, the \n in the above example would already be a newline character, and the parsing of the format expression would not see the backslash. And if it were true, that would actually be far more useful for this situation.

So given that it is not true, why not? And why go to the extra work of prohibiting \ in the format expressions?

AIUI there were strong objections to the "AS DESCRIBED" process (which would require almost all valid uses of backslashes inside to be doubled, and would incidentally leave your example *still* a syntax error), and disallowing backslashes is a way to pretend that it doesn't work that way and leave open the possibility of changing how it works in the future without breaking compatibility. The only dubious benefit to the described process with backslashes allowed would be that f-strings (or other strings, in the innermost level) could be infinitely nested as f'{f\'{f\\\'{...}\\\'}\'}', rather than being hard-limited to four levels as f'''{f"""{f'{"..."}'}"""}'''

Glenn Linderman

3:53 p.m.

New subject: An f-string issue [Was: Re: Re: What to do about invalid escape sequences]

On 8/14/2019 8:09 AM, Random832 wrote:

...

On Sat, Aug 10, 2019, at 19:54, Glenn Linderman wrote:

...
Because of the "invalid escape sequence" and "raw string" discussion, when looking at the documentation, I also noticed the following description for f-strings:

...
Escape sequences are decoded like in ordinary string literals (except when a literal is also marked as a raw string). After decoding, the grammar for the contents of the string is: followed by lots of stuff, followed by Backslashes are not allowed in format expressions and will raise an error: f"newline: {ord('\n')}" # raises SyntaxError What I don't understand is how, if f-strings are processed AS DESCRIBED, how the \n is ever seen by the format expression.

The description is that they are first decoded like ordinary strings, and then parsed for the internal grammar containing {} expressions to be expanded. If that were true, the \n in the above example would already be a newline character, and the parsing of the format expression would not see the backslash. And if it were true, that would actually be far more useful for this situation.

So given that it is not true, why not? And why go to the extra work of prohibiting \ in the format expressions? AIUI there were strong objections to the "AS DESCRIBED" process (which would require almost all valid uses of backslashes inside to be doubled, and would incidentally leave your example *still* a syntax error), and disallowing backslashes is a way to pretend that it doesn't work that way and leave open the possibility of changing how it works in the future without breaking compatibility.

The only dubious benefit to the described process with backslashes allowed would be that f-strings (or other strings, in the innermost level) could be infinitely nested as f'{f\'{f\\\'{...}\\\'}\'}', rather than being hard-limited to four levels as f'''{f"""{f'{"..."}'}"""}'''

Sure. I am just pointing out (and did so in the issue I created for documentation as well), that the documentation does not currently correctly describe the implemenation, which is misleading to the user. While I have opinions on how things could work better, my even stronger opinion is that documentation should *accurately* describe how things work, even if it how it works is more complex than it should be.

Gregory P. Smith

9 Aug 9 Aug

6:04 p.m.

On Fri, Aug 9, 2019 at 8:43 AM Guido van Rossum <guido@python.org> wrote:

...

This discussion looks like there's no end in sight. Maybe the Steering Council should take a vote?

I've merged the PR reverting the behavior in 3.8 and am doing the same in the master branch. The sheer volume of email this is generating shows that we're not ready to do this to our users. Recall the nightmare caused by md5.py and sha.py DeprecationWarning's in 2.5... this would be similar. We need owners of code to see the problems, not end users of other peoples code. FWIW, lest people think I don't like this change and just pushed the revert buttons as a result, wrong. I agree with the ultimate SyntaxError and believe we should move the language there (it is better for long term code quality). But it needs to be done in a way that disrupts the *right* people in the process, not disrupting an exponentially higher number of users of other peoples code. If the steering council does anything it should be deciding if we're still going to do this at all and, if so, planning how we do it without repeating past mistakes. -gps

Serhiy Storchaka

10 Aug 10 Aug

2:01 a.m.

10.08.19 02:04, Gregory P. Smith пише:

...

I've merged the PR reverting the behavior in 3.8 and am doing the same in the master branch.

I was going to rebase it to master and go in normal backporting process if we decide that DeprecationWarning should be in master. I waited the end of the discussion.

...

Recall the nightmare caused by md5.py and sha.py DeprecationWarning's in 2.5... this would be similar.

It is very different because DeprecationWarning for md5.py and sha.py is emitted at runtime.

Steve Holden

3:14 a.m.

While not a total solution, it seems like it might be worthwhile forcing flake8 or similar checks when uploading PyPI modules. That would catch the illegal escape sequences where it really matters - before they enter the ecosystem. (general) fathead:pyxll-www sholden$ cat t.py "Docstring with illegal \escape sequence" (general) fathead:pyxll-www sholden$ flake8 t.py t.py:1:25: W605 invalid escape sequence '\e' while this won't mitigate the case for existing packages, it should reduce the number of packages containing potentially erroneous string constants, preparing the ground for the eventual introduction of the syntax error. Steve Holden On Sat, Aug 10, 2019 at 8:07 AM Serhiy Storchaka <storchaka@gmail.com> wrote:

...

10.08.19 02:04, Gregory P. Smith пише:

...
I've merged the PR reverting the behavior in 3.8 and am doing the same in the master branch.

I was going to rebase it to master and go in normal backporting process if we decide that DeprecationWarning should be in master. I waited the end of the discussion.

...
Recall the nightmare caused by md5.py and sha.py DeprecationWarning's in 2.5... this would be similar.

It is very different because DeprecationWarning for md5.py and sha.py is emitted at runtime. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/H5VXWS6U...

Steven D'Aprano

9 Aug 9 Aug

8:19 a.m.

On Wed, Aug 07, 2019 at 07:47:45PM +1000, Chris Angelico wrote:

...

On Wed, Aug 7, 2019 at 7:33 PM Steven D'Aprano <steve@pearwood.info> wrote:

...
What's the rush? Let's be objective here: what benefit are we going to get from this change? Is there anyone hanging out desperately for "\d" and "\-" to become SyntaxErrors, so they can... do what?

So that problems can start to be detected. Time and again, Python users on Windows get EXTREMELY confused by the way their code worked perfectly with one path, then bizarrely fails with another. That is a very real problem, and the problem is that it appeared to work when actually it was wrong.

And this change won't fix that, because *good* paths that currently work today will fail in the future, but *bad* paths that silently do the wrong thing will continue to silently do the wrong thing. py> filename = "location\data" # will work correctly <stdin>:1: SyntaxWarning: invalid escape sequence \d py> filename = "location\temp" # doesn't work as expected, but no error py> Effectively, we are hoping that Windows users will infer from the failure of "\d" (say) that they shouldn't use "\t" even though it doesn't raise. Perhaps some of them will, but I maintain we're talking about a small, incremental improvement, not something that will once and for all fix the problem. I don't think this is a benefit for users of any operating system except Windows users. For Linux, Unix, Mac users, one could argue strongly that we're making the string escape experience a tiny bit *worse*, not better. Raymond's example of ASCII art for example. I think the subset of users that this will help is quite small: - users on Windows; - who haven't read or paid attention to the innumerable recommendations on the web and the documentation that they always use forwards slashes in paths; - who happen to use an escape like \d rather than \t; - and will read and understand the eventual SyntaxWarning/Error; - and infer from that error that they should change their path to use forward slashes instead of backslashes; - and all this happens *before* they get bitten by the \t problem and they learn the hard way not to use backslashes in paths. I'm not saying this isn't worth doing. I'm saying it's a small benefit that *right now* is a lot less than the cost to library authors and users.

...

Python has a history of fixing these problems. It used to be that b"\x61\x62\x63\x64" was equal to u"abcd", but now Python sees these as fundamentally different.

Yes, and we fixed that over a 10+ year period involving no fewer than three full releases in the Python 2.x series and eight full releases in the Python 3.x series, and the transition period is not over yet since 2.7 is not yet EOLed.

...

Data-dependent bugs caused by a syntactic oddity are a language flaw that needs to be fixed.

There is always a tradeoff between the severity of the flaw and how much pain we are willing to accept to fix it. I think Raymond has made a good case that in this instance, the pain of fixing it *now* is greater than the benefit. (I don't think he has made the case to reverse the depreciation altogether.) If the benefit versus pain never moves into the black, then we should keep the status quo indefinitely, like any other language wart or misfeature we're stuck with due to backwards compatibility. ("Although never is often better than *right* now.") But having said that, I'm confident that given an improved deprecation process that makes it easier for library authors to see the warning before end-users, we will be able to move forward in a release or two.

...

...
Because our processes don't work the way we assumed, it turns out that in practice we haven't given developers the deprecation period we thought we had. Read Nathaniel's post, if you haven't already done so:

https://mail.python.org/archives/list/python-dev@python.org/message/E7QCC74O...

He makes a compelling case that while we might have had the promised deprecation period by the letter of the law, in practice most developers will have never seen it, and we will be breaking the spirit of the promise if we continue with the unmodified plan.

Yes, that's a fair complaint. But merely pushing the deprecation back by a version is not solving it. There has to be SOMETHING done differently.

"We must do SOMETHING!!! This is something, therefore we must do it!!!" I agree that we ought to fix the problem with the deprecation warnings. What I don't agree with is the demand that unless I can give a fix for the deprecation warning issue *right now* we must stay the course no matter how annoying and painful it is for users and library authors.

...

...
And yet here we are rushing through a breaking change in an accelerated manner, for a change of marginal benefit.

It's not a marginal benefit. For people who try to teach Python on multiple operating systems, this is a very very real benefit.

You mean people like Raymond? *wink* As an educator, you can enable the warning for your students. You could even turn it into an error. You don't have to wait for the warning to be enabled by default.

...

...
Now is better than never. Although never is often better than *right* now.

Not sure how the Zen supports what you're saying there, since you're specifically saying "not never, not now, just later". But what do you actually mean by not rushing this into 3.8?

This warning was silent by default in 3.7. It should go back to being silent by default in 3.8, with the provisional aim of making it visible by default in 3.9 and an error in 4.0.

...

...
Right now, we're looking at a seriously compromised user-experience for 3.8. People are going to hate these warnings, many of them won't know what to do with them and will be sure that Python is buggy, and for very little benefit.

Then the problem is that people blame Python for these warnings. That is a problem to be solved; we need people to understand that a warning emitted by a library is a *library bug* not a language flaw.

Indeed, but let's be realistic here. How are you going to do that? We could take out an ad on the Superbowl, hire Instagram influencers, send people to re-education camps... *wink* Most experienced developers know that library bugs should be reported to the library, but there's always a few that missed the memo, or who think that b.p.o. is a tracker for all Python projects, or who simply aren't too clear on which libraries are in the std lib and which are not.

...

...
...
Library authors can start _right now_ fixing their code so it's more 3.8 compatible.

Provided that (1) they are aware that this is a problem that needs to be fixed, and (2) they have the round tuits to actually fix it by 3.8.0. Neither are guaranteed.

(1) Yes it is, see above; (2) fair point, but this is restricted to string literals and can be detected simply by compiling the code, so it's a reasonably findable problem.

I'm not saying this is a huge an onerous problem to fix for library authors. Most, I expect, will be able to fix it pretty quick. But there's lag between each step of this: - users notice the warning; - users get motivated to report it to the library; - the library gets patched; - users upgrade to the patched version. The first two steps are the reason for the annoyance factor and poor user-experience. That's why we prefer the library authors see the warnings first, before the end-users, and that's where the current process failed. [...]

...

And unless you have a plan to do something different in 3.8 that ensures that library devs see the warnings, there's no justification for the delay. All you'll do is defer the exact same problem by another eighteen months. If the warning remains silent in 3.8, how will library devs get any indication that they need to fix something?

We have good anecdotal evidence that most library authors test their libraries with warnings enabled. It is an accident of this specific warning that it didn't work for them. Libraries are, for the most part, trying to do the right thing. -- Steven

Chris Angelico

8:30 a.m.

On Fri, Aug 9, 2019 at 11:22 PM Steven D'Aprano <steve@pearwood.info> wrote:

...

And this change won't fix that, because *good* paths that currently work today will fail in the future, but *bad* paths that silently do the wrong thing will continue to silently do the wrong thing.

Except that many paths can be both "good" and "bad", because paths have multiple components. So the warning has a VERY high probability of happening. But I've given up on this debate. No more posts from me. Some things aren't worth fighting for. With the number of words posted in this thread saying "we need convenience, not correctness!!!!", I'm done arguing. ChrisA

brian.skinn＠gmail.com

7 Aug 7 Aug

10:52 a.m.

Steven D'Aprano wrote:

...

...
Currently it requires some extra steps or flags, which are not well known. What change are you proposing for 3.8 that will ensure that this actually gets solved? Absolutely nothing. I don't have to: we're an entire community, this doesn't have to fall only on my shoulders. I'm not even the messenger:

Because our processes don't work the way we assumed, it turns out that in practice we haven't given developers the deprecation period we thought we had. Read Nathaniel's post, if you haven't already done so: https://mail.python.org/archives/list/python-dev@python.org/message/E7QCC74O... He makes a compelling case that while we might have had the promised deprecation period by the letter of the law, in practice most developers will have never seen it, and we will be breaking the spirit of the promise if we continue with the unmodified plan. ... I'm sure that the affected devs will understand why it was their fault they couldn't see the warnings, when even people from a first-class library like SymPy took four iterations to do it right. that's Raymond. I'm just (partly) agreeing with him. Just because I don't have a solution for this problem doesn't mean the problem doesn't exist.

As the SymPy team has figured out the right pytest incantation to expose these warnings, perhaps a feature request on pytest to encapsulate that mix of options into a single flag would be a good idea?

raymond.hettinger＠gmail.com

5:57 p.m.

For me, these warnings are continuing to arise almost daily. See two recent examples below. In both cases, the code previously had always worked without complaint. ----- Example from yesterday's class ---- ''' How old-style formatting works with positional placeholders print('The answer is %d today, but was %d yesterday' % (new, old)) \--------------------o \------------------------------------o ''' SyntaxWarning: invalid escape sequence \- ----- Example from today's class ---- # Cut and pasted from: # https://en.wikipedia.org/wiki/VCard#vCard_2.1 vcard = ''' BEGIN:VCARD VERSION:2.1 N:Gump;Forrest;;Mr. FN:Forrest Gump ORG:Bubba Gump Shrimp Co. TITLE:Shrimp Man PHOTO;GIF:http://www.example.com/dir_photos/my_photo.gif TEL;WORK;VOICE:(111) 555-1212 TEL;HOME;VOICE:(404) 555-1212 ADR;WORK;PREF:;;100 Waters Edge;Baytown;LA;30314;United States of America LABEL;WORK;PREF;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:100 Waters Edge=0D= =0ABaytown\, LA 30314=0D=0AUnited States of America ADR;HOME:;;42 Plantation St.;Baytown;LA;30314;United States of America LABEL;HOME;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:42 Plantation St.=0D=0A= Baytown, LA 30314=0D=0AUnited States of America EMAIL:forrestgump@example.com REV:20080424T195243Z END:VCARD ''' SyntaxWarning: invalid escape sequence \,

Chris Angelico

6:07 p.m.

On Thu, Aug 8, 2019 at 8:58 AM <raymond.hettinger@gmail.com> wrote:

...

For me, these warnings are continuing to arise almost daily. See two recent examples below. In both cases, the code previously had always worked without complaint.

----- Example from yesterday's class ----

''' How old-style formatting works with positional placeholders

print('The answer is %d today, but was %d yesterday' % (new, old)) \--------------------o \------------------------------------o '''

SyntaxWarning: invalid escape sequence \-

I've no idea why this is even a string literal, but if it absolutely has to be, then you could use a character other than backslash.

...

----- Example from today's class ----

# Cut and pasted from: # https://en.wikipedia.org/wiki/VCard#vCard_2.1 vcard = ''' ... LABEL;WORK;PREF;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:100 Waters Edge=0D= =0ABaytown\, LA 30314=0D=0AUnited States of America ... '''

SyntaxWarning: invalid escape sequence \,

When you take a text string and create a string literal to represent it, sometimes you have to modify it to become syntactically valid. This is exactly the sort of thing that SHOULD be being warned about, because it's sometimes going to work and sometimes not, depending on the exact data you're working with. Please don't teach people the habit of pretending that the backslash isn't significant. If the warning were changed to be silent for 3.8, what would you do differently? How would having extra time to solve this problem help you? ChrisA

raymond.hettinger＠gmail.com

8:13 p.m.

This isn't about me. As a heavy user of the 3.8 beta, I'm just the canary in the coal mine. After many encounters with these warnings, I'm starting to believe that Python's long-standing behavior was convenient for users. Effectively, "\-" wasn't an error, it was just a way of writing "\-". For the most part, that worked out fine. Sure, we all seen interactive prompt errors from having \t in a pathname but not in production (likely because a FileNotFoundError would surface immediately).

Glenn Linderman

11:09 p.m.

...

This isn't about me. As a heavy user of the 3.8 beta, I'm just the canary in the coal mine. Are you, with an understanding of the issue, submitting bug reports on

On 8/7/2019 6:13 PM, raymond.hettinger@gmail.com wrote: the issues you find, thus helping to alleviate the problem, and educate the package maintainers? Or are you just carping here? I'll apologize in advance for using the word "carping" if the answer to my first question is yes. :) Glenn

Jeroen Demeyer

8 Aug 8 Aug

4:03 a.m.

...

When you take a text string and create a string literal to represent it, sometimes you have to modify it to become syntactically valid.

Even simpler: use r""" instead of """ The only case where that won't work is when you need actual escape sequences. But I find this very rare in practice for triple-quoted strings.

Dima Tisnek

4:31 a.m.

These two ought to be converted to raw strings, shouldn't they? On Thu, 8 Aug 2019 at 08:04, <raymond.hettinger@gmail.com> wrote:

...

For me, these warnings are continuing to arise almost daily. See two recent examples below. In both cases, the code previously had always worked without complaint.

----- Example from yesterday's class ----

''' How old-style formatting works with positional placeholders

print('The answer is %d today, but was %d yesterday' % (new, old)) \--------------------o \------------------------------------o '''

SyntaxWarning: invalid escape sequence \-

----- Example from today's class ----

# Cut and pasted from: # https://en.wikipedia.org/wiki/VCard#vCard_2.1 vcard = ''' BEGIN:VCARD VERSION:2.1 N:Gump;Forrest;;Mr. FN:Forrest Gump ORG:Bubba Gump Shrimp Co. TITLE:Shrimp Man PHOTO;GIF:http://www.example.com/dir_photos/my_photo.gif TEL;WORK;VOICE:(111) 555-1212 TEL;HOME;VOICE:(404) 555-1212 ADR;WORK;PREF:;;100 Waters Edge;Baytown;LA;30314;United States of America LABEL;WORK;PREF;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:100 Waters Edge=0D= =0ABaytown\, LA 30314=0D=0AUnited States of America ADR;HOME:;;42 Plantation St.;Baytown;LA;30314;United States of America LABEL;HOME;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:42 Plantation St.=0D=0A= Baytown, LA 30314=0D=0AUnited States of America EMAIL:forrestgump@example.com REV:20080424T195243Z END:VCARD '''

SyntaxWarning: invalid escape sequence \, _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/OYGRL5AW...

Terry Reedy

12 Aug 12 Aug

12:19 p.m.

On 8/8/2019 5:31 AM, Dima Tisnek wrote:

...

These two ought to be converted to raw strings, shouldn't they?

For the first example, yes or no. It depends ;-) See below. The problem is that string literals in python code are, by default, half-baked. The interpretation of '\' by the python parser, and the resulting string object, depends on the next char. I can see how this is sometimes a convenience, but I consider it a design bug. There is no way for a user to say "I intend for this string to be fully baked, so if it cannot be, I goofed." And the convenience gets used when it must not be.

...

On Thu, 8 Aug 2019 at 08:04, <raymond.hettinger@gmail.com> wrote:

...
For me, these warnings are continuing to arise almost daily. See two recent examples below. In both cases, the code previously had always worked without complaint.

----- Example from yesterday's class ----

''' How old-style formatting works with positional placeholders

print('The answer is %d today, but was %d yesterday' % (new, old)) \--------------------o \------------------------------------o '''

SyntaxWarning: invalid escape sequence \-

For true ascii-only character art, where one will never want '\' baked, an 'r' prefix is appropriate. It is in fact mandatory when '\' may be followed by a legal escape code. If one is making unicode art, with '\u' and '\U' escapes used, one must not use the 'r' prefix, but should instead use '\\' for unbaked backslashes. The unicode escapes have already thrown off column alignments.

...

...
----- Example from today's class ----

# Cut and pasted from: # https://en.wikipedia.org/wiki/VCard#vCard_2.1 vcard = ''' BEGIN:VCARD VERSION:2.1 N:Gump;Forrest;;Mr. FN:Forrest Gump ORG:Bubba Gump Shrimp Co. TITLE:Shrimp Man PHOTO;GIF:http://www.example.com/dir_photos/my_photo.gif TEL;WORK;VOICE:(111) 555-1212 TEL;HOME;VOICE:(404) 555-1212 ADR;WORK;PREF:;;100 Waters Edge;Baytown;LA;30314;United States of America LABEL;WORK;PREF;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:100 Waters Edge=0D= =0ABaytown\, LA 30314=0D=0AUnited States of America ADR;HOME:;;42 Plantation St.;Baytown;LA;30314;United States of America LABEL;HOME;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:42 Plantation St.=0D=0A= Baytown, LA 30314=0D=0AUnited States of America EMAIL:forrestgump@example.com REV:20080424T195243Z END:VCARD '''

SyntaxWarning: invalid escape sequence \,

Based on my reading of the Wikipedia vCard page linked above, the vCard protocol mandates use of '\' chars that must be passed through unbaked to a vCard processor. (I don't know why '\,', but it does not matter.) So vCard strings using '\' should generally have 'r' prefixes, just as for regex and latex strings. For version 2.1, it appears that one can currently, in 3.7-, get away with omitting 'r'. In versions 3.0 and 4.0, embedded 'newline' is represented by '\n' instead of '=0D=0A'. It must not be baked by python, but passed on as is. So omitting 'r' becomes a bug for those versions. To me, this one of the major problems with the half-baked default. People who want string literals left as is sometimes get away with omitting explicit mention of that fact, but sometimes don't. Note: when we added '\u' and '\U' escapes, we broke working code that had Windows paths like "C:\Users\Terry". But we did it anyway. -- Terry Jan Reedy

Steve Holden

12:47 p.m.

On Mon, Aug 12, 2019 at 6:26 PM Terry Reedy <tjreedy@udel.edu> wrote:

...

On 8/8/2019 5:31 AM, Dima Tisnek wrote: [...]

To me, this one of the major problems with the half-baked default. People who want string literals left as is sometimes get away with omitting explicit mention of that fact, but sometimes don't.

Note: when we added '\u' and '\U' escapes, we broke working code that had Windows paths like "C:\Users\Terry". But we did it anyway.

It might be helpful it there were some sort of declaration that the ultimate goal, despite the backwards incompatibility it would entail, is removing this wart from the language. While practicality does indeed often beat purity, I fell this particular case may be the exception that proves the rule. Onwards to 4.0!

Jim J. Jewett

8 Aug 8 Aug

3:40 p.m.

FWIW, the web archive https://mail.python.org/archives/list/python-dev@python.org/thread/ZX2JLOZDO... does not seem to display the problems ... apparently the individual messages are not included in view source, and are cleaned up for chrome's inspect. I'm not sure whether that counts as a bug in the archiving or not.

Terry Reedy

12 Aug 12 Aug

1:35 p.m.

On 8/7/2019 6:57 PM, raymond.hettinger@gmail.com wrote:

...

For me, these warnings are continuing to arise almost daily. See two recent examples below.

Both examples are fragile, as explained below. They make me more in favor of no longer guessing what \ means in the default mode. The transition is a different matter. I wonder if future imports could be (or have been) used.

...

In both cases, the code previously had always worked without complaint.

Because they are the are in the subset of examples of the type that work without adding an r prefix. Others in the class require an r prefix. Ascii art:

...

''' How old-style formatting works with positional placeholders print('The answer is %d today, but was %d yesterday' % (new, old)) \--------------------o \------------------------------------o '''

In general, ascii art needs an r prefix. Even if one example gets away without, an edited version or a new example may not. In the example above, the o looks weird. Suppose '\' were used instead. Suppose one pointed to parentheses instead and ended up with this teaching example. '''Sample code with parentheses: print('The answer is %d today, but was %d yesterday' % (new, old)) \-------\ \----------------------------------------------------------\ These parentheses are properly nested. ''' Whoops. This is what I mean by fragile. A new example: alpha_slide = ''' ----- \abcd *\bcd **\cd ***\d ****\ ----- ''' print(alpha_slide) # This looks nice in source, but the result is ----- bcd *cd **\cd ***\d ****----- where the appearance of \a and \b depends on the output device. Ascii art never needs cooking. I would teach "Always prefix ascii art with r" in preference to "Don't bother prefixing ascii art with r unless you really have to because you use one of a memorized the list of escapes, and promise yourself to recheck and add it if needed everytime you edit and are able to keep that promise". vCard data item:

...

# Cut and pasted from: # https://en.wikipedia.org/wiki/VCard#vCard_2.1 vcard = ''' BEGIN:VCARD VERSION:2.1 N:Gump;Forrest;;Mr. FN:Forrest Gump ORG:Bubba Gump Shrimp Co. TITLE:Shrimp Man PHOTO;GIF:http://www.example.com/dir_photos/my_photo.gif TEL;WORK;VOICE:(111) 555-1212 TEL;HOME;VOICE:(404) 555-1212 ADR;WORK;PREF:;;100 Waters Edge;Baytown;LA;30314;United States of America LABEL;WORK;PREF;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:100 Waters Edge=0D= =0ABaytown\, LA 30314=0D=0AUnited States of America ADR;HOME:;;42 Plantation St.;Baytown;LA;30314;United States of America LABEL;HOME;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:42 Plantation St.=0D=0A= Baytown, LA 30314=0D=0AUnited States of America EMAIL:forrestgump@example.com REV:20080424T195243Z END:VCARD '''

Thank you for including the link so I could learn more. In general, vCard representations should be raw. The above uses the vCard 2.1 spec. The more commonly used 3.0 and 4.0 specs replace "=0D=0A=" in the 2.1 spec with a raw "\n". If the above were updated, it might appear to 'work', but would, I believe, fail if fed to a vCard processor. This is what I mean by 'fragile'. I would rather teach beginners the easily remembered "Always prefix vCard representations with 'r'" rather than "Only prefix vCard representations with 'r' if you use the more common newer specs and use '\n', as you often would." (I don't know if raw '\t' is ever used; if so, add that.) The above is based on the idea that while bytes and strings are 'sequences of characters (codes)', they are usually used to represent instances of often undeclared types of data. If the strings of a data type never need cooking, and may contain backslashes that could be cooked but must not be, the easiest rule is to always prefix with 'r'. (Those with experience can refine it if they wish.) If instances contain some backslashes that must be cooked, omit 'r' and double any backslashes that must be left alone. -- Terry Jan Reedy

Dima Tisnek

8 Aug 8 Aug

4:25 a.m.

I feel this is one of the cases, where we're expecting early adopters to proactively post pull requests against affected libraries. Failing that opening issues against affected libraries. I was ready to do just that, but alas didn't even have to! Matt's analysis shows that it's now too hard. What was hard for me were the rules. In fact, not being up to date, I couldn't even find the PEP that specified the change. What the Python devs could do is to guide users on how to update existing code. Something like python3.8 -c 'print(repr("\b\l\a\h"))' but with sensible output. And instruction for those who support both py3 and py3 from the same codebase. I could hope for a feature in psf/black, but maybe that's not for everyone. Just my 2c :) On Mon, 5 Aug 2019 at 13:30, <raymond.hettinger@gmail.com> wrote:

...

We should revisit what we want to do (if anything) about invalid escape sequences.

For Python 3.8, the DeprecationWarning was converted to a SyntaxWarning which is visible by default. The intention is to make it a SyntaxError in Python 3.9.

This once seemed like a reasonable and innocuous idea to me; however, I've been using the 3.8 beta heavily for a month and no longer think it is a good idea. The warning crops up frequently, often due to third-party packages (such as docutils and bottle) that users can't easily do anything about. And during live demos and student workshops, it is especially distracting.

I now think our cure is worse than the disease. If code currently has a non-raw string with '\latex', do we really need Python to yelp about it (for 3.8) or reject it entirely (for 3.9)? If someone can't remember exactly which special characters need to be escaped, do we really need to stop them in their tracks during a data analysis session? Do we really need to reject ASCII art in docstrings: ` \-------> special case'?

IIRC, the original problem to be solved was false positives rather than false negatives: filename = '..\training\new_memo.doc'. The warnings and errors don't do (and likely can't do) anything about this.

If Python 3.8 goes out as-is, we may be punching our users in the nose and getting almost no gain from it. ISTM this is a job best left for linters. For a very long time, Python has been accepting the likes of 'more \latex markup' and has been silently converting it to 'more \\latex markup'. I now think it should remain that way. This issue in the 3.8 beta releases has been an almost daily annoyance for me and my customers. Depending on how you use Python, this may not affect you or it may arise multiple times per day.

Raymond

P.S. Before responding, it would be a useful exercise to think for a moment about whether you remember exactly which characters must be escaped or whether you habitually put in an extra backslash when you aren't sure. Then see: https://bugs.python.org/issue32912 _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ZX2JLOZD...

1966

Age (days ago)

1976

Last active (days ago)

List overview

Download

138 comments

36 participants

participants (36)

Brett Cannon
brian.skinn＠gmail.com
Chris Angelico
Christian Tismer
Dima Tisnek
Eric V. Smith
eryk sun
Facundo Batista
Glenn Linderman
Greg Ewing
Gregory P. Smith
Guido van Rossum
Ivan Pozdeev
Jeroen Demeyer
Jim J. Jewett
Joao S. O. Bueno
Jonathan Goble
Matt Billenstein
Michael
MRAB
Nathaniel Smith
Neil Schemenauer
Nick Coghlan
Paul Moore
Petr Viktorin
Random832
raymond.hettinger＠gmail.com
Richard Damon
Rob Cliffe
Serhiy Storchaka
Stephen J. Turnbull
Steve Dower
Steve Holden
Steven D'Aprano
Terry Reedy
Toshio Kuratomi

What to do about invalid escape sequences

tags

participants (36)