On 8/8/2019 5:31 AM, Dima Tisnek wrote:
These two ought to be converted to raw strings, shouldn't they?
For the first example, yes or no. It depends ;-) See below. The problem is that string literals in python code are, by default, half-baked. The interpretation of '\' by the python parser, and the resulting string object, depends on the next char. I can see how this is sometimes a convenience, but I consider it a design bug. There is no way for a user to say "I intend for this string to be fully baked, so if it cannot be, I goofed." And the convenience gets used when it must not be.
On Thu, 8 Aug 2019 at 08:04, <raymond.hettinger@gmail.com> wrote:
For me, these warnings are continuing to arise almost daily. See two recent examples below. In both cases, the code previously had always worked without complaint.
----- Example from yesterday's class ----
''' How old-style formatting works with positional placeholders
print('The answer is %d today, but was %d yesterday' % (new, old)) \--------------------o \------------------------------------o '''
SyntaxWarning: invalid escape sequence \-
For true ascii-only character art, where one will never want '\' baked, an 'r' prefix is appropriate. It is in fact mandatory when '\' may be followed by a legal escape code. If one is making unicode art, with '\u' and '\U' escapes used, one must not use the 'r' prefix, but should instead use '\\' for unbaked backslashes. The unicode escapes have already thrown off column alignments.
----- Example from today's class ----
# Cut and pasted from: # https://en.wikipedia.org/wiki/VCard#vCard_2.1 vcard = ''' BEGIN:VCARD VERSION:2.1 N:Gump;Forrest;;Mr. FN:Forrest Gump ORG:Bubba Gump Shrimp Co. TITLE:Shrimp Man PHOTO;GIF:http://www.example.com/dir_photos/my_photo.gif TEL;WORK;VOICE:(111) 555-1212 TEL;HOME;VOICE:(404) 555-1212 ADR;WORK;PREF:;;100 Waters Edge;Baytown;LA;30314;United States of America LABEL;WORK;PREF;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:100 Waters Edge=0D= =0ABaytown\, LA 30314=0D=0AUnited States of America ADR;HOME:;;42 Plantation St.;Baytown;LA;30314;United States of America LABEL;HOME;ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:42 Plantation St.=0D=0A= Baytown, LA 30314=0D=0AUnited States of America EMAIL:forrestgump@example.com REV:20080424T195243Z END:VCARD '''
SyntaxWarning: invalid escape sequence \,
Based on my reading of the Wikipedia vCard page linked above, the vCard protocol mandates use of '\' chars that must be passed through unbaked to a vCard processor. (I don't know why '\,', but it does not matter.) So vCard strings using '\' should generally have 'r' prefixes, just as for regex and latex strings. For version 2.1, it appears that one can currently, in 3.7-, get away with omitting 'r'. In versions 3.0 and 4.0, embedded 'newline' is represented by '\n' instead of '=0D=0A'. It must not be baked by python, but passed on as is. So omitting 'r' becomes a bug for those versions. To me, this one of the major problems with the half-baked default. People who want string literals left as is sometimes get away with omitting explicit mention of that fact, but sometimes don't. Note: when we added '\u' and '\U' escapes, we broke working code that had Windows paths like "C:\Users\Terry". But we did it anyway. -- Terry Jan Reedy